Radical lexers
David Relson
relson at osagesoftware.com
Wed Dec 10 17:44:42 CET 2003
On Wed, 10 Dec 2003 17:27:20 +0100
Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> wrote:
> David Relson wrote:
>
> > The following should show all the single character tokens (with
> > approx scores):
> >
> > bogoutil -d wordlist.db | grep "^? " | awk '{print $1}' | bogoutil
> > -p wordlist.db
> . I guess
>
> Actually I tried something like this once. I was totally
> fouled by those results. Those showed only the probabilities
> using default parameters.
>
> pi
Bogoutil isn't smart enough to do a full scoring as it can't read config
files.
See the message I just posted for a histogram. Use "-vvv" for actual
scores.
The following command is probably slightly better for you:
bogoutil -d wordlist.db | egrep -v "^. " | egrep "^... " | awk '{print
$1}' | bogofilter -vv -F -PH -C -m0.00000001
It tosses the single character tokens and keeps the triples. The "-PH"
turns off header tagging and the "-C -m0.0001" turns off config file
loading and provides a very small min_dev.
Enjoy.
More information about the Bogofilter
mailing list