Radical lexers

David Relson relson at osagesoftware.com
Wed Dec 10 17:44:42 CET 2003


On Wed, 10 Dec 2003 17:27:20 +0100
Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> wrote:

> David Relson wrote:
> 
> > The following should show all the single character tokens (with
> > approx scores):
> > 
> > bogoutil -d wordlist.db | grep "^? " | awk '{print $1}' | bogoutil
> > -p wordlist.db
>                                     . I guess
> 
> Actually I tried something like this once. I was totally
> fouled by those results. Those showed only the probabilities
> using default parameters.
> 
> pi

Bogoutil isn't smart enough to do a full scoring as it can't read config
files.
See the message I just posted for a histogram.  Use "-vvv" for actual
scores.

The following command is probably slightly better for you:

bogoutil -d wordlist.db | egrep -v "^. " | egrep "^... " | awk '{print
$1}' | bogofilter -vv -F -PH -C -m0.00000001

It tosses the single character tokens and keeps the triples.  The "-PH"
turns off header tagging and the "-C -m0.0001" turns off config file
loading and provides a very small min_dev.

Enjoy.




More information about the Bogofilter mailing list