Radical lexers
    David Relson 
    relson at osagesoftware.com
       
    Wed Dec 10 17:44:42 CET 2003
    
    
  
On Wed, 10 Dec 2003 17:27:20 +0100
Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> wrote:
> David Relson wrote:
> 
> > The following should show all the single character tokens (with
> > approx scores):
> > 
> > bogoutil -d wordlist.db | grep "^? " | awk '{print $1}' | bogoutil
> > -p wordlist.db
>                                     . I guess
> 
> Actually I tried something like this once. I was totally
> fouled by those results. Those showed only the probabilities
> using default parameters.
> 
> pi
Bogoutil isn't smart enough to do a full scoring as it can't read config
files.
See the message I just posted for a histogram.  Use "-vvv" for actual
scores.
The following command is probably slightly better for you:
bogoutil -d wordlist.db | egrep -v "^. " | egrep "^... " | awk '{print
$1}' | bogofilter -vv -F -PH -C -m0.00000001
It tosses the single character tokens and keeps the triples.  The "-PH"
turns off header tagging and the "-C -m0.0001" turns off config file
loading and provides a very small min_dev.
Enjoy.
    
    
More information about the bogofilter
mailing list