better tagging - results

David Relson relson at osagesoftware.com
Sat Sep 13 22:31:33 CEST 2003


Michael,

Looking at the various tokens, the differences appear to be
capitalization and spacing, presumably indicators of different mailers. 
Looking at how a message with those tokens in it would be scored, approx
half would be discarded by the default min_dev (which is 0.1).  Of the
remaining tokens, 11 are ham and 2 spam.

Those observations and details aside, that high a percent of useful
tokens (approx 50%) is justification for further testing.

Also worth noting is that embedded spaces are not compatible with
bogoutil's -l (load) function.  Likely I'll change them to underscores. 
I'm also thinking of converting a series of them to a single one.

David




More information about the bogofilter-dev mailing list