better tagging - results
David Relson
relson at osagesoftware.com
Sat Sep 13 22:31:33 CEST 2003
Michael,
Looking at the various tokens, the differences appear to be
capitalization and spacing, presumably indicators of different mailers.
Looking at how a message with those tokens in it would be scored, approx
half would be discarded by the default min_dev (which is 0.1). Of the
remaining tokens, 11 are ham and 2 spam.
Those observations and details aside, that high a percent of useful
tokens (approx 50%) is justification for further testing.
Also worth noting is that embedded spaces are not compatible with
bogoutil's -l (load) function. Likely I'll change them to underscores.
I'm also thinking of converting a series of them to a single one.
David
More information about the bogofilter-dev
mailing list