testing partial wordlists

Tom Anderson tanderso at oac-design.com
Sat Feb 5 23:22:33 CET 2005


On Sat, 2005-02-05 at 13:51, David Relson wrote:
> Last week I did some counts of the tokens in my wordlist.  Of the 1.5M
> tokens I have, approx 1/3 have timestamps more than 2 yrs old and
> another 1/3 are more than 1 yr old.  I'm giving thought to removing some
> (or all) of those oldies.

I'd imagine that removing all tokens older than 13 or 14 months would be
best, as you'll probably receive holiday and season specific spam and
ham.  Eg., in winter, "santa" and "snow" might be big scorers, while in
summer, "beach" and "surf" might be the big ones, with little overlap
from one season to the next.  Deleting anything newer than about 13
months is probably shooting yourself in the foot in regards to these
types of tokens.

Tom





More information about the Bogofilter mailing list