multipart spam

Tom Anderson tanderso at oac-design.com
Sun Nov 14 10:35:39 CET 2004


On Sun, 2004-11-14 at 04:10, Chris Fortune wrote:
> The ham collected from a group of people is self-similar, like the vocabulary of a language.   A shared wordlist is filled with the
> shared "languages" of ham and spam.  The group wordlist is surprisingly similar to an individual's wordlist, except that there are
> many more tokens, and tokens that would be strong ham or spam indicators in a personal wordlist are "watered down".

The best thing to do with this class of emails, IMHO, like most others,
is to simply train with them.  The hammy tokens will tend toward
neutrality rather than spam, while the spammy tokens will stand out.
Unless you receive very little ham to balance it out, there's virtually
zero chance of false positives from doing so.  Give yourself a decent
unsure range just to be sure though.  If you don't see any effect, train
til exhaustion.

Tom






More information about the Bogofilter mailing list