speaking of random words

Bob George mailings02 at ttlexceeded.com
Wed Mar 17 20:14:23 CET 2004


Tom Anderson wrote:

>[...]
>It seems as though the only defense is to register the email many times
>such that the very hammy words become more neutral and the somewhat
>spammy ones become very spammy.  This is a strong case for exhaustive
>training.  But of course it comes with the risk that some tokens will
>become spammy enough to push hams in the wrong direction.  So far that
>hasn't happened, but I'll keep my eyes peeled.
>  
>

Tom,

Are you simply trying to analyze how to get bogofilter to recognize such 
spam, or are you trying to find an effective way to keep this stuff out 
of your inbox?

If the latter, there are other tools that can supplement bogofilter, 
with the resulting combination proving highly effective. bogofilter is 
great at detecting the majority of spam I get, but there is some that 
slips past.

I've found that many of the "image" spams (an embedded graphic with 
varying amounts of "bayes avoidance" text) are already registered in 
pyzor or razor (or both). Others are listed in many of the blacklists. 
These are quick network checks that you could use as a 2nd level without 
incurring a significant performance hit. These might be enough to catch 
the few messages that avoid bogofilter and flag them as train-worthy.

I'm no expert, but I suspect that "over training" bogofilter to 
recognize specific "bayes poison" examples might cause more problems 
than it prevents.

- Bob






More information about the Bogofilter mailing list