[bogofilter] spamitarium [was: using block_on_subnets]

Fri Apr 30 17:02:53 CEST 2004

On Fri, 2004-04-30 at 09:49, Tom Anderson wrote:
> It took a little more time, but not much, and look at the reduced amount of
> processing that bogofilter has to do as a result!  After the first received
> line, none of the others followed the from/by chain.  My server
> (oac-design.com) received the mail from 213.210.179.114, which identified
> itself correctly as a DSL user in Czech.  The next line claimed to be
> Hotmail, which is not a DSL user in Czech, therefore we do not want to use
> that information to classify this message.  Instead, our token will be
> "rcvd:untrusted" which, in my wordlist, was seen 1383 times in spams and 24
> times in hams, with a Fisher value of 0.621936.

By stripping the bogus Received headers, aren't you throwing away
potentially useful information which would help bogofilter classify the
email as spam?  I would think those headers contain valuable data about
the way that their spam software works.  Can your filter do ASN lookups
on only the valid Received headers, skipping the bogus ones but also
keeping those bogus headers in the email?

I am going to try your filter because adding useful data like this can
only help bogofilter, but I wouldn't want to remove potentially useful
data from the spam email.

~Jason

PS.  Can your filter process multiple email messages in a single
file/pipe (mbox format)?  I would like to try processing my ham/spam
corpus through your filter to add these ASNs and then regenerate my
bogofilter wordlist.