Training bogofilter with Spamassassin collected spam.

Trevor Smith trevor at haligonian.com
Mon Aug 9 18:18:24 CEST 2004


On August 9, 2004 8:00 am, Tom Anderson wrote:

> Sorry, I misread your original message.  Not headers, the body, you're
> saying...  Yeah, that'll cause a problem.  Bogofilter doesn't look at
> attachment content.  You'll have to create (or find on the internet) a
> Spamassassin reverser script that will strip out the MIME stuff and just
> leave the original spam portion intact.

My hosting provider uses spamassassin (I think) and marks up most every spam 
that comes to me thus:

...
Spam detection software, running on the system "root.azhosting.biz", has
identified this incoming email as possible spam.  The original message
has been attached to this so you can view it (if it isn't spam) or block
similar future email.  If you have any questions, see
the administrator of that system for details.
...

I haven't bothered stripping messages (because I had no idea how to 
reconstruct the original). I just feed them straight into bogofilter.

I suppose this will be a problem if my hosting provider ever misidentifies a 
message as spam since it will add these lines to the body and then bogofilter 
will say, hey, this looks like spam, just because spamassassin has said so. 
Right?

Hmmm... I wonder if there is any way to undo the "damage" of training with 
3000+ spams that have this spamassassin markup.

(For the record, the reason for bogofilter on top of spamassassin is that 
spamassasin as run from my hosting provider is not nearly 100% 
effective--maybe 90% - 95%.)




More information about the Bogofilter mailing list