Bogofilter accuracy plummets starting around March 10, 2010

David Relson relson at osagesoftware.com
Tue Apr 13 05:21:39 CEST 2010


On Tue, 13 Apr 2010 04:23:47 +0400
Dmitry wrote:

> David Relson wrote:
>  > "Invisible headers" is not a term I recognize.
> 
> Headers like "To, cc, from, subject, date" -- are visible in almost
> all MUA. Everything else is usually invisible. When you allow
> bogofilter to process any invisible headers you pollute the database
> with random data and make spam detection innacurate. Real spammer has
> 2 way to break spam filter: 1. To make headers the same way as known
> mail user agents do; or 2) to make random headers in each message.
> So, in any case it makes more harm than good from the viewpoint of
> spam detection.
> 
> 1) You allow random data
> 
> > pipe your message through an appropriate "egrep -v
> > "^(this|or|that):" command.
> 
> To send millions messages through pipe with egrep? No, thanks. It is 
> unnecessary load on the server.

Dmitry,

Bogofilter recognizes a number of headers (to, from, return-path, etc)
and creates special symbols.  This gives them special significance.

Other header tokens don't get the same treatment, hence are treated
similarly to tokens in the body.  

I don't perceive that bogofilter has a problem.  It does well for the
thousands of spam arrriving daily at my domain.

Adding millions of header tokens is no different from adding millions
of body tokens, as far as I can tell. 

If you want bogofilter to provide special handling for other special
tokens, the source code is available and you can customize to your
heart's content.

Regards,

David



More information about the Bogofilter mailing list