Spam Filter Headers [was: spam_header_name]

David Relson relson at osagesoftware.com
Fri May 23 16:44:25 CEST 2003


At 10:29 AM 5/23/03, Matt Garretson wrote:
>Boris 'pi' Piwinger wrote:
>>Sorry, I mixed something up. But ignoring lines is needed, I
>>think.
>
>
>
>I'm not sure if this addresses the same issue, but when training
>bogofilter, i pipe all messages through a grep -v which strips
>out a few custom headers that my procmail recipes might have
>added along the way, in addition to the Status header added by
>mutt, which is what i use to double-check bogofilter's results.
>
>I saw mention of an ignore.db in a prior release of bogofilter,
>but wasn't able to grok any functionality to use it, but maybe
>i didn't look closely enough.
>
>-matt

Hi Matt,

There _is_ some code in bogofilter to handle an ignore.db, but I don't 
think it's complete.  It may work, or it may not.

A while back there was some discussion about it.  At the time, the thinking 
was that an ignore list would make bogofilter faster.  The idea was that a 
token would be looked up in the ignore list, and if found, it would be 
discarded.  If not found, the token would be looked for in spamlist.db and 
goodlist.db.  This _could_ save database lookups, e.g. disk accesses.

Analysis indicated that the ignore list would probably be pretty small, say 
1,000 or even 10,000 tokens, while spamlist.db and goodlist.db are likely 
to hold 100,000+ tokens.  Looking at the numbers, most lookups in the 
ignore list would probably fail and the two wordlist lookups would still be 
needed.  Thus, on average, the number of database accesses would go 
up.  This is _not_ what we want.

Considering the ignore.db as a way to discard stuff generated by other spam 
filters, is a wholly different use.  I'd suggest it's something to 
investigate once there's evidence of a problem.  Until then it classifies 
as work that isn't needed and may not be useful.

Of course, if you're really interested, go ahead and work with it to see 
what you learn.  I'm willing to provide advice, but not assistance.

David





More information about the Bogofilter mailing list