Spam Filter Headers [was: spam_header_name]
David Relson
relson at osagesoftware.com
Fri May 23 16:44:25 CEST 2003
At 10:29 AM 5/23/03, Matt Garretson wrote:
>Boris 'pi' Piwinger wrote:
>>Sorry, I mixed something up. But ignoring lines is needed, I
>>think.
>
>
>
>I'm not sure if this addresses the same issue, but when training
>bogofilter, i pipe all messages through a grep -v which strips
>out a few custom headers that my procmail recipes might have
>added along the way, in addition to the Status header added by
>mutt, which is what i use to double-check bogofilter's results.
>
>I saw mention of an ignore.db in a prior release of bogofilter,
>but wasn't able to grok any functionality to use it, but maybe
>i didn't look closely enough.
>
>-matt
Hi Matt,
There _is_ some code in bogofilter to handle an ignore.db, but I don't
think it's complete. It may work, or it may not.
A while back there was some discussion about it. At the time, the thinking
was that an ignore list would make bogofilter faster. The idea was that a
token would be looked up in the ignore list, and if found, it would be
discarded. If not found, the token would be looked for in spamlist.db and
goodlist.db. This _could_ save database lookups, e.g. disk accesses.
Analysis indicated that the ignore list would probably be pretty small, say
1,000 or even 10,000 tokens, while spamlist.db and goodlist.db are likely
to hold 100,000+ tokens. Looking at the numbers, most lookups in the
ignore list would probably fail and the two wordlist lookups would still be
needed. Thus, on average, the number of database accesses would go
up. This is _not_ what we want.
Considering the ignore.db as a way to discard stuff generated by other spam
filters, is a wholly different use. I'd suggest it's something to
investigate once there's evidence of a problem. Until then it classifies
as work that isn't needed and may not be useful.
Of course, if you're really interested, go ahead and work with it to see
what you learn. I'm willing to provide advice, but not assistance.
David
More information about the Bogofilter
mailing list