[PATCH] combined wordlist a.k.a. single list

David Relson relson at osagesoftware.com
Mon Jun 2 22:15:52 CEST 2003


At 03:12 PM 6/2/03, Malcolm Dew-Jones wrote:

>I will definitely play with that thanks.  The production systems use
>bogofilter 0.7.  The only issue we have with that is that base64 is not
>checked.  The sample databases (based on the same messages) are about
>1/10th the size of the new version (.11.2) , and I have just started to
>investigate why and what best to do about it.

Trevor,

One of the flaws in older versions of bogofilter is that mime boundary 
separators are put into the wordlists.  Current bogofilter versions don't 
do that.  From the size of your wordlists, you've processed many, many 
messages.  I wonder how many of them have contributed boundary tokens.

Also, bogofilter has had the ability to include a date with each token, 
with the date being updated each time the token's count is updated.  A 
combination of old date and low count, e.g. 1, might be useful for 
discarding unwanted tokens.  On the down side, dates add 4 bytes per token 
and you'd have to update to a newer bogofilter.  On the plus side, 
bogofilter is much smarter about parsing email.

The decision is, of course, yours.

David





More information about the bogofilter-dev mailing list