[PATCH] combined wordlist a.k.a. single list
David Relson
relson at osagesoftware.com
Mon Jun 2 22:15:52 CEST 2003
At 03:12 PM 6/2/03, Malcolm Dew-Jones wrote:
>I will definitely play with that thanks. The production systems use
>bogofilter 0.7. The only issue we have with that is that base64 is not
>checked. The sample databases (based on the same messages) are about
>1/10th the size of the new version (.11.2) , and I have just started to
>investigate why and what best to do about it.
Trevor,
One of the flaws in older versions of bogofilter is that mime boundary
separators are put into the wordlists. Current bogofilter versions don't
do that. From the size of your wordlists, you've processed many, many
messages. I wonder how many of them have contributed boundary tokens.
Also, bogofilter has had the ability to include a date with each token,
with the date being updated each time the token's count is updated. A
combination of old date and low count, e.g. 1, might be useful for
discarding unwanted tokens. On the down side, dates add 4 bytes per token
and you'd have to update to a newer bogofilter. On the plus side,
bogofilter is much smarter about parsing email.
The decision is, of course, yours.
David
More information about the bogofilter-dev
mailing list