Clean the database from non-spam mails?

David Relson relson at osagesoftware.com
Tue Dec 2 23:04:11 CET 2003


On Tue, 02 Dec 2003 11:01:20 -0800
Bill Wohler <wohler at newt.com> wrote:

> Johannes Klug <derjoi at gmx.net> writes:
> 
> > Btw., why did you decide to use only one file?
> 
> The authors can probably cite more reasons, but two reasons I can
> think of include "smaller and faster." Note that words are often in
> both the spam and ham wordlists:
> 
> 1. The result is smaller since the word (and the timestamp) is only
>    mentioned once.
> 
> 2. The number of lookups are cut in half since you only have to do one
>    lookup instead of two for a given word.


Yep.  "smaller and faster" is the answer.




More information about the Bogofilter mailing list