Using the -u option and database size

John G Walker johngwalker at tiscali.co.uk
Wed Mar 21 21:51:56 CET 2007



On Wed, 21 Mar 2007 14:30:20 -0500 Tom Anderson
<tanderso at oac-design.com> wrote:

> I think that wordlist size will tend to obey Beer's Law.  Sooner 
> or later you start to saturate the number of unique tokens you will
> ever see and growth therefore slows significantly.


This is very probably true. I've only been using bogofilter for about a
year, but what I've noticed is that I go days without more than a
single spam mail getting through, then I'll suddenly get three or four,
as though spammers are trying new tokens, to break through the Bayesian
filters. They quickly disappear, as bogofilter learns about them.

Seems likely that there's a finite number of tokens. Mind you, it helps
that I have my spam cutoff set to .65 and my ham cutoff set to .1,

-- 
 All the best,
 John



More information about the Bogofilter mailing list