Tipping point?

Geoff capsthorne at yahoo.co.uk
Fri Nov 14 19:22:24 CET 2003


Hi,

I am an enthusiatic but non-statistically-savvy user. 
Apologies in advance, therefore, if my question is
laughable.  Here goes:

Does a point come at which a word (or a small number of
words), has such a high spam count that its presence
will result in an email being categorised as spam
notwithstanding the presence of a few previously
unencountered words?

After several weeks of running bogofilter I am deeply
impressed, but I am puzzled by the fact that the mails which
are most effective at getting through to my inbox are
typically not html-bloated monstrosities, but short
and simple ones, advertising the usual
pharmaceuticals accompanied by a handful of rare,
previously unencounterd, words. Running bogoutil -w on the
pharmaceuticals typically gives a score approaching 2000 (I
have artificially inflated this by putting some examples
through bogofilter more than once).

The offending mails are generally categorised as
spam after one pass through bogofilter .. but the next
variation will still get through by the inclusion of only
few new words. Is there any end to this if the spam counts
on the pharmaceuticals reach some (and if so what) level?

Thanks,

Geoff




More information about the Bogofilter mailing list