Dealing with wordlist mails

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Wed Jan 28 14:36:11 CET 2004


David Relson wrote:

>> Your work-rate on bogofilter is so high that I hestitate to
>> suggest that you should do the job, but surely
>> the random-words / min_dev / robx issue would benefit from
>> an entry in the FAQ?  Bogofilter is a success and is
>> attracting many users, who, like me, have only the
>> vaguest grasp of its underlying theory.  The spammer's
>> random words technique drives lots of us wild and I think
>> that there are many who would be willing to accept a slight
>> trade-off in accuracy in favour of more effective first-hit
>> filtration.
> 
> If you check the FAQ documents that bogofilter includes, I know you'll
> find info on min_dev and robx. 

Actually, this is in man bogofilter.

> I don't think the random-words subject is addressed :-<

That is correct, I am not sure how it could be adressed.

A word never seen before is not used in the calculation if
you use default parameters (robx=0.415, robs=0.01, min_dev=0.1).

If words seen only very few times are used, depends on the
number of messages and how often on each side they have been
seen. Actually, robs that small says that robx is almost
ignored for those. That in turn means that they can quickly
get significant.

If those random words happen to be words often used, they
are probably not important. If for some reason they happen
to be almost only seen in ham, that's bad luck, but
bogofilter will learn from your correction.

pi




More information about the Bogofilter mailing list