Dealing with wordlist mails

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Wed Jan 28 13:28:53 CET 2004


Lars Clausen wrote:

> I saw on my run-through of bogofiltered mail today that a huge number of
> mails had a bunch of random (but not nonsense) words attached.  Many of
> these had bogosity of 0.50000, which is a bad sign, as some ham mails
> come over that.  

We just had the discussion the last few days. I assume that
either your training or your parameters are not to the best.
See the FAQ how to improve this.

> Thinking back to the original of bogofilter, is it not that only ham
> mails are likely to contain words that are specific to you?  When
> spammers send out wordlist spams, they put in a lot of words that are
> not known at all, so I'm guessing they are marked as
> neither-ham-nor-spam, thus tilting the mail towards the middle. 
> Shouldn't unknown words be considered slightly spammish, as they have
> never appeared in your ham?  Not a lot, as you'd want your friends to be
> able to introduce new words to you, but slightly?  Or is that just one
> of those tweakings that give poorer results?

You can simply try by changing robx (see man page for
explanation). Of course, this increases the risk of false
positives. Also check if those "random words" are
significant (using bogofilter -vvv). This depends on min_dev.

pi




More information about the Bogofilter mailing list