Dealing with wordlist mails

Manvendra Bhangui mbhangui at yahoo.com
Wed Jan 28 13:35:02 CET 2004


I am facing the same problem. I am thinking if it is possible to
have a new parameter in bogofilter.cf - 

unknown_words_count=30
unknown_words_robx=0.7

which will cause bogofilter to use 0.7 as the robx instead of the
default robx when the no of unknown words in a mail exceeds 30.

I am attaching two samples of these spam mails as a zip file. By looking
at the mail, i am convinced that this technique (new random words) is
being adopted to slip through bayesian filters.


On Wed, 2004-01-28 at 17:43, Lars Clausen wrote:
> I saw on my run-through of bogofiltered mail today that a huge number of
> mails had a bunch of random (but not nonsense) words attached.  Many of
> these had bogosity of 0.50000, which is a bad sign, as some ham mails
> come over that.  
> 
> Thinking back to the original of bogofilter, is it not that only ham
> mails are likely to contain words that are specific to you?  When
> spammers send out wordlist spams, they put in a lot of words that are
> not known at all, so I'm guessing they are marked as
> neither-ham-nor-spam, thus tilting the mail towards the middle. 
> Shouldn't unknown words be considered slightly spammish, as they have
> never appeared in your ham?  Not a lot, as you'd want your friends to be
> able to introduce new words to you, but slightly?  Or is that just one
> of those tweakings that give poorer results?
> 
> -Lars
> 
> P.S. Has anyone tried to make Gnus sort mails by bogosity?
> 
> 
> ---------------------------------------------------------------------
> FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
> To unsubscribe, e-mail: bogofilter-unsubscribe at aotto.com
> For summary digest subscription: bogofilter-digest-subscribe at aotto.com
> For more commands, e-mail: bogofilter-help at aotto.com
-- 
Manvendra Bhangui <mbhangui at yahoo.com>





More information about the Bogofilter mailing list