Dealing with wordlist mails

Manvendra Bhangui mbhangui at yahoo.com
Wed Jan 28 14:25:20 CET 2004


So, can ratio of unknown words to known words can be taken as a sign
of spam? Is it possible to assign a higher robx if this ratio
exceeds a number say 0.8 ?

I will make a small change in my bogofilter and start logging
the ratio  (unknown_words /(unknown_words + known_words) for 
each and every mail which comes to my domain (sifycorp.com). Then
I will see if statistics prove anything.

Regards Manvendra
On Wed, 2004-01-28 at 18:27, David Fries wrote:
> On Wed, Jan 28, 2004 at 07:27:36AM -0500, David Relson wrote:
> > In practice, random words in spam messages have little effect.  If you
> > want more detail on how bogofilter classified a 0.500000 message, run it
> > with flags "-vv" and "-vvv".  The FAQ has info on the output generated
> > with those flag settings.
> > 
> > David
> 
> If they have such little effect then why are they the only e-mail that
> seems to get through these days and I'm wanting a filter that actually
> deals with this problem?  Are you saying these e-mails aren't getting
> through to you?  This is the most spam since I've started using
> bogofilter.  The other problem spam were mostly removing comments and
> invalid html tags and it was back to being detected, now I don't know
> what to do.
> 
> http://www.wired.com/news/infostructure/0,1377,61886,00.html?tw=wn_tophead_3
> Here is an article from wired saying on some spam filters that would
> see what they called 'hash busters' as a sign of spam.  I don't know
> what would be a hash buster, but maybe it is just seeing a lot of new
> words relative to the words that are in our dictionary.
> 
> My current database is over nine megs and it used to be around 1 meg
> before these darn e-mails started showing up.
-- 
Manvendra Bhangui <mbhangui at yahoo.com>





More information about the Bogofilter mailing list