Dealing with wordlist mails

David Relson relson at osagesoftware.com
Wed Jan 28 14:11:46 CET 2004


On Wed, 28 Jan 2004 06:57:08 -0600
David Fries wrote:

> On Wed, Jan 28, 2004 at 07:27:36AM -0500, David Relson wrote:
> > In practice, random words in spam messages have little effect.  If
> > you want more detail on how bogofilter classified a 0.500000
> > message, run it with flags "-vv" and "-vvv".  The FAQ has info on
> > the output generated with those flag settings.
> > 
> > David
> 
> If they have such little effect then why are they the only e-mail that
> seems to get through these days and I'm wanting a filter that actually
> deals with this problem?  Are you saying these e-mails aren't getting
> through to you?  This is the most spam since I've started using
> bogofilter.  The other problem spam were mostly removing comments and
> invalid html tags and it was back to being detected, now I don't know
> what to do.
> 
> http://www.wired.com/news/infostructure/0,1377,61886,00.html?tw=wn_tophead_3
> Here is an article from wired saying on some spam filters that would
> see what they called 'hash busters' as a sign of spam.  I don't know
> what would be a hash buster, but maybe it is just seeing a lot of new
> words relative to the words that are in our dictionary.
> 
> My current database is over nine megs and it used to be around 1 meg
> before these darn e-mails started showing up.
> 
> -- 
> David Fries <dfries at mail.win.org>
> http://fries.net/~david/pgpkey.txt

Hi David,

They have little effect for _me_.  FWIW, my wordlist has 100,000+
messages, approx 1,000,000 tokens, and is about 50MB.  Of course, I've
been doing this a bit longer.

Question:  how do you train bogofilter?  what are the numbers for your
wordlist?

David




More information about the Bogofilter mailing list