Dealing with wordlist mails
Manvendra Bhangui
mbhangui at yahoo.com
Wed Jan 28 14:25:20 CET 2004
So, can ratio of unknown words to known words can be taken as a sign
of spam? Is it possible to assign a higher robx if this ratio
exceeds a number say 0.8 ?
I will make a small change in my bogofilter and start logging
the ratio (unknown_words /(unknown_words + known_words) for
each and every mail which comes to my domain (sifycorp.com). Then
I will see if statistics prove anything.
Regards Manvendra
On Wed, 2004-01-28 at 18:27, David Fries wrote:
> On Wed, Jan 28, 2004 at 07:27:36AM -0500, David Relson wrote:
> > In practice, random words in spam messages have little effect. If you
> > want more detail on how bogofilter classified a 0.500000 message, run it
> > with flags "-vv" and "-vvv". The FAQ has info on the output generated
> > with those flag settings.
> >
> > David
>
> If they have such little effect then why are they the only e-mail that
> seems to get through these days and I'm wanting a filter that actually
> deals with this problem? Are you saying these e-mails aren't getting
> through to you? This is the most spam since I've started using
> bogofilter. The other problem spam were mostly removing comments and
> invalid html tags and it was back to being detected, now I don't know
> what to do.
>
> http://www.wired.com/news/infostructure/0,1377,61886,00.html?tw=wn_tophead_3
> Here is an article from wired saying on some spam filters that would
> see what they called 'hash busters' as a sign of spam. I don't know
> what would be a hash buster, but maybe it is just seeing a lot of new
> words relative to the words that are in our dictionary.
>
> My current database is over nine megs and it used to be around 1 meg
> before these darn e-mails started showing up.
--
Manvendra Bhangui <mbhangui at yahoo.com>
More information about the Bogofilter
mailing list