OT: Chunking the cruft - random lettered words

David Relson relson at osagesoftware.com
Mon Mar 15 18:39:04 CET 2004


On 15 Mar 2004 09:37:52 -0500
Tom Anderson wrote:

> On Mon, 2004-03-15 at 09:07, Eric Wood wrote:
> > Let me say that I agree that modifications to bogofilter isn't
> > necessary, but I'm still getting lots of spam slip through.  One
> > kind of email is from infected computers wanting you to click on a
> > yahoo link or ip address (been getting it for weeks now).   The link
> > keeps changing but the format of the link is constant so I made:
> > 
> > : B:
> > * ^http://.*\*-http://.*
> > spam
> > 
> > Solved.  Okay, my only other nuscience email comes with lots of
> > random words in it:
> 
> Unfortunately, you have to keep manually modifying this rule if the
> author of the virus/spam/whatever changes the link format.
> 
> > wogwo gwoehg gjjdjgdd ......
> > 
> > I've trained till I'm blue in the face.  The procmail list didn't
> > yeild a magic rule to help me with this.  Does anyone have a trick
> > for this kind of email?
> 
> Make sure your min_dev value is further from 0.5 than your robx.  This
> way random words won't effect the classification.  The email will be
> scored based on the non-random words.

Have you any record of the scores for the false negatives?  Judiciously
lowering your spam_cutoff value may help.  Of course, if you have
sufficient stored email, bogotune will do its best to find an optimal
parameter set for you.

If you're using the default values for robx and min_dev (0.415 and
0.100, respectively), you're satisfying the constraint that Tom
mentions.

David




More information about the Bogofilter mailing list