Fighting bogus news spam

David Relson relson at osagesoftware.com
Sun Jul 27 14:20:03 CEST 2008


On Sun, 27 Jul 2008 13:41:24 +0200
Tomaž Šolc wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hi
> 
> > I currently see some of them in my unsure folder with a 0.9x
> > rating. Until now I just continue to feed them as usual but if
> > those mail will continue to be propagated on larger scale one might
> > thing about it again.
> 
> I've just checked the spamicity for the mails I'm getting and they
> range from 0.5 to 0.9. I use two-state filtering with default cut-off
> value (0.99).
> 
> I don't like the idea of lowering the cut-off. With the amount of spam
> I'm getting it's impossible to go through the spam box manually and
> check for false positives. I do know I get some occasionally even with
> this setting because two or three times in the last year I found a
> mail I was expecting in the spam box.
> 
> By the way, I'm using bogofilter with procmail and constant training
> (-u option).
> 
> Maybe it's time I switch to three-state filtering? I didn't set this
> up in the first place because I didn't saw a particular benefit in
> this. You have to read through both inbox and unsure folders anyway,
> so I don't see why this is better than just having everything in
> inbox.
> 
> > AFAIR there were some waves with quotes from classical literature
> > to poison statitical filters in the past but so far bogofilter was
> > able to cope with it for me. At least I don't receive daily
> > newsletters with such stuff.
> 
> I had pretty much the same experience with those.
> 
> Best regards
> Tomaž

Greetings Tomaž,

I've been using "-u" with 3-state filtering since bogofilter became
able to do that.  It works nicely so long as you train bogofilter with
_every_ error (both false positives and false negatives) and with
unsures.  However it will _not_ prevent the occasional error.

I do get a false positive every once in a while.  Usually it's when
I've first subscribed to a new mailing list or made reservations at an
airline or a hotel I've not used before.  Over the years bogofilter has
learned that most of the incoming html email is spam and has to be told
that airlines and hotels are good, not bad.

This month my mailserver has received approx 45,000 spam and has had 10
false negatives and 45 unsures (with 22 being ham and 23 being spam).
I'm not aware of any false positives, though some _could_ be hidden
among the 45,000 spam :-<

Regards,

David



More information about the Bogofilter mailing list