bogotune vs mailing lists

David Relson relson at osagesoftware.com
Sat Sep 25 05:54:07 CEST 2004


Tom,

In another scan of your messages, I found that 13 spam had scores less
than 0.1.  Running "bogofilter -vvv" for them, I saw that 12 of them had
header tokens indicating they're from a debian.org mailing list. Eh???

Also, "grep -c head:lists.debian.org bt.??.mbox" gives:

  bt.ns.mbox:1117
  bt.sp.mbox:63

which indicates you classified 1,117 debian.org messages as ham and 63
as spam.  That helps explain the bunch of low scoring spam.

FWIW, for some time, I've been aware that it's hard to correctly
classify spam that's distributed by a mailing list (ham).  I've gotten
fair amounts of spam through several gnu.org lists (since their policy
is one of open posting, with no subscription needed).  Using an
ignore.db with the list's header tokens helps since it makes bogofilter
use the message content.

Having that bunch of very low scoring spam makes it tough for bogotune.
Possibly a bigger set of messages for tuning would help.  I'm not sure
:-<

More tomorrow after a night's sleep and when I have additional insight.

David



More information about the Bogofilter mailing list