bogotune vs mailing lists
relson at osagesoftware.com
Fri Sep 24 23:54:07 EDT 2004
In another scan of your messages, I found that 13 spam had scores less
than 0.1. Running "bogofilter -vvv" for them, I saw that 12 of them had
header tokens indicating they're from a debian.org mailing list. Eh???
Also, "grep -c head:lists.debian.org bt.??.mbox" gives:
which indicates you classified 1,117 debian.org messages as ham and 63
as spam. That helps explain the bunch of low scoring spam.
FWIW, for some time, I've been aware that it's hard to correctly
classify spam that's distributed by a mailing list (ham). I've gotten
fair amounts of spam through several gnu.org lists (since their policy
is one of open posting, with no subscription needed). Using an
ignore.db with the list's header tokens helps since it makes bogofilter
use the message content.
Having that bunch of very low scoring spam makes it tough for bogotune.
Possibly a bigger set of messages for tuning would help. I'm not sure
More tomorrow after a night's sleep and when I have additional insight.
More information about the Bogofilter