Bogofilter accuracy plummets starting around March 10, 2010
Jonathan Kamens
jik at kamens.brookline.ma.us
Thu Apr 1 16:04:45 CEST 2010
Thomas Anderson wrote:
> It's important to have
> your robx between your ham and spam cutoffs and preferably within your
> min_dev... this way new tokens are graded neutrally and don't
> immediately affect classifications.
I would assume that bogotune would take care of this, wouldn't it? Here
are my current settings, generated by bogotune with current ham and spam
corpuses on March 22, i.e., almost two weeks after this problem started:
db_cachesize=78
robs=0.0100
min_dev=0.394
robx=0.595174
sp_esf=0.065025
ns_esf=0.017818
spam_cutoff=0.996938 # for 0.05% fp (2); expect 6.31% fn (398).
ham_cutoff=0.442
I'm still seeing well over 100 messages of this variety classified as
ham or unsure every day, despite the fact that I actively retrain every
message as it comes in.
> I also run spamitarium on my emails
> prior to bogofilter, and this often tags the header with an SPF failure
> or missing rDNS, which helps bump it into a spam classification.
I've never looked at spamitarium before, but I just took a look at it
now, and I'm quite uncomfortable with the idea of throwing away
non-standard headers. On the one hand, I understand the argument that
when included in spam, these headers are likely intended to throw off
Bayesian filters, but on the other hand, I really don't like the idea of
discarding data in messages that turn out to be spam -- I want the
message that ends up in my inbox to be exactly what the sender intended
it to be. I think it would be preferable to add a configuration option
to bogofilter to tell it to ignore headers with certain prefixes, and
then to have spamitarium add those prefixes to the headers it believes
should be ignored.
In addition, since I use bogofilter in a milter rather than in my
delivery agent, it would be difficult for me to integrate spamitarium's
functionality into my incoming mail flow. I'd have to (a) switch from
the milter to procmail, (b) write a milter for spamitarium, or (c)
reimplement spamitarum's functionality inside the milter. The time and
energy necessary to do any of these are, alas, in short supply for me
right now.
Thanks,
jik
More information about the Bogofilter
mailing list