Filtering and ignorelists
Jozef Hitzinger
hitzinger at phobos.fphil.uniba.sk
Fri Mar 5 17:49:15 CET 2004
On Fri, 5 Mar 2004, Tom Allison wrote:
> What is the problem with the current training?
"Buckets". The same message (= the stuff shown to user) is treated
differently, depending on where it came from.
Mailinglist, forward, or just headers cleverly crafted by a spammer may
help the spam to get to you, while nonspam may get lost, if it comes from
otherwise spammy source (ok, this is less likely, but possible).
> The direct manipulation of email headers smells a bit like the regex
> filtering that SA does. Only a little but I recognize there are some
> fundamental differences in your approach and theirs.
Thanks. I don't see SA as a viable alternative :).
> But all that aside and still subject to debate... what is not ok with
> the current training?
I've said it, given examples. I've hoped others will see the problem as
quickly as I saw it, but now it's clear I need to provide hard evidence
(i.e. verifiable numbers), before someone even agrees there's a problem at
all. Hmmm. Ok, that'll take some time but I'll send them. Who knows, maybe
I'm just plain wrong.
Just one more try: the rcvd:gnu.org and rcvd:Feb cases don't hint at a
problem? Why should bogo give _any_ thought to these tokens?
> I suppose you could periodically remove words from your list using fgrep
> -vf as previously described.
>
> Could you try it and see what the differences are in results.
I'd need to manually select what should go out, and that's not possible.
The database is on a server with 2000+ messages a day (not counting the
worm-induced mess), meant to serve 500+ different users. You see that
manual intervention like this is out of question.
In fact, I've run for a month with the same database, without noticing any
degradation. In numbers (for my mail account): 0-15 spams got through,
from daily 30-70, ~ 4-10% FN. 0 FP. Now I'm updating it with spam that got
through, we'll see if FNs will go down.
That's all for now,
--
jozef :-)
More information about the Bogofilter
mailing list