Filtering and ignorelists

Jozef Hitzinger hitzinger at phobos.fphil.uniba.sk
Fri Mar 5 17:49:15 CET 2004


On Fri, 5 Mar 2004, Tom Allison wrote:

> What is the problem with the current training?

"Buckets". The same message (= the stuff shown to user) is treated
differently, depending on where it came from.

Mailinglist, forward, or just headers cleverly crafted by a spammer may
help the spam to get to you, while nonspam may get lost, if it comes from
otherwise spammy source (ok, this is less likely, but possible).

> The direct manipulation of email headers smells a bit like the regex
> filtering that SA does.  Only a little but I recognize there are some
> fundamental differences in your approach and theirs.

Thanks. I don't see SA as a viable alternative :).

> But all that aside and still subject to debate...  what is not ok with
> the current training?

I've said it, given examples. I've hoped others will see the problem as
quickly as I saw it, but now it's clear I need to provide hard evidence
(i.e. verifiable numbers), before someone even agrees there's a problem at
all. Hmmm. Ok, that'll take some time but I'll send them. Who knows, maybe
I'm just plain wrong.

Just one more try: the rcvd:gnu.org and rcvd:Feb cases don't hint at a
problem? Why should bogo give _any_ thought to these tokens?

> I suppose you could periodically remove words from your list using fgrep
> -vf as previously described.
>
> Could you try it and see what the differences are in results.

I'd need to manually select what should go out, and that's not possible.
The database is on a server with 2000+ messages a day (not counting the
worm-induced mess), meant to serve 500+ different users. You see that
manual intervention like this is out of question.

In fact, I've run for a month with the same database, without noticing any
degradation. In numbers (for my mail account): 0-15 spams got through,
from daily 30-70, ~ 4-10% FN. 0 FP. Now I'm updating it with spam that got
through, we'll see if FNs will go down.

That's all for now,
-- 
jozef  :-)




More information about the Bogofilter mailing list