advice on ignore-files
Eric Seppanen
eds at reric.net
Sat Oct 5 08:42:45 CEST 2002
On Fri, Oct 04, 2002 at 11:48:11PM -0400, David Relson wrote:
>
> Performance:
<snip>
>
> With ignore lists, the words would be collected as before. Then the
> (presumably) small ignore list would be searched. If word in small list,
> done. If not in small list, search the two giant lists. Then calculate
> spamicity (as before). Number of searches: 1 small one for each word and 2
> big ones for each word not in the ignore list. Search time will be less
> for words found in the ignore list and will be greater for words not in the
> ignore list. How much time is saved depends on number of "ignore" words
> encountered, which is partially a function of size of the ignore list.
I think you have a point. I'm willing to accept that there may be
no net performance gain as long as we're collecting all the words
first, then looking them up second.
I think it's still easy to demonstrate places where hand-maintained
wordlists are necessary.
A few more examples I've thought of:
- A user notices that a lot of spam messages came through
secondary.mail.mydomain.com. While most mail comes through
primary.mail.mydomain.com, secondary is a legitimate mail machine and
the user wants to make sure it won't be used as an indicator of spam.
It shouldn't be whitelisted, however, because that would let some
spam through. It needs to be treated as neutral- which is what the
ignore list does. So the user adds secondary.mail.mydomain.com to
the ignorelist.
- Joe sysadmin installs bogofilter for end-users, and wants to insure
they don't spam-list critical messages from him. So he installs
bogofilter with a system-wide white-list so that his email address
will never be blocked on his own system.
- Jane's email address gets used as the From: address in a bunch of
spam. She can advise people using bogofilter to add her email
address to their ignore-list. Then her mail can get through, while
spam with her address in it can still be safely filtered.
I also think it's a nice safety that potential users may look for.
They may not trust that bogofilter works as well as everyone says,
and they may want the reassurance that they can override it if
needed.
More information about the bogofilter-dev
mailing list