advice on ignore-files

Eric Seppanen eds at reric.net
Sat Oct 5 08:42:45 CEST 2002


On Fri, Oct 04, 2002 at 11:48:11PM -0400, David Relson wrote:
> 
> Performance:
<snip>
> 
> With ignore lists, the words would be collected as before.  Then the 
> (presumably) small ignore list would be searched.  If word in small list, 
> done.  If not in small list, search the two giant lists.  Then calculate 
> spamicity (as before).  Number of searches: 1 small one for each word and 2 
> big ones for each word not in the ignore list.  Search time will be less 
> for words found in the ignore list and will be greater for words not in the 
> ignore list.  How much time is saved depends on number of "ignore" words 
> encountered, which is partially a function of size of the ignore list.

I think you have a point.  I'm willing to accept that there may be 
no net performance gain as long as we're collecting all the words 
first, then looking them up second.

I think it's still easy to demonstrate places where hand-maintained 
wordlists are necessary.

A few more examples I've thought of:

- A user notices that a lot of spam messages came through 
secondary.mail.mydomain.com.  While most mail comes through
primary.mail.mydomain.com, secondary is a legitimate mail machine and 
the user wants to make sure it won't be used as an indicator of spam.  
It shouldn't be whitelisted, however, because that would let some 
spam through.  It needs to be treated as neutral- which is what the 
ignore list does.  So the user adds secondary.mail.mydomain.com to 
the ignorelist.

- Joe sysadmin installs bogofilter for end-users, and wants to insure 
they don't spam-list critical messages from him.  So he installs 
bogofilter with a system-wide white-list so that his email address 
will never be blocked on his own system.

- Jane's email address gets used as the From: address in a bunch of 
spam.  She can advise people using bogofilter to add her email 
address to their ignore-list.  Then her mail can get through, while 
spam with her address in it can still be safely filtered.

I also think it's a nice safety that potential users may look for.  
They may not trust that bogofilter works as well as everyone says, 
and they may want the reassurance that they can override it if 
needed.



More information about the bogofilter-dev mailing list