advice on ignore-files

Eric Seppanen eds at reric.net
Sat Oct 5 02:19:13 CEST 2002


I'm looking for a few opinions on the best way to implement ignore-lists.

Ignore lists don't exist in current CVS code; I've implemented them a few 
different ways in past versions, but I'd like to bring it up to date.

The benefits of ignore lists are mainly:
- performance improvements, because there's no value in searching giant 
wordlists for "the" or "and".
- improved user control, because you can force bogofilter to ignore 
certain "red herrings", like the time I fed it a heap of spam from the 
month of May and it decided that "may" was a very strong spam indicator.

The main issue is: ignore lists will be maintained by hand.  This pretty 
much means it should be a plaintext file of words.  We need to be able to 
get bogofilter to be able to search this word list like any other.

This leads me to two likely solutions:

1. Add support for plaintext files alongside DB format. 
good: simplest for end users.
good: maybe supporting multiple formats is good for other reasons (?)
bad: race conditions likely if/when user edits ignore-list file.
bad: lots of work, more compex code.

-OR-

2. Add a "convert plaintext to DB" feature to bogofilter.
good: pretty simple code.
bad: prone to end-user error (forgetting to update db after editing).
bad: it's code that's not very useful to our main purpose: sorting spam.
bad: race conditions likely if user runs convert-to-db on ignore-list.

I'm pretty much on the fence.  I have likes and dislikes in both, so I'm 
curious if anyone can think of compelling reasons to go one way or 
another, or if anyone can come up with a better solution than these two.



More information about the bogofilter-dev mailing list