advice on ignore-files
Eric Seppanen
eds at reric.net
Sat Oct 5 02:19:13 CEST 2002
I'm looking for a few opinions on the best way to implement ignore-lists.
Ignore lists don't exist in current CVS code; I've implemented them a few
different ways in past versions, but I'd like to bring it up to date.
The benefits of ignore lists are mainly:
- performance improvements, because there's no value in searching giant
wordlists for "the" or "and".
- improved user control, because you can force bogofilter to ignore
certain "red herrings", like the time I fed it a heap of spam from the
month of May and it decided that "may" was a very strong spam indicator.
The main issue is: ignore lists will be maintained by hand. This pretty
much means it should be a plaintext file of words. We need to be able to
get bogofilter to be able to search this word list like any other.
This leads me to two likely solutions:
1. Add support for plaintext files alongside DB format.
good: simplest for end users.
good: maybe supporting multiple formats is good for other reasons (?)
bad: race conditions likely if/when user edits ignore-list file.
bad: lots of work, more compex code.
-OR-
2. Add a "convert plaintext to DB" feature to bogofilter.
good: pretty simple code.
bad: prone to end-user error (forgetting to update db after editing).
bad: it's code that's not very useful to our main purpose: sorting spam.
bad: race conditions likely if user runs convert-to-db on ignore-list.
I'm pretty much on the fence. I have likes and dislikes in both, so I'm
curious if anyone can think of compelling reasons to go one way or
another, or if anyone can come up with a better solution than these two.
More information about the bogofilter-dev
mailing list