ignore lists [was: UI for correcting mistakes]

David Relson relson at osagesoftware.com
Fri Mar 14 16:55:20 CET 2003


At 10:35 AM 3/14/03, Matthias Andree wrote:

>David Relson <relson at osagesoftware.com> writes:
>
> > There's a lot of valuable information in the headers.  Preserving it is
> > important for training bogofilter.  Whether it's better to forward or
> > attach is something I'm not sure of.  It could depend on the MUA.  Also,
> > I'd bet that having a 'forward' stanza is pretty harmless and unlikely
> > to skew results.
>
>At some time in the past, someone suggested of ignore lists, tokens that
>IMHO wouldn't ever be counted or registered. With the multiple-wordlists
>feature, this should be possible to implement without too much an
>effort.

Matthias,

The implementation of ignore lists is incomplete.  There _is_ some code in 
bogofilter, but some issues were never resolved.  For example, should the 
ignore list be a simple text file that can be edited by the 
user/sysadmin?  Should it be a BerkelyDB list?  In the latter case, 
bogoutil would be need mods so it can be used to load ignorelist.db from 
the user maintained tetxt file.

The last discussion of ignore lists was based on the premise that their 
purpose was to speed up bogofilter, by reducing database queries.  Assuming 
that ignore lists were fairly short, the number of queries would only be 
reduced a little.  Querying the ignore list would actually result in a net 
increase in processing time as all tokens would need an ignore query and 
most tokens would also need the spamlist/goodlist queries.

The conclusion was, if the purpose of the ignore list is to make bogofilter 
faster, it would probably be a dud.

On the other hand, the ignore list could be used for tokens that produce 
undesirable results.  The goal would be to nullify such tokens.  An 
alternative approach would be to extend the wordlist maintenance abilities 
to allow deleting specific tokens, e.g. "bogoutil -d spamlist.db -D 
delete_this_token"

It might be a good idea for someone to complete implementation so that its 
usefulness could be tested :-)

David





More information about the Bogofilter mailing list