ignore lists [was: UI for correcting mistakes]
David Relson
relson at osagesoftware.com
Fri Mar 14 16:55:20 CET 2003
At 10:35 AM 3/14/03, Matthias Andree wrote:
>David Relson <relson at osagesoftware.com> writes:
>
> > There's a lot of valuable information in the headers. Preserving it is
> > important for training bogofilter. Whether it's better to forward or
> > attach is something I'm not sure of. It could depend on the MUA. Also,
> > I'd bet that having a 'forward' stanza is pretty harmless and unlikely
> > to skew results.
>
>At some time in the past, someone suggested of ignore lists, tokens that
>IMHO wouldn't ever be counted or registered. With the multiple-wordlists
>feature, this should be possible to implement without too much an
>effort.
Matthias,
The implementation of ignore lists is incomplete. There _is_ some code in
bogofilter, but some issues were never resolved. For example, should the
ignore list be a simple text file that can be edited by the
user/sysadmin? Should it be a BerkelyDB list? In the latter case,
bogoutil would be need mods so it can be used to load ignorelist.db from
the user maintained tetxt file.
The last discussion of ignore lists was based on the premise that their
purpose was to speed up bogofilter, by reducing database queries. Assuming
that ignore lists were fairly short, the number of queries would only be
reduced a little. Querying the ignore list would actually result in a net
increase in processing time as all tokens would need an ignore query and
most tokens would also need the spamlist/goodlist queries.
The conclusion was, if the purpose of the ignore list is to make bogofilter
faster, it would probably be a dud.
On the other hand, the ignore list could be used for tokens that produce
undesirable results. The goal would be to nullify such tokens. An
alternative approach would be to extend the wordlist maintenance abilities
to allow deleting specific tokens, e.g. "bogoutil -d spamlist.db -D
delete_this_token"
It might be a good idea for someone to complete implementation so that its
usefulness could be tested :-)
David
More information about the Bogofilter
mailing list