Modularity
David Relson
relson at osagesoftware.com
Mon Jan 13 16:12:26 CET 2003
At 09:34 AM 1/13/03, Adriano Nagelschmidt Rodrigues wrote:
>Matthias Andree writes:
> > Not quite. While -f and -r (Fisher-Robinson) and (Robinson) share their
> > registration, -g (Graham) uses a different registration procedure, and
> > the filter algorithm is queried for parameters.
>
>Can they be made common (eg a superset of the information is stored in the
>dbs)?
The difference in registration is very small. As a message is read, the
words are collected in a hash table and duplications are counted. For
graham, the count is capped at 4 and for robinson and robinson-fisher the
count is capped at 1. So wordlists generated with '-r' or '-f' will
generally have lower counts than wordlists generated with '-g'. When
omputing a word's spamicity score, what's important is ratios. For the
goodlist and the spamlist each word gets a value based on its count vs the
message count (for that list). The word's spamicity score is a ratio of
the goodlist and spamlist values. Given all the ratios involved, the
effects of the caps (1 and 4) is pretty small. It's not worth worrying about.
>I may be asking naive questions, bogofilter is a blackbox to me.
>
> > OTOH, these programs share a lot of common code (like the tokenizer,
> > data base, and recently, MIME stuff), so that we'd exchange like 10
> > percent of these programs with the other 90% being linked in unchanged.
>
>What about a libbogo.{so,a}?
Part of the build process _does_ create libbogofilter.a which is then used
in building bogofilter, bogoutil, and bogolexer. It helps simplify the
Makefile. We don't distribute the library because there's no need to do so.
> > I wonder if that's worth the effort. It might make the man pages more
> > concise though ;-)
>
>Yes, for example, when I decided to add the '-f' switch last night, it wasn't
>clear to me if I would have compatibility problems with the existent ham/spam
>databases and if I needed to use it with [nNsS].
We're always interested in better documentation. If you care to rewrite
the sections that gave you trouble, send us patches. We'll review them and
use what's useful.
Ciao,
David
More information about the Bogofilter
mailing list