Modularity

David Relson relson at osagesoftware.com
Mon Jan 13 16:12:26 CET 2003


At 09:34 AM 1/13/03, Adriano Nagelschmidt Rodrigues wrote:

>Matthias Andree writes:
> > Not quite. While -f and -r (Fisher-Robinson) and (Robinson) share their
> > registration, -g (Graham) uses a different registration procedure, and
> > the filter algorithm is queried for parameters.
>
>Can they be made common (eg a superset of the information is stored in the
>dbs)?

The difference in registration is very small.  As a message is read, the 
words are collected in a hash table and duplications are counted.  For 
graham, the count is capped at 4 and for robinson and robinson-fisher the 
count is capped at 1.  So wordlists generated with '-r' or '-f' will 
generally have lower counts than wordlists generated with '-g'.  When 
omputing a word's spamicity score, what's important is ratios.  For the 
goodlist and the spamlist each word gets a value based on its count vs the 
message count (for that list).  The word's spamicity score is a ratio of 
the goodlist and spamlist values.  Given all the ratios involved, the 
effects of the caps (1 and 4) is pretty small.  It's not worth worrying about.

>I may be asking naive questions, bogofilter is a blackbox to me.
>
> > OTOH, these programs share a lot of common code (like the tokenizer,
> > data base, and recently, MIME stuff), so that we'd exchange like 10
> > percent of these programs with the other 90% being linked in unchanged.
>
>What about a libbogo.{so,a}?

Part of the build process _does_ create libbogofilter.a which is then used 
in building bogofilter, bogoutil, and bogolexer. It helps simplify the 
Makefile.  We don't distribute the library because there's no need to do so.

> > I wonder if that's worth the effort. It might make the man pages more
> > concise though ;-)
>
>Yes, for example, when I decided to add the '-f' switch last night, it wasn't
>clear to me if I would have compatibility problems with the existent ham/spam
>databases and if I needed to use it with [nNsS].

We're always interested in better documentation.  If you care to rewrite 
the sections that gave you trouble, send us patches.  We'll review them and 
use what's useful.

Ciao,

David





More information about the Bogofilter mailing list