Inlining

David Relson relson at osagesoftware.com
Tue Jan 18 01:50:21 CET 2005


On Tue, 18 Jan 2005 01:09:12 +0100
Matthias Andree wrote:

> David Relson <relson at osagesoftware.com> writes:
> 
> > Inlining is fine.  Reflecting on your earlier messages, I realize you're
> > right that a "compute" function shouldn't be responsible for calling
> > "lookup". Likely there _is_ an appropriate higher level place for the
> > lookup call.  I'll take a look when I have time.
> 
> Most of what bogofilter is doing is transform data sets.
> 
> Top-down, we have:
> 
> 1. transform mail storage to list of messages
> 2. transform message to a list of tokens
> 3. transform list of tokens into list of probabilities (unless it is
>    message-count format input)
> 4. transform list of probabilities into a single spamicity
> 
> How a particular transformation looks in detail depends on what we get
> as input, but the output is (ideally) always the same.
> 
> Bogotune introduced some switches for different data structures, and I
> am not at all happy with the fBogotune switch that is buried deep in the
> code - it clouds what the functions are doing and why.

There are some significant processing differences between bogofilter and
bogotune.  Given a group of messages, bogofilter iterates through the
group processing one at a time.  On the other hand, bogotune loads them
all into memory, splits them for training and tuning (if need be),
converts each message to an array of spam and ham scores (with database
lookups, if need be), then does multiple passes over the whole
collection of messages.  Hopefully, I've included the major phases.

Likely the code can be structured better to fit the above description. 
IF so, it'll be much cleaner.  I'll take a look, though it won't be
today.

David



More information about the bogofilter-dev mailing list