print_stats

David Relson relson at osagesoftware.com
Tue Oct 8 22:25:42 CEST 2002


At 04:05 PM 10/8/02, Eric Seppanen wrote:
>On Tue, Oct 08, 2002 at 03:57:27PM -0400, David Relson wrote:
> > At 03:54 PM 10/8/02, Eric Seppanen wrote:
> > >On Tue, Oct 08, 2002 at 11:21:04AM -0400, David Relson wrote:
> > > > Below are a couple of variations on formatting for the output of
> > > > print_stats().  They all display the word and its spamicity.  They 
> differ
> > > > in some small details:
> > >
> > >I don't want the stats in the message, I want it going to stderr with
> > >all the other verbose logging data.  How can I get this?
> >
> > I think what you want is in function compute_spamicity() in bogofilter.c.
>
>Why have two pieces of code performing essentially the same action, but in
>subtly different ways?

compute_spamicity() has the ability to print each token, its spamicity, and 
the cumulative spamicity for the computation.  As the extrema array doesn't 
store all this information, print_stats() cannot print as much.

The debug info from compute_spamicity() was put there so that people could 
better understand the results of the calculation.  print_stats() is there 
so that information can be written to stdout for inclusion in the email itself.

Remember the '-u' option.  If it's used, there is only one chance to show 
why the message was classified as spam or non-spam.  After the 
classification, the wordlists are updated.  A second run of bogofilter to 
see numbers will be using a different database.

compute_spamicity() could be given a file handle, e.g. stderr or stdout, as 
a parameter.  Then the one routine could be used for both displays.  In 
fact the private version of bogofilter that I am running does this.




More information about the bogofilter-dev mailing list