print_stats
David Relson
relson at osagesoftware.com
Tue Oct 8 22:25:42 CEST 2002
At 04:05 PM 10/8/02, Eric Seppanen wrote:
>On Tue, Oct 08, 2002 at 03:57:27PM -0400, David Relson wrote:
> > At 03:54 PM 10/8/02, Eric Seppanen wrote:
> > >On Tue, Oct 08, 2002 at 11:21:04AM -0400, David Relson wrote:
> > > > Below are a couple of variations on formatting for the output of
> > > > print_stats(). They all display the word and its spamicity. They
> differ
> > > > in some small details:
> > >
> > >I don't want the stats in the message, I want it going to stderr with
> > >all the other verbose logging data. How can I get this?
> >
> > I think what you want is in function compute_spamicity() in bogofilter.c.
>
>Why have two pieces of code performing essentially the same action, but in
>subtly different ways?
compute_spamicity() has the ability to print each token, its spamicity, and
the cumulative spamicity for the computation. As the extrema array doesn't
store all this information, print_stats() cannot print as much.
The debug info from compute_spamicity() was put there so that people could
better understand the results of the calculation. print_stats() is there
so that information can be written to stdout for inclusion in the email itself.
Remember the '-u' option. If it's used, there is only one chance to show
why the message was classified as spam or non-spam. After the
classification, the wordlists are updated. A second run of bogofilter to
see numbers will be using a different database.
compute_spamicity() could be given a file handle, e.g. stderr or stdout, as
a parameter. Then the one routine could be used for both displays. In
fact the private version of bogofilter that I am running does this.
More information about the bogofilter-dev
mailing list