nan in bogofilter stats

David Relson relson at osagesoftware.com
Wed Nov 19 02:03:36 CET 2008


On Wed, 19 Nov 2008 10:47:25 +1030
Stephen Davies wrote:

> Thanks for the feedback David.
> 
> I get:
> 
> bogoutil -w wordlist.db .MSG_COUNT
>                                  spam   good
> .MSG_COUNT                     312870      1
> 
> What does this actually mean?
> 
> Cheers,
> Stephen

To compute a token's spamicity, bogofilter needs to know how
many spam and ham messages have been registered (in the
wordlist).  .MSG_COUNT is the special token that provides this info.

The numbers 312870 and 1 indicate that 312870 spam messages and 1 ham
message have been registered.  The value 312870 is reasonable while the
value 1 seems unreasonably low.

FWIW, "bogoutil -d wordlist.db > wordlist.txt" will dump your wordlist
as a text file.  Each line has a token, its spam and ham counts, and a
timestamp.  .MSG_COUNT's "good" value _should_ be greater than any ham
count.

It might be time to start a new wordlist and register all the ham and
spam you have available.  I'd also recommend backing up your wordlist
periodically in case of future problems.  Lastly, switching from
NON-TRANSACTIONAL bogofilter to TRANSACTIONAL bogofilter will provide a
more secure database environment.

HTH,

David



More information about the Bogofilter mailing list