bogotune problem

David Relson relson at osagesoftware.com
Thu Jan 1 22:33:43 CET 2004


On Fri, 2 Jan 2004 05:21:10 +0800
Bopolissimus Platypus <bopolissimus at sni.ph> wrote:

> hello all,
> 
...[snip]...

> Reading good
> Reading /home/tiger/.bogofilter/wordlist.db
> 7434 messages 
> Reading spam
> 2353 messages 
>     4m:59s for 9787 messages.  avg: 32.7 msg/sec
>     7m:34s for 9787 messages.  avg: 21.6 msg/sec
> The wordlist contains 36 non-spam and 20 spam messages.
> Bogotune must be run with at least 2000 of each.

7434 messages refers to the "Reading good", which is your good mailbox. 
After starting to read that file, bogotune realizes it also needs to
read /home/tiger/.bogofilter/wordlist.db so it can save the ham/spam
numbers for each token.  Those numbers are needed for all the message
scoring during the tuning process.

2353 messages refers to the "Reading spam", which is your spam mailbox.

"36 non-spam and 20 spam messages" refers to your wordlist.db.  If you
run "bogoutil -w $HOME/.bogofilter .MSG_COUNT" you should see those
numbers (20 and 36) again.

> which doesn't make sense, since clearly it found 7434 ham and 2353
> spam. when i do the same thing, except i specify the 43873 spam mbox,
> i still get the same error.

Bogotune is complaining that there's too little info in wordlist.db for
it to operate.  Without a good starting wordlist, the tuning results are
meaningless.

Bogotune doesn't complain when you use "-D" because it uses some of the
input messages to create a wordlist (in ram) and uses those numbers for
the tuning run.

Hope this helps.

David




More information about the Bogofilter mailing list