how to bogotune?

Trevor Smith trevor at haligonian.com
Thu Sep 30 01:49:46 CEST 2004


On September 29, 2004 8:07 pm, David Relson wrote:

> Bogotune is a program that searches for the best combination it can
> find.  To do this, it needs a bunch of spam and ham messages against

Right.

> minimum number of false negatives (spam not caught).  For such a search
> to be meaningful, bogotune must be given at least 2000 ham and 2000 spam
> to use for the scoring tests.

OK, after reading your message, I *still* do not understand the wording of the 
man page, nor do I know how to properly run bogotune. The man page says:

"...
       In  order to produce useful results, bogotune has minimum message count
       requirements. The wordlist it uses must have at least  2,000  spam  and
       2,000  non-spam  in  it and the message files must contain at least 500
       spam and 500 non-spam messages. Also, the ratio  of  spam  to  non-spam
..."

What are the "wordlist" and the "message files"? Are they the same thing? 
Apparently not, since one needs 2,000 and one needs 500 each of spam and ham.

OK, so it needs 4,000+ messages, so you're referring to the "wordlist" from 
the man page. Fine. So what are these "message files" that require only 500 
spam and non-spam?

Furthermore, how could sending messages that bogofilter has already been 
trained with give bogofilter any useful info? If I train bogofilter on a 
message, it will (almost?) always then classify that message as 
either .99something (if I told it it was spam) or .00something (if I told it 
it was ham). So if bogotune gets 4,000 messages and every one it thinks is 
spam scores 0.0000 and every one it thinks is spam scores 1.0000, it's going 
to tell me:

hamcutoff: 0.000
spamcutoff: 1.000

isn't it? So this would imply that the 4,000 messages must be messages that 
have NOT been trained on yet, so that some percentage of them will have 
spamicity values varying throughout the range of 0.000 to 1.000. No?

-- 
Trevor Smith // trevor at haligonian.com



More information about the Bogofilter mailing list