how to bogotune?
Trevor Smith
trevor at haligonian.com
Thu Sep 30 01:49:46 CEST 2004
On September 29, 2004 8:07 pm, David Relson wrote:
> Bogotune is a program that searches for the best combination it can
> find. To do this, it needs a bunch of spam and ham messages against
Right.
> minimum number of false negatives (spam not caught). For such a search
> to be meaningful, bogotune must be given at least 2000 ham and 2000 spam
> to use for the scoring tests.
OK, after reading your message, I *still* do not understand the wording of the
man page, nor do I know how to properly run bogotune. The man page says:
"...
In order to produce useful results, bogotune has minimum message count
requirements. The wordlist it uses must have at least 2,000 spam and
2,000 non-spam in it and the message files must contain at least 500
spam and 500 non-spam messages. Also, the ratio of spam to non-spam
..."
What are the "wordlist" and the "message files"? Are they the same thing?
Apparently not, since one needs 2,000 and one needs 500 each of spam and ham.
OK, so it needs 4,000+ messages, so you're referring to the "wordlist" from
the man page. Fine. So what are these "message files" that require only 500
spam and non-spam?
Furthermore, how could sending messages that bogofilter has already been
trained with give bogofilter any useful info? If I train bogofilter on a
message, it will (almost?) always then classify that message as
either .99something (if I told it it was spam) or .00something (if I told it
it was ham). So if bogotune gets 4,000 messages and every one it thinks is
spam scores 0.0000 and every one it thinks is spam scores 1.0000, it's going
to tell me:
hamcutoff: 0.000
spamcutoff: 1.000
isn't it? So this would imply that the 4,000 messages must be messages that
have NOT been trained on yet, so that some percentage of them will have
spamicity values varying throughout the range of 0.000 to 1.000. No?
--
Trevor Smith // trevor at haligonian.com
More information about the Bogofilter
mailing list