how to bogotune?
Trevor Smith
trevor at haligonian.com
Thu Sep 30 04:30:19 CEST 2004
On September 29, 2004 9:26 pm, David Relson wrote:
> Let's work on this together, OK? I'll explain and you let me know when
> it becomes clear. Then we can working on fixing the man page so it's
> intelligible. OK?
:-) Thanks. (Honestly, I sometimes do not seem like an imbecile.)
> For tuning bogotune needs a wordlist representing with a decent amount
> of training history and it needs some additional (untrained messages) to
> run the tuning tests on. Experience has shown that the wordlist needs
> the contents of 500 each spam and non-spam messages (or more) and that
> there also need to be 2000 each of spam and non-spam messages used for
> the tuning process. Thus, in total, 5000 messages is the minimum
(picking up on the typo correction from your *next* email...)
That's the major clarification I needed (the 2000+2000 is for a *new*
wordlist.db and the 500+500 should be untrained).
One more clarification (it may have been covered somewhere, but I'm forgetting
if I've ever seen a definitive answer): can I just use the wordlist.db I
already have? Or is it preferable to pick 2000+2000 spam/ham that have not
yet been trained on to use as the new/temporary wordlist? Since I already
have a wordlist with 10- or 20,000 messages in them, it seems unnecessary to
build a new, smaller wordlist.db. Unless it's an issue of having a correct
ratio of ham/spam in the wordlist.db, which I couldn't begin to guess at...
I am still (until told definitively otherwise) assuming that it is, in fact,
either the above (unbalanced spam/ham counts fed through bogofilter into the
wordlist) or else the fact that the new spam/ham I fed into bogotune were
already trained on that caused bogotune to give me my 0.000 and 0.000
recommended thresholds, since my wordlist.db has tons of emails fed through
it over the months, and since I fed ~1000 each of spam/ham into bogotune.
--
Trevor Smith // trevor at haligonian.com
More information about the Bogofilter
mailing list