how to bogotune?

Trevor Smith trevor at haligonian.com
Thu Sep 30 04:30:19 CEST 2004


On September 29, 2004 9:26 pm, David Relson wrote:

> Let's work on this together, OK?  I'll explain and you let me know when
> it becomes clear.  Then we can working on fixing the man page so it's
> intelligible.  OK?

:-) Thanks. (Honestly, I sometimes do not seem like an imbecile.)

> For tuning bogotune needs a wordlist representing with a decent amount
> of training history and it needs some additional (untrained messages) to
> run the tuning tests on.  Experience has shown that the wordlist needs
> the contents of 500 each spam and non-spam messages (or more) and that
> there also need to be 2000 each of spam and non-spam messages used for
> the tuning process.  Thus, in total, 5000 messages is the minimum

(picking up on the typo correction from your *next* email...)

That's the major clarification I needed (the 2000+2000 is for a *new* 
wordlist.db and the 500+500 should be untrained).

One more clarification (it may have been covered somewhere, but I'm forgetting 
if I've ever seen a definitive answer): can I just use the wordlist.db I 
already have? Or is it preferable to pick 2000+2000 spam/ham that have not 
yet been trained on to use as the new/temporary wordlist? Since I already 
have a wordlist with 10- or 20,000 messages in them, it seems unnecessary to 
build a new, smaller wordlist.db. Unless it's an issue of having a correct 
ratio of ham/spam in the wordlist.db, which I couldn't begin to guess at...

I am still (until told definitively otherwise) assuming that it is, in fact, 
either the above (unbalanced spam/ham counts fed through bogofilter into the 
wordlist) or else the fact that the new spam/ham I fed into bogotune were 
already trained on that caused bogotune to give me my 0.000 and 0.000 
recommended thresholds, since my wordlist.db has tons of emails fed through 
it over the months, and since I fed ~1000 each of spam/ham into bogotune.


-- 
Trevor Smith // trevor at haligonian.com



More information about the Bogofilter mailing list