Bogofilter Tuning Issues...

elijah elijah at riseup.net
Wed Apr 30 19:29:07 CEST 2003


On Wed, 30 Apr 2003, David Relson wrote:

> I'd recommend a two part attack on database size.  First enable
> "ham_cutoff=0.1" in your bogofilter.cf file.  This will activate tristate
> mode in which messages are labeled as "Yes", "No", and "Unsure".  This will
> enable you to easily find the messages that bogofilter couldn't classify
> within its level of certainty (as determined by the spam_cutoff and
> ham_cutoff values).  Then manually train bogofilter with all the Unsures,
> as well as any false positives or false negatives that occur.
>
> Also, if you've been using '-u', I hope you've been checking the results
> and correcting any mistakes.  If you choose to follow the suggestion in the
> above paragraph, remove the "-u" from your procmail recipe.

Ahh, database size: based on past posts, it was my understanding that
using the only-manually-train-on-unsure method and the
only-train-on-corrections method made it so that you could not trim
database size by removing old tokens. This is because useful tokens
leading to correct categorization don't have their date updated.

Am I correct in this understanding?

I am worried about the possibility of a short term solution to keeping
database size low which results in a gradually growing database which
cannot be trimmed using a long term solution. Database size is an issue
for me because I am working with an isp type situation.

-elijah





More information about the Bogofilter mailing list