no filtering without ham

jkinz at kinz.org jkinz at kinz.org
Sun Jan 18 16:47:20 CET 2009


On Sun, Jan 18, 2009 at 08:16:23AM -0500, David Relson wrote:
> This process works -- but will requires user diligence because the
> error rate (numbers of false positives and negatives) will be
> significant.

Until bogo can ship pre-trained with its own intial db's of spam
and ham, experience suggests that the biggest issue in all of this 
will be user education.

We all know the problem of pre-training bogo. One mans ham is
another man spam, and this is likely to be insurmountable.

Additionally the user education issue is likewise insurmountable.

However these two insurmountables become trivial if your
performance level requirement is changed from :

"Never do harm" [the current choice]

to

"Make some kind of reasonable choice of spam/ham and do our 
best to inform the users about it and how they can override it"


My idea - add a second installation choice to the Bogo package.
This one would come pre-trained with a select population of
spam and ham. when installing this version the user is
responsible for any retraining they need done. 

This second choice could be done very simplistically by adding
the spam and ham file sets as raw email collections, not bogo
db's, and a script the user can run to train their bogo install 
using them. 

This keeps the original "pure" install intact unless the user
runs that script.

Additionally it keeps the new install delivery very simple and
allows the user a very simple "push one button" mechanism to
get the benefit of it.

Another benefit - it acts as a nice smoke test.  If the training
doesn't work, then the install is broken.

Jeff Kinz



More information about the Bogofilter mailing list