[PATCH] -U option as the opposite of -u

David Relson relson at osagesoftware.com
Tue Jan 28 21:44:06 CET 2003


Greetings,

Having watched bogofilter for several months, I've come to believe that an 
error in classifying any one message is almost totally irrelevant to 
classifying subsequent messages.  Even errors in 10 messages probably don't 
matter.

A lot depends on the size of the wordlists, i.e. the number of messages 
that have gone into building them.  Each new message add a _small_ amount 
of info.  When enough new messages are added, changes become visible.

I wouldn't worry about a few misclassified messages.  Bogofilter can be 
view as working by a majority vote.  Given an established (large) number of 
voters, a few additional voters will have little effect.

Using Fisher ternary classification, my practice is to put incorrectly 
classified messages into files named good.mmdd.hhmm.txt or 
spam.mmdd.hhmm.txt and Unsures into either unsure-good... or unsure-spam... 
and let a cronjob feed them to bogofilter each hour.  If I ran the cronjob 
either more frequently (say every minute) or less frequently (say once a 
day), computed spam scores might well differ by a _tiny_ amount because the 
info is more current.  Frankly, I doubt that there's a noticeable 
difference between updates by the minute, hour, or day.  By the same token, 
I don't worry about an occasional classification error (by the human or by 
the program).

The question I ask about '-U' is whether it contributes to bogofilter and 
whether it would be used.

David





More information about the bogofilter-dev mailing list