[PATCH] -U option as the opposite of -u
David Relson
relson at osagesoftware.com
Tue Jan 28 21:44:06 CET 2003
Greetings,
Having watched bogofilter for several months, I've come to believe that an
error in classifying any one message is almost totally irrelevant to
classifying subsequent messages. Even errors in 10 messages probably don't
matter.
A lot depends on the size of the wordlists, i.e. the number of messages
that have gone into building them. Each new message add a _small_ amount
of info. When enough new messages are added, changes become visible.
I wouldn't worry about a few misclassified messages. Bogofilter can be
view as working by a majority vote. Given an established (large) number of
voters, a few additional voters will have little effect.
Using Fisher ternary classification, my practice is to put incorrectly
classified messages into files named good.mmdd.hhmm.txt or
spam.mmdd.hhmm.txt and Unsures into either unsure-good... or unsure-spam...
and let a cronjob feed them to bogofilter each hour. If I ran the cronjob
either more frequently (say every minute) or less frequently (say once a
day), computed spam scores might well differ by a _tiny_ amount because the
info is more current. Frankly, I doubt that there's a noticeable
difference between updates by the minute, hour, or day. By the same token,
I don't worry about an occasional classification error (by the human or by
the program).
The question I ask about '-U' is whether it contributes to bogofilter and
whether it would be used.
David
More information about the bogofilter-dev
mailing list