
Barry Gould BarryGould at
Wed Jan 22 00:04:56 CET 2003

At 02:16 PM 1/21/2003, Greg Louis wrote:
>Nevertheless, I think there are sound arguments against what -u does,
>except if (1) you use it in binary mode like Boris wants, and (2) you
>go in and fix it very frequently with -S and -N as appropriate.

Hi Greg,

I absolutely agree with you that training on errors is most productive.

However, you seem to be repeatedly saying that using -u is _inherently_ BAD 
in a ternary system.

I don't see any reason that -u would be inherently harmful in a ternary 
system, especially if manual training on Unsures is also done. I do realize 
the databases may get slightly bigger that way, but my DB's are still under 
10MB each with 40,000 messages passed through.

Furthermore, I have some users who do not give me their uncaught spam, much 
less their Unsures. Therefore, I _cannot_ train on all Unsures unless I 
decide to cc them all to myself, which would be an invasion of everyone's 
In this case, I think it should be _helpful_ to use -u, so that bogofilter 
does evolve even when I cannot train it manually.

>In ternary mode, -u discards the messages that are most valuable for
>training, and you train only on those messages that bogofilter already
>gets right

I don't understand what you mean by "discards".
With -p, I can still see the Unsure status in my MUA, and use those 
messages for manual training. Therefore, it hasn't discarded anything in 
any sense. Maybe you're not using -p?

>Bottom line was that training on unsures and errors is as
>effective as (and less laborious than) training by manually classifying
>every message.

Training on unsures and errors is just as laborious as anything else.
In fact, training on unsures is more work than just training on errors, at 
least in the short term. (I'm not saying it's not worthwhile.)

In summary, I don't understand why you feel -u is inherently harmful with 
ternary mode, and I have been quite confused by such comments in the past.

I'm not trying to start a "religious" debate, just trying to reduce confusion.

The only problem I've personally had with -u is that false positives AND 
false negatives MUST be corrected, or things will get worse, but that has 
nothing to do with whether one is using binary or ternary mode.


More information about the Bogofilter mailing list