New version

Tom Anderson tanderso at oac-design.com
Wed Mar 17 04:59:30 CET 2004


On Tue, 2004-03-16 at 12:55, Greg Louis wrote:
> That's only true if every token's fw is within min_dev of 0.5.  If you
> have any unknowns and x is outside 0.5 +/- mindev, it's not true.  But
> yes, if you want an even worse straw man than Tom's all-unknowns
> message ;) an all-0.5 message will be scored at robx and (in my case)
> classed as spam.

I wouldn't call it a straw man, as that implies it is false.  It is not
a false case, just a worst case.  And for me at least, the worst case
scenario is assumed to be likely, as false positives are intolerable.  I
prefer to permit no situation where a valid ham would be allowed to fall
through the cracks due to a technicality such as this.  The all-unknown
and all-ambiguous (~0.5) cases are very possible and quite likely given
a sufficiently large volume of emails.  Therefore they must be accounted
for properly.  Of course, you may choose to ignore such cases, but
please do not advise less informed users to do so unless they understand
the risk involved.

I feel that full training is not a practical option for most users,
especially in large deployments where users do not have ssh or terminal
access to the mail server.  In such cases, they will start with an empty
database or a minimal database, and therefore will necessarily receive
all-unknown and all-ambiguous emails.  Bogofilter would not be an option
if these were allowed to be discarded or even drowned in a spam box. 
This claim is not so humble, but a firm testament of the reality for me
and my users.  And it is my humble opinion that keeping robx within the
min_dev range serves to prevent false positives in these cases.  I'm not
the statistician that you may be, so please let me know if this
conclusion is wrong.

Out of tens of thousands of emails over the past few months, I've not
received a single false positive.  That's how it should be.  Bill
McClain boasted 0.08% fp rate.  And while that sounds low at face value,
I think it is horrible.  When an email enters the filtered box, even if
still delivered, it's generally as good as dead.  If I received 5 false
positives a month, I'd stop using bogofilter.

Tom

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20040316/c6f091e3/attachment.sig>


More information about the Bogofilter mailing list