new writeup re varying Robinson's s and the minimum deviation

Greg Louis glouis at dynamicro.on.ca
Sat Mar 29 18:39:43 CET 2003


Hi.

The big experiment I had been wanting to do, long delayed by hardware
problems, has now been completed.  The report is at
http://www.bgl.nu/bogofilter/smindev.html

It appears to be a good idea to throw away tokens with f(w) greater
than 0.15 and less than 0.85 (ie to set mindev somewhere around 0.35),
and to use an s value of 0.1 or thereabouts; the good news is that the
peak is very flat, so the exact values don't matter a whole lot.  It
remains to be determined, however, whether this finding is broadly
applicable; so far, all the email I've used in this type of experiment
has come from one source, the email server of the company I work for. 
I have enough mail of my own accumulated now that I can conduct an
experiment with that, which will be my next little project; but it
would be an extremely good thing if other people would do similar tests
with other (large) bodies of mail and report the results.  To
facilitate that, I've again included, in an appendix to the report, all
of the scripts I used to perform the experiment and reduce the data.

-- 
| G r e g  L o u i s          | gpg public key: finger     |
|   http://www.bgl.nu/~glouis |   glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |




More information about the Bogofilter mailing list