Linus is dubious... but we get some praise
    Greg Louis 
    glouis at dynamicro.on.ca
       
    Sun Jan 25 13:25:41 CET 2004
    
    
  
There's an interesting thread on the linux kernel mailing list --
off-topic for them but interesting for us.  A good starting point is
here:
http://www.ussg.iu.edu/hypermail/linux/kernel/0401.3/0106.html
I particularly like:
 "The filter I use (bogofilter.sf.net) regularly catches and properly
 categorizes these spams."
Linus is skeptical, though:
 "Especially if the "random words" in the spam end up being weighted by
 real frequency, you just _cannot_ use single-word bayes filters on it."
Kevin O'Connor doesn't buy it:
 "A "random" word will not occur frequently enough in spam messages (when
 measured over a large sample of spam) to become a "spam" token or to
 affect adversely it becoming a "ham" token."
And then comes this jewel, highly recommended reading for
bogofilterers:
 "A lot of testing has been done on these filters (see for example
 http://spambayes.sourceforge.net/background.html) with a very large
 email corpus. If you haven't looked at bayes filters recently, or have
 only been looking at the simplistic ones, then I think you might have
 better luck trying again."
The spambayes page, besides being a good overview, has links to a few
of those invaluable discussion threads that I and others have been
mentioning over the last year.  Unfortunately, it then points generally
to (http://mail.python.org/pipermail/spambayes/) "the spambayes mailing
list" in which one has to do one's own spelunking.
-- 
| G r e g  L o u i s         | gpg public key: 0x400B1AA86D9E3E64 |
|  http://www.bgl.nu/~glouis |   (on my website or any keyserver) |
|  http://wecanstopspam.org in signatures helps fight junk email. |
    
    
More information about the bogofilter
mailing list