Linus is dubious... but we get some praise

Greg Louis glouis at dynamicro.on.ca
Sun Jan 25 13:25:41 CET 2004


There's an interesting thread on the linux kernel mailing list --
off-topic for them but interesting for us.  A good starting point is
here:
http://www.ussg.iu.edu/hypermail/linux/kernel/0401.3/0106.html

I particularly like:

 "The filter I use (bogofilter.sf.net) regularly catches and properly
 categorizes these spams."

Linus is skeptical, though:

 "Especially if the "random words" in the spam end up being weighted by
 real frequency, you just _cannot_ use single-word bayes filters on it."

Kevin O'Connor doesn't buy it:

 "A "random" word will not occur frequently enough in spam messages (when
 measured over a large sample of spam) to become a "spam" token or to
 affect adversely it becoming a "ham" token."

And then comes this jewel, highly recommended reading for
bogofilterers:

 "A lot of testing has been done on these filters (see for example
 http://spambayes.sourceforge.net/background.html) with a very large
 email corpus. If you haven't looked at bayes filters recently, or have
 only been looking at the simplistic ones, then I think you might have
 better luck trying again."

The spambayes page, besides being a good overview, has links to a few
of those invaluable discussion threads that I and others have been
mentioning over the last year.  Unfortunately, it then points generally
to (http://mail.python.org/pipermail/spambayes/) "the spambayes mailing
list" in which one has to do one's own spelunking.

-- 
| G r e g  L o u i s         | gpg public key: 0x400B1AA86D9E3E64 |
|  http://www.bgl.nu/~glouis |   (on my website or any keyserver) |
|  http://wecanstopspam.org in signatures helps fight junk email. |




More information about the Bogofilter mailing list