Linus is dubious... but we get some praise
Greg Louis
glouis at dynamicro.on.ca
Sun Jan 25 13:25:41 CET 2004
There's an interesting thread on the linux kernel mailing list --
off-topic for them but interesting for us. A good starting point is
here:
http://www.ussg.iu.edu/hypermail/linux/kernel/0401.3/0106.html
I particularly like:
"The filter I use (bogofilter.sf.net) regularly catches and properly
categorizes these spams."
Linus is skeptical, though:
"Especially if the "random words" in the spam end up being weighted by
real frequency, you just _cannot_ use single-word bayes filters on it."
Kevin O'Connor doesn't buy it:
"A "random" word will not occur frequently enough in spam messages (when
measured over a large sample of spam) to become a "spam" token or to
affect adversely it becoming a "ham" token."
And then comes this jewel, highly recommended reading for
bogofilterers:
"A lot of testing has been done on these filters (see for example
http://spambayes.sourceforge.net/background.html) with a very large
email corpus. If you haven't looked at bayes filters recently, or have
only been looking at the simplistic ones, then I think you might have
better luck trying again."
The spambayes page, besides being a good overview, has links to a few
of those invaluable discussion threads that I and others have been
mentioning over the last year. Unfortunately, it then points generally
to (http://mail.python.org/pipermail/spambayes/) "the spambayes mailing
list" in which one has to do one's own spelunking.
--
| G r e g L o u i s | gpg public key: 0x400B1AA86D9E3E64 |
| http://www.bgl.nu/~glouis | (on my website or any keyserver) |
| http://wecanstopspam.org in signatures helps fight junk email. |
More information about the Bogofilter
mailing list