Paul Graham's spam-conference article
Greg Louis
glouis at dynamicro.on.ca
Thu Jan 30 17:12:09 CET 2003
In his latest paper (a version of what he said at the spam conference),
Paul Graham remarks that he's tried zero html processing, 100% html
processing and a lot in between, and advises that looking at a, img and
font tags is desirable and probably sufficient (this, of course, is
independent of the comments-and-bogus-tags issue). Has any of our
lexperts considered doing something like that? Would it be expensive?
Paul also mentions the min_dev thing; for himself, he still uses the
top 15 extrema (with a little fudging to favour well-known extrema),
but he says that if one does something like min_dev instead, it should
be set fairly high, because otherwise the spammers will just spoof us
by including lots of innocent words. In the early days, I used to keep
min_dev at 0.4, but nowadays I find that discrimination degrades rather
badly with min_dev above about 0.3 (and pi and I have both had good
results with 0.025 (!) with the new releases of bogofilter). This
difference is puzzling.
--
| G r e g L o u i s | gpg public key: |
| http://www.bgl.nu/~glouis | finger greg at bgl.nu |
| Help free our mailboxes. Include |
| http://wecanstopspam.org in your signature. |
More information about the Bogofilter
mailing list