Paul Graham's spam-conference article

Greg Louis glouis at dynamicro.on.ca
Thu Jan 30 17:12:09 CET 2003


In his latest paper (a version of what he said at the spam conference),
Paul Graham remarks that he's tried zero html processing, 100% html
processing and a lot in between, and advises that looking at a, img and
font tags is desirable and probably sufficient (this, of course, is
independent of the comments-and-bogus-tags issue).  Has any of our
lexperts considered doing something like that?  Would it be expensive?

Paul also mentions the min_dev thing; for himself, he still uses the
top 15 extrema (with a little fudging to favour well-known extrema),
but he says that if one does something like min_dev instead, it should
be set fairly high, because otherwise the spammers will just spoof us
by including lots of innocent words.  In the early days, I used to keep
min_dev at 0.4, but nowadays I find that discrimination degrades rather
badly with min_dev above about 0.3 (and pi and I have both had good
results with 0.025 (!) with the new releases of bogofilter).  This
difference is puzzling.

-- 
| G r e g  L o u i s          | gpg public key:      |
|   http://www.bgl.nu/~glouis |   finger greg at bgl.nu |
| Help free our mailboxes. Include                   |
|        http://wecanstopspam.org in your signature. |




More information about the Bogofilter mailing list