question about new spam encoding

Tom Anderson tanderso at oac-design.com
Thu Nov 20 15:21:40 CET 2003


On Wed, 2003-11-19 at 19:13, David Relson wrote:
> Tokens are limited to 30 chars, so long URLs are excluded :-(

That sounds dangerous... maybe we should make an exception for URLs
only?  It seems to me that URLs are one of the most important tokens we
can use.  Minimum we should do is at least break it up and record the
domain but leave off query string junk and maybe the subdomain.  BTW,
www.quick-home-loan-search.biz is only 30 characters, and
quick-home-loan-search.biz is only 26, so these would fit current limits
if broken up.

Chances are, spammers are going to use the same domain for awhile since
it's an investment, so that's the ideal spam indicator.  It's at least
as important as any other two tokens, so let's give it two tokens'
character limits and make it 60.

Otherwise, you'll be getting URLs like:
http://haha.imaspammer.you-loser-cant-bogofilter-my-emails.com

Tom

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20031120/ddf52307/attachment.sig>


More information about the Bogofilter mailing list