info about spam messages

David Relson relson at osagesoftware.com
Thu Jun 17 16:04:45 CEST 2004


On Thu, 17 Jun 2004 09:46:42 -0400
Tom Anderson wrote:

> From: "David Relson" <relson at osagesoftware.com>
> > issue, bogofilter already had the idea of IPADDR.  It doesn't have
> > any concept of EMAIL_ADDR.  In fact, "@" is a delimiter, so
> > "username at domain.com" becomes two separate tokens "username" and
> > "domain.com".
> 
> I've always had an issue with this.  @ should not be a delimiter.  In
> what sense does it ever break up tokens except in an email address? 
> And breaking up email addresses is like saying everyone on the same
> street is a criminal just because one guy is.  The @ should be removed
> from the set of delimiters, and that would solve part of the problem. 
> Now, I'm not saying that logging the FROM address would be useful, but
> you already have code to detect a domain, right?  So, if you were
> inclined to detect an email, it would essentially be
> [a-zA-Z_\-.+]+\@$DOMAIN.
> 
> Tom

Tom,

No special code to detect a domain.  Periods are allowed within tokens,
so "domain.com" just works.

Using "@" as a delimiter was part of bogofilter's initial
implementation.  Bogofilter has a MAXTOKENLEN restriction of 30
characters.  "userid at domain.com" is more likely to run afoul of this
restriction than "userid" or "domain.com".  Remember, too, that mailing
lists often use VERPs which lengthen email addresses.  Bogofilter does
some checking for VERPs so that they won't fill the wordlist with
hapaxes.

>From a parsing point of view, it's a trivial change to allow "@" within
tokens.  I don't know what the effect on scoring or wordlists would be.

If you're inclined to run an experment to test the effect, have at it !

Regards,

David



More information about the Bogofilter mailing list