tanderso at oac-design.com
Fri Jul 9 02:23:09 CEST 2004
On Thu, 2004-07-08 at 09:20, David Relson wrote:
> On Thu, 8 Jul 2004 09:14:43 -0400
> Tom Anderson wrote:
> > From: "Tom Allison" <tallison at tacocat.net>
> > > Could you modify anthing that exceeds the MAXTOKENLEN to become the
> > > token, "MAXTOKENLEN" which a counter (+1) against it?
> > >
> > > This would tend to pool all these excessively long tokens into one
> > > "virtual" token to measure for spamicity.
> > Good idea, but it would also count email addresses and URLs and
> > perhaps signatures and whatnot. I'm not sure I'd appreciate an email
> > full of URLs from a friend being counted as spam just because they all
> > exceed the max length.
> > Tom
> It would just be a single token among many. It would have little effect
> on a hammish message but might be valuable for an unsure.
My interpretation was that every single token which went over the max
would simply be converted to "MAXTOKENLEN" for scoring. Therefore, if I
had an email that said something like, "Here are the articles: URL1 ...
URLN", where URL1 through URLN are URLs greater than MAXTOKENLEN. It
would be better to not convert those all to a single presumably spammy
token. I prefer the idea of breaking on case transitions to that.
Then again, maybe this "problem" doesn't need a solution at all... let's
see how it plays out for awhile.
More information about the Bogofilter