counters [was: StudlyCaps]

David Relson relson at osagesoftware.com
Thu Jul 8 15:17:07 CEST 2004


On Thu, 08 Jul 2004 09:02:42 -0400
Tom Allison wrote:

> Could you modify anthing that exceeds the MAXTOKENLEN to become the 
> token, "MAXTOKENLEN" which a counter (+1) against it?
> 
> This would tend to pool all these excessively long tokens into one 
> "virtual" token to measure for spamicity.
> 
> You might only get one token per email, but it helps.

Long tokens could simply be truncated to MAXTOKENLEN.

At one time, bogofilter had some feature counting code.  The lexer would
count various features (like no_body, html_break, html_comment,
html_tag, html_unk, ipaddr, html_char, url_char, money, ...) and create
tokens giving counts.  Perhaps I'll resurrect the code to see if it's of
value.



More information about the Bogofilter mailing list