What is a word (lexertest)

Allyn Fratkin allyn at fratkin.com
Tue Oct 22 17:15:26 CEST 2002


David Relson wrote:

> If a line contains exactly one token (composed only of letters and
> digits), the lexer will ignore it.
>
> If there're delimiters (spaces, punctuation, control characters) at the
> beginning or the end of the line, the lexer will return it.
>
> If there're special characters (underscore, dash, etc) in the token, the
> lexer will return it.

yes, i mentioned this last night in another email. if a line consists
only of a-zA-Z0-9+/ characters it is "assumed" to be base64 and discarded.
this is the reason it is ignored by the lexer.  in other words, this is
a purposeful design decision.

yes, it would be better if base64 data was recognized correctly.
it would be better still if base64 text attachments were decoded.
but unless and until one of those things happens, bogofilter needs to
ignore base64 data any way it can.  if it loses some single word lines,
then that is a tradeoff i would be willing to make.
-- 
Allyn Fratkin             allyn at fratkin.com
Escondido, CA             http://www.fratkin.com/





More information about the Bogofilter mailing list