[PATCH] lexer fix for base64 and CRLF

Allyn Fratkin allyn at fratkin.com
Sat Oct 19 18:19:58 CEST 2002


this is a fix for a problem i reported in bug#618368, where base64 data
is not recognized and ignored by the lexer if the input lines end in CRLF
(e.g., if training bogofilter with mailboxes created on Windows).
currently, the base64 data is treated as normal input data, parsed
and learned, causing major bloat of the word lists.

this is a quick fix, perhaps not the correct long-term fix, but that would
require someone a lot more skilled with lex/flex than myself.  there is
mention in lexer.l comments about ignoring carriage-returns in the input
but for whatever reason, at the point that base64 data is recognized, the
carriage return is still present in the input line.

this is a three-character fix that adds \r? to the regular expression that
checks for a base64 input line.

please let me know if there is any way i could be sending these in a more
helpful way.  thanks.
-- 
Allyn Fratkin             allyn at fratkin.com
Escondido, CA             http://www.fratkin.com/
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lexer.l.crpatch
URL: <https://www.bogofilter.org/pipermail/bogofilter-dev/attachments/20021019/b34db418/attachment.ksh>


More information about the bogofilter-dev mailing list