pipe chars and the lexer
David Relson
relson at osagesoftware.com
Sat Feb 21 18:40:54 CET 2004
On Sat, 21 Feb 2004 10:38:56 -0500
Clint Adams wrote:
> Why don't the words like w|o|r|d in the attached spam fragment get
> picked up by the lexer?
Hi Clint,
Vertical bars are a special character - one of the many that bogofilter
uses to delimit tokens. They can easily be included (see patch below).
Question: do people want "|" included in tokens ???
David
Index: lexer_v3.l
===================================================================
RCS file: /cvsroot/bogofilter/bogofilter/src/lexer_v3.l,v
retrieving revision 1.138
diff -u -r1.138 lexer_v3.l
--- lexer_v3.l 31 Jan 2004 00:09:24 -0000 1.138
+++ lexer_v3.l 21 Feb 2004 17:34:22 -0000
@@ -136,9 +136,9 @@
MSG_COUNT ^\".MSG_COUNT\"
TOKENFRONT [^[:blank:][:cntrl:][:digit:][:punct:]]
-TOKENMID [^[:blank:][:cntrl:]<>;=():&%$#@+|/\\{}^\"?*,\[\]]+
-BOGOLEX_TOKEN [^[:blank:][:cntrl:]<>; &% @ |/\\{}^\" *,\[\]]+
-TOKENBACK [^[:blank:][:cntrl:]<>;=():&%$#@+|/\\{}^\"?*,\[\]._~\'\`\-]
+TOKENMID [^[:blank:][:cntrl:]<>;=():&%$#@+/\\{}^\"?*,\[\]]+
+BOGOLEX_TOKEN [^[:blank:][:cntrl:]<>; &% @ /\\{}^\" *,\[\]]+
+TOKENBACK [^[:blank:][:cntrl:]<>;=():&%$#@+/\\{}^\"?*,\[\]._~\'\`\-]
TOKEN {TOKENFRONT}{TOKENMID}{TOKENBACK}
More information about the bogofilter-dev
mailing list