pipe chars and the lexer

David Relson relson at osagesoftware.com
Sat Feb 21 18:40:54 CET 2004


On Sat, 21 Feb 2004 10:38:56 -0500
Clint Adams wrote:

> Why don't the words like w|o|r|d in the attached spam fragment get
> picked up by the lexer?

Hi Clint,

Vertical bars are a special character - one of the many that bogofilter
uses to delimit tokens.  They can easily be included (see patch below).

Question:  do people want "|" included in tokens ???

David

Index: lexer_v3.l
===================================================================
RCS file: /cvsroot/bogofilter/bogofilter/src/lexer_v3.l,v
retrieving revision 1.138
diff -u -r1.138 lexer_v3.l
--- lexer_v3.l	31 Jan 2004 00:09:24 -0000	1.138
+++ lexer_v3.l	21 Feb 2004 17:34:22 -0000
@@ -136,9 +136,9 @@
 MSG_COUNT	^\".MSG_COUNT\"
 
 TOKENFRONT	[^[:blank:][:cntrl:][:digit:][:punct:]]
-TOKENMID	[^[:blank:][:cntrl:]<>;=():&%$#@+|/\\{}^\"?*,\[\]]+
-BOGOLEX_TOKEN	[^[:blank:][:cntrl:]<>;    &%  @ |/\\{}^\" *,\[\]]+
-TOKENBACK	[^[:blank:][:cntrl:]<>;=():&%$#@+|/\\{}^\"?*,\[\]._~\'\`\-]
+TOKENMID	[^[:blank:][:cntrl:]<>;=():&%$#@+/\\{}^\"?*,\[\]]+
+BOGOLEX_TOKEN	[^[:blank:][:cntrl:]<>;    &%  @ /\\{}^\" *,\[\]]+
+TOKENBACK	[^[:blank:][:cntrl:]<>;=():&%$#@+/\\{}^\"?*,\[\]._~\'\`\-]
 
 TOKEN		{TOKENFRONT}{TOKENMID}{TOKENBACK}




More information about the bogofilter-dev mailing list