obscured URL not being tokenized

Tom Anderson tanderso at oac-design.com
Sun Dec 21 21:08:08 CET 2003


On Sun, 2003-12-21 at 14:58, Dan Singletary wrote:
> Has any thought been put into not only registering single tokens as 
> bogofilter does now, but registering dual tokens so that "color, white" 
> was a token, or "display, none" was a token-- this might enhance 
> bogofilters accuracy because often you get enhanced meaning from looking 
> at two adjacent tokens .. "click here" comes to mind.

It has been discussed on this list in the past, and I believe someone
had created a version which does that.  I think it's a good idea if it
shows promising results... what were the results?  However, I don't
think "color" and "white" should even be seperate tokens in the first
place if they were in the form of "color='white'".  That should be a
single token.  The only seperators, IMHO, should be whitespace and maybe
a comma or semicolon, maybe.  Mostly just whitespace.

Tom

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20031221/56fedacf/attachment.sig>


More information about the Bogofilter mailing list