obscured URL not being tokenized
Tom Anderson
tanderso at oac-design.com
Sun Dec 21 21:08:08 CET 2003
On Sun, 2003-12-21 at 14:58, Dan Singletary wrote:
> Has any thought been put into not only registering single tokens as
> bogofilter does now, but registering dual tokens so that "color, white"
> was a token, or "display, none" was a token-- this might enhance
> bogofilters accuracy because often you get enhanced meaning from looking
> at two adjacent tokens .. "click here" comes to mind.
It has been discussed on this list in the past, and I believe someone
had created a version which does that. I think it's a good idea if it
shows promising results... what were the results? However, I don't
think "color" and "white" should even be seperate tokens in the first
place if they were in the form of "color='white'". That should be a
single token. The only seperators, IMHO, should be whitespace and maybe
a comma or semicolon, maybe. Mostly just whitespace.
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20031221/56fedacf/attachment.sig>
More information about the Bogofilter
mailing list