How to avoid s p lit up wor ds?
Chris Wilkes
cwilkes-bf at ladro.com
Fri Jan 17 22:19:26 CET 2003
On Fri, Jan 17, 2003 at 04:06:24PM -0500, David Relson wrote:
>
> The bad news is that the html tags _do_ break up the words.
I'm almost of the mind to send all HTML mail to a spam bin, and then
tell BF to rate it non-spam if it gets some low BF value.
However that still doesn't get around the problem of humans being able
to read text that a programming looking for text can't (the "ton er" =
"ton<BR>er" = toner are the same case).
I'm not sure how you can write a tokenizer to combine word fragments
that should be combined together.
Chris
More information about the Bogofilter
mailing list