How to avoid s p lit up wor ds?

Chris Wilkes cwilkes-bf at ladro.com
Fri Jan 17 22:19:26 CET 2003


On Fri, Jan 17, 2003 at 04:06:24PM -0500, David Relson wrote:
> 
> The bad news is that the html tags _do_ break up the words.

I'm almost of the mind to send all HTML mail to a spam bin, and then
tell BF to rate it non-spam if it gets some low BF value.

However that still doesn't get around the problem of humans being able
to read text that a programming looking for text can't (the "ton er" =
"ton<BR>er" = toner are the same case).

I'm not sure how you can write a tokenizer to combine word fragments
that should be combined together.

Chris




More information about the Bogofilter mailing list