How to avoid s p lit up wor ds?
Barry Gould
BarryGould at PennySaverUSA.net
Sat Jan 18 03:32:50 CET 2003
At 01:23 PM 1/17/2003, David Relson wrote:
>Why bother with the tags? I can read "buy to ner car tri dg es", though
>it's a bit of a pain. Combining such fragments calls for an AI type
>algorithm...
In the message I received, it looked like
to<!--fred-->ner ...
The (html-aware) MUA ignores the html comments (they are not rendered to
the screen, i.e. they are null).
Note this is different than if it were
to ner
or
to<br>ner
, both of which DO require some sort of AI to understand.
However, I have NOT seen email that actually looks like this.
IMHO, the best way for bogofilter to deal with this would be to convert the
message from html to plaintext at some point. Maybe run over it once as
HTML, and once as text, for messages that are HTML. This is what I am
currently doing with base-64 messages, etc.
Other alternatives that come to mind would include making the HTML comment
tag <!-- --> get highly penalized, as it would only show up infrequently
except in spam.
However, this would require modifying the way the statistics are computed,
as 1.0 would not be high enough for such tags.
Unless each instance were to get counted! (multiplying the probabilities
somehow I suppose.)
BTW, spam assassin already has some basic rules for what it calls "Gappy
Text", e.g.
B U Y M E N O W !
but this is of course different than
bu y me n ow
Barry
More information about the Bogofilter
mailing list