HTML entities (Was: Re: mass processing with mutt and Fcc)
David Relson
relson at osagesoftware.com
Tue Apr 1 23:33:39 CEST 2003
At 04:16 PM 4/1/03, Janne Nikula wrote:
>* Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> wrote:
> > The problem is that we need HTML processing to avoid the
> > spammers' tricks with tags in the middle of words. [...]
>
>Thinking about this lead me to think about other possibilities of
>intentional obfuscation in HTML.
>
>I don't recall receiving junk mail like this so far, but one of the ways
>to effectively break bogofilter's functionality to analyze HTML messages
>is to randomly replace normal characters with numerical entities.
Janne,
The code to scan a line for "&[0-9]*;" and convert to characters isn't
difficult. Until we need it, there's no point in writing it. It'll just
slow down processing.
David
More information about the Bogofilter
mailing list