HTML entities (Was: Re: mass processing with mutt and Fcc)

David Relson relson at osagesoftware.com
Tue Apr 1 23:33:39 CEST 2003


At 04:16 PM 4/1/03, Janne Nikula wrote:

>* Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> wrote:
> > The problem is that we need HTML processing to avoid the
> > spammers' tricks with tags in the middle of words. [...]
>
>Thinking about this lead me to think about other possibilities of
>intentional obfuscation in HTML.
>
>I don't recall receiving junk mail like this so far, but one of the ways
>to effectively break bogofilter's functionality to analyze HTML messages
>is to randomly replace normal characters with numerical entities.

Janne,

The code to scan a line for "&[0-9]*;" and convert to characters isn't 
difficult.  Until we need it, there's no point in writing it.  It'll just 
slow down processing.

David





More information about the Bogofilter mailing list