obscured URL not being tokenized

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Sun Dec 21 16:15:02 CET 2003


David Relson <relson at osagesoftware.com> wrote:

>The general subject can be termed "hidden text".  One technique is
>matching text color to background, which is how this thread started. 
>Another technique is having unrelated mime parts, for example the
>text/html section selling dirty pictures and the text/plain section
>being totally different, for example several paragraphs on archery. 
>There are other techniques, but I can't think of them at the moment.

You can add arbitrary large parts in HTML and hide them by
CSS with display:none or similar things.

>Anyhow, as regards white text on a white background, it's relatively
>easy to look at the most recent color directives and make a decision
>based on them.  Unfortunately, this isn't adequate.  Directives are
>nested, for example a table contains table data which can include font
>directives.  Proper processing of all this requires a stack for saving
>the previous state and popping the stack as end tags are encountered. 
>It all gets more complicated since the html may be improperly formed, as
>in <table><tr><td><font>...</table>, where the end directive pops
>several stack levels.

And probably worse: CSS will always be stronger, so it is
already very hard what to put into your stack.

If scripts are enabled (not sure if this is the default in
Outbreak Excess), than you are completely lost.

Actually, I don't see any problems so far with messages
being scored incorrectly due to colors.

But I also haven't seen any need to set HTML mode when a
DOCTYPE is seen.

As Jef says: If it where a problem, we would see many false
negatives.

pi




More information about the Bogofilter mailing list