Spammers catching on

Matthias Andree matthias.andree at gmx.de
Wed Dec 18 13:39:21 CET 2002


Matt Armstrong <matt at lickey.com> writes:

> It might also be possible to have pseudo-tokens in the "word"
> database.  E.g. .html-font-same-foreground-as-background could be in
> the database with a good and SPAM count.  Then it could count against
> messages having that property.

Static rules aren't buying us anything. I'd rather think that IF we
settled on looking at HTML colors (note they come in two flavours
though, as tag attribute and as style sheet, which is the first problem,
besides figuring the scope of this color), we might want to weight the
sections by their contrast or something, which is "magically" 0 for same
color. The contrast could be figured by using the absolue differences of
the RGB components and then weight them per the standard CCITT rules,
i. e. level = 30/100 red + 59/100 green + 11/100 blue if I recall
correctly. Pure Yellow-on-Red would be 0.59 (green 1 vs 0), and
green-on-red would be .89, while cyan-on-similar-grey might be as low as
.1 or something. This would avoid the static barrier that a spammer
might want to circumvent.

OTOH, killing all HTML doesn't seem to bad an idea after all, no HTML
also means no scripts, no HTML images that might be web bugs, no bloat,
no cheating eventually. I'm still wondering how much an effort a
minimal, color-aware, HTML parser would be.

-- 
Matthias Andree




More information about the Bogofilter mailing list