OT: finding small fonts 1px; vs 1;

Eric Wood eric at interplas.com
Wed Feb 25 15:43:33 CET 2004


I just noticed that people might want to look for specifications of font
sizes in both 1; and 1px; sizes. 0, 1.5, 2 as well.  Right now, if I find
that font specification anywhere in the message, the message gets chunked.

Rather just bluntly chunking the whole email, I'd rather just strip out
those section of html - becuase the spammer didn't want me to see it anyway
right?  So theoretically, the remaining "see-able" text should be put to the
bogo test.

I have this in my main recipe:

# Strip Useless TABS and HTML comments, used to split up words
:0 HB
* < 20000
* text/html
{
  :0fwb
  | expand | sed -e :a -e 's/<!--[^-]*-->//g;/</N;//ba'
}

Can someone help me strip out the <SPAN and <FONT tags that specify small
fonts.  There this one stubborn email that has a ton of generic historical
US history text hidden in the message.  I'd just hate to do a -Ns on a
message that has more good content than bad.

This is a tough one.....
-Eric Wood





More information about the Bogofilter mailing list