OT: finding small fonts 1px; vs 1;
Eric Wood
eric at interplas.com
Wed Feb 25 15:43:33 CET 2004
I just noticed that people might want to look for specifications of font
sizes in both 1; and 1px; sizes. 0, 1.5, 2 as well. Right now, if I find
that font specification anywhere in the message, the message gets chunked.
Rather just bluntly chunking the whole email, I'd rather just strip out
those section of html - becuase the spammer didn't want me to see it anyway
right? So theoretically, the remaining "see-able" text should be put to the
bogo test.
I have this in my main recipe:
# Strip Useless TABS and HTML comments, used to split up words
:0 HB
* < 20000
* text/html
{
:0fwb
| expand | sed -e :a -e 's/<!--[^-]*-->//g;/</N;//ba'
}
Can someone help me strip out the <SPAN and <FONT tags that specify small
fonts. There this one stubborn email that has a ton of generic historical
US history text hidden in the message. I'd just hate to do a -Ns on a
message that has more good content than bad.
This is a tough one.....
-Eric Wood
More information about the Bogofilter
mailing list