OT: finding small fonts 1px; vs 1;
    Eric Wood 
    eric at interplas.com
       
    Wed Feb 25 15:43:33 CET 2004
    
    
  
I just noticed that people might want to look for specifications of font
sizes in both 1; and 1px; sizes. 0, 1.5, 2 as well.  Right now, if I find
that font specification anywhere in the message, the message gets chunked.
Rather just bluntly chunking the whole email, I'd rather just strip out
those section of html - becuase the spammer didn't want me to see it anyway
right?  So theoretically, the remaining "see-able" text should be put to the
bogo test.
I have this in my main recipe:
# Strip Useless TABS and HTML comments, used to split up words
:0 HB
* < 20000
* text/html
{
  :0fwb
  | expand | sed -e :a -e 's/<!--[^-]*-->//g;/</N;//ba'
}
Can someone help me strip out the <SPAN and <FONT tags that specify small
fonts.  There this one stubborn email that has a ton of generic historical
US history text hidden in the message.  I'd just hate to do a -Ns on a
message that has more good content than bad.
This is a tough one.....
-Eric Wood
    
    
More information about the bogofilter
mailing list