HTML again

Jeff Kinz jkinz at kinz.org
Thu May 8 15:05:08 CEST 2003


On Thu, May 08, 2003 at 08:11:32AM -0400, David Relson wrote:
> At 05:15 AM 5/8/03, Boris 'pi' Piwinger wrote:
> >Today I received several mails in "HTML" which were not
> >detected. bogolexer shows why. I attach a ZIP file so that
> >your filter does not see it.
> Yuck!  The message is full of invalid html tags.  Bogofilter treats them as 
> <br>, while galeon (mozilla) discards them.  Guess it's time to extend the 
> processing of html tags so bogofilter's parsing matches mozilla's.

Is there any possibility that the configuration of invalid HTML tags would be
valid data for Bogo to do scoring on?

Come to think of it - What about valid HTML? Wouldn't certain patterns of
those also be good markers for spam/not-spam?

-- 
Jeff Kinz, Open-PC, Emergent Research,  Hudson, MA.  jkinz at kinz.org
copyright 2003.  Use is restricted. Any use is an 
acceptance of the offer at http://www.kinz.org/policy.html.
Don't forget to change your password often.




More information about the Bogofilter mailing list