HTML again

David Relson relson at osagesoftware.com
Thu May 8 15:01:45 CEST 2003


At 08:49 AM 5/8/03, Marek Kowal wrote:

> > Yuck!  The message is full of invalid html tags.  Bogofilter
> > treats them as
> > <br>, while galeon (mozilla) discards them.  Guess it's time
> > to extend the
> > processing of html tags so bogofilter's parsing matches mozilla's.
>
>There is just one issue I'd like you to remember: bogofilter is very good
>becouse of the algorithm used, robust development team, and because it is
>fast. And I mean really fast - I've managed to process up to 100 mails/sec
>with it (using bulk mode). SpamAsassin usualy rates at 1-2mails/sec. Bugs
>must be fixed, but please, try hard to keep the code fast - mozilla seems to
>be laaaaazy, so using it's parser might be easy, but will probably slow
>things down a lot.

Marek,

Don't worry.  We're not going to use mozilla's parser.  I ran mozilla to 
see what it does with the invalid html that pi sent.  Browser behavior 
indicates how the world "sees" html.  Having bogofilter "see" html in a 
similar way is what we want.

David





More information about the Bogofilter mailing list