HTML again
David Relson
relson at osagesoftware.com
Thu May 8 15:01:45 CEST 2003
At 08:49 AM 5/8/03, Marek Kowal wrote:
> > Yuck! The message is full of invalid html tags. Bogofilter
> > treats them as
> > <br>, while galeon (mozilla) discards them. Guess it's time
> > to extend the
> > processing of html tags so bogofilter's parsing matches mozilla's.
>
>There is just one issue I'd like you to remember: bogofilter is very good
>becouse of the algorithm used, robust development team, and because it is
>fast. And I mean really fast - I've managed to process up to 100 mails/sec
>with it (using bulk mode). SpamAsassin usualy rates at 1-2mails/sec. Bugs
>must be fixed, but please, try hard to keep the code fast - mozilla seems to
>be laaaaazy, so using it's parser might be easy, but will probably slow
>things down a lot.
Marek,
Don't worry. We're not going to use mozilla's parser. I ran mozilla to
see what it does with the invalid html that pi sent. Browser behavior
indicates how the world "sees" html. Having bogofilter "see" html in a
similar way is what we want.
David
More information about the Bogofilter
mailing list