DOCTYPE

David Relson relson at osagesoftware.com
Sat Nov 8 19:04:44 CET 2003


On Sat, 08 Nov 2003 17:06:49 +0100
Matthias Andree <matthias.andree at gmx.de> wrote:

> David Relson <relson at osagesoftware.com> writes:
> 
> > My test results don't change at all if <!DOCTYPE...> is used as an
> > html indicator.  The DOCTYPE directive is pretty common in my
> > incoming mail, specifically in my "unsures".  Recognizing it has
> > minimal impact on bogofilter's size and speed and follows the
> > principle of least surprise.
> >  
> >
> > Anyhow, for the above reasons, I'm inclined to make the lexer
> > change.
> >
> > What do you think?
> 
> It should only happen if there's little preceding it. I'd rather not
> want bogofilter to score a whole plaintext discussion as HTML when
> someone happens to explain <!DOCTYPE>. There is another indicator in
> the given mail: the meta stuff in the HTML HEAD section. I wonder
> which one makes lookOut, Exanthema  switch to HTML mode.

Given that parsing isn't looking ahead, only text after the DOCTYPE
directive is affected and the effect ends at the end of the mime body
part.  Something like "^<"\!DOCTYPE\ HTML\ PUBLIC\ .*">" should be
sufficiently detailed to work well.




More information about the Bogofilter mailing list