What to do for HTML comment processing ???
David Relson
relson at osagesoftware.com
Fri Feb 28 18:51:12 CET 2003
Hi,
Two things today...
First, bogofilter has options for turning off the killing of html
comments. If there are no objections, I going to remove the options, which
means that bogofilter will _always_ kill html comments (in html text).
Second, it has been suggested that bogofilter be more aggressive in its
handling of html comments. According to the standard, a non-empty comment
looks like "<!--comment-->" with white space allowed before/after the pairs
of dashes. The "comment" part (between the pairs of dashes) can include
most anything. In particular it can include angle brackets, but not pairs
of dashes. You can have multiple comments inside the "<!" and ">"
delimiters, as in "<! -- comment one -- -- <this comment has angle
brackets> -- >"
A while back spam without the trailing dashes was reported and bogofilter
was modified to consider that as a valid commeent and discard it. Now
there's spam without the leading dashes.
Should bogofilter simply forget that dashes are in the html standard and
treat "<!whatever is inside the angle brackets>" as a comment?
Probably the lexer can be written so that proper comments (with
leading/trailing dashes) are recognized as well as comments without any
dashes. The proper treatment of "<!-- proper start, improper end>" remains
a question.
Comments requested ...
David
More information about the Bogofilter
mailing list