mass processing with mutt and Fcc
Michael Kenneth Ter Louw
mterlo1 at uic.edu
Tue Apr 1 21:54:16 CEST 2003
On Tue, 1 Apr 2003, Boris 'pi' Piwinger wrote:
> > At the present time, when processing html, bogofilter does discards html
> > comments, valid html tags (and their innards), and invalid html tags (and
> > their innards). Basically everything between angle brackets is being
> > ignored at this time.
> >
> > The rationale is that that many tokens within html tags are not worth
> > scoring as spam indicators.
>
> I see. I thought that the use of html would be useful (I
> remember the early versions of bogofilter said so). Also web
> addresses as in links or img elements might be useful.
Graham mentions the use of HTML tags in his article:
"In fact, "ff0000" (html for bright red) turns out to be as good an
indicator of spam as any pornographic term."
I don't know if analyzing *all* the HTML tags would be worth the benefit
offered by this single case. Just thought I'd throw it out there.
Mike
More information about the Bogofilter
mailing list