mass processing with mutt and Fcc

Michael Kenneth Ter Louw mterlo1 at uic.edu
Tue Apr 1 21:54:16 CEST 2003



On Tue, 1 Apr 2003, Boris 'pi' Piwinger wrote:

> > At the present time, when processing html, bogofilter does discards html 
> > comments, valid html tags (and their innards), and invalid html tags (and 
> > their innards).  Basically everything between angle brackets is being 
> > ignored at this time.
> > 
> > The rationale is that that many tokens within html tags are not worth 
> > scoring as spam indicators.
> 
> I see. I thought that the use of html would be useful (I
> remember the early versions of bogofilter said so). Also web
> addresses as in links or img elements might be useful.

Graham mentions the use of HTML tags in his article:

"In fact, "ff0000" (html for bright red) turns out to be as good an
indicator of spam as any pornographic term."

I don't know if analyzing *all* the HTML tags would be worth the benefit
offered by this single case.  Just thought I'd throw it out there.

Mike





More information about the Bogofilter mailing list