html processing [was: Wanting a pre-db4 bogofilter]

David Relson relson at osagesoftware.com
Tue Mar 8 01:27:57 CET 2005


On Tue, 08 Mar 2005 01:07:19 +0100
Matthias Andree wrote:

> "Eric Wood" <eric at interplas.com> writes:

...[snip]...

> 
> > Also, a very new kind of spam I see is the creatation of a table with many 
> > cells.  Then they put word fragments in each cell, valign="top" and "bottom" 
> > some cells, and it forms a completely readable and lined up spam message 
> > that bogofilter and my keyword filter couldn't have caught.
> 
> I was under the impression that HTML tag removal was supposed to take
> care of this, but I have neither written nor looked at the HTML tag
> handling code.

Bogofilter removes html comments, so "th<!--comment-->is" becomes
"this" and "bef<font>aft" becomes "befaft".  The inclusion of spaces,
i.e. "th <!--comment--> is", would result in two 2 character fragments.

I'd have to see a sample of the table/array to determine why bogofilter
isn't doing what's wanted.

Regards,

David

_______________________________________________
Bogofilter mailing list
Bogofilter at bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter



More information about the Bogofilter mailing list