html processing [was: Wanting a pre-db4 bogofilter]
David Relson
relson at osagesoftware.com
Tue Mar 8 01:27:57 CET 2005
On Tue, 08 Mar 2005 01:07:19 +0100
Matthias Andree wrote:
> "Eric Wood" <eric at interplas.com> writes:
...[snip]...
>
> > Also, a very new kind of spam I see is the creatation of a table with many
> > cells. Then they put word fragments in each cell, valign="top" and "bottom"
> > some cells, and it forms a completely readable and lined up spam message
> > that bogofilter and my keyword filter couldn't have caught.
>
> I was under the impression that HTML tag removal was supposed to take
> care of this, but I have neither written nor looked at the HTML tag
> handling code.
Bogofilter removes html comments, so "th<!--comment-->is" becomes
"this" and "bef<font>aft" becomes "befaft". The inclusion of spaces,
i.e. "th <!--comment--> is", would result in two 2 character fragments.
I'd have to see a sample of the table/array to determine why bogofilter
isn't doing what's wanted.
Regards,
David
_______________________________________________
Bogofilter mailing list
Bogofilter at bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter
More information about the Bogofilter
mailing list