obscured URL not being tokenized

Tom Anderson tanderso at oac-design.com
Sat Dec 20 19:18:17 CET 2003


On Sat, 2003-12-20 at 11:16, Dan Singletary wrote:
> filter.  I've mentioned it before, but there should be some way to tell 
> bogofilter to ignore text that is the same color as it's background- I 
> know this would require more interpretation of the HTML, and I'm not 
> sure how much more code there would need to be for this.  I've attached 
> the entire offending email for your reference.

That would be simple enough for only white backgrounds and specifically
"white" fonts.  But once you start getting color="#fffffe", then it gets
more difficult.  Moreover, it is nearly impossible to compare a
background _image_ to a foreground color without doing all kinds of
image recognition.  Even if it were possible, the overhead would be
extreme.

However, if you simply leave everything alone as is, and just register
your spams, the Bayesian method should start to recogize things like
color="white" and background="something.jpg" as spamish tokens.

Assuming, that is, that bogofilter doesn't throw away such valuable
information as an equals sign and quotes.

Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20031220/0aa89ec1/attachment.sig>


More information about the Bogofilter mailing list