It's getting worse

David Relson relson at osagesoftware.com
Sat Mar 29 01:19:55 CET 2003


At 07:12 PM 3/28/03, Clint Adams wrote:

> > <html><body bgcolor="blue">
> > <font color="white">white</font>
> > <font color="blue">blue</font>
> > <font color="red">red</font>
> > </body><html>
> >
> > The word "blue" is rendered in bgcolor, hence is effectively
> > whitespace.  If bogofilter's goal is eye-space, then "blue" is not a token
> > to process.
> >
> > It's an interesting (difficult) problem just figuring what _ought_ to be
> > done.
>
>I've been getting a lot lately where they set the font color to #fffff,
>the font size to -4, then just spew garbage.

Assuming default settings, the first time bogofilter sees that garbage, 
each token will get spamicity 0.415 (which is Robinson's "x" parameter) and 
it will be ignored since min_dev of 0.100 ignores values from 0.400 to 
0.600.  Assuming the message is then added to the spam wordlist and the 
same arrives again, every piece of the garbage will be "known spam" and 
bogofilter will nail the spammer.

That's the good news.  The bad news is that all the garbage will not be in 
the wordlist.  (Note: a periodic maintenance pass that discards all tokens 
with low counts could be used to clean out the garbage).

David






More information about the Bogofilter mailing list