Ways to trick the lexer

John G Walker johngwalker at tiscali.co.uk
Sat Jun 9 11:38:21 CEST 2007



On Sat, 9 Jun 2007 11:00:11 +0200 Andreas Pardeike
<andreas at pardeike.net> wrote:

> On 9 jun 2007, at 01.24, David Relson wrote:
> 
> > The cunning trick tends to be a one-time success.  After you train  
> > with
> > such a message, the next time "SEIX8UALLY" appears in a Subject  
> > line it
> > is known to bogofilter as spam.  I think of it as being a red flag
> > saying "look at me, I'm spam".
> 
> My point was that I get spam that substitutes most key words with
> words that contain random character replacements. If I look at the
> last say 50-100 spams of category "sexually explicit" I can see that
> there isn't a single repetition of SEIX8UALLY.
> 
> I compared the messages and very very little is common in them. They  
> have
> 
> - different subjects (except that they all look *similar*
> - different IP's and subnets
> - the non common header lines are different too
> - the payload is different in every word
> - they contain truly random text (about 2 lines) at the end of
>    the message
> 
> I doubt that training has such an effect. It started about a month ago
> and grew to such extend that I doubted the state of my db. So I erased
> the db and started over freshly a week ago. Since then, I have not
> seen any improvement.
> 
> Here are two examples, freshly from my INBOX this morning (followed by
> the output of bogofilter -vv / -vvv):
> 
> ------------------------------------------------------------------------ 
> -------------
> Received: from MARIELAJUAN ([24.138.201.172])
> 	by mail.roundline.net (8.13.4/8.13.4) with ESMTP id
> l58MBNvI002718 for <info at loops.twosailors.net>; Sat, 9 Jun 2007
> 00:11:25 +0200 Received: from babyfeel (cardsfunny.sadlyraysays.com
> [135.121.172.180]) by mail.mail.com (Postfix) with ESMTP id placepaid
>          for <info at loops.twosailors.net>; Fri, 08 Jun 2007 18:59:56  
> +0100

{snip}

> X-Bogosity: No, tests=bogofilter, spamicity=0.684388, version=1.1.5

The message has been given a spamicity of 68%. 

I would suggest that your cutoff point is set too high. Do you actually
get ham with such a high spamicity? If not, you should bring the cutoff
point down. I've got mine set at 60% and don't get false positives,

-- 
 All the best,
 John



More information about the Bogofilter mailing list