Ways to trick the lexer
John G Walker
johngwalker at tiscali.co.uk
Sat Jun 9 11:38:21 CEST 2007
On Sat, 9 Jun 2007 11:00:11 +0200 Andreas Pardeike
<andreas at pardeike.net> wrote:
> On 9 jun 2007, at 01.24, David Relson wrote:
>
> > The cunning trick tends to be a one-time success. After you train
> > with
> > such a message, the next time "SEIX8UALLY" appears in a Subject
> > line it
> > is known to bogofilter as spam. I think of it as being a red flag
> > saying "look at me, I'm spam".
>
> My point was that I get spam that substitutes most key words with
> words that contain random character replacements. If I look at the
> last say 50-100 spams of category "sexually explicit" I can see that
> there isn't a single repetition of SEIX8UALLY.
>
> I compared the messages and very very little is common in them. They
> have
>
> - different subjects (except that they all look *similar*
> - different IP's and subnets
> - the non common header lines are different too
> - the payload is different in every word
> - they contain truly random text (about 2 lines) at the end of
> the message
>
> I doubt that training has such an effect. It started about a month ago
> and grew to such extend that I doubted the state of my db. So I erased
> the db and started over freshly a week ago. Since then, I have not
> seen any improvement.
>
> Here are two examples, freshly from my INBOX this morning (followed by
> the output of bogofilter -vv / -vvv):
>
> ------------------------------------------------------------------------
> -------------
> Received: from MARIELAJUAN ([24.138.201.172])
> by mail.roundline.net (8.13.4/8.13.4) with ESMTP id
> l58MBNvI002718 for <info at loops.twosailors.net>; Sat, 9 Jun 2007
> 00:11:25 +0200 Received: from babyfeel (cardsfunny.sadlyraysays.com
> [135.121.172.180]) by mail.mail.com (Postfix) with ESMTP id placepaid
> for <info at loops.twosailors.net>; Fri, 08 Jun 2007 18:59:56
> +0100
{snip}
> X-Bogosity: No, tests=bogofilter, spamicity=0.684388, version=1.1.5
The message has been given a spamicity of 68%.
I would suggest that your cutoff point is set too high. Do you actually
get ham with such a high spamicity? If not, you should bring the cutoff
point down. I've got mine set at 60% and don't get false positives,
--
All the best,
John
More information about the Bogofilter
mailing list