Ways to trick the lexer
Thomas Anderson
tanderso at oac-design.com
Fri Jun 8 23:50:09 CEST 2007
As the others have basically said, something you may consider to be a
cunning trick at first actually loses its effectiveness rather quickly.
Just train on them. Come back if they're still causing problems in a
few days or weeks (depending on how many you get a day).
Tom
On Fri, 2007-06-08 at 22:21 +0200, Andreas Pardeike wrote:
> Hi,
>
> I am getting hundreds of spams with subject "Sexually explicit"
> variations. The create tokens like
>
> subj:SEIX8UALLY-E8XPLICITI
>
> in the database and since they vary in at least one letter from
> each other, they all get counts of 1. As a result, none of those
> seemingly random letter will get high spam scores.
>
> Is this behaviour intented? Wouldn't a higher word count by splitting
> on more boundaries result in i.e.
>
> subj:UALLY
> ...
>
> or at least
>
> subj:SEIX8UALLY
> subj:E8XPLICITI
>
> ?
>
> Regards,
> Andreas Pardeike
> _______________________________________________
> Bogofilter mailing list
> Bogofilter at bogofilter.org
> http://www.bogofilter.org/mailman/listinfo/bogofilter
More information about the Bogofilter
mailing list