Random lettered word examples

Tom Allison tallison at tacocat.net
Tue Mar 16 12:47:35 CET 2004


Eric Wood wrote:
> These slipped through bogofilter:
> 
> http://www.interplas.com/spam.txt
> http://www.interplas.com/spam2.txt
> 
> And I get email like this very consistantly which is why I'm looking for a
> procmail rules that can maybe score words consisting of all consonants as
> spamish or a string of impossible consecutive consonants.
> 
> My buddy's MS Outlook spam filter (Spam Inspector) catches email like this
> virtually all the time when bogofilter lets it through.  My guess is that it
> can spell check only the [a-zA-Z] words then it gets trapped if there are
> over 50% or so mispellings in certain areas.
> 

I played with some email like this (of my own) and found out that:
All the gibberish at the end comes in at a robx value (.415) and since 
it's withing the min_dev parameter (0.10) is summarily ignored in the email.

I accidently set my min_dev < .085 which put into consideration all the 
initial-robx material and scored 'ham' like clockwork.  It might be 
worthwhile to add a note in the bogofilter.cf.example file to this 
effect, that you want min_dev > abs(0.5 - robx)

--------

tallison at janus:~> bogofilter -vv < spam.txt
X-Bogosity: Unsure, tests=bogofilter, spamicity=0.649109, version=0.17.2
    int  cnt   prob  spamicity histogram
   0.00    5 0.024668 0.007974 #####
   0.10    0 0.000000 0.007974
   0.20    0 0.000000 0.007974
   0.30    4 0.366755 0.098442 ####
   0.40    0 0.000000 0.098442
   0.50    0 0.000000 0.098442
   0.60    3 0.650643 0.208138 ###
   0.70    1 0.751575 0.244896 #
   0.80    1 0.809796 0.282436 #
   0.90   13 0.994421 0.588911 #############

tallison at janus:~> bogofilter -vv < spam2.txt
X-Bogosity: Unsure, tests=bogofilter, spamicity=0.503607, version=0.17.2
    int  cnt   prob  spamicity histogram
   0.00    5 0.002805 0.000747 #####
   0.10    0 0.000000 0.000747
   0.20    1 0.200982 0.012152 #
   0.30    8 0.334807 0.132136 ########
   0.40    0 0.000000 0.132136
   0.50    0 0.000000 0.132136
   0.60    1 0.623312 0.159974 #
   0.70    0 0.000000 0.159974
   0.80    2 0.835902 0.244146 ##
   0.90   11 0.977496 0.533678 ###########
tallison at janus:~>

--------

These would have slipped through mine too.  But after one training 
they're correct.  I do use a training to exhaustion process on my mailbox.





More information about the Bogofilter mailing list