Pure image based spam in the wild

David Relson relson at osagesoftware.com
Fri Jan 31 15:44:02 CET 2003


At 09:16 AM 1/31/03, Jonathan Buzzard wrote:

>matthias.andree at gmx.de said:
> > Could you run bogolexer on the full mail and let us know what it
> > prints?
> >
>
>Here goes, I have upgraded to 0.10.1.4 just for the test. As you can see
>apart from "enchanted holiday", there is nothing for bogofilter to go on
>really.

... [snip] ...

>normal mode.
>get_token: 2 'from'

...[snip]...

>157 tokens read.


Jonathan,

With 157 words found by bogolexer, there's a fair amount of info there for 
classifying.   Running with fisher tristate, 0.10.1.4 scores it at 0.372536 
which is in the Unsure zone.  Running bogofilter using "-vv" will generate 
a histogram (shown below) and running with "-vvv" will list all the tokens 
along with their good and spam counts, their spam scoree, and other info.

Here is bogofilter's histogram:

[relson at osage test$ bogofilter -d /var/lib/bogofilter -vv < image-spam.txt
X-Bogosity: No, tests=bogofilter, spamicity=0.372536, version=0.10.1.4
           int  cnt    prob   spamicity  histogram
          0.00   12  0.024535  0.005865  ############
          0.10    4  0.173421  0.019200  ####
          0.20    2  0.266996  0.029464  ##
          0.30    6  0.348798  0.067810  ######
          0.40   31  0.423770  0.226333  ###############################
          0.50   15  0.544654  0.292030  ###############
          0.60    4  0.653296  0.311264  ####
          0.70    5  0.750714  0.339379  #####
          0.80    4  0.837027  0.364658  ####
          0.90    1  0.919484  0.372536  #

True, the message content is hidden in the image.  However there's still 
info for bogofilter to work with.





More information about the Bogofilter mailing list