Pure image based spam in the wild
David Relson
relson at osagesoftware.com
Fri Jan 31 15:44:02 CET 2003
At 09:16 AM 1/31/03, Jonathan Buzzard wrote:
>matthias.andree at gmx.de said:
> > Could you run bogolexer on the full mail and let us know what it
> > prints?
> >
>
>Here goes, I have upgraded to 0.10.1.4 just for the test. As you can see
>apart from "enchanted holiday", there is nothing for bogofilter to go on
>really.
... [snip] ...
>normal mode.
>get_token: 2 'from'
...[snip]...
>157 tokens read.
Jonathan,
With 157 words found by bogolexer, there's a fair amount of info there for
classifying. Running with fisher tristate, 0.10.1.4 scores it at 0.372536
which is in the Unsure zone. Running bogofilter using "-vv" will generate
a histogram (shown below) and running with "-vvv" will list all the tokens
along with their good and spam counts, their spam scoree, and other info.
Here is bogofilter's histogram:
[relson at osage test$ bogofilter -d /var/lib/bogofilter -vv < image-spam.txt
X-Bogosity: No, tests=bogofilter, spamicity=0.372536, version=0.10.1.4
int cnt prob spamicity histogram
0.00 12 0.024535 0.005865 ############
0.10 4 0.173421 0.019200 ####
0.20 2 0.266996 0.029464 ##
0.30 6 0.348798 0.067810 ######
0.40 31 0.423770 0.226333 ###############################
0.50 15 0.544654 0.292030 ###############
0.60 4 0.653296 0.311264 ####
0.70 5 0.750714 0.339379 #####
0.80 4 0.837027 0.364658 ####
0.90 1 0.919484 0.372536 #
True, the message content is hidden in the image. However there's still
info for bogofilter to work with.
More information about the Bogofilter
mailing list