spamicity statistics?

Peter Bishop pgb at adelard.com
Mon Jul 7 09:47:08 CEST 2003


On 6 Jul 2003 at 18:25, David Carmean wrote:

> 
> Is anybody logging and plotting or otherwise analyzing message 
> spamicity as a means of judging how well their filter is tuned?
> 
> I intend to, once I overcome all the trouble I'm having getting gnuplot 
> to compile on this ancient FreeBSD installation :/
> 

I did try this a long time ago, using bogofilter v0.9
results shown in the attached PDF file.

Note this is done with the Robinson GM algorithm where is a fairly even
spread between 0 and 1 spamicity. Note also that the results are a bit 
notchy as the analysis has been performed on a limited number of messages
(especially ham).

As you can see there are two distnct "bell curves" for ham and spam
(ham centred at bout 0.3 and spam centred at about 0.7.
you can also see that with a span citoff of 0.54, there will be some false 
negatives.

I have not done any analysis with the latest version of bogofilter
but my feeling is that the bell ciurves are sharper and more widely 
separated.

-- 
Peter Bishop 
pgb at adelard.com
pgb at csr.city.ac.uk


-------------- next part --------------
A non-text attachment was scrubbed...
Name: spamicity.pdf
Type: application/octet-stream
Size: 13496 bytes
Desc: not available
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20030707/ec11708e/attachment.obj>


More information about the Bogofilter mailing list