Results - Filter on subject & body

Jozef Hitzinger hitzinger at phobos.fphil.uniba.sk
Tue Mar 9 15:26:50 CET 2004


On Tue, 9 Mar 2004, Boris 'pi' Piwinger wrote:

> Could you please describe your testing setup. How did you
> train and how did you then go on?

Setup: two mboxes, h with 8066 ham, s with 11848 spam (classified manualy,
surely with some errors)

To get all-header wordlist:

# start empty

rm ~/.bogofilter/wordlist.db

# kill surplus headers introduced by Pine

cat h | formail -s formail -I Status: -I X-Status: > h-
mv h- h
cat s | formail -s formail -I Status: -I X-Status: > s-
mv s- s

# train & set

bogofilter -s <s
bogogilter -n <h
bogoutil -R ~/.bogofilter/	#robx        = 0.39something

# tests

classify

Tests were run on the 's' mbox, and another 'u' mbox .. I've run on 'h'
too, but results are just as expected - both ways it's all ham & a few
unsures


To get subject&body wordlist, exactly as above, but with different
prefilter command (the From\ is the mbox message separator):

cat h | formail -s formail -k -X Subject: -X From\  > h+
cat s | formail -s formail -k -X Subject: -X From\  > s+

and then

bogofilter -s <s+
bogofilter -n <h+
bogoutil -R ~/.bogofilter/   #robx        = 0.473960

classify


Both cases params:

bogofilter version 0.16.3
algorithm   = fisher
robs        = 0.010000 (1.00e-02)
min_dev     = 0.300000 (3.00e-01)
ham_cutoff  = 0.100000 (1.00e-01)
spam_cutoff = 0.950000 (9.50e-01)
block_on_subnets  = no
replace_nonascii_characters = no

-- 
jozef  :-)




More information about the Bogofilter mailing list