article on blocking by subnets - Justification

David Relson relson at osagesoftware.com
Thu Dec 5 21:06:00 CET 2002


Barry,

Lots of data. Lots of fun :-)

Here's my thought on how to determine whether subnets provide useful info, 
i.e. help classification.

First, take a month's messages and separate spam from ham.

Phase 1:  run script contrib/randomtrain.  Afterwards display MSG_COUNT 
from spamlist.db and goodlist.db to determine how many messages were 
mis-classified, hence trained on.

Phase 2: turn on blocking_by_subnets and rerun phase 1.

Are the counts different?  Since the counts indicate how many messages 
bogofilter got wrong, the counts should go down when bogofilter has better 
data for classifying.

Notes: Be sure to use new wordlists for each run.  The newest cvs versions 
of bogofilter allow "block_on_subnet=Yes" to be put into the config file, 
which makes testing easier.

David





More information about the Bogofilter mailing list