Nigerian spam [was: multiple types of spam]

David Relson relson at osagesoftware.com
Fri Jul 4 01:10:38 CEST 2003


At 06:36 PM 7/3/03, Jef Poskanzer wrote:
> >It can be difficult.  If you want, I can send you a tarball with 60 or so
> >419 messages (before removal of duplicates).  Dont' know if it'll be useful
> >or not.
>
>If we're collecting a training resource, I've got over 400!

Jeff,

If you're up for an experiment:

Take your 400 Nigerian messages and split them in two groups.  Train on 
group 1 as spam.
Take an equal number of ham messages and split them.  Train on group 1 as ham.

Then evaluate the remaining ham and spam and let us know how well 
bogofilter does in scoring them.

As a second experiment, use pi's train-on-error script for the initial 
training.  Then do the above evaluate/report.

As a third experiment, use contrib/randomtrain.  Then the evaluate/report.

A result table like the one below would make a nice summary:

             number       training       % correct
           ham  spam     ham   spam     ham  spam
Test 1    200   200     200    200      80   90
Test 2    200   200      35     50      75   80
Test 3    200   200      40     45      80   75

Sound doable???

David






More information about the Bogofilter mailing list