Nigerian spam [was: multiple types of spam]
David Relson
relson at osagesoftware.com
Fri Jul 4 01:10:38 CEST 2003
At 06:36 PM 7/3/03, Jef Poskanzer wrote:
> >It can be difficult. If you want, I can send you a tarball with 60 or so
> >419 messages (before removal of duplicates). Dont' know if it'll be useful
> >or not.
>
>If we're collecting a training resource, I've got over 400!
Jeff,
If you're up for an experiment:
Take your 400 Nigerian messages and split them in two groups. Train on
group 1 as spam.
Take an equal number of ham messages and split them. Train on group 1 as ham.
Then evaluate the remaining ham and spam and let us know how well
bogofilter does in scoring them.
As a second experiment, use pi's train-on-error script for the initial
training. Then do the above evaluate/report.
As a third experiment, use contrib/randomtrain. Then the evaluate/report.
A result table like the one below would make a nice summary:
number training % correct
ham spam ham spam ham spam
Test 1 200 200 200 200 80 90
Test 2 200 200 35 50 75 80
Test 3 200 200 40 45 80 75
Sound doable???
David
More information about the Bogofilter
mailing list