testing with public-corpus
David Relson
relson at osagesoftware.com
Sat Oct 12 15:52:04 CEST 2002
Hello,
I've had the body of messages of the public-corpus sitting on my hard drive
(gathering dust) and have been wondering what to do with them. I'd been
thinking of testing all the messages with _my_ wordlists. That might be
interesting, but it didn't seem too useful and it's not something any of
you could reproduce since each of you has his own lists.
Last night I had an idea :-)
Split each of the three groups (easy_ham, hard_ham, and spam) into
half. Use 3 of the halves to build wordlists and then test bogofilter
using the other three halves. The first test sequence will just do
classification, i.e. "bogofilter -p" or something similar. The second test
sequence will do classification and updating, i.e. "bogofilter -p -u".
I'll report on the results when I have them. FWIW, it takes awhile to
process 175 easy_ham, 175 hard_ham, and 250 spam messages - especially when
I'm testing several versions of the spamicity computation for each message.
David
For summay digest subscription: bogofilter-digest-subscribe at aotto.com
More information about the Bogofilter
mailing list