Method of training

jxz jxz at uol.com.br
Thu Sep 4 20:53:51 CEST 2003


Hello!

I was reading some old messages from the list, and one message from Greg
Louis made me think to switch my method of training.

I would let bogofilter score the incoming messages. Procmail will add a
%U header (from `date`, to register the week of the year). In the
sunday, where the week number changes, I would just grep the mboxes with
mboxgrep, extract the last week messages with, f.e., "X-Week: 35" and
"X-Bogosity: Unsure" and the fp and fn, and save in a temporary
mbox.

After, I would classify manually the messages from this temp mbox, and
send to train.ham.35 and train.spam.35 mboxes, and train bogofilter.

With this method, I should only train bogofilter by it's errors, and
there will be no need to save the whole cruft of spam, only the unsures,
in case of db corrupt or db schema updates.

Now I ask: the train-on-error method works well? Or do I need to receive
hundreds of thousands of trillions of billions of emails to it begin to
be accurate? :)

What do you think of this method, and what method you currently use, and
is satisfied with it's accuracy?

For some time I'm testing bogofilter, with full training, but it gives
much fn (~87% accuracy), but my spam corpus is ~2400, and ham ~6000.

TIA


-- 
jxz at uol.com.br
http://jxz.dontexist.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20030904/cd3ccf87/attachment.sig>


More information about the Bogofilter mailing list