breaking the training db

Greg Louis glouis at dynamicro.on.ca
Sun Sep 21 14:42:24 CEST 2003


Some of the changes we make require that users rebuild their training
databases to update token counts in the light of the new parsing.  It
looks as though 0.15.4 is one such; I got 29 fp, out of about 950
nonspam, within 20 hours of installing it on my personal mail server. 
I haven't analysed the fp at all, but I'm pretty sure the change in
header tagging is part of the cause; these are the first fp I've seen
in my personal mbox for many (>6 at least) weeks.  After registering
them I reclassified them, and every one had a score less than
DBL_EPSILON.
 
It makes sense that this should happen, and I expect that, in the
present case, the effect will be transient as people train on the new
errors; but I feel sorry for our -u'sers.

Perhaps it would be helpful, especially to the users whose experience
is limited, to issue a very explicit "needs retraining" notice when
changes that impact db counts are included in a bogofilter release.

-- 
| G r e g  L o u i s         | gpg public key: 0x400B1AA86D9E3E64 |
|  http://www.bgl.nu/~glouis |   (on my website or any keyserver) |
|  http://wecanstopspam.org in signatures helps fight junk email. |




More information about the bogofilter-dev mailing list