Cats and dogs
Greg Louis
glouis at dynamicro.on.ca
Fri Jul 4 22:39:07 CEST 2003
On 20030704 (Fri) at 2022:26 +0100, Peter Bishop wrote:
> The spamlist is getting pretty big now (>2000 messages), so
> rather than cutting off the spam feed completely, I could:
>
> 1) first classify the spamtrap email using bogofilter
> 2) if the test results in No or Unsure, add the message
> to the spamlist database.
That is _exactly_ how I've been updating my spamlist.db. It works
excellently for me. Then periodically I take the accumulated nonspam
messages and do the same with those. The only difference is that I
waited till I had 10,000 spams and about 8,000 nonspams before
switching from full training; 2,000 is a bit small, but the only
disadvantage will be that it will take longer to get bogofilter trained
to accuracies like <1% spam and <0.01% nonspam misclassified.
> In case you are wondering why I worry about database size,
> my bogofilter runs on an outsourced mail server supplied by our ISP
> and there is a hard limit on available disk storage.
It helps a lot to dump and reload periodically; that keeps the .db file
sizes down, since loading in order builds a more compact .db file.
--
| G r e g L o u i s | gpg public key: finger |
| http://www.bgl.nu/~glouis | glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |
More information about the Bogofilter
mailing list