Cats and dogs

Greg Louis glouis at dynamicro.on.ca
Fri Jul 4 22:39:07 CEST 2003


On 20030704 (Fri) at 2022:26 +0100, Peter Bishop wrote:

> The spamlist is getting pretty big now (>2000 messages), so
> rather than cutting off the spam feed completely, I could:
> 
> 1) first classify the spamtrap email using bogofilter
> 2) if the test results in No or Unsure, add the message 
>     to the spamlist database.

That is _exactly_ how I've been updating my spamlist.db.  It works
excellently for me.  Then periodically I take the accumulated nonspam
messages and do the same with those.  The only difference is that I
waited till I had 10,000 spams and about 8,000 nonspams before
switching from full training; 2,000 is a bit small, but the only
disadvantage will be that it will take longer to get bogofilter trained
to accuracies like <1% spam and <0.01% nonspam misclassified.

> In case you are wondering why I worry about database size,
> my bogofilter runs on an outsourced mail server supplied by our ISP
> and there is a hard limit on available disk storage.

It helps a lot to dump and reload periodically; that keeps the .db file
sizes down, since loading in order builds a more compact .db file.

-- 
| G r e g  L o u i s          | gpg public key: finger     |
|   http://www.bgl.nu/~glouis |   glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |




More information about the Bogofilter mailing list