mailing lists and hapaxes

Peter Bishop pgb at adelard.com
Thu Sep 25 22:30:48 CEST 2003


On 25 Sep 2003 at 14:06, David Relson wrote:

> Bogofilter's default YYYYMMDD timestamps take up 4 bytes per token. 
> Since a database entry already has the token, its ham and spam counts,
> plus standard DB overhead, these 4 bytes are a minor part of the space
> used.  The timestamp is updated whenever a token's ham or spam count is
> updated.  The easiest way of keeping track of all tokens is let
> bogofilter register all messages.

I know that - but I don't want the space overhead

What I was suggesting was a way of "touching" the date value
of the token without changing the counts in the database

Alternatively, you could create a
> "recent_hits" directory and just have bogofilter register all incoming
> tokens in _that_ wordlist.  Then you'd need some sort of script to dump
> the working wordlist and trim according to the content of the
> recent_hits list. 

That was basically what I thinking for the "hard way"
-- 
Peter Bishop 
pgb at adelard.com
pgb at csr.city.ac.uk






More information about the Bogofilter mailing list