mailing lists and hapaxes
Peter Bishop
pgb at adelard.com
Thu Sep 25 22:30:48 CEST 2003
On 25 Sep 2003 at 14:06, David Relson wrote:
> Bogofilter's default YYYYMMDD timestamps take up 4 bytes per token.
> Since a database entry already has the token, its ham and spam counts,
> plus standard DB overhead, these 4 bytes are a minor part of the space
> used. The timestamp is updated whenever a token's ham or spam count is
> updated. The easiest way of keeping track of all tokens is let
> bogofilter register all messages.
I know that - but I don't want the space overhead
What I was suggesting was a way of "touching" the date value
of the token without changing the counts in the database
Alternatively, you could create a
> "recent_hits" directory and just have bogofilter register all incoming
> tokens in _that_ wordlist. Then you'd need some sort of script to dump
> the working wordlist and trim according to the content of the
> recent_hits list.
That was basically what I thinking for the "hard way"
--
Peter Bishop
pgb at adelard.com
pgb at csr.city.ac.uk
More information about the Bogofilter
mailing list