token age format

Matthias Andree matthias.andree at gmx.de
Mon Dec 16 03:14:44 CET 2002


David Relson <relson at osagesoftware.com> writes:

> I've been thinking about dates and formats.  There are at least three
> different things going on.
>
> 1 - internal format - how dates are stored in wordlists
> 2 - external format - how dates are dumped/loaded by bogoutil
> 3 - ages - how the user says "discard tokens older than X"
>
> While time_t may be good for #1, it's not good for #2 or #3.
>
> For #2, something human readable like yyyymmdd is more useful that the
> time_t equivalent.  For today, the two values would be 20021215 and
> 1040001069.
>
> For #3, using "days" as the unit of measurement is good.  "Discard
> tokens older than 100 days" is easier than its time_t equivalent
> "discard tokens older than 8,640,000 seconds".
>
> "A Plan for Spam" was published in August and /.'ed on August 16.  ESR
> started bogofilter around then (0.2 is dated Aug 22 and it's oldest file
> is dated Aug 18 05:51).  Bogofilter could assign an August 2002 date to
> tokens without dates and wouldn't be too far off.

#2 and #3 are user interface issues. The data base already suffered from
endianness, let's not make it suffer again, this time from choosing the
wrong internal representation. Also, let's not make our lives harder
than need be, use existing tools, oh, and please let's not store time_t
directly, but convert it to a string, in hex (slightly faster) or
decimal presentation. We must also be prepared that systems switch to
64bit time_t, and we're going to lose big time when that happens and we
read 32bit into the wrong half of the time_t...

Printing the date to the user is a matter of strftime or
something. Reading an age from command line can well happen as count of
days, you just turn that into a reference time_t by doing "time(NULL) -
86400 * age_in_days" and compare the token age against this.

There is no need to put a human-readable format into the data base that
requires us to write our set of tools when time_t tools are available in
every libc.

-- 
Matthias Andree




More information about the bogofilter-dev mailing list