message tags

Chris Wilkes cwilkes-bf at ladro.com
Sun Aug 29 04:16:12 CEST 2004


On Sat, Aug 28, 2004 at 02:33:06PM -0400, David Relson wrote:
> 
> It has been suggested that bogofilter should be able to avoid
> duplication registration of a message and avoid unregistering a message
> that was never registered.  Given a unique tag, bogofilter could check
> the tag before registering or unregistering a message.  To take the idea
> a step further, the tag would have a value of 1 for its ham count or its
> spam count.  The tag value would also make it easy to fix an incorrect
> classification.

How about comparing the timestamps of all the tokens in that email?  If
all the tokens aren't in the wordlist then one can assume that it hasn't
been registered yet (that is, unless the wordlist has been trimmed).

If all the words are in there, then a date cutoff value can be used --
if there's a token with a timestamp older than that then assume the
email hasn't been registered.

One could get tricker and look for the last received timestamp in the
email and use that as the cutoff date (make sure you use the last
Received header as that's the only one you can trust).

For a test of this you could run all your email through and print out
the oldest timestamp value of one of its tokens or a note that a token
is missing.

Course this wouldn't work if the tokenizer's been changed or if someone
trims their wordlist of singletons often.

Chris



More information about the Bogofilter mailing list