message tags

David Relson relson at osagesoftware.com
Sat Aug 28 20:33:06 CEST 2004


Greetings,

It has been suggested that bogofilter should be able to avoid
duplication registration of a message and avoid unregistering a message
that was never registered.  Given a unique tag, bogofilter could check
the tag before registering or unregistering a message.  To take the idea
a step further, the tag would have a value of 1 for its ham count or its
spam count.  The tag value would also make it easy to fix an incorrect
classification.

All this is well and good, but what part of the message can be used for
the tag (or to generate a tag)?  Bogofilter already recognizes the
Message-ID, Queue-ID, and IP address for a message.  However none of
these is unique (see note 1), which is a desirable characteristic.

One approach might be to generate an md5sum (or other other checksum)
for the header.  To ensure that bogofilter generates the same checksum,
it'll be necessary to trim lines line the envelope, X-Bogosity:, etc.

Do y'all think a message tag is of value?  How should it be generated?

To minimize the impact of message tags (see note 2), using them will be
an option (which defaults to "off").  

Regards,

David

Note 1:  As an aside, checking my August spam, I see Message-ID's that
occur as often as 8 times.  For one of those, postfix generates 2
different SMTP ID's (each being used for 4 messages).

Note 2: Computing message tags will require time and storing them will
take wordlist space.  A way to trim old tags may also be needed.



More information about the Bogofilter mailing list