Honeytraps and garbage removal
David Relson
relson at osagesoftware.com
Tue Apr 15 22:35:47 CEST 2003
At 03:53 PM 4/15/03, Herman Oosthuysen wrote:
>Maybe, what is needed is 'forget' feature to delete tokens that have
>become unused and aged beyond a preset time in order to keep the database
>current.
The maintenance capabilities in bogoutil allow discarding of old tokens,
tokens with low counts, etc. Peter is using his honeytrap to update
spamlist.db and only updating goodlist.db when a user sends him a false
negative. This means that token's timestamps will typically not get
updated even though the tokens are being used to score incoming
messages. Since the tokens are not being updated, their timestamps will
get older and older.
He wants to keep a separate list of recently used tokens so that he knows
which ones are still needed.
Having written the above, I thought of a way the task can be accomplished
without requiring any coding changes in bogofilter.
Bogolexer can be used to generate a list of tokens for a message (or a
mailbox). The token list could be appended to a file for each day. At the
end of the day, the list could be loaded into a 'status' wordlist. The
info in the 'status' wordlist would have a count of how many times (days)
the token has been to it and would have a timestamp for the last
day. Periodically the spamlist, goodlist and statuslist could be dumped
and the words from the statuslist could be used to select current words
from the other two lists. Finally, the current wordlists could be used to
build new spamlist and goodlist files.
The generation/updating of the status list could be done several ways. A
copy of each day's mail could be saved and processed using "bogolexer -p"
at the end of the day. Alternatively, "bogolexer -p" could be run for each
incoming message and a file of parsed tokens could be accumulated during
the day. Either of these outputs could be piped to "bogoutil -d
statuslist.db".
Hope these ideas spark the desired capability :-)
David
More information about the Bogofilter
mailing list