Honeytraps and garbage removal

David Relson relson at osagesoftware.com
Tue Apr 15 22:35:47 CEST 2003


At 03:53 PM 4/15/03, Herman Oosthuysen wrote:

>Maybe, what is needed is 'forget' feature to delete tokens that have 
>become unused and aged beyond a preset time in order to keep the database 
>current.


The maintenance capabilities in bogoutil allow discarding of old tokens, 
tokens with low counts, etc.  Peter is using his honeytrap to update 
spamlist.db and only updating goodlist.db when a user sends him a false 
negative.  This means that token's timestamps will typically not get 
updated even though the tokens are being used to score incoming 
messages.  Since the tokens are not being updated, their timestamps will 
get older and older.

He wants to keep a separate list of recently used tokens so that he knows 
which ones are still needed.

Having written the above, I thought of a way the task can be accomplished 
without requiring any coding changes in bogofilter.

Bogolexer can be used to generate a list of tokens for a message (or a 
mailbox).  The token list could be appended to a file for each day.  At the 
end of the day, the list could be loaded into a 'status' wordlist.  The 
info in the 'status' wordlist would have a count of how many times (days) 
the token has been to it and would have a timestamp for the last 
day.  Periodically the spamlist, goodlist and statuslist could be dumped 
and the words from the statuslist could be used to select current words 
from the other two lists.  Finally, the current wordlists could be used to 
build new spamlist and goodlist files.

The generation/updating of the status list could be done several ways.  A 
copy of each day's mail could be saved and processed using "bogolexer -p" 
at the end of the day.  Alternatively, "bogolexer -p" could be run for each 
incoming message and a file of parsed tokens could be accumulated during 
the day.  Either of these outputs could be piped to "bogoutil -d 
statuslist.db".

Hope these ideas spark the desired capability :-)

David





More information about the Bogofilter mailing list