invalid html warfare

Peter Bishop pgb at adelard.com
Wed May 28 22:28:59 CEST 2003


On 28 May 2003 at 13:45, John McCain wrote:

> What if we removed from the database each token occurring only once in the
> database?  (bogoutil -c 1  *.db??)This would only be practical if done on a
> sufficiently infrequent interval for "good data" to accumulate more than one
> hit, but often enough to prevent database pollution.
> 

This gets more difficult if the "production" messages are NOT used for 
training (as it is in my case). A database singleton might be hit many 
times during production runs. So in my case I need a separate "hitlist"

It occurs to me that a little procmail magic could do this for me.
After spam labelling add:

:0Hcw
* ^X-Bogosity: Yes
| bogofilter -s -d $HITDIR

This should count the number of token hits in a separate "hit" folder
And I can use the hitcounts to decide if the tokens in my main spaml 
datatbase are redundant
i.e. trash tokens in the main spaml db will occur zero times in the hitlist
and so can be deleted

-- 
Peter Bishop 
pgb at adelard.com
pgb at csr.city.ac.uk






More information about the Bogofilter mailing list