invalid html warfare
Peter Bishop
pgb at adelard.com
Wed May 28 22:28:59 CEST 2003
On 28 May 2003 at 13:45, John McCain wrote:
> What if we removed from the database each token occurring only once in the
> database? (bogoutil -c 1 *.db??)This would only be practical if done on a
> sufficiently infrequent interval for "good data" to accumulate more than one
> hit, but often enough to prevent database pollution.
>
This gets more difficult if the "production" messages are NOT used for
training (as it is in my case). A database singleton might be hit many
times during production runs. So in my case I need a separate "hitlist"
It occurs to me that a little procmail magic could do this for me.
After spam labelling add:
:0Hcw
* ^X-Bogosity: Yes
| bogofilter -s -d $HITDIR
This should count the number of token hits in a separate "hit" folder
And I can use the hitcounts to decide if the tokens in my main spaml
datatbase are redundant
i.e. trash tokens in the main spaml db will occur zero times in the hitlist
and so can be deleted
--
Peter Bishop
pgb at adelard.com
pgb at csr.city.ac.uk
More information about the Bogofilter
mailing list