bogofilter Digest 25 Sep 2003 21:41:21 -0000 Issue 187

Tom Anderson tanderso at oac-design.com
Fri Sep 26 18:53:25 CEST 2003


> It's up to the site administrator to determine the policy for
> bogofilter.  Using '-u' for auto-updating is one policy.  Train-on-error
> is another policy.  A maintenance policy for discarding singletons after
> N days that may be appropriate for the for the former but not the
> latter.  'Tis up to the site administrator to determine what works for
> his/her site!

I have two possible methods:

1) Use a timestamp for last-read consisting of 30 epoch days in a
bitwise format... that would require only 5 bits per token (assuming the
other format is turned off).

2-a) Store a single "time_since_last_purged" and simply purge ALL
hapaxes after some arbitrary number of days.  The ones that happend to
be added on the day of purging, if very important in identifying an
email as spam or not, will appear more than once in the subsequent
non-purge period (assuming -u).  Since purging could possibly be an
expensive operation, bogofilter could fork a copy of itself to the
background under low priority in order to do this.

2-b) Same as "2-a", but don't store any date at all, and simply purge on
the 1st of every month (or the 1st and 15th, etc.).

BTW, for whoever asked how "hapaxes" came about, I know that "hap" means
one, as in "haploid cells"... the suffix I don't know.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20030926/98a72ae4/attachment.sig>


More information about the Bogofilter mailing list