Radical lexers

David Relson relson at osagesoftware.com
Thu Jan 22 13:30:12 CET 2004


On 22 Jan 2004 03:04:58 -0500
Tom Anderson <tanderso at oac-design.com> wrote:

> On Tue, 2004-01-20 at 11:03, David Relson wrote:
> > I'd warrant that your wordlists have a lot of hapaxes (tokens that
> > have occurred once and only once) taking up space.  This seems
> > contrary to your efforts to minimize wordlist size :-(
> 
> IMHO, more hapaxes for more accuracy is a good trade-off.  However,
> wordlist size is definitely important.  Therefore, someone should
> write a hapax stripping script that can be run in a cronjob.  If I had
> time, I'd do it...

The cronjob is, roughly, a 1 liner.  Use bogoutil's maintenance
functions.




More information about the Bogofilter mailing list