Radical lexers
David Relson
relson at osagesoftware.com
Thu Jan 22 13:30:12 CET 2004
On 22 Jan 2004 03:04:58 -0500
Tom Anderson <tanderso at oac-design.com> wrote:
> On Tue, 2004-01-20 at 11:03, David Relson wrote:
> > I'd warrant that your wordlists have a lot of hapaxes (tokens that
> > have occurred once and only once) taking up space. This seems
> > contrary to your efforts to minimize wordlist size :-(
>
> IMHO, more hapaxes for more accuracy is a good trade-off. However,
> wordlist size is definitely important. Therefore, someone should
> write a hapax stripping script that can be run in a cronjob. If I had
> time, I'd do it...
The cronjob is, roughly, a 1 liner. Use bogoutil's maintenance
functions.
More information about the Bogofilter
mailing list