Radical lexers
Tom Anderson
tanderso at oac-design.com
Thu Jan 22 09:04:58 CET 2004
On Tue, 2004-01-20 at 11:03, David Relson wrote:
> I'd warrant that your wordlists have a lot of hapaxes (tokens that have
> occurred once and only once) taking up space. This seems contrary to
> your efforts to minimize wordlist size :-(
IMHO, more hapaxes for more accuracy is a good trade-off. However,
wordlist size is definitely important. Therefore, someone should write
a hapax stripping script that can be run in a cronjob. If I had time,
I'd do it...
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20040122/5c42be9b/attachment.sig>
More information about the Bogofilter
mailing list