Radical lexers

Tom Anderson tanderso at oac-design.com
Thu Jan 22 09:04:58 CET 2004


On Tue, 2004-01-20 at 11:03, David Relson wrote:
> I'd warrant that your wordlists have a lot of hapaxes (tokens that have
> occurred once and only once) taking up space.  This seems contrary to
> your efforts to minimize wordlist size :-(

IMHO, more hapaxes for more accuracy is a good trade-off.  However,
wordlist size is definitely important.  Therefore, someone should write
a hapax stripping script that can be run in a cronjob.  If I had time,
I'd do it...

Tom

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20040122/5c42be9b/attachment.sig>


More information about the Bogofilter mailing list