next release [was: ' or ` at TOKENBACK]

David Relson relson at osagesoftware.com
Mon Nov 17 15:12:04 CET 2003


On Mon, 17 Nov 2003 14:56:47 +0100
Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> wrote:

> David Relson wrote:
> 
...[snip]
> 
> I am not sure about BOGOLEX_TOKEN, which additional
> characters can show up there which may not show up in a
> TOKEN? Right now we have: ?=():#+

BOGOLEX_TOKEN is fine as it is.  It's not used in parsing messages.  It
exists to recognize messages in msg-count format, as generated by
bogolex.sh,   Basically, it needs to accept any token written by
"bogoutil -d".

> BTW: After so many lexer changes, when will we have the next
> release?
> 
> pi

At the moment, bogotune's memory usage is the hot button.  At present it
uses 20 bytes per token and the locality (working set) isn't great. 
When the set of messages is large enough that there's a need for
swapping, performance is terrible.  I'm working to reduce the storage to
8 bytes per token (plus some per-message overhead).  Done properly, it
will be able to efficiently process larger data sets than presently.

David




More information about the Bogofilter mailing list