lexer change

David Relson relson at osagesoftware.com
Tue Nov 11 15:12:42 CET 2003


On 11 Nov 2003 08:46:56 -0500
Tom Anderson <tanderso at oac-design.com> wrote:

> On Mon, 2003-11-10 at 11:17, Boris 'pi' Piwinger wrote:
> > make too much sense. So my question is more if we need the
> > dollar case (not allowing 123.45$ or $100,000 at the same
> > time). Or if we need it why only this special case of a price?
> 
> Why would we be excluding these in the first place?  If it has a
> non-numeric in it, then it shouldn't fit the all-numeric rule, which
> probably shouldn't even be there in the first place.  Unless a rule is
> absolutely required, it shouldn't be institued.  The default action
> should be to use Bayesian filtering, not rules.  In short, yes, all
> prices should be evaluated, as they could very well be good
> indicators.
> 
> Tom

Tom,

As you know bogofilter is evolving.  The initial implementation was case
insensitive, didn't include monetary quantities, dates, message ids, etc
because that seemed the right way to go.  Over time, experiments have
shown that some of the early decisions were correct and some were
incorrect, so bogofilter gets changed.

The rules determine how the message is parsed and tokens get created. 
After that, the bayesian technique is applied to the tokens.

Hope this helps,

David




More information about the Bogofilter mailing list