Algorithm limitations.

michael at optusnet.com.au michael at optusnet.com.au
Mon Apr 12 06:07:15 CEST 2004


"Boris 'pi' Piwinger" <3.14 at logic.univie.ac.at> writes:
> michael at optusnet.com.au wrote:
> 
> >1. The absense of a feature.
> >2. the XOR problem.
> 
> OK, so you found the first is of no use. The second is a
> funny idea, but it would extremely blow up the database. You

Doing it the naive way would. I was trying to work out
if there's a slightly more intelligent way of doing it.  After
all, 2 layer non-linear neural networks can learn it. :)

> would need to save all combinations of tokens. It seems more
> interesting to save neighboured tokens like:
> intersting to save
> intersting to
> intersting * save
> etc.
> 
> That already gets huge.

I'm already doing word pairs. You might have seen the patch I posted
previously for a lossy token database. That was design to support
exactly what you're talking about. Basically it would allow
bogofilter to generate a vast array of tokens and only keep the
ones that occur 'frequently'. 'frequently' here means "at a frequency
high enough that a second instance comes along before the first has been
discarded from the database". :)

Michael.




More information about the Bogofilter mailing list