tuning and archives

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Mon Feb 23 17:25:16 CET 2004


Stroller wrote:

>> I really don't know if it matters that the tokens in the
>> database are balanced in number. Balancing them in content
>> is done with training to exhaustion. This would just let
>> your problem vanish.
> 
> I appear to get relatively little spam - well, my archive of ham is 
> MUCH larger, but perhaps that's just because I've been more 
> conscientious about storing it.

Good enough. With this concept, bogofilter just learns
enought about that it needs to understand. My impression it,
that the ratio has nothing to do with the input sizes.

> But casual observation yesterday seemed to indicate that bogofilter 
> only catches c 95% of incoming spam. I had noticed already that 
> Bogofilter was not as accurate as might be anticipated (I think others 
> expect 97% - 99%?) and had assumed this was due to the imbalance.

Actually, with only 99% I would look for another program.

pi




More information about the Bogofilter mailing list