Keeping the cruft out (was Re: no To: header in emails)

Tom Allison tallison at tacocat.net
Sat Mar 13 16:01:19 CET 2004


Eric Wood wrote:
> Boris 'pi' Piwinger wrote:
> 
>>># Strip Useless TABS and HTML comments, used to split up words
>>
>>Why do something bogofilter already does?
> 
> 
> So later, you can do your own grep searches a little easier.
> 

But if bogofilter does it, why grep for it later yourself?

I don't get this need for continued development on bogofilter.
Unless there is tangible evidence that something is not up to par, then 
there isn't much need for a continued bugfix process.  Or, if it ain't 
broke, don't fix it.

I haven't seen anything to indicate that bogofilter is in any way broken.

If I could make a wish, it would be for bogotune to require less than 
2,000 words for analysis.  However, while I'm not a professional 
statistician, I do know that if you don't have enough of a sample base, 
you can't reliably work with the data and expect good results.

Therefore I am forced to simply use what I have, which works well, and 
modify things by hand if I'm really feeling a need to.  Until then, I'll 
leave the automated tools to the auspices of those who actually know 
what the F\w+ bogofilter really does under the hood.

One think I do know is that if we start meddling with the statistical 
process it will no longer be statistical.  And once that's done it's a 
little hard to be certain where you are headed next.

I personally have messed with bogofilter in the wrong way and created a 
database that was both HUGE and incredibly stupid.  I didn't bother to 
wait around for the heuristic analysis of the data.  I realized I hosed 
up, was thankful for a lot of archives, and promptly rebuilt everything 
from scratch.

Frankly, I'm amazed at how well bogofilter works and how easily it went 
from a base install to 99.9%.





More information about the Bogofilter mailing list