Keeping the cruft out (was Re: no To: header in emails)
Tom Allison
tallison at tacocat.net
Sat Mar 13 16:01:19 CET 2004
Eric Wood wrote:
> Boris 'pi' Piwinger wrote:
>
>>># Strip Useless TABS and HTML comments, used to split up words
>>
>>Why do something bogofilter already does?
>
>
> So later, you can do your own grep searches a little easier.
>
But if bogofilter does it, why grep for it later yourself?
I don't get this need for continued development on bogofilter.
Unless there is tangible evidence that something is not up to par, then
there isn't much need for a continued bugfix process. Or, if it ain't
broke, don't fix it.
I haven't seen anything to indicate that bogofilter is in any way broken.
If I could make a wish, it would be for bogotune to require less than
2,000 words for analysis. However, while I'm not a professional
statistician, I do know that if you don't have enough of a sample base,
you can't reliably work with the data and expect good results.
Therefore I am forced to simply use what I have, which works well, and
modify things by hand if I'm really feeling a need to. Until then, I'll
leave the automated tools to the auspices of those who actually know
what the F\w+ bogofilter really does under the hood.
One think I do know is that if we start meddling with the statistical
process it will no longer be statistical. And once that's done it's a
little hard to be certain where you are headed next.
I personally have messed with bogofilter in the wrong way and created a
database that was both HUGE and incredibly stupid. I didn't bother to
wait around for the heuristic analysis of the data. I realized I hosed
up, was thankful for a lot of archives, and promptly rebuilt everything
from scratch.
Frankly, I'm amazed at how well bogofilter works and how easily it went
from a base install to 99.9%.
More information about the Bogofilter
mailing list