Self-adjusting bogofilter.cf settings

Mark Constable markc at renta.net
Sat Feb 28 00:57:18 CET 2004


On Sat, 28 Feb 2004 08:06 am, Chris Fortune wrote:
> I had a good look at bogofilter.cf.example and honestly couldn't see what
> paramaters should be usefully tweaked on a daily basis, with perhaps the
> limited exception of the BLOCK ON SUBNETS setting (eg: your server is
> getting slammed by a spammer from a single IP block with lots of subnets).

> Which settings do you feel would be most useful to optimize?

I wish I knew :) I mean manually, let alone in any automated sense.

Some folks on this list seem to have a good sense of what parameters
to adjust for certain results but, to me, most of the discussion of
stats and results is a blur and fortunately I get along fine with
whatever are the defaults, along with one single setting of ham_cutoff
= 0.312000 to provide me with the tristate system that has virtually 
freed my inbox from spam for the past month. That 0.312000 figure
arbitarily came from some posting where I discovered this was the new 
way to engage tristate filtering post ~v0.16. I have absolutely no
idea what 0.312000 actually means nor whether 0.311000 or 0.313000
would be any better or worse.

The only way I can see to increase my understanding of the interaction
between min_dev vs spam_cutoff vs ham_cutoff vs robx vs every other
tweakable parameter is to spend a _lot_ of time pouring over the
archives of this list, semi-massive googling and hit and miss
experimentation until something starts to make sense. I'm busy enough 
anyway so I'd rather spend that time relaxing in front of a TV. So 
it occurred to me maybe there is some way to leverage the formidable 
group understanding of the interaction between these tweakable bogo
parameters, by some/most folks on this list, but at a minimum there 
needs to be a log/history of settings from a wordlist.db stored 
somewhere. Hence...

> > If not then, for instance, if a one line summary of the current
> > wordlist.db settings could be logged daily then surely some
> > _obvious_ (if they exist) trends could be analyzed, and according
> > to some _obvious_ rules, write out an optimised bogofilter.cf ?

I did not mean that any tweaking should be on a daily basis, only
the logging of certain state values like wordlist counts etc.

As an ISP admin, and an analogy, I track traffic and hard drive usage
and produce pretty graphs which after 1/2 a year or so indicate trends
to the point where it's possible to predict when bandwidth and HDDs
have to be upgraded.

If various bogo* tools logged certain important settings, wordlist
counts, whatever, on a daily basis then I could _imagine_ that if
some parameters reached certain limits vs other parameter settings
then it could be probable that x, y or z parameter could be tweaked,
plus or minus some value according to yet another parameter, or sum
of them over time, and to write out a new bogofilter.cf which then
gets tracked and perhaps self-adjusted yet again down the track. What
these "rules of adjustment" could be is losely tied up in the group
understanding of some/most of the people on this list. If these rules
could be extracted to code then anyone, even me, could start to apply 
and fine tune bogofilter usage even better than "good enough".

Sorry about the long rave. If I knew what I was talking about I would 
provide precise details, examples and even code.

--markc




More information about the Bogofilter mailing list