compile time options
Boris 'pi' Piwinger
3.14 at logic.univie.ac.at
Tue Sep 30 14:28:59 CEST 2003
David Relson wrote:
>> > I'm anticipating 0.15.5 being promoted from "current release" to
>> > "stable release" in a week. That'll be an appropriate time to
>> > announce cleanup plans.
>>
>> Good point. We could then make a cleanup for 0.16. Should I
>> start with suggestions?
>
> Feel free!
OK, I'll start from the man page (1.15.4):
> The -t (terse) option tells bogofilter to print an abbre
> viated spamicity message containing 1 letter and the
> score. Spam is indicated with "Y", ham by "N", and unsure
> by "U". Note: the formatting can be customized using the
> config file.
I think, this can go. -T is for machine readability and does
what we need.
> The -2 option tells bogofilter to binary classify the mes
> sage as either ham or spam, and never as unsure. When this
> option is used with -u, a wordlist is always updated.
>
>
> The -3 option tells bogofilter to use tristate classifica
> tion for the message, i.e. classify the message as ham,
> spam, or unsure. This option is effective only if ham_cut
> off is non-zero.
Those can go, the decision can be made by choosing
appropriate cutoffs.
> When reading mbox format, bogofilter relies on the empty
> line after a mail.
BTW: We should mention formail -es here which fixes this in
mboxes.
> The -Bfilename (bulk mode) option tells bogofilter to
> classify multiple objects (see the previous paragraph)
Do we need both -b and -B? Isn't one enough?
> The -F (force) ignores threshold values when printing
> spamicity statistics.
I don't understand this one, which makes me feel it is not
needed;-)
> The -d dir option allows you to set the directory under
> which the wordlists will be found to dir. If omitted, the
> default directory will be $BOGOFILTER_DIR if BOGOFIL
> TER_DIR is set and $HOME/.bogofilter otherwise.
Is that correct? Doesn't the config file come in here?
Anyhow, this is explained later. So "If omitted ..." should
be deleted here.
> The -k tag option sets the cache size for the BerkeleyDB
> subsystem. Properly sizing the cache improves bogofilter's
> performance. Run the bogotune script to determine the rec
> ommended size.
Enough if only in config file.
> The -L tag option configures a tag which can be included
> in the information being logged by the -l option, but it
> requires a custom format that includes the %l string for
> now. This option implies -l.
Enough if only in config file.
> The -I filename option tells bogofilter to read its input
> from the specified file, rather than from stdin
I cannot see a situation where we could not read from stdin.
So this would be superfluous.
> The -O filename option tells bogofilter where to write its
> output in passthrough mode. Note that this only works when
> -p is explicitly given.
Why not capture this from stdout? So this could also go.
> The -W option tells bogofilter to operate with a single
> wordlist, named wordlist.db. Each token in wordlist.db is
> stored as an ASCII string with two counts (for spam and
> ham) and (optionally) a timestamp.
>
>
> The -WW option tells bogofilter to operate with a pair of
> wordlists, named spamlist.db and goodlist.db. Spamlist.db
> stores tokens, counts, and timestamps for tokens from spam
> messages. Goodlist.db stores tokens, counts, and times
> tamps for tokens from ham messages.
I think those can go. Either we drop the two lists
completely or you can set it in the config file.
> The -O filename option tells bogofilter where to write its
> output in passthrough mode. Note that this only works when
> -p is explicitly given.
We had that before. Needs to be fixed in the man page.
> The -g option selects the original Graham form of the cal
> culation method.
>
> The -r option selects the Robinson modifications to the
> calculation method.
>
> The -f option selects the Robinson-Fisher modifications to
> the calculation method.
Those can go, config file is enough.
> Bogofilter has three special parsing options which can be
> enabled (or disabled) at the user's discretion. The
> options are of form -Px and -PX where x designates an
> option letter. For the parsing options, a lower case let
> ter enables the option and an upper case letter disables
> it.
I think they can all go completely. Let's fix the defaults.
> The -m [value][,value][,value] option allows setting the
> min_dev value and, optionally, the robs and robx values.
> The -o [value][,value] option allows setting the spam_cut
> off value and, optionally, the ham_cutoff value.
Useful for testing, but it could be done using the -c
switch. I'd leave them in.
> Option -y date specifies the date to give to tokens that
> don't have dates.
Is that relevant for bogofilter? Or should that be bogoutil?
> ENVIRONMENT
> Bogofilter will initialize its data base directory to
> $BOGOFILTER_DIR if BOGOFILTER_DIR is set. If it is not
> set, bogofilter will use $HOME/.bogofilter instead. If
> neither BOGOFILTER_DIR nor HOME is set, the -d dir option
> must be present.
With the combined wordlist, we only have one file in that
directory. So it would be good enough to name the file directly.
General remark: Of course, some people will use one or the
other options here, they will have to do some changes, but
that should not be too complicated.
pi
More information about the Bogofilter
mailing list