compile time options

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Tue Sep 30 14:28:59 CEST 2003


David Relson wrote:

>> > I'm anticipating 0.15.5 being promoted from "current release" to
>> > "stable release" in a week.  That'll be an appropriate time to
>> > announce cleanup plans.
>> 
>> Good point. We could then make a cleanup for 0.16. Should I
>> start with suggestions?
> 
> Feel free!

OK, I'll start from the man page (1.15.4):

>        The -t (terse) option tells bogofilter to print an  abbre­
>        viated  spamicity  message  containing  1  letter  and the
>        score. Spam is indicated with "Y", ham by "N", and  unsure
>        by  "U".	 Note: the formatting can be customized using the
>        config file.

I think, this can go. -T is for machine readability and does
what we need.

>        The -2 option tells bogofilter to binary classify the mes­
>        sage as either ham or spam, and never as unsure. When this
>        option is used with -u, a wordlist is always updated.
> 
> 
>        The -3 option tells bogofilter to use tristate classifica­
>        tion for the message, i.e. classify the	message	 as  ham,
>        spam, or unsure. This option is effective only if ham_cut­
>        off is non-zero.

Those can go, the decision can be made by choosing
appropriate cutoffs.

>        When reading mbox format, bogofilter relies on  the  empty
>        line after a mail.

BTW: We should mention formail -es here which fixes this in
mboxes.

>        The  -Bfilename	(bulk  mode)  option  tells bogofilter to
>        classify multiple objects  (see	the  previous  paragraph)

Do we need both -b and -B? Isn't one enough?


>        The  -F	(force)	 ignores  threshold  values when printing
>        spamicity statistics.

I don't understand this one, which makes me feel it is not
needed;-)

>        The  -d	dir  option allows you to set the directory under
>        which the wordlists will be found to dir. If omitted,  the
>        default	directory  will	 be  $BOGOFILTER_DIR  if BOGOFIL­
>        TER_DIR is set and $HOME/.bogofilter otherwise.

Is that correct? Doesn't the config file come in here?
Anyhow, this is explained later. So "If omitted ..." should
be deleted here.

>        The -k tag option sets the cache size for  the  BerkeleyDB
>        subsystem. Properly sizing the cache improves bogofilter's
>        performance. Run the bogotune script to determine the rec­
>        ommended size.

Enough if only in config file.

>        The  -L	tag option configures a tag which can be included
>        in the information being logged by the -l option,  but  it
>        requires	 a  custom format that includes the %l string for
>        now. This option implies -l.

Enough if only in config file.

>        The -I filename option tells bogofilter to read its  input
>        from the specified file, rather than from stdin

I cannot see a situation where we could not read from stdin.
So this would be superfluous.

>        The -O filename option tells bogofilter where to write its
>        output in passthrough mode. Note that this only works when
>        -p is explicitly given.

Why not capture this from stdout? So this could also go.

>        The  -W	 option tells bogofilter to operate with a single
>        wordlist, named wordlist.db. Each token in wordlist.db  is
>        stored  as  an  ASCII string with two counts (for spam and
>        ham) and (optionally) a timestamp.
> 
> 
>        The -WW	option tells bogofilter to operate with a pair of
>        wordlists,  named spamlist.db and goodlist.db. Spamlist.db
>        stores tokens, counts, and timestamps for tokens from spam
>        messages.  Goodlist.db  stores  tokens, counts, and times­
>        tamps for tokens from ham messages.

I think those can go. Either we drop the two lists
completely or you can set it in the config file.

>        The -O filename option tells bogofilter where to write its
>        output in passthrough mode. Note that this only works when
>        -p is explicitly given.

We had that before. Needs to be fixed in the man page.

>        The -g option selects the original Graham form of the cal­
>        culation method.
> 
>        The -r option selects the Robinson  modifications  to  the
>        calculation method.
> 
>        The -f option selects the Robinson-Fisher modifications to
>        the calculation method.

Those can go, config file is enough.

>        Bogofilter  has three special parsing options which can be
>        enabled	(or  disabled)	at  the	 user's	 discretion.  The
>        options	are  of	 form  -Px  and -PX where x designates an
>        option letter. For the parsing options, a lower case  let­
>        ter  enables  the option and an upper case letter disables
>        it.

I think they can all go completely. Let's fix the defaults.

>        The  -m	[value][,value][,value] option allows setting the
>        min_dev value and, optionally, the robs and  robx  values.

>        The -o [value][,value] option allows setting the spam_cut­
>        off  value  and,	 optionally, the ham_cutoff value.

Useful for testing, but it could be done using the -c
switch. I'd leave them in.

>        Option -y date specifies the date to give to  tokens  that
>        don't have dates.

Is that relevant for bogofilter? Or should that be bogoutil?

> ENVIRONMENT
>        Bogofilter will initialize  its	data  base  directory  to
>        $BOGOFILTER_DIR	if  BOGOFILTER_DIR  is	set. If it is not
>        set, bogofilter will  use  $HOME/.bogofilter  instead.  If
>        neither	BOGOFILTER_DIR nor HOME is set, the -d dir option
>        must be present.

With the combined wordlist, we only have one file in that
directory. So it would be good enough to name the file directly.


General remark: Of course, some people will use one or the
other options here, they will have to do some changes, but
that should not be too complicated.

pi





More information about the Bogofilter mailing list