I use bogofilter to additionally calculate an incoming email's 
"archivicity" (the likelihood that I will want to save it in my 
permanent email archive, excluding messages from machines, "uplifting" 
email forwards from one of my sisters, etc).

I just use -c to select a different config file, which specifies a 
different wordlist file, a different email header, different labels 
instead of "Ham" and "Spam", etc.

One word of caution: bogofilter's classification command line arguments 
(-n and -s) are very clearly for nonspam versus spam classification 
(versus another Bayesean tool, dbacl, where you have to specify the 
category explicitly), and that has profound usage implications.  Let me 

When I set this up initially, I wanted to have messages that wanted I to 
keep in my archive to have a high numerical "archivicity", so a message 
I was certain to want in the archive would have a reported bogosity of 
1.00.  So I set up my tools to classify those messages with -s (OK, 
that's weird), and changed the "Spam" label to "Yes".  Vice versa for 
messages I was certain to not want: classify with -n, and change "Ham" 
to "No".

As long as the tools were handling it, everything would be fine, but a 
couple of times a year I'd try to classify something by hand, and I 
would *always* think "I want to retain this message, so classify with 
-n", which would change the statistics 100% in the wrong direction.

So finally I threw away my existing database, changed all my tools to 
switch -n and -s, and relabeled things so a message with a high numeric 
bogosity means that I probably *don't* want to keep it, and it's labeled 
"X-Archive: No".  The numbers don't make sense, but it keeps me from 
messing up my database.

I've experimented with dbacl, which supports comparing a message to 
multiple categories (eg, email that might be work, personal, or 
hobby-related), in addition to not being tuned for spam vs ham.  It 
looks good, but my existing setup satisfies my current desires, so I 
just haven't taken the time to play with it very much.

I've attached my config.archive.  I hope this helps.

spam_cutoff=0.501122    # for 0.20% fp (1); expect 1.00% fn (5).

#### WORDLIST: define additional word lists
#	char type: 'r' (regular) or 'i' (ignore)
#	char *name: name of list, e.g. "system", "user", "ignore"
#	char *path: absolute path to file or
#	            file name (relative to bogofilter_dir)
#	int  order - once found, skip higher numbered lists
wordlist r,archivewords,archivewords.db,1

#	used in reporting spamicity and
#	in removing already existing headers

#### Format of spamicity output
# for two-state output the third entry is not needed and not used
spamicity_tags = No, Yes, Unsure
spamicity_formats = %0.6f, %0.6f, %0.6f

#### Format of SPAM_HEADER
#	formatting characters:
#	    h - spam_header_name, e.g. "X-Bogosity"
#	    c - classification, e.g. Yes/No, Spam/Ham/Unsure, +/-/?
#	    D - date, fixed ISO-8601 format for Universal Time ("GMT")
#	    e - spamicity as 'e' format
#	    f - spamicity as 'f' format
#	    g - spamicity as 'g' format
#	    A - IP address (from first Received: statement having one)
#		Not guaranteed to be the originating address of the message.
#	    I - Message ID
#	    Q - Queue ID (from first id tag found in Received: headers)
#	    l - logging tag (from '-l' option)
#	    o - spam_cutoff, ex. cutoff=%o
#	    p - spamicity value
#	    d - if ham or unsure, the spamicity
#		if spam, difference of spamicity from 1.0
#	    r - runtype
#	        w - word count
#	        m - message count
#	    u - username - this will either be the login from getlogin(),
#			   if that is empty, the pw_name obtained from
#			   the password database, or the user id
#			   prefixed by #, for instance, #1003
#	    v - version
#    customizable messages:
#	header_format - the "X-Bogosity" line that '-p' adds to
#		the message header and '-v' outputs.
#	terse_format - an abbreviated form of header_format;
#		selected by command line option '-t'
#	log_header_format - written to syslog by '-u' option
#		when classifying messages.
#	log_update_format - written to syslog by '-u' option
#		when registering messages.
header_format = %h: %c, tests=bogofilter, archivicity=%p, version=%v
#terse_format = %1.1c %f
log_header_format = %h: %c, archivicity=%p, version=%v
#log_update_format = register-%r, %w words, %m messages
##log_header_format = %h: %c, spamicity=%f, ipaddr=%A, queueID=%Q, msgID=%I, version=%v
