[Bogofilter]UNSURE messages end up in inbox folder
Daniel Moyne
daniel.moyne at neuf.fr
Tue Aug 19 12:42:57 CEST 2008
Le Wednesday 13 August 2008, Daniel Moyne a écrit :
> Nigel, thanks for all these details ; I have found what was wrong about
> UNSURE messages wrongly forwarded : I had "X-Bogosity " set in filter
> rather than "X- Bogosity".
>
> I have to leave for a few days ; when I am back I will have to go back to
> your 4 filters as set in my KMail KDE-4 and compare their efficiency with
> what I had set in my KMail KDE-3 baucuse apparently training on Bogofilter
> is not as good : some spam messages keep coming back
>
> Daniel.
Nigel before going any further in discussing efficiency of your proposed
filtering system I have attached my file "bogofilter.cf" that is part of my
setting ; may be there is something wrong in it as after 4 days of absence I
collected plenty of spam mail in my inbox.
Regards.
--
Daniel Moyne (Nulix)---------------------------------------------------------
Distribution : Ubuntu 8.04 Hardy Heron \\|||// Machine : x86_64
kernel 2.6.24-19-generic / --- \ ATI Radeon X300 Express
KDE 3.5.9 + 4.1 (test) (' o-o ')
----------------------------------------oOO-(_)-OOo--------------------------
-------------- next part --------------
# Comment lines MUST have their hash mark in the leftmost column.
# Comments can be added at the end of any line (after whitespace and a '#').
# Blank lines are allowed.
########### General Settings ########################################
#### BOGOFILTER_DIR
#
# directory for wordlists
#
bogofilter_dir=~/.bogofilter
##bogofilter_dir=/var/spool/bogofilter
#### name/location of user config file
#
user_config_file=~/.bogofilter/.bogofilter.cf
##user_config_file=~/.bogofilterrc
##user_config_file=~/.bogofilter/config
#### TRANSACTIONS: enable/disable database transactions
#
# boolean indicating whether transactions
# should be enabled (yes) or disabled (no)
#
db_transaction=no # default
##db_transaction=yes # (alternate)
#### WORDLIST: define additional word lists
#
# char type: 'r' (regular) or 'i' (ignore)
# char *name: name of list, e.g. "system", "user", "ignore"
# char *path: absolute path to file or
# file name (relative to bogofilter_dir)
# int order - once found, skip higher numbered lists
#
##wordlist i,ignore,~/ignorelist.db,1
##wordlist r,wordlist,~/wordlist.db,2
#### SPAM_HEADER_NAME
#
# used in reporting spamicity and
# in removing already existing headers
#
spam_header_name=X-Bogosity
#### SPAM_SUBJECT_TAG
#
# tag added to "Subject: " line for identifying spam or unsure
# default is to add nothing.
#
spam_subject_tag=***SPAM***
unsure_subject_tag=???UNSURE???
#### STATS_IN_HEADER
#
# non-zero (default): put spamicity info in message header
# zero: put spamicity info in message body
# can use "bool" values of True, False, Yes, No, 1, or 0
#
stats_in_header=Yes # default
##stats_in_header=No # (alternate)
#### DB_CACHESIZE
#
# non-zero: set this as DB cache size (in Mbytes)
# zero: use DB default cache size (.25 Mbyte in 4.0.14)
#
# note that Berkeley DB increases any buffer size below 500 MB
# by 25%!
# This helps most when doing massive changes to the data base that
# involve a lot of overwrites, such as registering mail boxes,
# whereas it is mostly a waste of memory for read-only
# applications such as scoring.
# WARNING: If you set this too large, bogofilter will fail.
#
db_cachesize=0 # default
##db_cachesize=16 # (alternate)
#### DB_LOG_AUTOREMOVE
#
# boolean indicating whether auto-removing of
# logs should be enabled (yes) or disabled (no)
#
db_log_autoremove=yes # default
##db_log_autoremove=no # (alternate)
#### TIMESTAMP
#
# enables or disables token timestamps
#
timestamp=Yes
#### Format of spamicity output
#
# for two-state output the third entry is not needed and not used
#
spamicity_tags = Spam, Ham, Unsure
spamicity_formats = %0.6f, %0.6f, %0.6f
#
##spamicity_tags = Yes, No, Unsure
##spamicity_formats = %0.6f, %0.6f, %0.6f
#### Format of SPAM_HEADER
#
# formatting characters:
#
# h - spam_header_name, e.g. "X-Bogosity"
#
# c - classification, e.g. Yes/No, Spam/Ham/Unsure, +/-/?
#
# D - date, fixed ISO-8601 format for Universal Time ("GMT")
#
# e - spamicity as 'e' format
# f - spamicity as 'f' format
# g - spamicity as 'g' format
#
# A - IP address (from first Received: statement having one)
# Not guaranteed to be the originating address of the message.
# I - Message ID
# Q - Queue ID (from first id tag found in Received: headers)
#
# l - logging tag (from '-l' option)
#
# o - spam_cutoff, ex. cutoff=%o
#
# p - spamicity value
# d - if ham or unsure, the spamicity
# if spam, difference of spamicity from 1.0
#
# r - runtype
# w - word count
# m - message count
#
# u - username - this will either be the login from getlogin(),
# if that is empty, the pw_name obtained from
# the password database, or the user id
# prefixed by #, for instance, #1003
#
# v - version
#
# customizable messages:
#
# header_format - the "X-Bogosity" line that '-p' adds to
# the message header and '-v' outputs.
# terse_format - an abbreviated form of header_format;
# selected by command line option '-t'
# log_header_format - written to syslog by '-u' option
# when classifying messages.
# log_update_format - written to syslog by '-u' option
# when registering messages.
#
#
#header_format = %h: %c, tests=bogofilter, spamicity=%p, version=%v
#terse_format = %1.1c %f
#log_header_format = %h: %c, spamicity=%p, version=%v
#log_update_format = register-%r, %w words, %m messages
##log_header_format = %h: %c, spamicity=%f, ipaddr=%A, queueID=%Q, msgID=%I, version=%v
#### TERSE
#
# if enabled, format the X-Bogosity using the 'terse_format' specificaton.
#
terse=no # default
##terse=yes # (alternate)
########### Tokenizer Settings ######################################
#### BLOCK ON SUBNETS
#
# convert IPADDRs into a special token, url:1.2.3.4,
# and also return url:1.2.3, url:1.2, and url:1
# to allow identifying spammers by ip address / subnets.
#
#block_on_subnets=no
#### CHARSET handling
#
# specify default charset
#
charset_default=utf-8 # default
#charset_default=iso-8859-1 # default
#charset_default=us-ascii # (alternate)
##charset_default=cp866 # for Russian
#### REPLACE_NONASCII_CHARACTERS
#
# replace non-7bit chars with '?'
#
#replace_nonascii_characters=N # default
##replace_nonascii_characters=Y # (alternate)
#### UNICODE handling
#
# boolean indicating whether raw storage (no) or unicode (yes)
# is the default encoding for the wordlist
#
unicode=yes # default
##unicode=no # (alternate)
#### lexer parameters
#
# minimum and maximum lengths for single tokens
#
#min-token-len=3 # default
#max-token-len=30 # default
#
# count and length for multi-word tokens
# Note: if length not specified, defaults to
# multi-token-count * max-token-len (approx)
#
#multi-token-count=1 # default
#max-multi-token-len=0 # default
########### Classification Constants Settings #######################
#
# See man page for a more detailled description of the parameters.
#### MINIMUM DEVIATION
#
# if token spamicity closer to EVEN_ODDS (0.5)
# than MIN_DEV, don't use the word in the
# spamicity calculation
#
#min_dev=0.375 # default
#### Robinson Constants
#
# floating point values for
# Robinson S and X coefficients.
#
#robs=0.0178 # default
#robx=0.52 # default
#### CUTOFF Values
#
# both ham_cutoff and spam_cutoff are allowed.
# setting ham_cutoff to a non-zero value will
# enable tri-state results (Spam/Ham/Unsure).
#
ham_cutoff = 0.45 # default
spam_cutoff= 0.99 # default
#
# for two-state classification:
#
##ham_cutoff = 0.00 # default
##spam_cutoff = 0.99 # default
#### Effective Size Factor Values
#
#ns_esf = 1.000 # default
#sp_esf = 1.000 # default
#### Auto-update threshold
#
# Skip autoupdating if the spamicity is within this value
# of 0.000000 (surely ham) or 1.000000 (surely spam).
#
## thresh_update=0.01 # (optional)
More information about the Bogofilter
mailing list