some questions about bogofilter 0.13.6&0.15.7

David Relson relson at osagesoftware.com
Tue Nov 4 13:15:53 CET 2003


Greetings Mike,

Welcome to the bogofilter mailing list.  We're glad to have you.

As you know, bogofilter is constantly evolving as we learn better ways
to analyze email to detect spam.  In the past few months, many of the
changes have involved using more detail, rather than less.  For example,
bogofilter used to be case insensitive and didn't differentiate between
"TEST", "test", and "teSt".  When it became case sensitive, these 3
spellings became different tokens, but only "test" was in the database. 
An option was added to allow old databases to be used.  Changes and the
need for using old databases are why bogofilter has options like the
"three special parsing options ..." you mention.  That's also why the
"-H" option exists.

If you have a bunch of saved messages (both ham and spam), the best
thing to do would be create a totally new word database.  The commands
would be (roughly):

	export BOGOFILTER_DIR="your_directory"
	rm -f $BOGOFILTER_DIR/????list.db
	bogofilter -s < spam_mbox
	bogofilter -n < ham_mbox

To answer other questions:

If you continue to see base64 text going into your wordlist, there's a
problem.  If the email is correctly formatted, bogofilter should _not_
add base64 text to the wordlist.  If it does, there's a bug and I'd like
to see the original message so the bug can be fixed.  Bogofilter relies
on correctly formatted email with proper identification of base64 text. 
If the identification is missing, bogofilter will parse the message
wrong.

Boundary tokens should not go into the wordlist.  Again, if a properly
formatted email causes boundary tokens to go into the wordlist, I want
to see the email so bogofilter can be fixed.

Regarding applications and images, they tend to have binary content
which does not provide useful information for bogofilter.  When it
encounters a mime part that's labeled as a program or a picture, it
ignores that mime part.  Bogofilter _never_ changes the email (except to
add it's X-Bogosity line).  Describing the action as "discarding" the
information was a bad choice of wording by me.  I apologize for that.

Hope this helps!

David




More information about the Bogofilter mailing list