default settings for creating wordlists in 0.15.7

David Relson relson at osagesoftware.com
Wed Nov 12 13:12:25 CET 2003


On Wed, 12 Nov 2003 14:04:44 +0400
Mike Lykov <combr at vesna.ru>(by way of Mike Lykov <combr at vesna.ru>)
wrote:

> _ _________ __ _______ 11 ______ 2003 16:34 David Relson _______:
> > Bogofilter used to be case insensitive, so "Mike", "mike", and
> > "MIKE" would all go into the wordlist as "mike".  It was changed
> > some time ago to be case sensitive and now capitalization matters
> > and "Mike", "mike", and "MIKE" are all different wordlist entries.
> 
> I don't understand, why. My argument is:
> when I have in wordlist "Spam" and spammer send me word "sPam" - it's
> not finded in wordlist and not classified as spam ...

Mike,

There will always be some words that bogofilter doesn't know.  Training
expands the wordlist.  When "sPam" arrives, train on it.  Then the next
time bogofilter sees it, bogofilter will say "Aha, this is a spammer's
message."

Bogofilter used to convert everything to lower case, which was good for
wordlist size.  Then it was discovered that bayesian filters are more
effective when case is preserved.  This was verified by testing a
modified bogofilter.  The improvement was significant so bogofilter was
changed, even though it meant that the new bogofilter would do poorly
with old wordlists.

Hope this helps ...

David




More information about the Bogofilter mailing list