Bootstrapping the database in a MUA-embedded scenario

David Relson relson at osagesoftware.com
Wed Aug 10 17:48:28 CEST 2005


On Wed, 10 Aug 2005 07:08:30 -0700 (PDT)
Charles Hewson wrote:

> On Wed, 10 Aug 2005, Mikhail Zabaluev wrote:
> 
> > Hello,
> >
> > I've used Bogofilter for a time now, and I love it so much that I've
> > written an Evolution plugin for it.
> > Now, I have a problem with learning starting from clean user setup.
> > Bogofilter needs at least one ham message in the database for the
> > algorithm to work properly (at least with the default parameters). The
> > problem is, Evolution will only let you report a message as non-spam if
> > it has been classified as spam before (the problem is symmetrical for
> > spam, but untagged spam messages aren't usually in short supply :)).
> > I'd like to provide the users with a working setup out of the box,
> > without resorting to manual learning procedures involving CLI. The
> > solution I came up with is to feed bogofilter a made-up seed message
> > once in order to initialize the ham message count. The message has an
> > empty body and minimal headers to avoid upsetting the word counts too
> > much.
> > I'll appreciate any comments to my approach. Maybe I'm missing
> > something.
> > _______________________________________________
> > Bogofilter mailing list
> > Bogofilter at bogofilter.org
> > http://www.bogofilter.org/mailman/listinfo/bogofilter
> >
> Hi all,
> 
> 	Would the following load do what you need:
> 
> bogoutil -l .bogofilter/wordlist.db < .MSG_COUNT 00000 00001 20050810
> 
> Charles

Charles,

Very close, but not quite.  The following will create a minimal wordlist:


   mkdir ~/.bogofilter
   echo .MSG_COUNT 0 1 20050810 | bogoutil -l ~/.bogofilter/wordlist.db

Alternatively, one could do something like:

   mkdir ~/.bogofilter
   echo ham  | bogofilter -n -H
   echo spam | bogofilter -s -H

Where the "-H" says to skip the normal header tagging.  This would
actually be slightly better because it will also include the .ENCODING
and .WORDLIST_VERSION meta-tokens.  Of course one can substitute
whatever words desired for "ham" and "spam", possibly:

   echo bogofilter bogofilter.org | bogofilter -n -H
   echo p0rn pron sex | bogofilter -s -H 


Regards,

David



More information about the Bogofilter mailing list