Bootstrapping the database in a MUA-embedded scenario
David Relson
relson at osagesoftware.com
Wed Aug 10 17:48:28 CEST 2005
On Wed, 10 Aug 2005 07:08:30 -0700 (PDT)
Charles Hewson wrote:
> On Wed, 10 Aug 2005, Mikhail Zabaluev wrote:
>
> > Hello,
> >
> > I've used Bogofilter for a time now, and I love it so much that I've
> > written an Evolution plugin for it.
> > Now, I have a problem with learning starting from clean user setup.
> > Bogofilter needs at least one ham message in the database for the
> > algorithm to work properly (at least with the default parameters). The
> > problem is, Evolution will only let you report a message as non-spam if
> > it has been classified as spam before (the problem is symmetrical for
> > spam, but untagged spam messages aren't usually in short supply :)).
> > I'd like to provide the users with a working setup out of the box,
> > without resorting to manual learning procedures involving CLI. The
> > solution I came up with is to feed bogofilter a made-up seed message
> > once in order to initialize the ham message count. The message has an
> > empty body and minimal headers to avoid upsetting the word counts too
> > much.
> > I'll appreciate any comments to my approach. Maybe I'm missing
> > something.
> > _______________________________________________
> > Bogofilter mailing list
> > Bogofilter at bogofilter.org
> > http://www.bogofilter.org/mailman/listinfo/bogofilter
> >
> Hi all,
>
> Would the following load do what you need:
>
> bogoutil -l .bogofilter/wordlist.db < .MSG_COUNT 00000 00001 20050810
>
> Charles
Charles,
Very close, but not quite. The following will create a minimal wordlist:
mkdir ~/.bogofilter
echo .MSG_COUNT 0 1 20050810 | bogoutil -l ~/.bogofilter/wordlist.db
Alternatively, one could do something like:
mkdir ~/.bogofilter
echo ham | bogofilter -n -H
echo spam | bogofilter -s -H
Where the "-H" says to skip the normal header tagging. This would
actually be slightly better because it will also include the .ENCODING
and .WORDLIST_VERSION meta-tokens. Of course one can substitute
whatever words desired for "ham" and "spam", possibly:
echo bogofilter bogofilter.org | bogofilter -n -H
echo p0rn pron sex | bogofilter -s -H
Regards,
David
More information about the Bogofilter
mailing list