Testing if it works
tallison at tacocat.net
Sat Jul 17 14:02:18 CEST 2004
> After reading the man page again, I started thinking that the following
> command would help me identify good -o values:
> cat /home/mike/spamfile | bogofilter -e -p -M -o 0.8,0.2 | grep -e
The command line option can be easily replaced by using a bogofilter
configuration file (bogofilter.cf). This is generally preferred since
it simplified the use in the command line.
More on bogofilter.cf is in the man pages, but bogofilter.cf is self
> If an entry comes back as Unsure, then my values need to be changed.
> This assumes that all the mail in /home/mike/spamfile is in fact spam.
> I could do the reverse for ham.
You may loose your mind doing this.
Bogofilter has to learn about spam and that is going to take probably a
minimum of 100 each ham and spam before it even starts to understand the
most basic spam with any regularity.
It's a process of continuous evolution but when you reach 2,000 each of
ham and spam it gets much slower. Almost to the point of zero maintanence.
I would suggest starting with something like 0.8/0.2 and leaving it
there for a month. If your Unsure section of spam is consistently
showing only spam or only ham than you can review the scores and adjust
accordingly. But go slowly.
bogotune is supposed to automate a lot of this for you so all you have
to do is fire it off and go do something else for a bit.
> I also believe that this would not reclassify any the spam...because
> there is the missing -s, -n, or -u.
> Is this even close to right?
You need to do something in order for bogofilter to learn and store the
words it is seeing.
-u will do this for you automatically on every email read and make a
guess if it's ham/spam, but you'll have to make corrections with the -Ns
/ -nS options.
-s/-n will do this for you on the assumption that you already know what
the email message is (spam/ham) and store the words accordingly.
You could replace your command line with:
bogofilter -Mv < /home/mike/spamfile
and get (approximately) the same results.
(Approximately because I don't have any mbox files to test with)
More information about the Bogofilter