A Tale of Two Sisters
David Relson
relson at osagesoftware.com
Wed Jul 27 12:53:29 CEST 2005
On Wed, 27 Jul 2005 09:10:47 +0200 (CEST)
Pavel Kankovsky wrote:
> On Sun, 24 Jul 2005, JoeHill wrote:
>
> > It was kinda hit and miss for both, so I sat down today and did training
> > on a whole *whack* of mail from each sister (about 20MB each)
>
> Those 20 megabytes were not a single message but a mailbox containing
> multiple messages, were they?
>
> > joehill at node3:~/mail$ bogofilter -vv < sister1
> > joehill at node3:~/mail$ bogofilter -n < sister1
>
> This is not the right way to deal with multiple messages in a mailbox.
> It may appear to work at the first glance but many messages can be
> interpretered incorrectly (e.g. BASE64 and QP parts not decoded, binary
> attachments not ignored, header tokens not tagged).
>
> You should have done bogofilter -n -M < sisterN to train mailboxes.
> Afaik, there is no simple command to get some kind of aggregate scoring
> results for multiple messages in a mailbox but you can do that with
> a script.
Pavel is right about -M (for mailboxes).
I'd suggest using
bogofilter -n -v -M < sister1
for training as the switches indicate:
-n -- for training ham
-v -- print summary counts of messages and tokens
-M -- input is mailbox format
To check scoring after registration, you can use:
bogofilter -v -M < sister1
HTH,
David
More information about the Bogofilter
mailing list