A Tale of Two Sisters

David Relson relson at osagesoftware.com
Wed Jul 27 12:53:29 CEST 2005


On Wed, 27 Jul 2005 09:10:47 +0200 (CEST)
Pavel Kankovsky wrote:

> On Sun, 24 Jul 2005, JoeHill wrote:
> 
> > It was kinda hit and miss for both, so I sat down today and did training
> > on a whole *whack* of mail from each sister (about 20MB each)
> 
> Those 20 megabytes were not a single message but a mailbox containing 
> multiple messages, were they?
> 
> > joehill at node3:~/mail$ bogofilter -vv < sister1 
> > joehill at node3:~/mail$ bogofilter -n < sister1 
> 
> This is not the right way to deal with multiple messages in a mailbox.
> It may appear to work at the first glance but many messages can be
> interpretered incorrectly (e.g. BASE64 and QP parts not decoded, binary
> attachments not ignored, header tokens not tagged).
> 
> You should have done bogofilter -n -M < sisterN to train mailboxes.
> Afaik, there is no simple command to get some kind of aggregate scoring 
> results for multiple messages in a mailbox but you can do that with
> a script.

Pavel is right about -M (for mailboxes).  

I'd suggest using

  bogofilter -n -v -M < sister1

for training as the switches indicate:

   -n -- for training ham
   -v -- print summary counts of messages and tokens
   -M -- input is mailbox format

To check scoring after registration, you can use:

  bogofilter -v -M < sister1

HTH,

David




More information about the Bogofilter mailing list