questions from a newbie

David Relson relson at osagesoftware.com
Mon Mar 10 13:22:55 CET 2003


At 04:06 AM 3/10/03, Jean-Paul Le Fèvre wrote:


>This is not clear for me :
>
>- Does the format of the input text matter ? Has it to be a unix mbox
>   file ? Are my Emacs Rmail folders correctly understood ?
>
>- What if I give by mistake twice or more the same file to bogofilter ?
>   Does it lessen the quality of the processing ?

Jean-Paul,

Greetings!

When training bogofilter, i.e. using the '-s' and '-n' flags to build up 
the wordlists with spam and ham messages, you can use either single 
messages or Unix mbx format files.  bogofilter doesn't understand folders.

If you do have a folder of messages, you can do something like:

         for msg in spam-folder/* ; do bogofilter -s < $msg ; done

For a mbx formatted file the following two commands give the same results:

         bogofilter -s < spam.mbx
         formail < spam.mbx -s bogofilter -s

For the second question, using '-s' or '-n' twice will lessen the quality, 
but the effect will be very, very small.  You can ignore the mistake, or 
you can correct it using the '-S' and '-N' flags.

David





More information about the Bogofilter mailing list