can't train bogofilter

David Relson relson at osagesoftware.com
Sun Jun 10 14:53:00 CEST 2012


Oops, I mixed up upper case and lower case flags.  Here's the straight
story (from the man page):

"-s" register as spam
"-n" register as ham
"-S" undo prior spam registration
"-N" undo prior ham registration

My apologies for any confusion I caused.

David


On Sun, 10 Jun 2012 07:57:41 -0400
David Relson wrote:

> Hello Renato,
> 
> There are several problems with what you're doing.
> 
> 
> The "-N" option increases ham counts in the wordlist and "-s"
> decreases spam counts in the word list. Using "-Ns" is the proper flag
> combination for reclassifying a spam message as ham, i.e for
> correcting a false positive.
> 
> Use "-nS" when the message has been classified as ham, but should
> have been classified as spam.
> 
> Bogofilter's unsure classification indicates that bogofilter can't
> determine if the message is ham or spam.  So, when bogofilter
> classifies a message as Unsure, the message's words are not entered in
> the wordlist. To train an unsure message as spam, use "-S".
> 
> The flag combinations "-Ns" and "-Sn" will not change the wordlist
> size.  As they are classification corrections, the words in the
> messages being used are already in the wordlist.  These flag
> combinations change counts for the already existing wordlist
> entries.
> 
>  Training bogofilter does not change the input file(s) used.  You need
> to run those messages through bogofilter again to see how the
> classification will change.
> 
> You can use bogoutil to see a token's counts.  Command 
> 
> 	bogoutil -p <path to wordlist> word1 word2 etc
> 
> will give you the spam and ham counts for word1, word2, etc
> 
> Hope this helps!
> 
> David
> 
> Messages c On Sun, 10 Jun 2012 12:40:55 +0200 renato wrote:
> 
> > Hello, I have a mailbox folder with spam mails that bogofilter has
> > incorrectly filed as "Unsure" or "Ham", and I'd like to train it. So
> > in the directory I gave this command:
> > 
> > for i in $(ls); do bogofilter -Ns < $i; done
> > 
> > which takes a while, but after that all messages still have an
> > unmodified X-Bogosity header (i.e. it still reads "Unsure" or
> > "Ham") - is this the supposed behaviour? If I also give the -p
> > option, the message is written to stdout with a correct X-Bogosity
> > header (i.e. spam), but my bash-fu is weak and I can't figure out
> > how to correctly redirect this output to overwrite the original
> > file - and most importantly I don't know if that's really what I
> > want to/should do.
> > 
> > Also, after that command the file ~/.bogofilter/wordlist.db hasn't
> > increased in size.
> > 
> > What am I missing?
> > 
> > cheers,
> > renato
> > _______________________________________________
> > Bogofilter mailing list
> > Bogofilter at bogofilter.org
> > http://www.bogofilter.org/mailman/listinfo/bogofilter



More information about the Bogofilter mailing list