O'Reilly article [was: training is SLOW]

David Relson relson at osagesoftware.com
Sun Aug 10 23:08:15 CEST 2003


At 03:17 PM 8/10/03, Lane P. Lester wrote:
>"Rodney D. Myers" <rdmyers at pe.net> wrote:
> > Correct. The syntax has changed, along with a few other items, and the
> > article, I think, was geared towards mbox format, and I use the MH
> > format.
>
>Maybe =I= need a different method, because of 10 messages caught by
>bogofilter, only one was truly spam. I used over 100 spam and 100
>nonspam to train. Was that not enough, or when you say "syntax has
>changed" do you mean the command "cat * | bogofilter -s -v" was the
>wrong one to use for spam?

Hi Lane,

"cat * | bogofilter -s -v" may work or it may not.  It depends on what's in 
the files being "cat"ed.  If the output is equivalent to mbox format, all 
is fine.

Assuming individual messages, you want the message count output by 
bogofilter to match the number of files.  If there isn't a match, then 
"cat" shouldn't be used.

As to training, the more messages you use to train, the better bogofilter 
will do its job.  Consider if I told you that any message with "sex" or 
"porn" in it is spam.  If you then received a message with "viagra", you'd 
think it was OK.

I'm not sure what Rodney meant by "syntax has changed", but it might refer 
to the change in meaning of "-S" and "-N" (an event that took place with 
release 0.11.0 back in March.

The O'Reilly article appeared last fall and there were changes in the two 
months or so between its being written and its being published.

David





More information about the Bogofilter mailing list