best practices question
Ben Rosengart
br at panix.com
Fri Sep 20 22:12:28 CEST 2002
On Fri, Sep 20, 2002 at 02:01:54PM -0400, David Relson wrote:
>
> 1 - Create good and spam word lists (using the '-h' and '-s' options). Let
> bogofilter classify messages. For incorrectly classified messages, feed
> them into the word lists (again using the '-h' and '-s' options).
>
> 2 - Create word lists (as above). When a message is classified as spam,
> automatically merge it into the word list (using '-s'). This will expand
> the spam list by including words that have "appeared in a spam
> context". For incorrectly classified messages, use the '-H' and '-S'
> options so that probabilities will shift from the wrong answer to the right
> answer.
>
> What do y'all think is the best practice for handling word list updating?
If I understand you correctly, 2, by far.
For me, a big part of the utility of a bayesian spam filter is that I
don't have to do the work of figuring out what makes spam identifiable
as spam. All I have to do is identify it, and let the software find
the interesting words.
Looking at the counts kept by ifile, I see that "border" has appeared
26 times in nonspam and 1874 times in spam. It would normally not
occur to me to filter on this word.
I also train the filter on good mail, and I don't see why anyone
wouldn't, if they're using a private set of word lists.
--
Ben Rosengart (212) 741-4400 x215
Microsoft has argued that open source is bad for business, but you
have to ask, "Whose business? Theirs, or yours?" --Tim O'Reilly
For summay digest subscription: bogofilter-digest-subscribe at aotto.com
More information about the Bogofilter
mailing list