Bogofilter for general filesystem classification
David Relson
relson at osagesoftware.com
Sat Sep 13 17:57:36 CEST 2003
Hello Ben,
You raise some interesting questions. It seems probable that a bayesian
filter can assign emblems, given the needed data, i.e. wordlists.
That's a matter of training, which is simple enough. Returning values
in range -100::+100 is easy too. Determining optimal values for
ham_cutoff and spam_cutoff is harder. There are two main methods - the
empirical methods (a.k.a. trial and error) and bogotune. I would
recommend creating your test corpora and running bogotune (which is in
the bogofilter/tuning subdirectory) to determine parameters that fit
_your_ mix of messages (files).
FWIW, as a minor shortcut, "-PH -Pi -Pt" can be shortened to "-PHit".
Keep us posted on your project. It sounds interesting.
Also, it'd be quite easy to add a "scoring_range={min},{max}" option to
the config file. Let me know if you need it.
David
On 14 Sep 2003 01:26:24 +1000
Ben Martin <monkeyiq at users.sourceforge.net> wrote:
> Hi,
> Although it seems that most Bayesian classifiers are being used for
> SPAM detection I'm looking to use them to help assign emblems to files
> in a filesystem [1] (a classification by any other name).
>
> The top screenshot on [2] shows some of the new agent stuff I've added
> to libferris to go about allowing SVM and Bayesian stuff to interact
> with general filesystems.
>
> Which beings me to my main question from RTFMing on bogofilter.
> You'll notice that the command I am using to get bogofilter to give
> its classification I use -T -3
> "bogofilter -d /tmp/my-new-agent -W -PH -Pi -PT -T -3"
>
> >From the man page for -3 option
> "This option is effective only if ham_cutoff is non-zero"
>
> What would folks recommend for the spam/ham cutoffs here? From the
> point of view of libferris I want to turn the result value into a
> double from-100 to 100 with 0 meaning unsure 100 being SPAM and -100
> being HAM. Any values in between are captured as a fuzzy assertion
> toward that classification (this assumes that the training cases are
> treating an emblem being assigned as SPAM).
>
> Thoughts on this would be great, and we should add some more comment
> to the man page for -o [v][v] from this thread.
>
> Thanks.
>
> [1] http://witme.sourceforge.net/libferris.web/
> [2] http://witme.sourceforge.net/libferris.web/research/shots.html
>
>
> ---------------------------------------------------------------------
> FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
> To unsubscribe, e-mail: bogofilter-unsubscribe at aotto.com
> For summary digest subscription: bogofilter-digest-subscribe at aotto.com
> For more commands, e-mail: bogofilter-help at aotto.com
--
David Relson Osage Software Systems, Inc.
relson at osagesoftware.com Ann Arbor, MI 48103
www.osagesoftware.com tel: 734.821.8800
--
David Relson Osage Software Systems, Inc.
relson at osagesoftware.com Ann Arbor, MI 48103
www.osagesoftware.com tel: 734.821.8800
--
David Relson Osage Software Systems, Inc.
relson at osagesoftware.com Ann Arbor, MI 48103
www.osagesoftware.com tel: 734.821.8800
More information about the Bogofilter
mailing list