Bogofilter for general filesystem classification

Sat Sep 13 17:57:36 CEST 2003

Hello Ben,

You raise some interesting questions.  It seems probable that a bayesian
filter can assign emblems, given the needed data, i.e. wordlists. 
That's a matter of training, which is simple enough.  Returning values
in range -100::+100 is easy too.  Determining optimal values for
ham_cutoff and spam_cutoff is harder.  There are two main methods - the
empirical methods (a.k.a. trial and error) and bogotune.  I would
recommend creating your test corpora and running bogotune (which is in
the bogofilter/tuning subdirectory) to determine parameters that fit
_your_ mix of messages (files).

FWIW, as a minor shortcut, "-PH -Pi -Pt" can be shortened to "-PHit".  

Keep us posted on your project.  It sounds interesting.

Also, it'd be quite easy to add a "scoring_range={min},{max}" option to
the config file.  Let me know if you need it.

David

On 14 Sep 2003 01:26:24 +1000
Ben Martin <monkeyiq at users.sourceforge.net> wrote:

> Hi,
>   Although it seems that most Bayesian classifiers are being used for
> SPAM detection I'm looking to use them to help assign emblems to files
> in a filesystem [1] (a classification by any other name).
> 
> The top screenshot on [2] shows some of the new agent stuff I've added
> to libferris to go about allowing SVM and Bayesian stuff to interact
> with general filesystems.
> 
> Which beings me to my main question from RTFMing on bogofilter.
> You'll notice that the command I am using to get bogofilter to give
> its classification I use -T -3 
> "bogofilter  -d /tmp/my-new-agent -W  -PH -Pi  -PT  -T -3"
> 
> >From the man page for -3 option 
>   "This option is effective only if ham_cutoff is non-zero"
> 
> What would folks recommend for the spam/ham cutoffs here? From the
> point of view of libferris I want to turn the result value into a
> double from-100 to 100 with 0 meaning unsure 100 being SPAM and -100
> being HAM. Any values in between are captured as a fuzzy assertion
> toward that classification (this assumes that the training cases are
> treating an emblem being assigned as SPAM).
> 
> Thoughts on this would be great, and we should add some more comment
> to the man page for -o [v][v] from this thread.
> 
> Thanks.
> 
> [1] http://witme.sourceforge.net/libferris.web/
> [2] http://witme.sourceforge.net/libferris.web/research/shots.html
> 
> 
> ---------------------------------------------------------------------
> FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
> To unsubscribe, e-mail: bogofilter-unsubscribe at aotto.com
> For summary digest subscription: bogofilter-digest-subscribe at aotto.com
> For more commands, e-mail: bogofilter-help at aotto.com

-- 
David Relson                   Osage Software Systems, Inc.
relson at osagesoftware.com       Ann Arbor, MI 48103
www.osagesoftware.com          tel:  734.821.8800

-- 
David Relson                   Osage Software Systems, Inc.
relson at osagesoftware.com       Ann Arbor, MI 48103
www.osagesoftware.com          tel:  734.821.8800

-- 
David Relson                   Osage Software Systems, Inc.
relson at osagesoftware.com       Ann Arbor, MI 48103
www.osagesoftware.com          tel:  734.821.8800