Bogofilter for general filesystem classification

Sat Sep 13 17:26:24 CEST 2003

Hi,
  Although it seems that most Bayesian classifiers are being used for
SPAM detection I'm looking to use them to help assign emblems to files
in a filesystem [1] (a classification by any other name).

The top screenshot on [2] shows some of the new agent stuff I've added
to libferris to go about allowing SVM and Bayesian stuff to interact
with general filesystems.

Which beings me to my main question from RTFMing on bogofilter.
You'll notice that the command I am using to get bogofilter to give its
classification I use -T -3 
"bogofilter  -d /tmp/my-new-agent -W  -PH -Pi  -PT  -T -3"

>From the man page for -3 option 
  "This option is effective only if ham_cutoff is non-zero"

What would folks recommend for the spam/ham cutoffs here? From the point
of view of libferris I want to turn the result value into a double from
-100 to 100 with 0 meaning unsure 100 being SPAM and -100 being HAM. Any
values in between are captured as a fuzzy assertion toward that
classification (this assumes that the training cases are treating an
emblem being assigned as SPAM).

Thoughts on this would be great, and we should add some more comment to
the man page for -o [v][v] from this thread.

Thanks.

[1] http://witme.sourceforge.net/libferris.web/
[2] http://witme.sourceforge.net/libferris.web/research/shots.html