Alternative use for bogofilter
David Carmean
dlc at halibut.com
Mon Jun 6 23:54:46 CEST 2005
You might want to look for "ifile", which was designed to sort incoming
email into multiple categories.
30 seconds on google revealed this page which might be of use to you:
http://www.cs.ualberta.ca/~dunwei/TM%20papers/Text%20Mining.htm
On Mon, Jun 06, 2005 at 04:44:22PM +0200, Helge Preuss wrote:
> Hi,
>
> I need to automatically categorize HTML pages based on their content. I
> had the idea to use bogofilter for this.
>
> This is how I go about it:
> - download examples of web pages of a category, and counterexamples
> - train bogofilter to use the pages belonging to the desired category as
> ham, and the counterexamples as spam
> - move the generated database to a separate directory
> - repeat for every category I want to autodetect
> When I want to detect if an HTML page belongs to a specific category, I
> give the path to the corresponding database with the -d switch.
>
> My first tests showed encouraging results, but before I go further I'd
> like to ask you whether anyone has done this before, if I overlook
> princial limitations of bogofilter or Bayes filtering in general, or if
> you have any other thoughts or comments.
>
> Thanks,
>
> Helge
> _______________________________________________
> Bogofilter mailing list
> Bogofilter at bogofilter.org
> http://www.bogofilter.org/mailman/listinfo/bogofilter
More information about the Bogofilter
mailing list