Alternative use for bogofilter

David Carmean dlc at halibut.com
Mon Jun 6 23:54:46 CEST 2005


You might want to look for "ifile", which was designed to sort incoming 
email into multiple categories. 

30 seconds on google revealed this page which might be of use to you:

    http://www.cs.ualberta.ca/~dunwei/TM%20papers/Text%20Mining.htm



On Mon, Jun 06, 2005 at 04:44:22PM +0200, Helge Preuss wrote:
> Hi,
> 
> I need to automatically categorize HTML pages based on their content. I 
> had the idea to use bogofilter for this.
> 
> This is how I go about it:
> - download examples of web pages of a category, and counterexamples
> - train bogofilter to use the pages belonging to the desired category as 
> ham, and the counterexamples as spam
> - move the generated database to a separate directory
> - repeat for every category I want to autodetect
> When I want to detect if an HTML page belongs to a specific category, I 
> give the path to the corresponding database with the -d switch.
> 
> My first tests showed encouraging results, but before I go further I'd 
> like to ask you whether anyone has done this before, if I overlook 
> princial limitations of bogofilter or Bayes filtering in general, or if 
> you have any other thoughts or comments.
> 
> Thanks,
> 
> Helge
> _______________________________________________
> Bogofilter mailing list
> Bogofilter at bogofilter.org
> http://www.bogofilter.org/mailman/listinfo/bogofilter



More information about the Bogofilter mailing list