Alternative use for bogofilter
Helge Preuss
helge.preuss at gmx.net
Mon Jun 6 16:44:22 CEST 2005
Hi,
I need to automatically categorize HTML pages based on their content. I
had the idea to use bogofilter for this.
This is how I go about it:
- download examples of web pages of a category, and counterexamples
- train bogofilter to use the pages belonging to the desired category as
ham, and the counterexamples as spam
- move the generated database to a separate directory
- repeat for every category I want to autodetect
When I want to detect if an HTML page belongs to a specific category, I
give the path to the corresponding database with the -d switch.
My first tests showed encouraging results, but before I go further I'd
like to ask you whether anyone has done this before, if I overlook
princial limitations of bogofilter or Bayes filtering in general, or if
you have any other thoughts or comments.
Thanks,
Helge
More information about the Bogofilter
mailing list