A note about robustness of the different scoring schemes
Olivier Cappe
comcap at free.fr
Fri Mar 14 10:18:43 CET 2003
Dear all,
I recently switched to bogofilter and first had the impression that the fisher
way of computing scores was really performing badly (compared to the other two
options and to other programs I had used before). After some investigations, I
found out that there was a problem in my training database (with some group of
consecutive emails analyzed as a single email). The interesting thing is that
the graham option was definitely more robust than the fisher one with regards to
this problem (although somewhat worse, than the robinson option in particular,
once I had corrected the database problem).
I wrote the details (with numbers and histograms) on this page
http://comcap.free.fr/bogo.html
for those who are interested.
Thanks to all the bogo-people for great programing and stimulating ideas on
statistical text processing,
--
olivier
More information about the Bogofilter
mailing list