A note about robustness of the different scoring schemes

Olivier Cappe comcap at free.fr
Fri Mar 14 10:18:43 CET 2003


Dear all,

I recently switched to bogofilter and first had the impression that the fisher
way of computing scores was really performing badly (compared to the other two
options and to other programs I had used before). After some investigations, I
found out that there was a problem in my training database (with some group of
consecutive emails analyzed as a single email). The interesting thing is that
the graham option was definitely more robust than the fisher one with regards to
this problem (although somewhat worse, than the robinson option in particular,
once I had corrected the database problem).

I wrote the details (with numbers and histograms) on this page

  http://comcap.free.fr/bogo.html

for those who are interested.

Thanks to all the bogo-people for great programing and stimulating ideas on
statistical text processing,

--
olivier




More information about the Bogofilter mailing list