Chung-Kwei algorithm

Chris Fortune cfortune at telus.net
Fri Aug 13 08:33:29 CEST 2004


Chung-Kwei: a Pattern-discovery-based System for the
Automatic Identification of Unsolicited E-mail Messages
(SPAM)

Isidore Rigoutsos and Tien Huynh
Bioinformatics and Pattern Discovery Group
IBM Thomas J Watson Research Center
Yorktown Heights, NY 10598, USA
rigoutso at us.ibm.com and gdesktop at us.ibm.com

Abstract. 
In this paper, we present Chung-Kwei1, a system for the analysis of
electronic messages and the automatic identification of unsolicited email
messages (=SPAM). The method uses pattern-discovery as its underlying tool
and is another instance of a generic approach that has been the basis of
previously successful solutions developed by our group to tackle problems in
computational biology such as gene finding and protein annotation. Chung-
Kwei can be trained very quickly; as new examples of SPAM become
available, the system can re-train itself without interrupting the classification of
incoming e-mail. We trained Chung-Kwei on a repository of 87,000 messages,
then tested it with a very large collection of 88,000 pieces of SPAM and
WHITE email: the current prototype achieved a sensitivity of 96.56% whereas
the false positive rate was 0.066%, or one-in-six-thousand. In terms of speed,
we are currently capable of classifying 214 messages/second, on a 2.2 GHz
Intel-Pentium platform. The Chung-Kwei system is part of SpamGuru, a
collaborative antispam filtering solution that is currently under development at
IBM Research.
http://www.ceas.cc/papers-2004/153.pdf


---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.737 / Virus Database: 491 - Release Date: 8/11/2004




More information about the Bogofilter mailing list