Nigerian spam [was: multiple types of spam]

Joerg Over Dexia over at dexia.de
Thu Jul 3 15:39:37 CEST 2003


Hi there,

Am 09:19 03.07.2003 -0400 teilte Andrew Pimlott mir folgendes
mit:
->On Wed, Jul 02, 2003 at 09:22:40PM -0700, Max Rible wrote:
->> Most of the 419 mail I get doesn't get recognized as such by
->> bogofilter.
->
->I'd been meaning to do some more research and experimentation
before
->writing, but since this came up:  Is this the common
experience?
->I've been disappointed at how easily these things slip by
->bogofilter.  When I look at the -vvv diagnostics, it seems
clear
->that the reason is that the large number of harmless words
(since
->these are long and varied narratives) swamps the spam words.

My experience is different. Our installation of bogofilter
catches 419 (the nigerian law paragraph for advance fee fraud
afaik, btw) more reliably than any other spam. I'd wager that
this has a lot to do with the ham corpus, i.e. how often you get
"urgent business proposals strictly confidential" in your mail,
how often large dollar sums, how often you have correspondence
with partners in african countries with names like mobutu, sese
seko or talk about political affairs in these countries.

Another factor might be that our spam corpus is rather small yet
(~250 mails), and around 40% is comprised of 419ers.
(I'm afraid this one will trigger some filters :)
If it does, I for example won't register this into "nonspam"
since actions like that very likely water down the database.)

Greetings, jo




More information about the Bogofilter mailing list