Nigerian spam [was: multiple types of spam]

Andrew Pimlott andrew at pimlott.net
Thu Jul 3 15:19:09 CEST 2003


On Wed, Jul 02, 2003 at 09:22:40PM -0700, Max Rible wrote:
> Most of the 419 mail I get doesn't get recognized as such by
> bogofilter.

I'd been meaning to do some more research and experimentation before
writing, but since this came up:  Is this the common experience?
I've been disappointed at how easily these things slip by
bogofilter.  When I look at the -vvv diagnostics, it seems clear
that the reason is that the large number of harmless words (since
these are long and varied narratives) swamps the spam words.

Paul Graham's articles suggest that he doesn't have a problem with
these spams.  The difference that immediately jumps out is that he
bases his scores on only a handful of words.  I haven't seen any
discussion of why bogofilter uses all words.  It seems to make it
trivial for spammers to disguise their spam.

Can I throw in another question?  Why do so many scores end up
within epsilon of .5?

I'm still using 0.12.2, with a default configuration, if it matters.

Andrew




More information about the Bogofilter mailing list