How to deal with extremely high spam levels

Chris Fortune cfortune at telus.net
Wed Jun 23 00:51:29 CEST 2004


>On Tue, Jun 22, 2004 at 01:07:26PM -0700, Chris Fortune wrote:
>> The answer is to collect good email from people's PC's, your friends
>> and family will let you do it.  Copy everything in their Sent box
>> (under 35kb in size, attachments are useless to you!) to a zip file
>> and upload it to your server.  (Make sure they aren't sending spam
>> themselves.)

> From: "Bob Vincent" <bogofilter at bobvincent.org>
> Dubious.  Most of my friends and family have VERY different interests.
> Their ham doesn't look anything like my ham.

I'm dubious about your dubiousness.  Surprisingly, the public's tastes fall neatly into a bell-curve without a lot of deviation.  I
have 179 users from 120 domains on a single wordlist, and their tastes are varied, yet any email written in standard
conversational/business English (or French, Dutch, Spanish) is classified by bogofilter as < 2% bogosity (Nigerian scams excepted).
Spam is remarkably different than ham, yet similar to itself, so nearly all of it is classified as >95%.  The spam that falls into
the 'grey zone' is newsletters, ads from ISP's, chain letters, and catalogs - similar to color glossy magazine mail.  These go into
quarantine on the server where people can review them via the web every week or so.  Most of my users just ignore them and let them
self-destruct in 30 days.  I recommended that you use your friend's and family's mail to train bogofilter because your language
patterns are likely to be _very similar_.





More information about the Bogofilter mailing list