How to deal with extremely high spam levels

Tom Anderson tanderso at oac-design.com
Wed Jun 23 14:41:41 CEST 2004


On Tue, 2004-06-22 at 18:01, Bob Vincent wrote:
> This is called "training to exhaustion".  No, I haven't been doing
> that; I suspect that my "ham" corpus isn't large enough to make it
> effective as of yet.

The size of the corpus is irrelevant.  In fact, a small corpus is
exactly why this would be a good idea.  If you're getting hams scoring
around 0.5, then this will help you since it will make slightly hammy
tokens very hammy.  I do this and none of my hams ever score above 0.15.

> Dude.  I get over 1,000 spams per day, and I'm filtering them with a
> compiled "c" program partly because it keeps my loads well below the
> radar of my ISP.  I am NOT going to add a perl script to the mix,
> expecially when it loads a new copy of the interpreter for each and
> every incoming message.

The load is minimal.  Usually it takes under 1s to process an email.  I
just watched some email coming in using "top"... I saw procmail for a
split second, and spamitarium didn't even register... it was either too
fast or too far down the list (sorted by CPU load, 1s intervals).  And
I'm running on a K6.  Dude, 1000 emails is nothing, and C isn't
necessarily faster than Perl.  On a Linux system, a great deal of things
are running on Perl.  I just ran spamitarium 1000 times, and it (plus
the bash loop) used 71.6 cpu seconds, on a K6.

> I've been doing that.  Most of my "unsure" spam is still scoring very
> near 0.5.

Doing exhaustive training should move hams and spams out away from 0.5.

> Haven't done *any* initial training.  Just training on error.  Like I
> said, I had an unfortunate accident which wiped out my email spool
> (and my carefully trained bogofilter database) and I'm having to start
> over from scratch.

I didn't do any initial training either.  I use -u and register every
error.  Works fine.

Tom





More information about the Bogofilter mailing list