How to deal with extremely high spam levels
Tom Anderson
tanderso at oac-design.com
Wed Jun 23 14:41:41 CEST 2004
On Tue, 2004-06-22 at 18:01, Bob Vincent wrote:
> This is called "training to exhaustion". No, I haven't been doing
> that; I suspect that my "ham" corpus isn't large enough to make it
> effective as of yet.
The size of the corpus is irrelevant. In fact, a small corpus is
exactly why this would be a good idea. If you're getting hams scoring
around 0.5, then this will help you since it will make slightly hammy
tokens very hammy. I do this and none of my hams ever score above 0.15.
> Dude. I get over 1,000 spams per day, and I'm filtering them with a
> compiled "c" program partly because it keeps my loads well below the
> radar of my ISP. I am NOT going to add a perl script to the mix,
> expecially when it loads a new copy of the interpreter for each and
> every incoming message.
The load is minimal. Usually it takes under 1s to process an email. I
just watched some email coming in using "top"... I saw procmail for a
split second, and spamitarium didn't even register... it was either too
fast or too far down the list (sorted by CPU load, 1s intervals). And
I'm running on a K6. Dude, 1000 emails is nothing, and C isn't
necessarily faster than Perl. On a Linux system, a great deal of things
are running on Perl. I just ran spamitarium 1000 times, and it (plus
the bash loop) used 71.6 cpu seconds, on a K6.
> I've been doing that. Most of my "unsure" spam is still scoring very
> near 0.5.
Doing exhaustive training should move hams and spams out away from 0.5.
> Haven't done *any* initial training. Just training on error. Like I
> said, I had an unfortunate accident which wiped out my email spool
> (and my carefully trained bogofilter database) and I'm having to start
> over from scratch.
I didn't do any initial training either. I use -u and register every
error. Works fine.
Tom
More information about the Bogofilter
mailing list