How to deal with extremely high spam levels

Jake Di Toro karrde+bogofilter at viluppo.net
Tue Jun 22 20:48:21 CEST 2004


On Tue, Jun 22, 2004 at 01:32:50PM -0400, Bob Vincent wrote:
> Bogofilter is apparently designed for the situation where the number
> of spams per day roughly equals the number of non-spams per day.
> 
> In my situation, the ratio exceeds 100:1.  In the two weeks I've been
> re-training bogofilter, I've collected:
> 
>   Correctly filtered Ham: 101
>   Unsures registered as Ham: 26
>   Correctly filtered Spam: 1067
>   Unsures registered as Spam: 14177
> 
> Part of the reason for my unusually high spam-load is that I'm
> receiving catch-all emails for several domains.  Part of the reason is
> that my email address is listed in several places on the internet.
> 
> I am unwilling to change email addresses or remove my catch-all
> accounts.  I would rather just filter the crap out at the server.
> 
> However, at this rate, it will take nearly a year to collect enough
> non-spam to run bogotune.  I'm not willing to wait that long.
> 
> If Bogofilter is inadequate to this situation, are there any
> recommendations for how to properly deal with it?

I have a similar situation.  But I started with a corpus of 500 each
that I could train on.  Like you, I filter Tri-State.  But it is
unclear to me if you are using -u or not.  I am not, and frankly I
think it causes more long term work than train on error, but I digress
for now.

I have 2 setups, a personal & work domain, that both started w/
roughly 500 spam & ham traing dbs aproxmiately 6ish mo ago.  This is
what they look like today:

                                 spam   good
.MSG_COUNT                       1205    570

                                 spam   good
.MSG_COUNT                       1406    690

Somewhat unballanced, but working effectively.  And I've never run
bogotune, and have recived no false positieves that I know about.  I
maintain a 14day running log of currently caught spam.  I get 5-10
unsures, and 100ish spams a day on the personal domain, and 20-30
unsures and 400ish spams a doay on the work domain.

Maybe I just started w/ good copuii, I aslo kept on the unsures
religoulsy for the first two weeks, registering either way and testing
each one before registering.  If it was already shifted to the proper
classification before I got to it, it was left unregistered.

-- 
Till Later, Jake <karrde+bogofilter at viluppo.net>
-----------------------------------------------
Direct replys are likley to be flagged as spam.
Drop the +addy if you need to reply direct.



More information about the Bogofilter mailing list