How to deal with extremely high spam levels
Jake Di Toro
karrde+bogofilter at viluppo.net
Tue Jun 22 20:48:21 CEST 2004
On Tue, Jun 22, 2004 at 01:32:50PM -0400, Bob Vincent wrote:
> Bogofilter is apparently designed for the situation where the number
> of spams per day roughly equals the number of non-spams per day.
>
> In my situation, the ratio exceeds 100:1. In the two weeks I've been
> re-training bogofilter, I've collected:
>
> Correctly filtered Ham: 101
> Unsures registered as Ham: 26
> Correctly filtered Spam: 1067
> Unsures registered as Spam: 14177
>
> Part of the reason for my unusually high spam-load is that I'm
> receiving catch-all emails for several domains. Part of the reason is
> that my email address is listed in several places on the internet.
>
> I am unwilling to change email addresses or remove my catch-all
> accounts. I would rather just filter the crap out at the server.
>
> However, at this rate, it will take nearly a year to collect enough
> non-spam to run bogotune. I'm not willing to wait that long.
>
> If Bogofilter is inadequate to this situation, are there any
> recommendations for how to properly deal with it?
I have a similar situation. But I started with a corpus of 500 each
that I could train on. Like you, I filter Tri-State. But it is
unclear to me if you are using -u or not. I am not, and frankly I
think it causes more long term work than train on error, but I digress
for now.
I have 2 setups, a personal & work domain, that both started w/
roughly 500 spam & ham traing dbs aproxmiately 6ish mo ago. This is
what they look like today:
spam good
.MSG_COUNT 1205 570
spam good
.MSG_COUNT 1406 690
Somewhat unballanced, but working effectively. And I've never run
bogotune, and have recived no false positieves that I know about. I
maintain a 14day running log of currently caught spam. I get 5-10
unsures, and 100ish spams a day on the personal domain, and 20-30
unsures and 400ish spams a doay on the work domain.
Maybe I just started w/ good copuii, I aslo kept on the unsures
religoulsy for the first two weeks, registering either way and testing
each one before registering. If it was already shifted to the proper
classification before I got to it, it was left unregistered.
--
Till Later, Jake <karrde+bogofilter at viluppo.net>
-----------------------------------------------
Direct replys are likley to be flagged as spam.
Drop the +addy if you need to reply direct.
More information about the Bogofilter
mailing list