still problem in spam management

RW rwmaillists at googlemail.com
Wed Apr 6 01:17:56 CEST 2011


On Mon, 4 Apr 2011 17:40:01 +0100
Stéphane Guedon <stephane at 22decembre.eu> wrote:

> Hi everyone
> 
> I use bogfilter since months now, and I still have a problem of
> classification between ham and spam.
> Half of my spam is still classified as ham.
> 
> Description of the system :
> My bogofilter work on my mail server, the mail is delivered by
> postfix, bogofilter read it, rank it, mark it as spam or not
> ( ___SPAM___ in header) and make auto-update, and give it to postix
> again. Postfix give it to dovecot deliver which set it in the

Sounds like you are doing two-way classification and autolearning
everything.

> corresponding box on imap. If marked as spam, they go in INBOX.spam...
> 
> After that, if I receive spam in my normal boxes, I put it in the
> spam box.
> 
> End of process : each day, a script read the messages in INBOX.spam
> and learn those without ___SPAM___ (so, the mistaken) to be real
> spam. The contrary to the rare ham that were in the spam box.

If you autolearned during classification you need to unlearn before
relearning. Are you doing that?


> This script also run bogoutil -l wordlist.db and such things after
> having corrected the mistakes ...
> 
> Unless all this sophisticated process, half of my spam has still a
> bogofilter score of around 0.42, close to be considered really as
> spam, but not crossing the border !


Bogofilter produces results that cluster three-ways around 0.0, 0.5.
and 1.0, representing ham, unsure and spam. In my experience it's
unusual to get  spam below 0.49. I suspect that your wordlist is
mistrained.



More information about the Bogofilter mailing list