still problem in spam management

Stéphane Guedon stephane at 22decembre.eu
Wed Apr 6 09:41:56 CEST 2011


On Wednesday 06 April 2011 01:17:56 RW wrote:
> On Mon, 4 Apr 2011 17:40:01 +0100
> 
> Stéphane Guedon <stephane at 22decembre.eu> wrote:
> > Hi everyone
> > 
> > I use bogfilter since months now, and I still have a problem of
> > classification between ham and spam.
> > Half of my spam is still classified as ham.
> > 
> > Description of the system :
> > My bogofilter work on my mail server, the mail is delivered by
> > postfix, bogofilter read it, rank it, mark it as spam or not
> > ( ___SPAM___ in header) and make auto-update, and give it to postix
> > again. Postfix give it to dovecot deliver which set it in the
> 
> Sounds like you are doing two-way classification and autolearning
> everything.
> 
> > corresponding box on imap. If marked as spam, they go in INBOX.spam...
> > 
> > After that, if I receive spam in my normal boxes, I put it in the
> > spam box.
> > 
> > End of process : each day, a script read the messages in INBOX.spam
> > and learn those without ___SPAM___ (so, the mistaken) to be real
> > spam. The contrary to the rare ham that were in the spam box.
> 
> If you autolearned during classification you need to unlearn before
> relearning. Are you doing that?

I do that :

If in spam but without "___SPAM___" I tell it's really a spam...
grep -L "___SPAM___" * | bogofilter -Ns -d /var/lib/bogofilter


contrary on the ham boxes :
grep -l "___SPAM___" * | bogofilter -Sn -d /var/lib/bogofilter

The scripts are not mine, but sounds good !

Remarks ?

> 
> > This script also run bogoutil -l wordlist.db and such things after
> > having corrected the mistakes ...
> > 
> > Unless all this sophisticated process, half of my spam has still a
> > bogofilter score of around 0.42, close to be considered really as
> > spam, but not crossing the border !
> 
> Bogofilter produces results that cluster three-ways around 0.0, 0.5.
> and 1.0, representing ham, unsure and spam. In my experience it's
> unusual to get  spam below 0.49. I suspect that your wordlist is
> mistrained.

-- 
Stéphane Guedon
page web : http://www.22decembre.eu/
carte de visite : http://www.22decembre.eu/downloads/Stephane-Guedon.vcf
clé publique gpg : http://www.22decembre.eu/downloads/Stephane-Guedon.asc
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 316 bytes
Desc: This is a digitally signed message part.
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20110406/055b231f/attachment.sig>


More information about the Bogofilter mailing list