How do I filter out spam that turns up on mailing lists?

Nigel Henry cave.dnb at tiscali.fr
Sat Jan 26 21:55:13 CET 2008


On Saturday 26 January 2008 20:41, David Relson wrote:
> On Sat, 26 Jan 2008 19:14:54 +0100
>
> Nigel Henry wrote:
> > On Saturday 26 January 2008 01:16, David Relson wrote:
> > > On Fri, 25 Jan 2008 20:22:45 +0100
> > > Nigel Henry wrote:
> > >
> > > ...[snip]...
> > >
> > > > Hi David. Meanwhile back at the ranch, I'm not really on my way to
> > > > creating this ignore.db. Not being one to give up (although a few
> > > > days have passed), here's how things stand at present.
> > > >
> > > > I already had an /etc/bogofilter.cf.example file, but also created
> > > > an /etc/bogofilter.cf file. I have added the following 2 lines to
> > > > this newly created file.
> > > >
> > > > wordlist i,ignore,ignore.db,1
> > > > wordlist r,word,wordlist.db,2
> > >
> > > good

> > > > Next I created a file named ignore_list.txt, and put the full
> > > > headers from one of my Debian list emails within.
> > > >
> > > > Now I ran the following command.
> > > > [djmons at localhost djmons]$ bogoutil -l ~/.bogofilter/ignore.db <
> > > > ignore_list.txt
> > > > bogoutil: Unexpected input [ Received:] on line 2. Expecting
> > > > whitespace before count.
> > > > read or write error, aborting.
> > > > [djmons at localhost djmons]$
> > >
> > > bogoutil expects lines containing 1 token, 2 counts, and a
> > > timestamp. It isn't smart enough to parse real headers.
> > >
> > > You could use the following to parse and import in a single command:
> > >
> > >    bogolexer < message.headers | bogoutil -l ignore.db
> >
> > Ok, but I'm still rather clueless here. Anyway I 've run the stuff
> > below.
> >
> > [djmons at localhost djmons]$ bogolexer < ignore_list.txt | bogoutil -l
> > ~/.bogofilter/ignore.db
> > [djmons at localhost djmons]$ bogoutil -d .bogofilter/ignore.db
> > 195 0 0 20080126
> > get_token: 220 0 20080126
> > normal 0 0 20080126
> > [djmons at localhost djmons]$
> >
> > The ignore_list.txt above is the full headers from a Debian mailing
> > list email.
> >
> > Does that output above look any better?
>
> It looks better, though not quite right -- due to a flag I forgot to
> include, i.e. "-p".  Use
>
>    rm ignore.db
>   bogolexer -p < message.headers | bogoutil -l ignore.db
>
> The output should list each token from the headers and the counts
> should be "0 0 YYYYMMDD".

Ok. I've just run the stuff below.

[djmons at localhost djmons]$ rm .bogofilter/ignore.db
[djmons at localhost djmons]$ bogolexer -p < ignore_list.txt | bogoutil 
-l .bogofilter/ignore.db
[djmons at localhost djmons]$ bogoutil -d .bogofilter/ignore.db
rtrn:127.0.0.1 0 0 20080126
rtrn:202.124.106.199 0 0 20080126
rtrn:202.36.170.3 0 0 20080126
rtrn:82.195.75.100 0 0 20080126
rtrn:AES256-SHA 0 0 20080126
rtrn:AWL 0 0 20080126
rtrn:Bannister 0 0 20080126
rtrn:CA4.6070201 0 0 20080126
rtrn:Chris 0 0 20080126
rtrn:CkNsjqaKbqG.A.baG.3polHB 0 0 20080126
rtrn:Content-Disposition 0 0 20080126
rtrn:Content-Type 0 0 20080126
rtrn:D.2000405 0 0 20080126
rtrn:D5.50706 0 0 20080126
rtrn:Date 0 0 20080126
rtrn:Delivered-To 0 0 20080126
rtrn:ESMTP 0 0 20080126
rtrn:Exim 0 0 20080126
rtrn:FOURLA 0 0 20080126
rtrn:From 0 0 20080126
rtrn:GA24974 0 0 20080126
rtrn:GC4214 0 0 20080126
rtrn:How 0 0 20080126
rtrn:In-Reply-To 0 0 20080126
rtrn:Jan 0 0 20080126
rtrn:LDOSUBSCRIBER 0 0 20080126
rtrn:LDO_WHITELIST 0 0 20080126
rtrn:List-Help 0 0 20080126
rtrn:List-Id 0 0 20080126
rtrn:List-Post 0 0 20080126
rtrn:List-Subscribe 0 0 20080126
rtrn:List-Unsubscribe 0 0 20080126
rtrn:MIME-Version 0 0 20080126
rtrn:Message-ID 0 0 20080126
rtrn:Mutt 0 0 20080126
rtrn:NZDT 0 0 20080126
rtrn:Old-Return-Path 0 0 20080126
rtrn:Postfix 0 0 20080126
rtrn:Precedence 0 0 20080126
rtrn:QMQP 0 0 20080126
rtrn:Received 0 0 20080126
rtrn:References 0 0 20080126
rtrn:Resent-Date 0 0 20080126
rtrn:Resent-From 0 0 20080126
rtrn:Resent-Message-ID 0 0 20080126
rtrn:Resent-Sender 0 0 20080126
rtrn:SpamAssassin 0 0 20080126
rtrn:Status 0 0 20080126
rtrn:Subject 0 0 20080126
rtrn:TLSv1 0 0 20080126
rtrn:UTC 0 0 20080126
rtrn:User-Agent 0 0 20080126
rtrn:Wed 0 0 20080126
rtrn:X-KMail-EncryptionState 0 0 20080126
rtrn:X-KMail-MDN-Sent 0 0 20080126
rtrn:X-KMail-SignatureState 0 0 20080126
rtrn:X-Loop 0 0 20080126
rtrn:X-Mailing-List 0 0 20080126
rtrn:X-Original-To 0 0 20080126
rtrn:X-Rc-Spam 0 0 20080126
rtrn:X-Rc-Virus 0 0 20080126
rtrn:X-Spam-Checker-Version 0 0 20080126
rtrn:X-Spam-Level 0 0 20080126
rtrn:X-Spam-Status 0 0 20080126
rtrn:X-Spam-Virus 0 0 20080126
rtrn:X-Status 0 0 20080126
rtrn:X-UID 0 0 20080126
rtrn:X-policyd-weight 0 0 20080126
rtrn:archive 0 0 20080126
rtrn:autolearn 0 0 20080126
rtrn:bits 0 0 20080126
rtrn:bounce-debian-user 0 0 20080126
rtrn:box 0 0 20080126
rtrn:cached 0 0 20080126
rtrn:cave.dnb 0 0 20080126
rtrn:certificate 0 0 20080126
rtrn:charset 0 0 20080126
rtrn:cipher 0 0 20080126
rtrn:client 0 0 20080126
rtrn:cox.net 0 0 20080126
rtrn:debian-user 0 0 20080126
rtrn:debian-user-request 0 0 20080126
rtrn:earthlight.co.nz 0 0 20080126
rtrn:esmtp 0 0 20080126
rtrn:failed 0 0 20080126
rtrn:for 0 0 20080126
rtrn:from 0 0 20080126
rtrn:helo 0 0 20080126
rtrn:help 0 0 20080126
rtrn:inline 0 0 20080126
rtrn:latest 0 0 20080126
rtrn:list 0 0 20080126
rtrn:lists-debian-user 0 0 20080126
rtrn:lists.debian.org 0 0 20080126
rtrn:liszt 0 0 20080126
rtrn:liszt.debian.org 0 0 20080126
rtrn:localhost 0 0 20080126
rtrn:mahler.earthlight.co.nz 0 0 20080126
rtrn:mahler2.earthlight.co.nz 0 0 20080126
rtrn:mail.libertysurf.net 0 0 20080126
rtrn:mailto 0 0 20080126
rtrn:mockingbird 0 0 20080126
rtrn:osamu.debian.net 0 0 20080126
rtrn:painter-decorator.eu 0 0 20080126
rtrn:plain 0 0 20080126
rtrn:rate 0 0 20080126
rtrn:requested 0 0 20080126
rtrn:required 0 0 20080126
rtrn:result 0 0 20080126
rtrn:score 0 0 20080126
rtrn:subject 0 0 20080126
rtrn:subscribe 0 0 20080126
rtrn:tests 0 0 20080126
rtrn:text 0 0 20080126
rtrn:tiscali.fr 0 0 20080126
rtrn:unsubscribe 0 0 20080126
rtrn:us-ascii 0 0 20080126
rtrn:use 0 0 20080126
rtrn:userid 0 0 20080126
rtrn:using 0 0 20080126
rtrn:version 0 0 20080126
rtrn:with 0 0 20080126
[djmons at localhost djmons]$

There's a whole bunch more output  now from running bogoutil 
-d .bogofilter/ignore.db

Does that look better to you?

Debian headers that I'm using for this are attached below, as ignore_list.txt.



> Look at the FAQ for info on bogofilter's "-vvv" flags which tell
> bogofilter to display each token and its ham/spam counts and spamicity
> score.  With a test message, save the "-vvv" results before and after
> creating the  ignorelist, and then compare them.  You should see a
> difference in the final score as well as the header tokens.  The last
> column of the "-vvv" output has a "+" for tokens used in scoring the
> message, a "i" for tokens in the ignore list, and a "-" for tokens near
> 0.5 that are not used in scoring the message
I'll go for the -vvv comparison a bit later, as I'm feeling a bit worn out.
>
> HTH,
>
> David

Thanks for your help.

Nigel.
-------------- next part --------------
Return-Path: <bounce-debian-user=cave.dnb=tiscali.fr at lists.debian.org>
 Received: from liszt.debian.org (82.195.75.100) by mail.libertysurf.net (7.1.026)
        id 47849605010C9C17 for cave.dnb at tiscali.fr; Wed, 23 Jan 2008 01:29:47 +0100
 Received: from localhost (localhost [127.0.0.1])
        by liszt.debian.org (Postfix) with QMQP
        id E50F213A57E7; Wed, 23 Jan 2008 00:29:43 +0000 (UTC)
 Old-Return-Path: <mockingbird at earthlight.co.nz>
 X-Spam-Virus: No
 X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on liszt.debian.org
 X-Spam-Level: 
 X-Spam-Status: No, score=-7.7 required=4.0 tests=AWL,FOURLA,LDOSUBSCRIBER,
        LDO_WHITELIST autolearn=failed version=3.2.3
 X-Original-To: debian-user at lists.debian.org
 Delivered-To: lists-debian-user at liszt.debian.org
 X-policyd-weight: using cached result; rate: -6.1
 Received: from mahler2.earthlight.co.nz (mahler.earthlight.co.nz [202.36.170.3])
        (using TLSv1 with cipher AES256-SHA (256/256 bits))
        (No client certificate requested)
        by liszt.debian.org (Postfix) with ESMTP id 9458713A538A
        for <debian-user at lists.debian.org>; Wed, 23 Jan 2008 00:29:34 +0000 (UTC)
 Received: from [202.124.106.199] (helo=box)
        by mahler2.earthlight.co.nz with esmtp (Exim 4.50)
        id 1JHTUX-000556-KV
        for debian-user at lists.debian.org; Wed, 23 Jan 2008 13:29:29 +1300
 Received: by box (Postfix, from userid 1000)
        id 19ADE57042; Wed, 23 Jan 2008 13:33:45 +1300 (NZDT)
 Date: Wed, 23 Jan 2008 13:33:45 +1300
 From: Chris Bannister <mockingbird at earthlight.co.nz>
 To: debian-user at lists.debian.org
 Subject: Re: How to use Mutt?
 Message-ID: <20080123003345.GC4214 at box>
 References: <479245D5.50706 at painter-decorator.eu> <20080119190446.GA24974 at osamu.debian.net> <47925CA4.6070201 at cox.net> <4792672D.2000405 at painter-decorator.eu>
 MIME-Version: 1.0
 Content-Type: text/plain;
  charset=us-ascii
 Content-Disposition: inline
 In-Reply-To: <4792672D.2000405 at painter-decorator.eu>
 User-Agent: Mutt/1.5.17 (2007-11-01)
 X-Rc-Virus: 2007-09-13_01
 X-Rc-Spam: 2007-10-04_01
 Resent-Message-ID: <CkNsjqaKbqG.A.baG.3polHB at liszt>
 Resent-From: debian-user at lists.debian.org
 X-Mailing-List: <debian-user at lists.debian.org> archive/latest/508945
 X-Loop: debian-user at lists.debian.org
 List-Id: <debian-user.lists.debian.org>
 List-Post: <mailto:debian-user at lists.debian.org>
 List-Help: <mailto:debian-user-request at lists.debian.org?subject=help>
 List-Subscribe: <mailto:debian-user-request at lists.debian.org?subject=subscribe>
 List-Unsubscribe: <mailto:debian-user-request at lists.debian.org?subject=unsubscribe>
 Precedence: list
 Resent-Sender: debian-user-request at lists.debian.org
 Resent-Date: Wed, 23 Jan 2008 00:29:43 +0000 (UTC)
 X-UID: 
 Status: R
 X-Status: N
 X-KMail-EncryptionState: 
 X-KMail-SignatureState: 
 X-KMail-MDN-Sent:
 
  


More information about the Bogofilter mailing list