How do I filter out spam that turns up on mailing lists?

Nigel Henry cave.dnb at tiscali.fr
Sat Jan 26 19:14:54 CET 2008


On Saturday 26 January 2008 01:16, David Relson wrote:
> On Fri, 25 Jan 2008 20:22:45 +0100
> Nigel Henry wrote:
>
> ...[snip]...
>
> > Hi David. Meanwhile back at the ranch, I'm not really on my way to
> > creating this ignore.db. Not being one to give up (although a few
> > days have passed), here's how things stand at present.
> >
> > I already had an /etc/bogofilter.cf.example file, but also created
> > an /etc/bogofilter.cf file. I have added the following 2 lines to
> > this newly created file.
> >
> > wordlist i,ignore,ignore.db,1
> > wordlist r,word,wordlist.db,2
>
> good
>
> > Question 1:
> > Do entries in /etc/bogofilter.cf override default settings
> > in /etc/bogofilter.cf.example?
>
> bogofilter.cf.example is only an example.  It is not used by
> bogofilter.
>
> > Next I created a file named ignore_list.txt, and put the full headers
> > from one of my Debian list emails within.
> >
> > Now I ran the following command.
> > [djmons at localhost djmons]$ bogoutil -l ~/.bogofilter/ignore.db <
> > ignore_list.txt
> > bogoutil: Unexpected input [ Received:] on line 2. Expecting
> > whitespace before count.
> > read or write error, aborting.
> > [djmons at localhost djmons]$
>
> bogoutil expects lines containing 1 token, 2 counts, and a timestamp.
> It isn't smart enough to parse real headers.
>
> You could use the following to parse and import in a single command:
>
>    bogolexer < message.headers | bogoutil -l ignore.db

Ok, but I'm still rather clueless here. Anyway I 've run the stuff below.

[djmons at localhost djmons]$ bogolexer < ignore_list.txt | bogoutil -l 
~/.bogofilter/ignore.db
[djmons at localhost djmons]$ bogoutil -d .bogofilter/ignore.db
195 0 0 20080126
get_token: 220 0 20080126
normal 0 0 20080126
[djmons at localhost djmons]$

The ignore_list.txt above is the full headers from a Debian mailing list 
email.

Does that output above look any better?

On an earlier post you said:
Quote:
What you _could_ do is create an ignore list with headers from the
debian list.  This would eliminate those tokens from the scoring
effectively telling bogofilter to score using only body tokens.

This is still what I'm looking for. It's not too easy to test the 
effectiveness of the ignore.db at the moment, as I only get the odd spam 
email from the Debian list, but if I can get it to work, it will be a job 
well done.

Thanks for your help with this problem.

Nigel.

btw: Sorry about the bad language on the previous post. I've sent a lot of 
spam to spamcop, but with limited success. Someone on a list said that they 
had had success with knujon, and they do seem to have shut down many 
spammers, but it is an horrendous problem. Filtering at the user end is ok, 
but ideally it needs to be stopped at source, but probably is not going to 
happen.
http://www.knujon.com/sendusspam.html




> David



More information about the Bogofilter mailing list