best way to create DBs

David N Murray dnm at jsbsystems.com
Tue Nov 18 14:18:36 CET 2003


In my environment, I'm doing 'global' filtering for the company.
I make a copy of every message that gets delivered.  The reason for this
is because there's no reasonable way to get the headers out of Outlook and
Outlook Express.  The users forward their spam to spam at example.com and a
nightly process matches the forwarded messages to the originals so the
headers are found.  The copy of the original message gets fed back into
bogofilter (-s).

Since I have a copy of every message, for the first couple of days, I use
that file as an mbox and use pine to separate the messages into good and
bad folders.  I then feed the entire folders into bogofilter to construct
the DB.

Yes, its extremely time consuming.  Out of sheer boredom, I replaced pine
with a perl script that allows me to look at some of the headers and the
first couple of lines of the email, then route it to bogofilter with the
appropriate designation (-n or -s).  After I get through the first
thousand or so, I modified the perl script to pass the message through
bogofilter (to rate it), first, and then look at how bogofilter scored it
to help me determine what is spam and ham.

The site I'm supporting gets about 22K msgs/day with >65% spam.  It took
me about 30 hours to build the DB.  That's the tough part about training:
you have to train it.

HTH,
Dave

On Nov 18, Alessandro de Manzano scribed:

> Hello,
>
> I'm quite new to Bogofilter, I'm very interested in deploying it at my
> company.
>
> I read docs and FAQs but I've still an unanswered question, so I try to
> post it here ;)
>
> My mail server's setup is quite common, a public-IP FreeBSD 4.9 machine
> running Postfix as MTA and qpopper as POP3. Behind it a bunch of users
> poll it with a wide range of MUA (OE5, OE6, Eudora, PMMail, Netscape
> Mail, etc. etc. etc.) from everywhere on the Net. I'm already running
> some anti-spam measures like few regexp, some DNS-RBL,
> remove-exec-attachments using Postfix features.
>
> My problem is that with this setup I don't know (I can't ?) create a
> "good database" since I've not really a true message base since all
> mail is only in transit on my server (good and bad mixed, since some
> spam bypass static filters).
> If I had , like my home setup, a bunch of mailboxes (standard UNIX
> mailbox read with Mutt or similar) I could create the "good" and "bad"
> messages database Bogofilter needs.
>
> Since I guess my office setup is quite common I'ld ask you how do you
> solved this problem ?
>
> I also tought about making a copy of _every_mail my Postfix sees
> (always_bcc  keyword) but this IMHO is not a solution since also SPAM
> would be copied...
>
>
> Someone could , please, enlight me ?
> What am I missing ? :)
>
>
> Many thanks in advance!!
>
>
>
>
> Alessandro de Manzano
>
> Senior Network Manager
> Playstos - TIMA S.p.A.
> Corso Sempione 63
> 20149 Milano, Italy
>
> tel.: +39-023314153
> fax: +39-02315678
> email: demanzano at playstos.com
>
> http://www.playstos.com
>
>
>
>
> ---------------------------------------------------------------------
> FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
> To unsubscribe, e-mail: bogofilter-unsubscribe at aotto.com
> For summary digest subscription: bogofilter-digest-subscribe at aotto.com
> For more commands, e-mail: bogofilter-help at aotto.com
>




More information about the Bogofilter mailing list