bogofilter setup in multi-user

Tom Anderson tanderso at oac-design.com
Wed Jun 30 14:18:22 CEST 2004


On Tue, 2004-06-29 at 19:08, .rp wrote:
> > > :0H
> > > *  ^Subject:.*(p0rn| pr0n| HGH |ha1f| curn |GB2312)|\
> > >   ^Subject:.*(paris hilton|p4ris|par1s|p4r1s )|\
> > >   ^Subject:.*(iagra |@gra|1agra|lagra|Cialis)|\
> > > /dev/null
> > 
> > This is dangerous... what if you have legitimate emails that talk about
> > these topics... like emails in this list?
> > 
> so? the subject in the header is checked, not the body.

Still, the same comment.  People may craft perfectly innocent emails
which will get shafted due to this, eg. consider these subjects:

recent influx of viagra spam increased my server load 200%
pics of my trip to niagra falls
I need help solving the lagrange equation for this surface
This diet pill has HGH in it, is that safe?
here's that diagra m we discussed in the meeting
network gaming is a flagrant misuse of company resources
I heard bob is on viagra
does Cialis work the same as viagra?
employees surfing for "pr0n" at work
etc

Are you sure none of your users are going to recieve legitimate emails
with subjects such as these?

> > > #no legit mail should have "|" in its subject
> > > :0H
> > > * ^Subject:.*(\|)
> > > /var/spool/mail/junkbox
> > 
> > Why not?  I don't see anything in the RFC banning it.  Do you often receive
> > spams with a pipe in the subject?  This seems odd to me.
> >
> who said anything about the RFC? this is real world. And yes, we did get tons of it, 
> otherwise it wouldn't have made an issue.

Well, making random characters illegal would make no sense to someone
sending an email to your users, and they would have no idea that they
would be filtered because of it.  It's not "standard" practice.

> > > :0H:
> > > * ^Subject:.*(â|í|Ò|Æ|¶|¯|º|Í|Á|ª|Í|¨|È|«|¿|ª|Í|¨|Ë|ë|ä|ö|ü|ï|é|¡|ã|ò)
> > > /dev/null
> > 
> > Again, dangerous.
> >
> Not for us, but yes if you did run a foreign language shop you should probably adjust 
> it to your purposes.

You don't need to "run a foreign language shop" for this to be
dangerous.  All you need is someone copying and pasting something from
an external document in a different format or from the web to get some
weird characters.  They may not even notice, or they just won't care, or
they'll even think it's cool, and try to send it anyway.  Or what about
people who accent certain english words like "resume"?  Trying to be
correct and thorough, they'll end up in the bit bucket with no clue why,
and not even a bounce message telling them so.  

This is dangerous because it does not conform to a standard, and it is
undocumented as far as general email users are concerned.  Furthermore,
you yourself can't even determine how many legitimate emails you are
losing this way.  It would be far safer to consider such characters
within bogofilter's calculation so that one or two such characters don't
kill an entire email.
 
> > > Then we go through and see if it is for someone with special needs in
> > their filtering
> > > #if for Rick -  do a special run for them
> > 
> > Why not specify these rules in each users own procmailrc file and specify
> > their bogofilter settings in their bogofilter.cf file?  Managing this in a
> > single large file could be burdensome with a much larger set of users.
> >
> Depends - if the user already needs and has a shell account, then a .procmail is 
> setup in their directories. But if they don't , we handle it in one central location. This 
> actually turned out to be much easier than handling lots of individual files. In addition, 
> we already had a procmailrc setup to handle other email flows unrelated to BF.

I'm sure it works well for a few users.  However, if you had lots of
them, it could become a management problem.  This is true of any large
config file such as httpd.conf, named.conf, or virtusertable.  This is
why these configs are usually split up into multiple files these days. 
With more than a few users, I'd imagine that using each individual's own
config files would be easier.

> > It seems like you could just eliminate most of these procmail rules and rely
> > solely on bogofilter instead.  Bogofilter should handle most of the obvious
> > spam terms like "pr0n" and whatnot, and then you wouldn't have to maintain
> > your own ad hoc list which will likely go out of date very quickly.
> > 
> We see no need to have bogofilter handle the obvious spam anymore and would 
> prefer to spend computer resources on those that are not obvious.
> As for using /dev/null - those rules that got it 100% after a 3 month testing period 
> were assigned /dev/null, the ones that didn't get placed into folders that are rotated 
> every day and kept for a month.

That's fine for the 3 months you tested it.  But what about 3 months
from now, or a year from now.  These rules might be completely
different, but you won't know that you're throwing away legitimate
mail.  It's safer to run it all through bogofilter.  It really does not
use much resources.  At least in my opinion, adding more ram and
upgrading a cpu would be far cheaper than losing vital email.

Thanks for sharing your config.  I hope it is successful for you, but
it's good to be aware of the potential for trouble.

Tom





More information about the Bogofilter mailing list