Qmail with Spamassassin (bogofilter to be installed) Question

bogo at escom.com bogo at escom.com
Sat Mar 25 04:59:20 CET 2006


Ryan Pagquil wrote:
> My user's emails directly goes to ~theirhome/Maildir/new or 
> ~theirhome/Maildir/cur so I need to get those for my 1000+ users? 
> Is there any simplified approach for this?

I did initial training with a sendmail server by making daily snapshots 
of /var/spool/mail for a couple days:

  cd /var/spool/mail
  touch /tmp/hamdata
  foreach i (*)
	cat $i >> /tmp/hamdata
	echo "" >> tmp/hamdata
  end

It might have been possible to just cat * > /tmp/hamdata, but I wanted to 
be sure there were blank lines between the last message of one mailbox
and the first message of the next mailbox.

FTP it over to the bogofilter platform and feed it to bogofilter:

  bogofilter -vn < hamdata

Bogofilter swallowed the whole hamdata file without spilling anything,
and I didn't have to look at any of the user messages (if that is a concern).
All you need is a stream of messages, beginning with "From xxx" and
separated by blank lines.

I don't know anything about the qmail spool directory format, but 
you should be able to make a script that will create a file that 
Bogofilter can scan.  (Hmmm.  Didn't look, maybe there's something
in contributions?)

This should give really good results if you're using 3-state classification 
and reviewing Unsures before you send them to the final server.  Otherwise,
you could get into a cycle of sending spam, wrongly classifying it as ham,
which would cause it to be sent again... and again.  If I recall your 
network (spam assassin running in front), that's functionally like 
our architecture.  The more stuff S.A. knocks down, the sooner you'll
converge and the fewer problems with misclassification cycles.  But I 
think the real key to the problem is manually classifying Unsures
on your front end filters, if that is possible.

Al



More information about the Bogofilter mailing list