Qmail with Spamassassin (bogofilter to be installed) Question
bogo at escom.com
bogo at escom.com
Sat Mar 25 04:59:20 CET 2006
Ryan Pagquil wrote:
> My user's emails directly goes to ~theirhome/Maildir/new or
> ~theirhome/Maildir/cur so I need to get those for my 1000+ users?
> Is there any simplified approach for this?
I did initial training with a sendmail server by making daily snapshots
of /var/spool/mail for a couple days:
cd /var/spool/mail
touch /tmp/hamdata
foreach i (*)
cat $i >> /tmp/hamdata
echo "" >> tmp/hamdata
end
It might have been possible to just cat * > /tmp/hamdata, but I wanted to
be sure there were blank lines between the last message of one mailbox
and the first message of the next mailbox.
FTP it over to the bogofilter platform and feed it to bogofilter:
bogofilter -vn < hamdata
Bogofilter swallowed the whole hamdata file without spilling anything,
and I didn't have to look at any of the user messages (if that is a concern).
All you need is a stream of messages, beginning with "From xxx" and
separated by blank lines.
I don't know anything about the qmail spool directory format, but
you should be able to make a script that will create a file that
Bogofilter can scan. (Hmmm. Didn't look, maybe there's something
in contributions?)
This should give really good results if you're using 3-state classification
and reviewing Unsures before you send them to the final server. Otherwise,
you could get into a cycle of sending spam, wrongly classifying it as ham,
which would cause it to be sent again... and again. If I recall your
network (spam assassin running in front), that's functionally like
our architecture. The more stuff S.A. knocks down, the sooner you'll
converge and the fewer problems with misclassification cycles. But I
think the real key to the problem is manually classifying Unsures
on your front end filters, if that is possible.
Al
More information about the Bogofilter
mailing list