mime question
David Relson
relson at osagesoftware.com
Thu May 1 01:16:35 CEST 2003
At 05:54 PM 4/30/03, Michael O'Reilly wrote:
>>>Check the Received lines and if the name doesn't match the reversed
>>>domain name add 'markup:forged'.
>>IMHO, there is no room for such features as DNS lookups in bogofilter.
>>Bogofilter is meant as a statistical text processor with data base back
>>end. Again, these DNS checks can be requested in SpamAssassin. No need
>>to reimplement this. (It would make bogofilter dependent on the
>>performance of external systems, and may make bogofilter unable to
>>decide if a message is spam or no when the DNS client cannot complete
>>its work.)
>
>I wasn't looking to use DNS for this. The MTA already does the DNS
>lookup for us. The only question is, does it match.
>
>Received: from bsd1.extrafilm.com.au (bsd1.extrafilm.com.au [203.52.211.2])
> by mail012.syd.optusnet.com.au (8.11.6p2/8.11.6) with ESMTP id
>
>The bit inside the parenthesis is the MTA doing a reverse lookup. The
>question is: Does the bit outside (user supplied) match the bit inside?
>(From the DNS). A mis-match it fairly frequent, but the bit outside
>is normally a subset of the bit inside. That's not data that
>bogofilter can currently detect.
You could pretty easily have lexer_v3.l pass the whole Received: line to a
routine which creates a special token.
>>>My casual glance at my corpus indicates that all these things are
>>>spam indicators (not certain, but they do mean something) and they're
>>>currently not clues that bogofilter can see.
>>Indeed they're not, but bogofilter will extract the IP or a (forged)
>>address and see it as a token that its score is based upon.
>
>Of course, but that doesn't help much if this is the only message
>that's come from that IP address. (as it very common with spam
>originating from dial-up or DSL ranges).
Michael,
FWIW, I just did a quick check on the spam scores of "may", "forged", and
"unknown" and got:
[relson at osage backup.d]$ bogoutil -p $BOGOFILTER_DIR may forged unknown
spam good Gra prob Rob prob
may 3340 6005 0.573095 0.572728
forged 12 199 0.127051 0.163150
unknown 5027 3000 0.801758 0.800925
"unknown" is the strongest spam indicator of the three words (for
me). Then I ran "grep -w forget 2003.04/mail.*" and got a variety of
differing results. At least one site shows "may be forged" and is
not. Sample follows:
From jobhunter at flipdog.com Wed Apr 2 05:58:24 2003
Return-Path: <jobhunter at flipdog.com>
Received: from fs1.i.flipdog.com (ftp.flipdog.com [63.121.30.222])
by example.com (Postfix) with ESMTP id 3AFEB27ECE
for <relson at example.com>; Wed, 2 Apr 2003 05:58:24 -0500 (EST)
Received: from mail.flipdog.com (app1.i.flipdog.com [192.168.24.46])
by fs1.i.flipdog.com (Postfix) with ESMTP id 53036231C3
for <relson at example.com>; Wed, 2 Apr 2003 03:58:21 -0700 (MST)
Received: from app1.i.flipdog.com (localhost.i.flipdog.com [127.0.0.1] (may
be forged))
by mail.flipdog.com with SMTP (8.8.6 (PHNE_17190)/8.7.1) id DAA09718
for <relson at example.com>; Wed, 2 Apr 2003 03:58:19 -0700 (MST)
From: jobhunter at flipdog.com
Of course, my token scores and your token scores are expected to vary.
David
P.S. I've CC'd the mailing list since it's up and working.
More information about the Bogofilter
mailing list