mime question

David Relson relson at osagesoftware.com
Thu May 1 01:16:35 CEST 2003


At 05:54 PM 4/30/03, Michael O'Reilly wrote:

>>>Check the Received lines and if the name doesn't match the reversed
>>>domain name add 'markup:forged'.
>>IMHO, there is no room for such features as DNS lookups in bogofilter.
>>Bogofilter is meant as a statistical text processor with data base back
>>end. Again, these DNS checks can be requested in SpamAssassin. No need
>>to reimplement this. (It would make bogofilter dependent on the
>>performance of external systems, and may make bogofilter unable to
>>decide if a message is spam or no when the DNS client cannot complete
>>its work.)
>
>I wasn't looking to use DNS for this. The MTA already does the DNS
>lookup for us. The only question is, does it match.
>
>Received: from bsd1.extrafilm.com.au (bsd1.extrafilm.com.au [203.52.211.2])
>         by mail012.syd.optusnet.com.au (8.11.6p2/8.11.6) with ESMTP id
>
>The bit inside the parenthesis is the MTA doing a reverse lookup. The
>question is: Does the bit outside (user supplied) match the bit inside?
>(From the DNS). A mis-match it fairly frequent, but the bit outside
>is normally a subset of the bit inside. That's not data that
>bogofilter can currently detect.

You could pretty easily have lexer_v3.l pass the whole Received: line to a 
routine which creates a special token.

>>>My casual glance at my corpus indicates that all these things are
>>>spam indicators (not certain, but they do mean something) and they're
>>>currently not clues that bogofilter can see.
>>Indeed they're not, but bogofilter will extract the IP or a (forged)
>>address and see it as a token that its score is based upon.
>
>Of course, but that doesn't help much if this is the only message
>that's come from that IP address. (as it very common with spam
>originating from dial-up or DSL ranges).


Michael,

FWIW, I just did a quick check on the spam scores of "may", "forged", and 
"unknown" and got:

[relson at osage backup.d]$ bogoutil -p $BOGOFILTER_DIR may forged unknown
                        spam    good  Gra prob  Rob prob
may                    3340    6005  0.573095  0.572728
forged                   12     199  0.127051  0.163150
unknown                5027    3000  0.801758  0.800925

"unknown" is the strongest spam indicator of the three words (for 
me).  Then I ran "grep -w forget 2003.04/mail.*" and got a variety of 
differing results.  At least one site shows "may be forged" and is 
not.  Sample follows:

 From jobhunter at flipdog.com  Wed Apr  2 05:58:24 2003
Return-Path: <jobhunter at flipdog.com>
Received: from fs1.i.flipdog.com (ftp.flipdog.com [63.121.30.222])
         by example.com (Postfix) with ESMTP id 3AFEB27ECE
         for <relson at example.com>; Wed,  2 Apr 2003 05:58:24 -0500 (EST)
Received: from mail.flipdog.com (app1.i.flipdog.com [192.168.24.46])
         by fs1.i.flipdog.com (Postfix) with ESMTP id 53036231C3
         for <relson at example.com>; Wed,  2 Apr 2003 03:58:21 -0700 (MST)
Received: from app1.i.flipdog.com (localhost.i.flipdog.com [127.0.0.1] (may 
be forged))
         by mail.flipdog.com with SMTP (8.8.6 (PHNE_17190)/8.7.1) id DAA09718
         for <relson at example.com>; Wed, 2 Apr 2003 03:58:19 -0700 (MST)
From: jobhunter at flipdog.com

Of course, my token scores and your token scores are expected to vary.

David

P.S.  I've CC'd the mailing list since it's up and working.






More information about the Bogofilter mailing list