mime question

David Relson relson at osagesoftware.com
Thu May 1 03:00:49 CEST 2003


At 07:30 PM 4/30/03, m at mo.optusnet.com.au wrote:

>David Relson <relson at osagesoftware.com> writes:
> > At 05:54 PM 4/30/03, Michael O'Reilly wrote:
> > >The bit inside the parenthesis is the MTA doing a reverse lookup. The
> > >question is: Does the bit outside (user supplied) match the bit inside?
> > >(From the DNS). A mis-match it fairly frequent, but the bit outside
> > >is normally a subset of the bit inside. That's not data that
> > >bogofilter can currently detect.
> >
> > You could pretty easily have lexer_v3.l pass the whole Received: line to a
> > routine which creates a special token.
>
>That's what I was thinking.
> >
> > Michael,
> >
> > FWIW, I just did a quick check on the spam scores of "may", "forged", and
> > "unknown" and got:
> >
> > [relson at osage backup.d]$ bogoutil -p $BOGOFILTER_DIR may forged unknown
> >                         spam    good  Gra prob  Rob prob
> > may                    3340    6005  0.573095  0.572728
> > forged                   12     199  0.127051  0.163150
> > unknown                5027    3000  0.801758  0.800925
>
>That's interesting that the numbers are so radically different.
>                        spam    good  Gra prob  Rob prob
>may                    8768    4982  0.537313  0.537128
>forged                 4151     645  0.809398  0.807506
>unknown                1564    3284  0.239110  0.239773
>
>I suspect a different MTA? (are you using sendmail or
>something else?)

postfix.

>At the risk of digressing, what are your top spam indicators?
>
>$ bogoutil -d spamlist.db | awk '{print $1}' | bogoutil -p . |sort -n +4 | 
>tail -n6
>safelist               1235       8  0.990278  0.979093
>recurring              1459      11  0.988703  0.979247
>opted-in               1979      16  0.987896  0.980909
>x-list-unsubscribe     1514       6  0.994030  0.984779
>h4f                    1272       1  0.998810  0.987689
>x-info                 2220       6  0.995921  0.989547

Here're my top 20:

$ bogoutil -d spamlist.db | awk '{print $1}' | bogoutil -p . | sort -n +4 | 
tail -n20
raton                   586       2  0.998588  0.985428
boca                    588       2  0.998593  0.985477
opted                  1088      18  0.993192  0.986134
znex                    730       5  0.997170  0.986600
url:66.216              885       6  0.997199  0.988451
osagesoftware          1298      17  0.994603  0.988652
unsub.php               804       1  0.999485  0.989811
remove.asp              852       1  0.999514  0.990376
pbz                    1117       6  0.997779  0.990816
customerservice        1831      20  0.995495  0.991253
url:65.61              1184       3  0.998951  0.992357
bfntrfbsgjner          1120       0  1.000000  0.993013
m25                    1325       3  0.999063  0.993161
recurring              2427      16  0.997276  0.994055
postal                 3494      29  0.996573  0.994336
t.pl                   1625       3  0.999236  0.994412
x-id                   1589       1  0.999739  0.994801
f2.6                   2665      11  0.998293  0.995350
x-list-unsubscribe     2613       3  0.999525  0.996513
h4f                    2497       1  0.999834  0.996681

Interestingly "x-list-unsubscrib", "h4f" and "recurring" are in both 
lists.  Also my domain name (without the .com) is right up there as is 
"boca raton".  Go figger!





More information about the Bogofilter mailing list