info about spam messages

David Relson relson at osagesoftware.com
Sat Jun 19 21:47:46 CEST 2004


On Sat, 19 Jun 2004 12:24:03 -0700
Charlie Wyble wrote:

> 100% agree with NOT adding the from field in the equasioins.  I have 
> gotten untold numbers of bounce messages that I DID NOT SEND.  but my 
> address had been placed in the from address.  I would NOT want my
> email to end up getting flagged as a spamer just because of things
> like this.
> 
> Charlie

Hi Charlie,

You're a bit off target here (I think).  The discussion hasn't been
whether to include the From: line in the scoring.  The discussion has
been whether bogofilter should make additional characteristics of an
email available for (optional) addition to the X-Bogosity: line and/or
the system logging record.  '%I' was added to make the originating IP
Address available (though the implementation may be incorrect for some
MTA). Whether to make the from addresses available through (optional)
'%F' has been the main subject of the discussion.

People have been generally against doing it (adding '%F') because that
field is easily and commonly forged.

FWIW, the tokens in the From: line are already parsed and appear as
"from:Charlie", "from:Wyble", etc in the wordlist.

Out of curiosity, I took a look at _my_ wordlist to see how my name
fares in the ham/spam scoring.  Here's what I found:

[relson at osage spam-fixups]$ bogoutil -p wordlist.h.08/wordlist.db 
   RELSON Relson from:Relson from:relson head:RELSON head:Relson
   head:relson rcvd:RELSON rcvd:Relson rcvd:relson relson rtrn:Relson
   rtrn:relson subj:RELSON subj:Relson subj:relson to:RELSON to:Relson
   to:relson 

                                 spam    good    Fisher
RELSON                            230     206  0.572436
Relson                           1708    7515  0.214170
from:Relson                        16    4953  0.003861
from:relson                        37    8102  0.005447
head:RELSON                        37      97  0.313875
head:Relson                         8     876  0.010843
head:relson                     13436   66726  0.194496
rcvd:RELSON                       241     199  0.592204
rcvd:Relson                         1      17  0.066338
rcvd:relson                     12656   65063  0.189138
relson                           1521    6469  0.219935
rtrn:Relson                         7       0  0.998783
rtrn:relson                       561    4787  0.123216
subj:RELSON                        15       2  0.899538
subj:Relson                       670      10  0.987694
subj:relson                       393    1139  0.292664
to:RELSON                         456     149  0.785852
to:Relson                        1931    5478  0.297110
to:relson                        9437   13654  0.453190

As you can see, location and capitalization are highly relevant as to
which forms of "relson" are hammish and which are spammish.

Also, since my configuration includes min_dev=0.435, most of the
variations will be ignored.  If I'm counting properly, 5 of the 19 forms
won't be excluded by min_dev.

Regards,

David



More information about the Bogofilter mailing list