info about spam messages
David Relson
relson at osagesoftware.com
Sat Jun 19 21:47:46 CEST 2004
On Sat, 19 Jun 2004 12:24:03 -0700
Charlie Wyble wrote:
> 100% agree with NOT adding the from field in the equasioins. I have
> gotten untold numbers of bounce messages that I DID NOT SEND. but my
> address had been placed in the from address. I would NOT want my
> email to end up getting flagged as a spamer just because of things
> like this.
>
> Charlie
Hi Charlie,
You're a bit off target here (I think). The discussion hasn't been
whether to include the From: line in the scoring. The discussion has
been whether bogofilter should make additional characteristics of an
email available for (optional) addition to the X-Bogosity: line and/or
the system logging record. '%I' was added to make the originating IP
Address available (though the implementation may be incorrect for some
MTA). Whether to make the from addresses available through (optional)
'%F' has been the main subject of the discussion.
People have been generally against doing it (adding '%F') because that
field is easily and commonly forged.
FWIW, the tokens in the From: line are already parsed and appear as
"from:Charlie", "from:Wyble", etc in the wordlist.
Out of curiosity, I took a look at _my_ wordlist to see how my name
fares in the ham/spam scoring. Here's what I found:
[relson at osage spam-fixups]$ bogoutil -p wordlist.h.08/wordlist.db
RELSON Relson from:Relson from:relson head:RELSON head:Relson
head:relson rcvd:RELSON rcvd:Relson rcvd:relson relson rtrn:Relson
rtrn:relson subj:RELSON subj:Relson subj:relson to:RELSON to:Relson
to:relson
spam good Fisher
RELSON 230 206 0.572436
Relson 1708 7515 0.214170
from:Relson 16 4953 0.003861
from:relson 37 8102 0.005447
head:RELSON 37 97 0.313875
head:Relson 8 876 0.010843
head:relson 13436 66726 0.194496
rcvd:RELSON 241 199 0.592204
rcvd:Relson 1 17 0.066338
rcvd:relson 12656 65063 0.189138
relson 1521 6469 0.219935
rtrn:Relson 7 0 0.998783
rtrn:relson 561 4787 0.123216
subj:RELSON 15 2 0.899538
subj:Relson 670 10 0.987694
subj:relson 393 1139 0.292664
to:RELSON 456 149 0.785852
to:Relson 1931 5478 0.297110
to:relson 9437 13654 0.453190
As you can see, location and capitalization are highly relevant as to
which forms of "relson" are hammish and which are spammish.
Also, since my configuration includes min_dev=0.435, most of the
variations will be ignored. If I'm counting properly, 5 of the 19 forms
won't be excluded by min_dev.
Regards,
David
More information about the Bogofilter
mailing list