headers - example

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Mon Mar 8 11:06:25 CET 2004


Jozef Hitzinger wrote:

>> I cannot see your point. Your output just reflects what your training
>> shows. Certain constellations seem to be more unlikely in spam. That is
>> normal and intended.
> 
> My point was to demonstrate what I was arguing previously (headers except
> Subject should not go into db):
> 
> "195.80.171.24"                     53  0.006570  0.000000  0.000074 +
> "rcvd:mail.slovanet.sk"             52  0.006446  0.000000  0.000075 +
> "212.55.234.133"                     1  0.000124  0.000000  0.003877 +
> "rcvd:mtx1.www.ematrix.sk"           1  0.000124  0.000000  0.003877 +
> "rcvd:proxy.ematrix.sk"              1  0.000124  0.000000  0.003877 +
> "to:hotmail.com"                   266  0.029999  0.002026  0.063266 +
> "head:UTC"                         661  0.061609  0.013842  0.183460 +
> 
> are neither "hammy" or "spammy" in nature.

Let's have a look. There are two IP-addresses. The were
found in the body, so they don't count here. There are three
 hostnames in rcvd; one was seen pretty often in ham, not at
all in spam, that will have some reason, so it is correct to
 use it as it is; two have been seen only once. Also this
might have a good reason. Next is the Hotmail thing, so you
seem to get a log legitimate mail with this in To; what's
wrong with that observation? Finally, there is this UTC
which indicated that according to your observations this is
not often used by spammers.

> Yet they are the only on the
> hammy side of this message. How did they got there? 

By your training.

> They come from
> headers. Because I trained on full messages, including headers (current
> recommended way), they're in.

I don't recommend this;-)

> So I agree with you, this reflects my training. But I don't agree with
> "certain constelations".

This is just the result of your training.

> The message was pure spam.

It had some indications for ham, though. IF you fix this,
the same message is probably seen differently. Try it!

> If the junk headers
> were just "noise" I wouldn't care, as bogofilter wouldn't care either.

Right, this is not noise, there is significance to those
observations.

pi




More information about the Bogofilter mailing list