headers - example

Jozef Hitzinger hitzinger at phobos.fphil.uniba.sk
Tue Mar 9 08:21:30 CET 2004


On Mon, 8 Mar 2004, Boris 'pi' Piwinger wrote:

> Let's have a look. There are two IP-addresses. The were
> found in the body, so they don't count here.

False. Of course they're in headers, it's ip addresses in the Received:
lines, they match the two domain names.

> There are three hostnames in rcvd; one was seen pretty often in ham, not
> at all in spam, that will have some reason, so it is correct to use it
> as it is;

Wrong again. Just because I got more ham than spam from a certain ISP
doesn't mean I should be softer on spam coming from him .. but that's what
bogofilter does, when trained with headers.

> two have been seen only once. Also this might have a good reason. Next
> is the Hotmail thing, so you seem to get a log legitimate mail with this
> in To; what's wrong with that observation?

Wrong is that we shouldn't bias _either_way_ based on where the heck it
came from; only the message is important.

> > They come from headers. Because I trained on full messages, including
> > headers (current recommended way), they're in.
>
> I don't recommend this;-)

I beg your pardon, is it not you, who oposses any mention of stripping
headers before passing mails to bogofilter?

> > So I agree with you, this reflects my training. But I don't agree with
> > "certain constelations".
>
> This is just the result of your training.

My training is as fine as it can be. 8067 hams, 11848 spams, bogoutil -H:

hapaxes:  ham  151118 (32.15%), spam  140149 (29.82%)
   pure:  ham  238036 (50.64%), spam  204481 (43.51%)


> > The message was pure spam.
>
> It had some indications for ham, though. IF you fix this,
> the same message is probably seen differently. Try it!

What would you do to "fix" it? I don't understand what you mean.

> > If the junk headers were just "noise" I wouldn't care, as bogofilter
> > wouldn't care either.
>
> Right, this is not noise, there is significance to those
> observations.

No! There's _information_ in these tokens, and that information steers
bogofilter the wrong way.

-- 
jozef  :-)




More information about the Bogofilter mailing list