headers - example
Jozef Hitzinger
hitzinger at phobos.fphil.uniba.sk
Tue Mar 9 08:21:30 CET 2004
On Mon, 8 Mar 2004, Boris 'pi' Piwinger wrote:
> Let's have a look. There are two IP-addresses. The were
> found in the body, so they don't count here.
False. Of course they're in headers, it's ip addresses in the Received:
lines, they match the two domain names.
> There are three hostnames in rcvd; one was seen pretty often in ham, not
> at all in spam, that will have some reason, so it is correct to use it
> as it is;
Wrong again. Just because I got more ham than spam from a certain ISP
doesn't mean I should be softer on spam coming from him .. but that's what
bogofilter does, when trained with headers.
> two have been seen only once. Also this might have a good reason. Next
> is the Hotmail thing, so you seem to get a log legitimate mail with this
> in To; what's wrong with that observation?
Wrong is that we shouldn't bias _either_way_ based on where the heck it
came from; only the message is important.
> > They come from headers. Because I trained on full messages, including
> > headers (current recommended way), they're in.
>
> I don't recommend this;-)
I beg your pardon, is it not you, who oposses any mention of stripping
headers before passing mails to bogofilter?
> > So I agree with you, this reflects my training. But I don't agree with
> > "certain constelations".
>
> This is just the result of your training.
My training is as fine as it can be. 8067 hams, 11848 spams, bogoutil -H:
hapaxes: ham 151118 (32.15%), spam 140149 (29.82%)
pure: ham 238036 (50.64%), spam 204481 (43.51%)
> > The message was pure spam.
>
> It had some indications for ham, though. IF you fix this,
> the same message is probably seen differently. Try it!
What would you do to "fix" it? I don't understand what you mean.
> > If the junk headers were just "noise" I wouldn't care, as bogofilter
> > wouldn't care either.
>
> Right, this is not noise, there is significance to those
> observations.
No! There's _information_ in these tokens, and that information steers
bogofilter the wrong way.
--
jozef :-)
More information about the Bogofilter
mailing list