headers - example

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Tue Mar 9 08:43:44 CET 2004


Jozef Hitzinger <hitzinger at phobos.fphil.uniba.sk> wrote:

>> Let's have a look. There are two IP-addresses. The were
>> found in the body, so they don't count here.
>
>False. Of course they're in headers, it's ip addresses in the Received:
>lines, they match the two domain names.

Then you must use a really old version of bogofilter, since
all recent versions tag *all* header tags.

>> There are three hostnames in rcvd; one was seen pretty often in ham, not
>> at all in spam, that will have some reason, so it is correct to use it
>> as it is;
>
>Wrong again. Just because I got more ham than spam from a certain ISP
>doesn't mean I should be softer on spam coming from him .. 

Just because you get $goodword in more ham than spam doesn't
mean you should be softer on spam with it.

>but that's what bogofilter does, when trained with headers.

That is the whole point of bogofilter. Observer, learn,
repeat parrot fashion without the slightest idea what it
means.

>> two have been seen only once. Also this might have a good reason. Next
>> is the Hotmail thing, so you seem to get a log legitimate mail with this
>> in To; what's wrong with that observation?
>
>Wrong is that we shouldn't bias _either_way_ based on where the heck it
>came from; only the message is important.

Headers are part of the message and they are highly
significant. As I sais already, some people work only with
headers (much faster!) and perferorm excellent.

If you don't like it, formail is your friend (or in procmail
feed only bodies). Matthias will likely explain to you how
to do it in maildrop>;->

>> > They come from headers. Because I trained on full messages, including
>> > headers (current recommended way), they're in.
>>
>> I don't recommend this;-)
>
>I beg your pardon, is it not you, who oposses any mention of stripping
>headers before passing mails to bogofilter?

I thought you mean full training.

>> > So I agree with you, this reflects my training. But I don't agree with
>> > "certain constelations".
>>
>> This is just the result of your training.
>
>My training is as fine as it can be. 8067 hams, 11848 spams, bogoutil -H:

That says you training is simply not sufficient to capture
one particular spam. I haven't heard anything reliable from
anybody not having false negatives.

>> > The message was pure spam.
>>
>> It had some indications for ham, though. IF you fix this,
>> the same message is probably seen differently. Try it!
>
>What would you do to "fix" it? I don't understand what you mean.

Train it as spam. Untrain as ham if you used -u.

>> > If the junk headers were just "noise" I wouldn't care, as bogofilter
>> > wouldn't care either.
>>
>> Right, this is not noise, there is significance to those
>> observations.
>
>No! There's _information_ in these tokens, and that information steers
>bogofilter the wrong way.

And many more in the right diretion.

pi




More information about the Bogofilter mailing list