headers - example

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Mon Mar 8 15:06:31 CET 2004


Tom Anderson wrote:

>> > "195.80.171.24"                     53  0.006570  0.000000  0.000074 +
>> > "rcvd:mail.slovanet.sk"             52  0.006446  0.000000  0.000075 +
>> > "212.55.234.133"                     1  0.000124  0.000000  0.003877 +
>> > "rcvd:mtx1.www.ematrix.sk"           1  0.000124  0.000000  0.003877 +
>> > "rcvd:proxy.ematrix.sk"              1  0.000124  0.000000  0.003877 +
>> > "to:hotmail.com"                   266  0.029999  0.002026  0.063266 +
>> > "head:UTC"                         661  0.061609  0.013842  0.183460 +
> 
>> Since you consider sources so important, create whitelists and
>> blacklists.
>> 
>> If a message was "pure spam" that's how bogofilter would classify it.
>> Your message included messages that you've used in ham training, else
>> bogofilter wouldn't classify them as ham.  Sounds like additional
>> training is needed.  Bogofilter needs sufficient information to to a
>> good job, and that doesn't happen overnight.
> 
> "hotmail.com" should not show up individually in the "to:" token.  It
> should be a full email address. 

You can change your lexer to allow @ if you like.

> This way bogofilter would automatically
> maintain a white/black list (weighted) for you.  The domain alone should
> not be a strong indicator,

But it often is. hotmail.com is an example which is used in
longer lists in To by spammers, be it real addresses or not.

> as you may receive a large majority of your
> hams from domains such as hotmail and yahoo and aol, and only occasional
> spams from these same sources. 

If that is how it is, then this is how it is. Nothing wrong
there. For example I do get mail under different addresses,
some of which I don't actively use for some years now. So it
is a strong spam indicator if I receive mail with that address.

> The fact that it came from that same
> email service as some of your friends use does not predict that it is
> ham.  However, if you left the "@" out of the list of splitting
> characters (include it in valid token characters), then a full email
> address is fairly indicative of ham or spam (assuming the to: is not the
> same as the from:).  "friend at hotmail.com" will be hammy while
> "spammer at hotmail.com" will be spammy. 

You'll have the local parts anyway.

> Meanwhile, "friend at yahoo.com" may
> also be a spammer, so shouldn't receive special recognition just because
> they use the same handle as "friend at hotmail.com".  Hotmail and yahoo
> should remain neutral, as the domain should not be registered or
> classified seperately.  Do you see the logic of this?

I don't. I just observe the opposite. As you know my lexer
is much stricter, it even splits up hotmail.com into two
tokens, but this still works great. I am wondering, if I
should allow . in the middle of tokens again, which would
also make the IP-rule unneeded for me (probably breaking the
rule to catch subnets, but I don't ming;-).

pi




More information about the Bogofilter mailing list