headers - example

David Relson relson at osagesoftware.com
Mon Mar 8 13:44:38 CET 2004


On Mon, 8 Mar 2004 10:46:01 +0100 (CET)
Jozef Hitzinger wrote:

> On Mon, 8 Mar 2004, Boris 'pi' Piwinger wrote:
> 
> > I cannot see your point. Your output just reflects what your
> > training shows. Certain constellations seem to be more unlikely in
> > spam. That is normal and intended.
> 
> My point was to demonstrate what I was arguing previously (headers
> except Subject should not go into db):
> 
> "195.80.171.24"                     53  0.006570  0.000000  0.000074 +
> "rcvd:mail.slovanet.sk"             52  0.006446  0.000000  0.000075 +
> "212.55.234.133"                     1  0.000124  0.000000  0.003877 +
> "rcvd:mtx1.www.ematrix.sk"           1  0.000124  0.000000  0.003877 +
> "rcvd:proxy.ematrix.sk"              1  0.000124  0.000000  0.003877 +
> "to:hotmail.com"                   266  0.029999  0.002026  0.063266 +
> "head:UTC"                         661  0.061609  0.013842  0.183460 +
> 
> are neither "hammy" or "spammy" in nature. Yet they are the only on
> the hammy side of this message. How did they got there? They come from
> headers. Because I trained on full messages, including headers
> (current recommended way), they're in.
> 
> So I agree with you, this reflects my training. But I don't agree with
> "certain constelations". The message was pure spam. If the junk
> headers were just "noise" I wouldn't care, as bogofilter wouldn't care
> either. But it's not noise, it discriminates between sources of spam,
> allowing the new or less potent sources to get through.
> 
> In this case it was due to comming from sources (or "buckets") we
> could label "mail.slovanet.sk" "hotmail.com" and "UTC"

Josef,

Since you consider sources so important, create whitelists and
blacklists.

If a message was "pure spam" that's how bogofilter would classify it.
Your message included messages that you've used in ham training, else
bogofilter wouldn't classify them as ham.  Sounds like additional
training is needed.  Bogofilter needs sufficient information to to a
good job, and that doesn't happen overnight.

David




More information about the Bogofilter mailing list