headers - example

David Relson relson at osagesoftware.com
Tue Mar 9 13:15:25 CET 2004


On Tue, 9 Mar 2004 08:34:30 +0100 (CET)
Jozef Hitzinger wrote:

> On Mon, 8 Mar 2004, David Relson wrote:
> 
> > Since you consider sources so important, create whitelists and
> > blacklists.
> 
> No .. important thing is to _not_ use the sources. To _not_ let the
> information on where it came from damage bogofilter abilities. But if
> you let it train on headers, this information gets in.
> 
> > If a message was "pure spam" that's how bogofilter would classify
> > it. Your message included messages that you've used in ham training,
> > else bogofilter wouldn't classify them as ham.  Sounds like
> > additional training is needed.  Bogofilter needs sufficient
> > information to to a good job, and that doesn't happen overnight.
> 
> Again .. I've trained on full messages, i.e. with headers. Aditional
> training would not do any good, because I constantly get more ham than
> spam from there.
> 
> I'm trying to tell you all, that bogofilter can do _better_, if we
> drop the headers (except Subject), on both training and filtering. I
> find this much much better than changing the database on the fly by
> constant training and/or ad-hoc removing of tokens, as is suggested.

Our tests have shown that headers are useful and that it's useful to
provide special tagging for certain header lines.  If you want to not
use headers, that is your decision.  Using procmail and/or formail it's
easy to strip the headers before bogofilter sees it.

David




More information about the Bogofilter mailing list