Ignore lists
Tom Anderson
tanderso at oac-design.com
Thu Mar 4 05:47:09 CET 2004
When I brought up this subject about two weeks ago in my "headers"
discussion, I advocated bogofilter ignoring certain headers and
emphasizing others. I've since changed my mind. Bogofilter is, and
should remain, a statistical filter without adhoc heuristics. Instead,
I now believe that any massaging of headers should be done prior to
bogofilter, such that bogofilter is just a highly tuned member of an
email assembly line which happens to include other steps. These other
steps may be SpamAssassin, virus scanners, procmail recipes, etc.
Bogofilter should not try to be the end-all-be-all solution for
filtering email, but just a tool that may be used toward that end. This
is the Unix way.
In this vein, I'm currently building a program which will strip out
x-headers, dates, etc., and add emphasis to important bits. This will
sit right in front of bogofilter. You could use SpamAssassin or other
rule-based filters in a similar way. But adding this directly to
bogofilter will just clutter the code and the purpose of the project.
That is the Microsoft way, which I assume we want to avoid. Let's push
feature creep into seperate sub-projects.
Tom
On Wed, 2004-03-03 at 19:46, Matthias Andree wrote:
> On Wed, 03 Mar 2004, David Relson wrote:
>
> > [relson at osage bogofilter]$ bogoutil -p $BOGOFILTER_DIR rcvd:Jan rcvd:Feb
> > rcvd:Mar rcvd:Apr rcvd:May rcvd:Jun rcvd:Jul rcvd:Aug rcvd:Sep
> > rcvd:Oct
> > rcvd:Nov rcvd:Dec
>
> adding "|sort -k4" is handy here to sort by Fisher value.
>
> Looks like your values are in the vicinity of 0.5. Not so in my data
> base - it's just historic coincidence I didn't have much spam to
> register from April or July (2003 FWIW) and the two spam mails may have
> been mails with bogus clock on some system (and yes this data base
> started service life in last fall and I seem to have had more spam than
> ham from September:
>
> spam good Fisher
> rcvd:Apr 1 354 0.006887
> rcvd:Jul 1 340 0.007168
> rcvd:May 12 307 0.087427
> rcvd:Jun 15 316 0.104215
> rcvd:Mar 28 438 0.135449
> rcvd:Jan 117 758 0.274451
> rcvd:Feb 105 547 0.319921
> rcvd:Nov 109 443 0.376162
> rcvd:Aug 68 217 0.434369
> rcvd:Dec 162 325 0.549860
> rcvd:Oct 906 1068 0.675208
> rcvd:Sep 783 321 0.856683
>
> I wonder if this reflects spam criteria :-)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20040303/964feac5/attachment.sig>
More information about the Bogofilter
mailing list