Ignore lists

Tom Anderson tanderso at oac-design.com
Thu Mar 4 05:47:09 CET 2004


When I brought up this subject about two weeks ago in my "headers"
discussion, I advocated bogofilter ignoring certain headers and
emphasizing others.  I've since changed my mind.  Bogofilter is, and
should remain, a statistical filter without adhoc heuristics.  Instead,
I now believe that any massaging of headers should be done prior to
bogofilter, such that bogofilter is just a highly tuned member of an
email assembly line which happens to include other steps.  These other
steps may be SpamAssassin, virus scanners, procmail recipes, etc. 
Bogofilter should not try to be the end-all-be-all solution for
filtering email, but just a tool that may be used toward that end.  This
is the Unix way.

In this vein, I'm currently building a program which will strip out
x-headers, dates, etc., and add emphasis to important bits.  This will
sit right in front of bogofilter.  You could use SpamAssassin or other
rule-based filters in a similar way.  But adding this directly to
bogofilter will just clutter the code and the purpose of the project. 
That is the Microsoft way, which I assume we want to avoid.  Let's push
feature creep into seperate sub-projects.

Tom


On Wed, 2004-03-03 at 19:46, Matthias Andree wrote:
> On Wed, 03 Mar 2004, David Relson wrote:
> 
> > [relson at osage bogofilter]$ bogoutil -p $BOGOFILTER_DIR rcvd:Jan rcvd:Feb
> >    rcvd:Mar rcvd:Apr rcvd:May rcvd:Jun rcvd:Jul rcvd:Aug rcvd:Sep
> > rcvd:Oct
> >    rcvd:Nov rcvd:Dec
> 
> adding "|sort -k4" is handy here to sort by Fisher value.
> 
> Looks like your values are in the vicinity of 0.5. Not so in my data
> base - it's just historic coincidence I didn't have much spam to
> register from April or July (2003 FWIW) and the two spam mails may have
> been mails with bogus clock on some system (and yes this data base
> started service life in last fall and I seem to have had more spam than
> ham from September:
> 
>                                  spam    good    Fisher
> rcvd:Apr                            1     354  0.006887
> rcvd:Jul                            1     340  0.007168
> rcvd:May                           12     307  0.087427
> rcvd:Jun                           15     316  0.104215
> rcvd:Mar                           28     438  0.135449
> rcvd:Jan                          117     758  0.274451
> rcvd:Feb                          105     547  0.319921
> rcvd:Nov                          109     443  0.376162
> rcvd:Aug                           68     217  0.434369
> rcvd:Dec                          162     325  0.549860
> rcvd:Oct                          906    1068  0.675208
> rcvd:Sep                          783     321  0.856683
> 
> I wonder if this reflects spam criteria :-)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20040303/964feac5/attachment.sig>


More information about the Bogofilter mailing list