understanding bogofilter

Javier Castillo Alcibar javier.castillo at euroview-spain.com
Tue May 6 17:33:24 CEST 2003


  Hi all,

  One I still don't understand is, why are message headers so important??. In fact, I think that is better to ignore them....

  Look at this: This is an spam email that bogofilter consider as not spam, because typical non spam words, as "smtp", "mime-version", etc....are included in my goodlist.db:


linux-list:~# bogofilter -vvv < /var/mail/spamfilter 
X-Bogosity: No, tests=bogofilter, spamicity=0.500001, version=0.12.2
                                     n    pgood     pbad      fw     U
"194.69.248.103"                    38  0.481481  0.002308  0.007887 +
"194.69.248.5"                      38  0.481481  0.002308  0.007887 +
"alhsybarigw.alhantigengw.local"     38  0.481481  0.002308  0.007887 +
"smtpsvc"                           38  0.481481  0.002308  0.007887 +
"smtp"                              84  0.703704  0.006001  0.009860 +
"isp1.alhsys.es"                    86  0.666667  0.006278  0.010699 +
"esmtp"                             36  0.222222  0.002770  0.015557 +
"194.69.248.2"                      41  0.222222  0.003231  0.017175 +
"debian"                            41  0.222222  0.003231  0.017175 +
"dns1.alhsys.es"                    36  0.185185  0.002862  0.018450 +
"rfc822"                            49  0.185185  0.004062  0.023816 +
"mime-version"                    9668  0.518519  0.891248  0.632195 -
"multipart"                       4979  0.259259  0.459010  0.639049 -
"subject"                        10833  0.555556  0.998708  0.642559 -
"x-priority"                      4531  0.111111  0.418021  0.790004 +
"receive"                         1611  0.037037  0.148634  0.800498 +
"return-path"                     8372  0.185185  0.772434  0.806614 +
"non-accredited"                    79  0.000000  0.007293  0.998990 +
"prestigious"                       79  0.000000  0.007293  0.998990 +
"universities"                      79  0.000000  0.007293  0.998990 +
"confidentiality"                   83  0.000000  0.007662  0.999038 +
"classes"                           85  0.000000  0.007847  0.999061 +
"interviews"                        87  0.000000  0.008032  0.999083 +
"jmartine"                          97  0.000000  0.008955  0.999177 +
"turned"                            98  0.000000  0.009047  0.999185 +
"earning"                          111  0.000000  0.010247  0.999281 +
"including"                        117  0.000000  0.010801  0.999317 +
"power"                            187  0.000000  0.017264  0.999573 +
"experience"                       270  0.000000  0.024926  0.999704 +
"call"                             335  0.000000  0.030927  0.999761 +
"marketing"                        406  0.000000  0.037482  0.999803 +
"life"                             530  0.000000  0.048929  0.999849 +
"required"                         629  0.000000  0.058069  0.999873 +
"money"                            806  0.000000  0.074409  0.999901 +
"aol.com"                          842  0.000000  0.077733  0.999905 +
N_P_Q_S_s_x_md                      94  0.00e+00  1.65e-06  5.00e-01
                                        2.00e-01  6.00e-01  0.200


   Am I wrong?? How can I avoid to include these typical headers in my goodlist??

   Best regards.
   Javier.


-----Mensaje original-----
De: Tony L. Svanstrom [mailto:tony at svanstrom.com] 
Enviado el: martes, 06 de mayo de 2003 17:11
Para: Simon Huggins
CC: Bogofilter
Asunto: Re: understanding bogofilter


On Tue, 6 May 2003 the voices made Simon Huggins write:

SH> Hiya Bogofilter,
SH>
SH> On Tue, May 06, 2003 at 08:04:26AM -0400, David Relson wrote:
SH> > In particular message/rfc822 isn't handled.  Using a perl script 
SH> > to extract the forwarded message would be a good solution.
SH>
SH> > P.S.  If anyone wants to write the perl script, there's room in 
SH> > the bogofilter/contrib directory.
SH>
SH> This was actually very trivial to do (the code is smaller than the 
SH> POD and the copyright boilerplate stuff).
SH>
SH> I've attached something you might like to include if you think it 
SH> will be of any use to people.
SH>
SH> It just extracts all message/rfc822 parts from a message and dumps 
SH> them on STDOUT.

 Just remember that MIME::Parser isn't available for everyone that might want to use this script, and that if people want to add this to their procmailrc it might mess things up if it extracts every rfc822-part and not only the first level.


-- 
      /\___/\                                              /\___/\
      \_@ @_/                                              \_@ @_/
 +--oOO-(_)-OOo------------------------------------------oOO-(_)-OOo--+
 | Per scientiam ad libertatem! // Through knowledge towards freedom! |  +---ôôô---ôôô--------------------------------------------ôôô---ôôô---+
     \O/   \O/      (c)1998-2003  tony at svanstrom.com      \O/   \O/







More information about the Bogofilter mailing list