tuning [was: understanding bogofilter]

David Relson relson at osagesoftware.com
Tue May 6 18:01:11 CEST 2003


Javier,

Bogofilter generally looks at the whole message - header and body.  Using 
procmail, you can give bogofilter the header and/or the body of the 
message.  Bogofilter has been working nicely for me using the whole message 
(header and body).  Over time, as it gets trained, it learns what domains, 
ip addresses, etc are spammish and which are good.  That information, along 
with the words in the message, contributes to the message's 
score.  Generally, the more information bogofilter has to work with, the 
better it will do its job.

By the way, looking at the U column it appears that you're using a min_dev 
different from the default value of 0.1.  When doing that, it's appropriate 
to also change the spam_cutoff value.  You'll have to do some testing to 
determine the optimal values for the mail at _your_ location.  For more 
information, look in the bogofilter/tuning directory.

David

Note:  The bogofilter/tuning directory is new with version 0.12

At 11:33 AM 5/6/03, Javier Castillo Alcibar wrote:


>   Hi all,
>
>   One I still don't understand is, why are message headers so 
> important??. In fact, I think that is better to ignore them....
>
>   Look at this: This is an spam email that bogofilter consider as not 
> spam, because typical non spam words, as "smtp", "mime-version", 
> etc....are included in my goodlist.db:
>
>
>linux-list:~# bogofilter -vvv < /var/mail/spamfilter
>X-Bogosity: No, tests=bogofilter, spamicity=0.500001, version=0.12.2
>                                      n    pgood     pbad      fw     U
>"194.69.248.103"                    38  0.481481  0.002308  0.007887 +
>"194.69.248.5"                      38  0.481481  0.002308  0.007887 +
>"alhsybarigw.alhantigengw.local"     38  0.481481  0.002308  0.007887 +
>"smtpsvc"                           38  0.481481  0.002308  0.007887 +
>"smtp"                              84  0.703704  0.006001  0.009860 +
>"isp1.alhsys.es"                    86  0.666667  0.006278  0.010699 +
>"esmtp"                             36  0.222222  0.002770  0.015557 +
>"194.69.248.2"                      41  0.222222  0.003231  0.017175 +
>"debian"                            41  0.222222  0.003231  0.017175 +
>"dns1.alhsys.es"                    36  0.185185  0.002862  0.018450 +
>"rfc822"                            49  0.185185  0.004062  0.023816 +
>"mime-version"                    9668  0.518519  0.891248  0.632195 -
>"multipart"                       4979  0.259259  0.459010  0.639049 -
>"subject"                        10833  0.555556  0.998708  0.642559 -
>"x-priority"                      4531  0.111111  0.418021  0.790004 +
>"receive"                         1611  0.037037  0.148634  0.800498 +
>"return-path"                     8372  0.185185  0.772434  0.806614 +
>"non-accredited"                    79  0.000000  0.007293  0.998990 +
>"prestigious"                       79  0.000000  0.007293  0.998990 +
>"universities"                      79  0.000000  0.007293  0.998990 +
>"confidentiality"                   83  0.000000  0.007662  0.999038 +
>"classes"                           85  0.000000  0.007847  0.999061 +
>"interviews"                        87  0.000000  0.008032  0.999083 +
>"jmartine"                          97  0.000000  0.008955  0.999177 +
>"turned"                            98  0.000000  0.009047  0.999185 +
>"earning"                          111  0.000000  0.010247  0.999281 +
>"including"                        117  0.000000  0.010801  0.999317 +
>"power"                            187  0.000000  0.017264  0.999573 +
>"experience"                       270  0.000000  0.024926  0.999704 +
>"call"                             335  0.000000  0.030927  0.999761 +
>"marketing"                        406  0.000000  0.037482  0.999803 +
>"life"                             530  0.000000  0.048929  0.999849 +
>"required"                         629  0.000000  0.058069  0.999873 +
>"money"                            806  0.000000  0.074409  0.999901 +
>"aol.com"                          842  0.000000  0.077733  0.999905 +
>N_P_Q_S_s_x_md                      94  0.00e+00  1.65e-06  5.00e-01
>                                         2.00e-01  6.00e-01  0.200
>
>
>    Am I wrong?? How can I avoid to include these typical headers in my 
> goodlist??
>
>    Best regards.
>    Javier.
>
>
>-----Mensaje original-----
>De: Tony L. Svanstrom [mailto:tony at svanstrom.com]
>Enviado el: martes, 06 de mayo de 2003 17:11
>Para: Simon Huggins
>CC: Bogofilter
>Asunto: Re: understanding bogofilter
>
>
>On Tue, 6 May 2003 the voices made Simon Huggins write:
>
>SH> Hiya Bogofilter,
>SH>
>SH> On Tue, May 06, 2003 at 08:04:26AM -0400, David Relson wrote:
>SH> > In particular message/rfc822 isn't handled.  Using a perl script
>SH> > to extract the forwarded message would be a good solution.
>SH>
>SH> > P.S.  If anyone wants to write the perl script, there's room in
>SH> > the bogofilter/contrib directory.
>SH>
>SH> This was actually very trivial to do (the code is smaller than the
>SH> POD and the copyright boilerplate stuff).
>SH>
>SH> I've attached something you might like to include if you think it
>SH> will be of any use to people.
>SH>
>SH> It just extracts all message/rfc822 parts from a message and dumps
>SH> them on STDOUT.
>
>  Just remember that MIME::Parser isn't available for everyone that might 
> want to use this script, and that if people want to add this to their 
> procmailrc it might mess things up if it extracts every rfc822-part and 
> not only the first level.
>
>
>--
>       /\___/\                                              /\___/\
>       \_@ @_/                                              \_@ @_/
>  +--oOO-(_)-OOo------------------------------------------oOO-(_)-OOo--+
>  | Per scientiam ad libertatem! // Through knowledge towards freedom! 
> |  +---ôôô---ôôô--------------------------------------------ôôô---ôôô---+
>      \O/   \O/      (c)1998-2003  tony at svanstrom.com      \O/   \O/
>
>
>---------------------------------------------------------------------
>FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
>To unsubscribe, e-mail: bogofilter-unsubscribe at aotto.com
>For summary digest subscription: bogofilter-digest-subscribe at aotto.com
>For more commands, e-mail: bogofilter-help at aotto.com
>
>
>---------------------------------------------------------------------
>FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
>To unsubscribe, e-mail: bogofilter-unsubscribe at aotto.com
>For summary digest subscription: bogofilter-digest-subscribe at aotto.com
>For more commands, e-mail: bogofilter-help at aotto.com





More information about the Bogofilter mailing list