understanding bogofilter
Javier Castillo Alcibar
javier.castillo at euroview-spain.com
Tue May 6 17:33:24 CEST 2003
Hi all,
One I still don't understand is, why are message headers so important??. In fact, I think that is better to ignore them....
Look at this: This is an spam email that bogofilter consider as not spam, because typical non spam words, as "smtp", "mime-version", etc....are included in my goodlist.db:
linux-list:~# bogofilter -vvv < /var/mail/spamfilter
X-Bogosity: No, tests=bogofilter, spamicity=0.500001, version=0.12.2
n pgood pbad fw U
"194.69.248.103" 38 0.481481 0.002308 0.007887 +
"194.69.248.5" 38 0.481481 0.002308 0.007887 +
"alhsybarigw.alhantigengw.local" 38 0.481481 0.002308 0.007887 +
"smtpsvc" 38 0.481481 0.002308 0.007887 +
"smtp" 84 0.703704 0.006001 0.009860 +
"isp1.alhsys.es" 86 0.666667 0.006278 0.010699 +
"esmtp" 36 0.222222 0.002770 0.015557 +
"194.69.248.2" 41 0.222222 0.003231 0.017175 +
"debian" 41 0.222222 0.003231 0.017175 +
"dns1.alhsys.es" 36 0.185185 0.002862 0.018450 +
"rfc822" 49 0.185185 0.004062 0.023816 +
"mime-version" 9668 0.518519 0.891248 0.632195 -
"multipart" 4979 0.259259 0.459010 0.639049 -
"subject" 10833 0.555556 0.998708 0.642559 -
"x-priority" 4531 0.111111 0.418021 0.790004 +
"receive" 1611 0.037037 0.148634 0.800498 +
"return-path" 8372 0.185185 0.772434 0.806614 +
"non-accredited" 79 0.000000 0.007293 0.998990 +
"prestigious" 79 0.000000 0.007293 0.998990 +
"universities" 79 0.000000 0.007293 0.998990 +
"confidentiality" 83 0.000000 0.007662 0.999038 +
"classes" 85 0.000000 0.007847 0.999061 +
"interviews" 87 0.000000 0.008032 0.999083 +
"jmartine" 97 0.000000 0.008955 0.999177 +
"turned" 98 0.000000 0.009047 0.999185 +
"earning" 111 0.000000 0.010247 0.999281 +
"including" 117 0.000000 0.010801 0.999317 +
"power" 187 0.000000 0.017264 0.999573 +
"experience" 270 0.000000 0.024926 0.999704 +
"call" 335 0.000000 0.030927 0.999761 +
"marketing" 406 0.000000 0.037482 0.999803 +
"life" 530 0.000000 0.048929 0.999849 +
"required" 629 0.000000 0.058069 0.999873 +
"money" 806 0.000000 0.074409 0.999901 +
"aol.com" 842 0.000000 0.077733 0.999905 +
N_P_Q_S_s_x_md 94 0.00e+00 1.65e-06 5.00e-01
2.00e-01 6.00e-01 0.200
Am I wrong?? How can I avoid to include these typical headers in my goodlist??
Best regards.
Javier.
-----Mensaje original-----
De: Tony L. Svanstrom [mailto:tony at svanstrom.com]
Enviado el: martes, 06 de mayo de 2003 17:11
Para: Simon Huggins
CC: Bogofilter
Asunto: Re: understanding bogofilter
On Tue, 6 May 2003 the voices made Simon Huggins write:
SH> Hiya Bogofilter,
SH>
SH> On Tue, May 06, 2003 at 08:04:26AM -0400, David Relson wrote:
SH> > In particular message/rfc822 isn't handled. Using a perl script
SH> > to extract the forwarded message would be a good solution.
SH>
SH> > P.S. If anyone wants to write the perl script, there's room in
SH> > the bogofilter/contrib directory.
SH>
SH> This was actually very trivial to do (the code is smaller than the
SH> POD and the copyright boilerplate stuff).
SH>
SH> I've attached something you might like to include if you think it
SH> will be of any use to people.
SH>
SH> It just extracts all message/rfc822 parts from a message and dumps
SH> them on STDOUT.
Just remember that MIME::Parser isn't available for everyone that might want to use this script, and that if people want to add this to their procmailrc it might mess things up if it extracts every rfc822-part and not only the first level.
--
/\___/\ /\___/\
\_@ @_/ \_@ @_/
+--oOO-(_)-OOo------------------------------------------oOO-(_)-OOo--+
| Per scientiam ad libertatem! // Through knowledge towards freedom! | +---ôôô---ôôô--------------------------------------------ôôô---ôôô---+
\O/ \O/ (c)1998-2003 tony at svanstrom.com \O/ \O/
More information about the Bogofilter
mailing list