Question
Stephen Davies
scldad at sdc.com.au
Wed May 20 07:05:55 CEST 2009
Here is an example.
The first two lines seem to have an out of proportion influence on the end
result.
After I feed this email through bogofilter -Sn, subsequent runs give a
spamicity of 1.
Cheers,
Steephen
-bash-3.2# bogofilter -vvv < spam3
X-Bogosity: Ham, tests=bogofilter, spamicity=0.776434, version=1.2.0
n pgood pbad fw U
"Cain" 2 0.000146 0.000000 0.004587 +
"www.lufriek.com" 1 0.000073 0.000000 0.009094 +
"May" 16446 0.209690 0.042807 0.169536 -
"amp" 9852 0.106958 0.026452 0.198274 -
"scldad" 41942 0.439417 0.113300 0.204987 -
"Issue" 5361 0.052022 0.014661 0.219867 -
"Capell" 39 0.000291 0.000110 0.274895 -
"here.Copyright" 39 0.000291 0.000110 0.274895 -
"here.To" 39 0.000291 0.000110 0.274895 -
"newsletter.Issue" 39 0.000291 0.000110 0.274895 -
"sdc.com.au" 70468 0.494572 0.200911 0.288880 -
"Bill" 1077 0.006922 0.003098 0.309210 -
"remove" 6423 0.040656 0.018504 0.312783 -
"view" 12837 0.076138 0.037204 0.328244 -
"subj:incredible" 94 0.000510 0.000274 0.349917 -
"subj:diet" 904 0.004809 0.002644 0.354763 -
"rcvd:May" 72404 0.369472 0.212436 0.365069 -
"Info" 2046 0.009763 0.006032 0.381904 -
"mime:iso-8859-1" 46835 0.220765 0.138205 0.385005 -
"subj:loss" 1388 0.006412 0.004102 0.390133 -
"head:V6.00.2900.2180" 57798 0.248962 0.171573 0.407987 -
"message" 165183 0.693770 0.491112 0.414482 -
"This" 191332 0.796284 0.569174 0.416837 -
"Berry" 3636 0.014718 0.010834 0.424011 -
"subj:weight" 5821 0.023242 0.017359 0.427547 -
"rcvd:from" 181290 0.707322 0.541343 0.433537 -
"rcvd:Wed" 29851 0.112641 0.089303 0.442215 -
"subj:will" 9572 0.035337 0.028670 0.447917 -
"subj:lead" 62 0.000219 0.000186 0.459949 -
"head:Content-Type" 206432 0.645537 0.623342 0.491254 -
"Try" 5433 0.016976 0.016406 0.491459 -
"head:Date" 219291 0.673880 0.662685 0.495812 -
"Dr's" 167 0.000510 0.000505 0.497432 -
"mime:quoted-printable" 114172 0.333770 0.345761 0.508823 -
"mime:plain" 130170 0.380474 0.394212 0.508867 -
"mime:Content-Type" 133617 0.390310 0.404662 0.509027 -
"mime:charset" 130277 0.380255 0.394559 0.509231 -
"mime:Content-Transfer-Encoding" 131330 0.378652 0.397951 0.512425 -
"mime:text" 134505 0.382222 0.407814 0.516196 -
"head:multipart" 136306 0.387322 0.413275 0.516208 -
"privacy" 12380 0.035118 0.037538 0.516653 -
"mime:html" 130326 0.365392 0.395358 0.519695 -
"Berry.More" 0 0.000000 0.000000 0.520000 -
"Oprah's" 0 0.000000 0.000000 0.520000 -
"from:friscoplano" 0 0.000000 0.000000 0.520000 -
"from:marshallmn.com" 0 0.000000 0.000000 0.520000 -
"goxjvuqvofxj" 0 0.000000 0.000000 0.520000 -
"head:From" 0 -------- -------- 0.520000 i
"head:May" 0 -------- -------- 0.520000 i
"head:OZb" 0 0.000000 0.000000 0.520000 -
"head:Status" 0 -------- -------- 0.520000 i
"head:Wed" 0 -------- -------- 0.520000 i
"head:X-KMail-EncryptionState" 0 -------- -------- 0.520000 i
"head:X-KMail-MDN-Sent" 0 -------- -------- 0.520000 i
"head:X-KMail-SignatureState" 0 -------- -------- 0.520000 i
"head:X-Status" 0 -------- -------- 0.520000 i
"head:X-Virus-Scanned" 0 -------- -------- 0.520000 i
"head:amavisd-new" 0 -------- -------- 0.520000 i
"head:friscoplano" 0 0.000000 0.000000 0.520000 -
"head:marshallmn.com" 0 0.000000 0.000000 0.520000 -
"head:sdc.com.au" 0 -------- -------- 0.520000 i
"rcvd:ESMTP" 0 -------- -------- 0.520000 i
"rcvd:mustang.sdc.com.au" 0 -------- -------- 0.520000 i
"rtrn:friscoplano" 0 0.000000 0.000000 0.520000 -
"rtrn:marshallmn.com" 0 0.000000 0.000000 0.520000 -
"url:78.171.200" 0 0.000000 0.000000 0.520000 -
"url:78.171.200.136" 0 0.000000 0.000000 0.520000 -
"weightlos" 0 0.000000 0.000000 0.520000 -
"secret" 4412 0.011803 0.013409 0.531840 -
"unsubscribe" 23370 0.061494 0.071070 0.536119 -
"format" 153476 0.388415 0.467399 0.546145 -
"unsubscribe.php" 289 0.000729 0.000880 0.547129 -
"head:alternative" 112408 0.276648 0.342669 0.553301 -
"head:X-MimeOLE" 148448 0.362987 0.452637 0.554958 -
"head:Produced" 149024 0.363716 0.454423 0.555435 -
"head:Microsoft" 150258 0.364517 0.458281 0.556979 -
"head:MIME-Version" 197494 0.478106 0.602393 0.557514 -
"head:Express" 119544 0.289253 0.364637 0.557643 -
"head:MimeOLE" 148457 0.357158 0.452918 0.559105 -
"head:Message-ID" 193077 0.457413 0.589353 0.563022 -
"here" 59764 0.140765 0.182461 0.564499 -
"head:X-Mailer" 178652 0.416466 0.545615 0.567119 -
"head:X-MSMail-Priority" 127712 0.297195 0.390064 0.567565 -
"head:Normal" 158595 0.367140 0.484471 0.568888 -
"click" 34323 0.078033 0.104910 0.573459 -
"head:X-Priority" 162488 0.367067 0.496757 0.575067 -
"MIME" 147671 0.331949 0.451530 0.576314 -
"url:78" 4726 0.010492 0.014456 0.579454 -
"url:78.171" 99 0.000219 0.000303 0.580822 -
"head:Outlook" 139700 0.302732 0.427646 0.585513 -
"multi-part" 144920 0.302878 0.444109 0.594534 -
"policy" 6454 0.013260 0.019788 0.598759 -
"mail" 15626 0.031840 0.047921 0.600812 -
"Arial" 108047 0.207577 0.331901 0.615226 -
"face" 132373 0.223024 0.407981 0.646558 -
"More" 17609 0.029508 0.054279 0.647819 -
"our" 82451 0.135956 0.254247 0.651575 -
"Doctor" 4349 0.006922 0.013421 0.659753 -
"href" 184887 0.286849 0.570899 0.665579 -
"size" 145426 0.214062 0.449551 0.677430 -
"http" 272478 0.388124 0.842865 0.684706 -
"to:sdc.com.au" 331278 0.466521 1.024985 0.687215 -
"color" 96446 0.129253 0.298691 0.697967 -
"Pharmacy" 4574 0.004736 0.014226 0.750240 -
"Copyright" 5097 0.005246 0.015854 0.751376 -
"newsletter" 10094 0.009326 0.031443 0.771246 -
"Acai" 17439 0.005537 0.054780 0.908197 +
"subj:Acai" 57043 0.006557 0.179687 0.964791 +
"to:scldad" 160570 0.012095 0.506077 0.976659 +
"rcvd:with" 129609 0.000291 0.408905 0.999288 +
"from:Cain" 34 0.000000 0.000107 0.999749 +
"head:j!!" 161 0.000000 0.000508 0.999947 +
"rcvd:forged" 20182 0.000000 0.063674 1.000000 +
"rcvd:may" 20183 0.000000 0.063678 1.000000 +
"head:X-UIDL" 45032 0.000000 0.142077 1.000000 +
N_P_Q_S_s_x_md 11 0.000000 0.552868 0.776434
0.017800 0.520000 0.375000
On Wednesday 20 May 2009 11:57:22 RW wrote:
> On Wed, 20 May 2009 10:10:37 +0930
>
> Stephen Davies <scldad at sdc.com.au> wrote:
> > > Is it possible to configure bogofilter so that it ignore tokens
> > > with an occurrence count less than N?
> > >
> > > I am seeing many obvious spams where tokens with an occurrence
> > > count of one are outweighing other tokens with big occurrence
> > > counts, big bad counts and zero good counts.
>
> Do you have an example of this? The example you gave before involved
> previous miss-training where you had learned some obviously spammy
> tokens as ham - including a Mexican IP address that's in Spamhaus XBL.
>
> > Sorry. I just realised that I have misrepresented my question.
> >
> > What I should have asked is whether it is possible to ignore (for the
> > calculation of total spamicity) tokens that have never been seen
> > before. That is, with occurrence count of zero.
>
> _______________________________________________
> Bogofilter mailing list
> Bogofilter at bogofilter.org
> http://www.bogofilter.org/mailman/listinfo/bogofilter
--
=============================================================================
Stephen Davies Consulting P/L Voice: 08-8177 1595
Adelaide, South Australia. Fax : 08-8177 0133
Computing & Network solutions. Mobile:040 304 0583
VoIP:sip:1132210 at sip1.bbpglobal.com
More information about the Bogofilter
mailing list