<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.2800.1400" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY>
<DIV><FONT face=Arial size=2>From: "Boris 'pi' Piwinger" <</FONT><A
href="mailto:3.14@piology.org"><FONT face=Arial
size=2>3.14@piology.org</FONT></A><FONT face=Arial size=2>></FONT></DIV>
<DIV><FONT face=Arial size=2>> I don't subscribe to this point of view. I
need the filter<BR>> to block unwanted mail of every kind. I do use a
virus</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>I concur. Bogofilter needs to be able to
filter any kind of unwanted mail. If I register all emails from the
bogofilter mailing list, then it should filter this list, no questions
asked. Virii spam are still spam... they are sent en masse to unwilling
recipients with a payload, but instead of a marketing message, the payload is a
virus (read this in the voice of Agent Smith ;). They should be
filterable.</FONT></DIV>
<DIV><BR><FONT face=Arial size=2>> luckily, it works, and my error rate is in
the magnitude of<BR>> one in a thousand.<BR></FONT></DIV>
<DIV><FONT face=Arial size=2>Good for you. I wish I could get that.
I still get a virus spam every day or two, sometimes several in a day, usually
classified as unsure. Since I've started using ASNs, it's been
getting better. It's hard for that one token to push it up over my cutoff
though, so they often still end up as unsure. Here's the breakdown of one
of them (use a fixed-width font):</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Courier size=2>X-Bogosity: Yes, tests=bogofilter,
spamicity=0.468896, version=0.17.5</FONT></DIV>
<DIV><FONT face=Courier size=2><BR>n
pgood
pbad fw U</FONT></DIV>
<DIV><FONT face=Arial size=2><FONT
face=Courier>"mime:application"
332 0.020702 0.000771 0.036174
+<BR>"mime:Content-Disposition"
486 0.026857 0.001232 0.044046
+<BR>"mime:attachment"
305 0.016786 0.000775 0.044432
+<BR>"mime:octet-stream"
284 0.015387 0.000729 0.045550
+<BR>"mime:base64"
401 0.016646 0.001182 0.066503
+<BR>"document"
853 0.028955 0.002708 0.085612
+<BR>"mime:bit"
937 0.030774 0.003006 0.089056
+<BR>"attached"
1284 0.039586 0.004196 0.095897
+<BR>"rcvd:Mar"
5390 0.138341 0.018448 0.117676
+<BR>"mime:plain"
2452 0.050497 0.008765 0.147932
+<BR>"rcvd:216.109.145.120"
7179 0.133445 0.026094 0.163569
+<BR>"mime:charset"
1776 0.032172 0.006481 0.167695
+<BR>"head:mixed"
2957 0.049937 0.010899 0.179171
+<BR>"format"
19217 0.213037 0.074171 0.258251
+<BR>"mime:text"
4484 0.049098 0.017325 0.260838
+<BR>"mime:Content-Type"
4458 0.047419 0.017266 0.266936
+<BR>"mime:Content-Transfer-Encoding" 4384 0.045461
0.017015 0.272351
+<BR>"subj:website"
45 0.000420 0.000176 0.296278
+<BR>"MIME"
23202 0.168275 0.092217 0.354011
-<BR>"multi-part"
17427 0.118198 0.069510 0.370308
-<BR>"This"
72541 0.474752 0.289855 0.379091
-<BR>"Your"
38280 0.235557 0.153406 0.394397
-<BR>"rcvd:oac-design.com"
217650 0.939712 0.884200 0.484782
-<BR>"mime:Windows-1252"
166 0.000699 0.000675 0.491046
-<BR>"head:Date"
244940 1.000560 0.996772 0.499052
-<BR>"rcvd:from"
244922 0.998461 0.996760 0.499574
-<BR>"rcvd:for"
222991 0.892433 0.908005 0.504325
-<BR>"message"
118798 0.470695 0.483880 0.506906
-<BR>"head:Message-Id"
133397 0.487341 0.544578 0.527733
-<BR>"rcvd:tanderso"
203455 0.692684 0.832099 0.545716
-<BR>"rcvd:Wed"
45157 0.151909 0.184740 0.548760
-<BR>"head:Content-Type"
235179 0.781508 0.962420 0.551869
-<BR>"to:oac-design.com"
223169 0.506085 0.920329 0.645205
-<BR>"head:multipart"
129453 0.290670 0.533939 0.647506
-<BR>"head:MIME-Version"
212537 0.467898 0.876906 0.652070
-<BR>"to:tanderso"
205168 0.337110 0.849935 0.716009
+<BR>"subj:Your"
17893 0.026997 0.074196 0.733212
+<BR>"from:lovebreeze.com"
1 0.000000 0.000004 0.910000
+<BR>"rtrn:lovebreeze.com"
1 0.000000 0.000004 0.910000
+<BR>"mime:your_website.pif"
2 0.000000 0.000008 0.950909
+<BR>"rcvd:24.7.114.120"
10 0.000000 0.000042 0.989412
+<BR>"rcvd:helo-oac-design.com"
177 0.000000 0.000742 0.999391
+<BR>"rcvd:as6478"
519 0.000000 0.002176 0.999792
+<BR>N_P_Q_S_s_x_md
26 7.20e-02 9.74e-03
4.69e-01<BR>
2.00e-01 4.60e-01
0.200<BR></FONT>
<BR>As you can see, some tokens such as "document" and "attached" are hammy,
however I doubt I've ever received a ham that said "Your document is
attached." And yet, some variation of this (ie "Your file is attached",
etc.) is seen in these virus spams all the time. With a Markovian filter,
the 3-4 token phrase would be exponentially more relevant than the individual
tokens.</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>Also of note, even though I've stripped out the
non-standard headers with spamitarium, it's still largely the "administrative"
tokens which make this email seem hammy. Dates are especially
frustrating... I wish bogofilter would ignore them. I would strip them
with spamitarium if they weren't a required part of the spec and used
extensively by email clients for sorting and such. Removing "X-Priority",
"X-MSMail-Priority", "ESMTP", etc., has helped a bit. Adding
"helo-oac-design.com" and "as6478" helped a lot. Without spamitarium, this
email was scored at 0.067239. Nonetheless, even at 0.468896, it still gets
classified as "unsure". I need something more to overcome the hamminess of
the "mime:" tokens. Perhaps simply registering this exhaustively until all
of those tokens become neutral is the answer. However, the Markovian
method is also tempting.</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>Tom</FONT></DIV>
<DIV> </DIV></BODY></HTML>