"qsf", a light-weight alternative to bogofilter?

Jef Poskanzer jef at acme.com
Sat Feb 26 18:13:03 CET 2005


C. Fischer:
><URL:http://www.ivarch.com/programs/qsf.shtml>

Thanks, that looks very interesting!  There's a FreeBSD port, so I installed
it and will give it a try soon.  One feature I noticed is missing is
training on MH folders, so I'll have to revive my early bogofilter scripts
for training on each message in an MH folder separately.

>since "qsf" documentation never mentions bayes

Yeah, but here's a comment block from spam/check.c:

  /*
   * Return a probability that the message is spam, where 0.9 and above means
   * "definitely spam", using the Robinson method.
   *
   * Robinson's method:
   *
   *   Set pN = "spam probability" of token N, where pN = f(p(w)), where p(w)
   *   is (bad / tb) / ((bad / tb) + (good / tg)) - bad is number of times
   *   token seen in bad messages, good is times token seen in good messages,
   *   tb and tg are total number of bad and good messages seen; f(w) is (robs
   *   * robx + gbtot * p(w)) / (robs + gbtot), where gbtot is is good + bad,
   *   robs is a constant, and robx is a fudge factor calculated from the
   *   average p(w) of all tokens that have been seen over 10 times.
   *
   *   Then:
   *
   *   P = 1 - ((1-p1)(1-p2)(1-p3)...(1-pN))^(1/n)
   *   Q = 1 - (p1p2p3...pN)^(1/n)
   *   S = (P - Q) / (P + Q)
   *
   * S is then a number from -1 to +1, so we scale it to 0-1 and then divide
   * by (0.54/0.9=0.6) and clip to 1, since the spam cutoff point for this
   * algorithm is 0.54 and we want it to be 0.9.
   */

So it is Bayesian.
---
Jef

         Jef Poskanzer  jef at acme.com  http://www.acme.com/jef/
_______________________________________________
Bogofilter mailing list
Bogofilter at bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter



More information about the Bogofilter mailing list