No subject

Tamer Yousef tamer.yousef at gmail.com
Thu Jul 25 23:00:31 CEST 2013


I have bogofilter version 1.2.4, with the following number in the training
set:
spam: 82,799 & good:101,798

I ran the following command through the filter and I got results as Unknown
with 0.52 score.

bogofilter -t -vvv <<< "

I work in games and simulations. And over the years games have become
increasingly more prevalent as a kind of touchstone for design more
broadly. So the project that I’m working on now that I’m going to talk
about is an attempt to generalize from my experience in game design to
almost anything else."

here is the output of the command:

                                        n    pgood     pbad      fw     U
  "going"                           34609  0.247117  0.114168  0.316006 -
  "anything"                        15831  0.109992  0.055967  0.337233 -
  "else"                            11879  0.080719  0.044228  0.353973 -
  "talk"                            10472  0.069530  0.040991  0.370889 -
  "And"                             30372  0.199022  0.122127  0.380282 -
  "working"                         16125  0.105621  0.064892  0.380570 -
  "now"                             41486  0.268483  0.170956  0.389033 -
  "almost"                          15381  0.099334  0.063636  0.390477 -
  "about"                           66305  0.423879  0.279653  0.397499 -
  "kind"                            14878  0.094894  0.063020  0.399079 -
  "project"                          6709  0.042771  0.028442  0.399397 -
  "work"                            33685  0.214700  0.142864  0.399549 -
  "over"                            42124  0.266862  0.180654  0.403682 -
  "generalize"                         45  0.000285  0.000193  0.404213 -
  "have"                            92352  0.583793  0.397626  0.405154 -
  "that"                           107708  0.644286  0.508714  0.441209 -
  "from"                            86081  0.509372  0.413387  0.447990 -
  "years"                           27694  0.163687  0.133226  0.448704 -
  "for"                            123322  0.726920  0.595696  0.450392 -
  "the"                            149189  0.871324  0.730564  0.456065 -
  "attempt"                          3886  0.022358  0.019445  0.465154 -
  "and"                            147442  0.847639  0.738584  0.465624 -
  "more"                            67875  0.388377  0.342263  0.468442 -
  "become"                          14658  0.078931  0.079989  0.503329 -
  "experience"                      13936  0.073282  0.078214  0.516275 -
  "game"                             8702  0.043321  0.051836  0.544743 -
  "games"                            5233  0.024499  0.033080  0.574511 -
  "prevalent"                         337  0.001444  0.002295  0.613760 -
  "touchstone"                         19  0.000079  0.000133  0.628221 -
  "broadly"                           167  0.000609  0.001268  0.675534 -
  "design"                           7857  0.028507  0.059844  0.677339 -
  "increasingly"                     1865  0.005982  0.015169  0.717163 -
  "simulations"                        62  0.000138  0.000580  0.808173 -
  N_P_Q_S_s_x_md                        0  0.000000  0.000000  0.520000
                                           0.017800  0.520000  0.375000


I do not really understand the meaning of the headers at the top "pgood
pbad      fw", and hence these values does not make sense to me, I'm
wondering why this text which is non-spam was not identified as such? and
fell into the unknown bucket. The training set I have is mostly well
categorized except some cases but the text I'm examining does not have any
spam-like tokens.
any help on this is appreciated !

thanks,
Tamer



More information about the Bogofilter mailing list