When is spam_cutoff too low?

Pavel Kankovsky peak at argo.troja.mff.cuni.cz
Thu Dec 23 16:02:39 CET 2004


On Sun, 12 Dec 2004, Matej Cepl wrote:

> Would you have some tool to do get this statistics from the email
> corpora, or should I made myself some combination of grep, procmail,
> and other shell tools (or Python)?

I've got this crude (*) script that runs the mailbox through bogofilter
-M, and prints a histogram of bogosity distribution.

(*) It is not configurable, and makes implicit assumptions about
bogofilter output.

--Pavel Kankovsky aka Peak  [ Boycott Microsoft--http://www.vcnet.com/bms ]
"Resistance is futile. Open your source code and prepare for assimilation."
-------------- next part --------------
#!/bin/sh
bogofilter ${1+"$@"} -t -M | \
perl -w -e '
  %t2n = ( "N"=>1, "U"=>2, "Y"=>4 );
  %n2t = ( 0=>" ", 1=>"N", 2=>"U", 3=>"NU",
           4=>"Y", 5=>"?", 6=>"UY", 7=>"*" );
  @at = ();
  @an = ();
  $scale = 10;
  $total = 0;

  for my $i(0..$scale-1) {
    $at[$i] = $an[$i] = 0;
  }

  while (<STDIN>) {
    if (/^([NUY]) ([01]\.\d+)$/) {
      my $i = $2 * $scale;
      $i = $scale-1 if ($i >= $scale);
      $at[$i] |= $t2n{$1};
      $an[$i]++;
      $total++;
    }
  }

  for my $i(0..$scale-1) {
    $pct = 100*$an[$i]/$total;
    printf("%6d  %6.2f%% %-4s %4.2f  %s\n",
           $an[$i], $pct, $n2t{$at[$i]},
           $i/$scale, "*"x($pct/3));
  }
'


More information about the Bogofilter mailing list