When is spam_cutoff too low?
Pavel Kankovsky
peak at argo.troja.mff.cuni.cz
Thu Dec 23 16:02:39 CET 2004
On Sun, 12 Dec 2004, Matej Cepl wrote:
> Would you have some tool to do get this statistics from the email
> corpora, or should I made myself some combination of grep, procmail,
> and other shell tools (or Python)?
I've got this crude (*) script that runs the mailbox through bogofilter
-M, and prints a histogram of bogosity distribution.
(*) It is not configurable, and makes implicit assumptions about
bogofilter output.
--Pavel Kankovsky aka Peak [ Boycott Microsoft--http://www.vcnet.com/bms ]
"Resistance is futile. Open your source code and prepare for assimilation."
-------------- next part --------------
#!/bin/sh
bogofilter ${1+"$@"} -t -M | \
perl -w -e '
%t2n = ( "N"=>1, "U"=>2, "Y"=>4 );
%n2t = ( 0=>" ", 1=>"N", 2=>"U", 3=>"NU",
4=>"Y", 5=>"?", 6=>"UY", 7=>"*" );
@at = ();
@an = ();
$scale = 10;
$total = 0;
for my $i(0..$scale-1) {
$at[$i] = $an[$i] = 0;
}
while (<STDIN>) {
if (/^([NUY]) ([01]\.\d+)$/) {
my $i = $2 * $scale;
$i = $scale-1 if ($i >= $scale);
$at[$i] |= $t2n{$1};
$an[$i]++;
$total++;
}
}
for my $i(0..$scale-1) {
$pct = 100*$an[$i]/$total;
printf("%6d %6.2f%% %-4s %4.2f %s\n",
$an[$i], $pct, $n2t{$at[$i]},
$i/$scale, "*"x($pct/3));
}
'
More information about the Bogofilter
mailing list