-vvv output [was: FAQ: Asian spam]
David Relson
relson at osagesoftware.com
Thu Mar 27 14:27:31 CET 2003
At 07:40 AM 3/27/03, Boris 'pi' Piwinger wrote:
>Boris 'pi' Piwinger wrote:
>
> > This fails on multipart, but the fix is too risky I think.
>
>I just had one coming throuhg. Here some part of the -vvv
>output (which I don't really like all that much, it is hard
>too read and wider than my terminal windows):
>
> >
> "°ÅºÎÇØ" 1 0.000000 0.000149 0.999416
> -7.44490 -0.00058 +
> >
> "¸ÞÀϹ߼ÛÀÌ" 1 0.000000 0.000149 0.999416
> -7.44490 -0.00058 +
> >
> "¼ö½Å°ÅºÎÇÒ" 1 0.000000 0.000149 0.999416
> -7.44490 -0.00058 +
> >
> "ÀÔ·ÂÈÄ¿¡" 1 0.000000 0.000149 0.999416
> -7.44490 -0.00058 +
> >
> "ÁØÈñ" 1 0.000000 0.000149 0.999416
> -7.44490 -0.00058 +
> >
> "ÁÙ±î" 1 0.000000 0.000149 0.999416
> -7.44490 -0.00058 +
> >
> "ó¸" 1 0.000000 0.000149 0.999416
> -7.44490 -0.00058 +
> >
> "211.172.115.17" 2 0.000000 0.000297 0.999708
> -8.13755 -0.00029 +
> >
> "µÇÁö" 2 0.000000 0.000297 0.999708
> -8.13755 -0.00029 +
> >
> "ÁÖ¼Ò¸¦" 2 0.000000 0.000297 0.999708
> -8.13755 -0.00029 +
> >
> "Áֽøé" 2 0.000000 0.000297 0.999708
> -8.13755 -0.00029 +
> >
> "hanmail.net" 4 0.000000 0.000594 0.999854
> -8.83044 -0.00015 +
> >
> "¾Ê½À´Ï´Ù" 6 0.000000 0.000891 0.999903
> -9.23582 -0.00010 +
>
>So it really adds to the database, maybe it won't hurt more
>to add all?
>
>Anyhow, the FAQ should say something on that issue.
Would you care to draft a few lines on asian spam? An idea would be to
include the two primary processing ways. 1 - check for
"charset=gb2512|kc_5601_..." to discard it ; 2 - add to database.
>Talking about the FAQ: David, I believe, gave an excellent
>explanation of the above output. That would also fit the FAQ
>well.
>
>I'd would be also nice if the above output could be modified
>by the config file. I don't care for the last two columns
>before the +/- for example.
With the inclusion of Robinson-Fisher months ago, Greg added a '-R' switch
and detailed token/value output. The output in a format that the R
statistical tool/language can utilize to verify correct spamicity
calculation and to experiment with the calculation. The format of the
output is properly known as an R table. More information on "The R Project
for Statistical Computing" can be found at www.r-project.org.
Since bogofilter needed a way to display detailed info, it was reasonable
to adopt the R table ("-R") output for "-vvv". Like you, I don't find the
last two columns particularly useful. They are intermediate results in the
spamicity calculation and do not have easily understood interpretations (at
least for a person like me). Perhaps the thing to do is have "-R" generate
_all_ columns of output and to have "-vvv" leave out the last two numeric
columns. What do you think?
By the way, any explanation of the table format would be Greg's, as he's
our "R" expert.
More information about the Bogofilter
mailing list