-vvv output [was: FAQ: Asian spam]

David Relson relson at osagesoftware.com
Thu Mar 27 14:27:31 CET 2003


At 07:40 AM 3/27/03, Boris 'pi' Piwinger wrote:

>Boris 'pi' Piwinger wrote:
>
> > This fails on multipart, but the fix is too risky I think.
>
>I just had one coming throuhg. Here some part of the -vvv
>output (which I don't really like all that much, it is hard
>too read and wider than my terminal windows):
>
> > 
> "°ÅºÎÇØ"                             1  0.000000  0.000149  0.999416 
> -7.44490  -0.00058 +
> > 
> "¸ÞÀϹ߼ÛÀÌ"                         1  0.000000  0.000149  0.999416 
> -7.44490  -0.00058 +
> > 
> "¼ö½Å°ÅºÎÇÒ"                         1  0.000000  0.000149  0.999416 
> -7.44490  -0.00058 +
> > 
> "ÀÔ·ÂÈÄ¿¡"                           1  0.000000  0.000149  0.999416 
> -7.44490  -0.00058 +
> > 
> "ÁØÈñ"                               1  0.000000  0.000149  0.999416 
> -7.44490  -0.00058 +
> > 
> "ÁÙ±î"                               1  0.000000  0.000149  0.999416 
> -7.44490  -0.00058 +
> > 
> "ó¸"                                1  0.000000  0.000149  0.999416 
> -7.44490  -0.00058 +
> > 
> "211.172.115.17"                     2  0.000000  0.000297  0.999708 
> -8.13755  -0.00029 +
> > 
> "µÇÁö"                               2  0.000000  0.000297  0.999708 
> -8.13755  -0.00029 +
> > 
> "ÁÖ¼Ò¸¦"                             2  0.000000  0.000297  0.999708 
> -8.13755  -0.00029 +
> > 
> "Áֽøé"                             2  0.000000  0.000297  0.999708 
> -8.13755  -0.00029 +
> > 
> "hanmail.net"                        4  0.000000  0.000594  0.999854 
> -8.83044  -0.00015 +
> > 
> "¾Ê½À´Ï´Ù"                           6  0.000000  0.000891  0.999903 
> -9.23582  -0.00010 +
>
>So it really adds to the database, maybe it won't hurt more
>to add all?
>
>Anyhow, the FAQ should say something on that issue.

Would you care to draft a few lines on asian spam?  An idea would be to 
include the two primary processing ways.  1 - check for 
"charset=gb2512|kc_5601_..." to discard it ; 2 - add to database.

>Talking about the FAQ: David, I believe, gave an excellent
>explanation of the above output. That would also fit the FAQ
>well.
>
>I'd would be also nice if the above output could be modified
>by the config file. I don't care for the last two columns
>before the +/- for example.

With the inclusion of Robinson-Fisher months ago, Greg added a '-R' switch 
and detailed token/value output.  The output in a format that the R 
statistical tool/language can utilize to verify correct spamicity 
calculation and to experiment with the calculation.  The format of the 
output is properly known as an R table.  More information on "The R Project 
for Statistical Computing" can be found at www.r-project.org.

Since bogofilter needed a way to display detailed info, it was reasonable 
to adopt the R table ("-R") output for "-vvv".  Like you, I don't find the 
last two columns particularly useful.  They are intermediate results in the 
spamicity calculation and do not have easily understood interpretations (at 
least for a person like me).  Perhaps the thing to do is have "-R" generate 
_all_ columns of output and to have "-vvv" leave out the last two numeric 
columns.  What do you think?

By the way, any explanation of the table format would be Greg's, as he's 
our "R" expert.







More information about the Bogofilter mailing list