Minimum usable counts [was: Question]

David Relson relson at osagesoftware.com
Mon May 25 16:05:17 CEST 2009


On Mon, 25 May 2009 23:03:32 +0930
Stephen Davies wrote:

> Sorry David, I thought my example made it clear.
> 
> The actual texts were:
> 
> h=<CR><LF>ere
> 
> and 
> 
> h=<LF>ere
> 
> The first case is the raw text received by sendmail, amavis-milter
> and amavisd.
> 
> The second is the text presented by kmail.
> 
> In the second case, bogofilter is smart enough to get "here" as the
> token but in the first case, the CR broke the algorithm.
> 
> Presumably, the = at end-of-line is part of a protocol that
> bogofilter knows but =<CR> is not.
> 
> HTH,
> Stephen

Hi Stephen,

Ah, more info!  The trailing "=" hadn't been mentionned previously...

I've created a sample mesage containing "alpha" ( "a=<CR><LF>lpha" )
and "beta" ( "b=<LF>eta" ) split as you've described:

### begin message ###
Subject: test

CRLF:
a=
lpha

LF:
b=
eta
### end message ###

Since claws-mail doesn't show <CR> or <LF>, I've attached a tgz file
with the message.  The tgz also includes a hex dump of the message so
you can see the CR and LF characters.

Bogofilter's -vvv results (with and without "--min-token-len=1") is at
the end of the message.  In neither case do I see bogofilter combining
characters before/after a final "=" sign.  Perhaps you can gzip a
sample message and send it?

Regards,

David

relson at osage $ bogofilter -C -vvv < Message.0525.txt
X-Bogosity: Unsure, tests=bogofilter, spamicity=0.459094, version=1.2.0
                                        n    pgood     pbad      fw
U "CRLF"                              153  0.000829  0.000001  0.000759
+ "subj:test"                         917  0.001303  0.000393  0.231703
- "lpha"                                0  0.000000  0.000000  0.520000
- "eta"                               532  0.000005  0.000308  0.982578
+ N_P_Q_S_s_x_md                        2  0.087930  0.006118  0.459094
                                           0.017800  0.520000  0.375000

relson at osage $ bogofilter --min-token-len=1 -C -vvv < Message.0525.txt
X-Bogosity: Unsure, tests=bogofilter, spamicity=0.459094, version=1.2.0
                                        n    pgood     pbad      fw
U "CRLF"                              153  0.000829  0.000001  0.000759
+ "subj:test"                         917  0.001303  0.000393  0.231703
- "LF"                                  0  0.000000  0.000000  0.520000
- "a"                                   0  0.000000  0.000000  0.520000
- "b"                                   0  0.000000  0.000000  0.520000
- "lpha"                                0  0.000000  0.000000  0.520000
- "eta"                               532  0.000005  0.000308  0.982578
+ N_P_Q_S_s_x_md                        2  0.087930  0.006118  0.459094
                                           0.017800  0.520000  0.375000





More information about the Bogofilter mailing list