Minimum usable counts [was: Question]
David Relson
relson at osagesoftware.com
Mon May 25 16:05:17 CEST 2009
On Mon, 25 May 2009 23:03:32 +0930
Stephen Davies wrote:
> Sorry David, I thought my example made it clear.
>
> The actual texts were:
>
> h=<CR><LF>ere
>
> and
>
> h=<LF>ere
>
> The first case is the raw text received by sendmail, amavis-milter
> and amavisd.
>
> The second is the text presented by kmail.
>
> In the second case, bogofilter is smart enough to get "here" as the
> token but in the first case, the CR broke the algorithm.
>
> Presumably, the = at end-of-line is part of a protocol that
> bogofilter knows but =<CR> is not.
>
> HTH,
> Stephen
Hi Stephen,
Ah, more info! The trailing "=" hadn't been mentionned previously...
I've created a sample mesage containing "alpha" ( "a=<CR><LF>lpha" )
and "beta" ( "b=<LF>eta" ) split as you've described:
### begin message ###
Subject: test
CRLF:
a=
lpha
LF:
b=
eta
### end message ###
Since claws-mail doesn't show <CR> or <LF>, I've attached a tgz file
with the message. The tgz also includes a hex dump of the message so
you can see the CR and LF characters.
Bogofilter's -vvv results (with and without "--min-token-len=1") is at
the end of the message. In neither case do I see bogofilter combining
characters before/after a final "=" sign. Perhaps you can gzip a
sample message and send it?
Regards,
David
relson at osage $ bogofilter -C -vvv < Message.0525.txt
X-Bogosity: Unsure, tests=bogofilter, spamicity=0.459094, version=1.2.0
n pgood pbad fw
U "CRLF" 153 0.000829 0.000001 0.000759
+ "subj:test" 917 0.001303 0.000393 0.231703
- "lpha" 0 0.000000 0.000000 0.520000
- "eta" 532 0.000005 0.000308 0.982578
+ N_P_Q_S_s_x_md 2 0.087930 0.006118 0.459094
0.017800 0.520000 0.375000
relson at osage $ bogofilter --min-token-len=1 -C -vvv < Message.0525.txt
X-Bogosity: Unsure, tests=bogofilter, spamicity=0.459094, version=1.2.0
n pgood pbad fw
U "CRLF" 153 0.000829 0.000001 0.000759
+ "subj:test" 917 0.001303 0.000393 0.231703
- "LF" 0 0.000000 0.000000 0.520000
- "a" 0 0.000000 0.000000 0.520000
- "b" 0 0.000000 0.000000 0.520000
- "lpha" 0 0.000000 0.000000 0.520000
- "eta" 532 0.000005 0.000308 0.982578
+ N_P_Q_S_s_x_md 2 0.087930 0.006118 0.459094
0.017800 0.520000 0.375000
More information about the Bogofilter
mailing list