DOCTYPE
David Relson
relson at osagesoftware.com
Fri Nov 7 17:09:18 CET 2003
Greetings,
I just took a look at the 57 "unsures" I've received so far this month.
16 of them have the DOCTYPE directives.
2 of them are text/html while 14 are text/plain.
Here's what bogofilter thinks of the 16 messages, with each being scored
the old way (without special treatment of DOCTYPE) and the new way (with
special treatment).
msg Content-Type w/o DOCTYPE with DOCTYPE
1 text/plain; S 1.000000 S 1.000000
2 text/plain; U 0.500000 U 0.500000
7 text/plain; S 1.000000 S 1.000000
11 text/html; U 0.500000 U 0.500000
13 text/plain; U 0.500000 U 0.500000
17 text/plain; U 0.500052 U 0.500095 *
22 text/plain; U 0.500119 U 0.500660 *
28 text/plain; U 0.500000 U 0.500000
30 text/plain; U 0.500000 S 0.572000 *
34 text/plain; U 0.500000 U 0.500000
35 text/html; U 0.500001 U 0.500001
49 text/plain; U 0.500000 U 0.500001 *
50 text/plain; U 0.500000 U 0.500000
51 text/plain; U 0.500000 U 0.500000
52 text/plain; U 0.500000 U 0.500000
57 text/plain; U 0.500000 S 0.640000 *
The messages were scored using a wordlist from the end of October and
the 5 messages where DOCTYPE processing changes the score have been
marked.
Conclusion: DOCTYPE processing makes a slight difference.
David
P.S. I'm running a more extensive test using the 70,000 messages
received this year to better quantify the effect of DOCTYPE processing.
I'll report on that later.
More information about the Bogofilter
mailing list