DOCTYPE

David Relson relson at osagesoftware.com
Fri Nov 7 17:09:18 CET 2003


Greetings,

I just took a look at the 57 "unsures" I've received so far this month. 


16 of them have the DOCTYPE directives.  

2 of them are text/html while 14 are text/plain.

Here's what bogofilter thinks of the 16 messages, with each being scored
the old way (without special treatment of DOCTYPE) and the new way (with
special treatment).

msg     Content-Type    w/o DOCTYPE     with DOCTYPE
 1	text/plain;	S 1.000000	S 1.000000
 2	text/plain;	U 0.500000	U 0.500000
 7	text/plain;	S 1.000000	S 1.000000
11	text/html;	U 0.500000	U 0.500000
13	text/plain;	U 0.500000	U 0.500000
17	text/plain;	U 0.500052	U 0.500095 *
22	text/plain;	U 0.500119	U 0.500660 *
28	text/plain;	U 0.500000	U 0.500000
30	text/plain;	U 0.500000	S 0.572000 *
34	text/plain;	U 0.500000	U 0.500000
35	text/html;	U 0.500001	U 0.500001
49	text/plain;	U 0.500000	U 0.500001 *
50	text/plain;	U 0.500000	U 0.500000
51	text/plain;	U 0.500000	U 0.500000
52	text/plain;	U 0.500000	U 0.500000
57	text/plain;	U 0.500000	S 0.640000 *

The messages were scored using a wordlist from the end of October and
the 5 messages where DOCTYPE processing changes the score have been
marked.

Conclusion:  DOCTYPE processing makes a slight difference.

David

P.S.  I'm running a more extensive test using the 70,000 messages
received this year to better quantify the effect of DOCTYPE processing. 
I'll report on that later.




More information about the Bogofilter mailing list