[cvs] bogofilter/src charset.c, 1.13, 1.14 charset.h, 1.1, 1.2 collect.c, 1.36, 1.37 lexer.c, 1.103, 1.104 mime.c, 1.32, 1.33 mime.h, 1.19, 1.20
Evgeny Kotsuba
evgen at shatura.laser.ru
Mon Jan 3 22:46:00 CET 2005
- Previous message (by thread): [cvs] bogofilter/src charset.c, 1.13, 1.14 charset.h, 1.1, 1.2 collect.c, 1.36, 1.37 lexer.c, 1.103, 1.104 mime.c, 1.32, 1.33 mime.h, 1.19, 1.20
- Next message (by thread): [cvs] bogofilter/src charset.c, 1.13, 1.14 charset.h, 1.1, 1.2 collect.c, 1.36, 1.37 lexer.c, 1.103, 1.104 mime.c, 1.32, 1.33 mime.h, 1.19, 1.20
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
Matthias Andree wrote:
>relson at users.sourceforge.net writes:
>
>
>
>> static void map_windows_1251(void)
>> {
>>-#ifdef WINDOWS_1251_to_CYRILLIC
>>+#ifdef CP866
>> /* Map: windows-1251 -> KOI8-R (Cyrillic) */
>> /* Contributed by: Yar Tikhiy (yarq at users.sourceforge.net) */
>> static char xlate_1251[] = {
>>- 0xA8, 0xB3,
>>- 0xB8, 0xA3,
>>+ 0xA8, 0xB3,
>>+ 0xB8, 0xA3,
>> 0xE0, 0xC1, 0xE1, 0xC2, 0xE2, 0xD7, 0xE3, 0xC7, 0xE4, 0xC4, 0xE5, 0xC5, 0xE6, 0xD6, 0xE7, 0xDA,
>> 0xE8, 0xC9, 0xE9, 0xCA, 0xEA, 0xCB, 0xEB, 0xCC, 0xEC, 0xCD, 0xED, 0xCE, 0xEE, 0xCF, 0xEF, 0xD0,
>> 0xF0, 0xD2, 0xF1, 0xD3, 0xF2, 0xD4, 0xF3, 0xD5, 0xF4, 0xC6, 0xF5, 0xC8, 0xF6, 0xC3, 0xF7, 0xDE,
>>@@ -285,6 +290,98 @@
>> #endif
>> }
>>
>>
>
>What is this function doing?
>
>
This function should be rename to
static void map_windows_1251_to_koi8r(void) ....and aaaa....
ups..
should be
+#ifndef CP866
i.e. it is old function for converting from xxxx codepades to base koi8r codepage. I made
static void map_windows_1251_to_cp866(void);
static void map_koi8_r_to_cp866(void);
static void map_iso_8859_5_to_cp866(void);
because my work codepage is cp866. koi8r is native russian codepage for
UNIX's so may be some time later anybody will make
map_iso_8859_5_to_koi8r and unicode to koi8r support.
>Why are we converting directly from one codepage to another?
>
>
because one word may have different binary representation, say word
"spammer" may be
E1 AF A0 AC ? AC A5 E0 CP866
D3 D0 C1 CD ? CD C5 D2 KOI8-R
F1 EF E0 EC ? EC E5 F0 CP1251
E1 DF D0 DC ? DC D5 E0 ISO-8859-5
User as human can view (with debug, bogoutil etc.) can understand words
only in his current codepage - this is the first reason and second is
data base size.
>
>
>>+int htmlUNICODE_decode(byte *buf, int len)
>>
>>
>
>And what does this function do?
>
>
this function decode unicode html tags and should change name to
int decode_and_htmlUNICODE_to_cp866(byte *buf, int len)
It decodes things like м to cp866 and all other normal
characters with charset_table[]
In any case all those changes will work with CP866 macro defined
>>+void mime_type2(word_t * text)
>>
>>
>
>What does this do? Why a mile-long #if 0? The whole mime.* change is
>undocumented and I don't see why we might need it, what it changes or does.
>
>
this function is used in proposed "EK binary problem hack"
#if 0 - there is empty swith isn' t it ? and old mime_type() is still
in place
SY,
EK
- Previous message (by thread): [cvs] bogofilter/src charset.c, 1.13, 1.14 charset.h, 1.1, 1.2 collect.c, 1.36, 1.37 lexer.c, 1.103, 1.104 mime.c, 1.32, 1.33 mime.h, 1.19, 1.20
- Next message (by thread): [cvs] bogofilter/src charset.c, 1.13, 1.14 charset.h, 1.1, 1.2 collect.c, 1.36, 1.37 lexer.c, 1.103, 1.104 mime.c, 1.32, 1.33 mime.h, 1.19, 1.20
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
More information about the bogofilter-dev
mailing list