... convert_unicode.c ...

David Relson relson at osagesoftware.com
Thu Jun 23 21:12:51 CEST 2005


On Thu, 23 Jun 2005 20:49:13 +0400
Yar Tikhiy wrote:

> On Thu, Jun 23, 2005 at 02:56:35PM +0200, Matthias Andree wrote:
> > On Thu, 23 Jun 2005, David Relson wrote:
> > 
> > > Having both options also allows personal preferences.  I know some of
> > > our cyrillic users have CP866 and KOI8-R as their default charsets.
> > > Perhaps those would be useful as their default-to-charset.  Maybe so,
> > 
> > It is their default-from-charset, not "-to-charset".
> 
> Hoping I may speak for Cyrillic users, they would rather choose
> between Windows-1251 and KOI8-R as their default-from-charset since
> literally nobody uses CP866 on the Net side.  Interestingly, I
> receive most ham in KOI8-R and most spam in Windows-1251, and I've
> never seen an email in CP866.  However, today most non-English
> spammers seem to specify charset right for their recipients to be
> able to read the junk in one click--who will ever spend two clicks
> to read spam?  Therefore US-ASCII is a reasonable default-from-charset
> for Cyrillic users.  I hope that it is for Chinese folks, too :-)
> 
> And of course I vote for using UTF-8 as the default -to-charset
> without giving special support for national encodings like CP866.
> So bogofilter developers from all over the world won't have to fight
> over national issues, while people trying to localize software will
> have free time to spend on more fruitful projects than l10n :-)

Yar & Matthias,

It sounds like there's a consensus.  "./configure --with-charset=name"
will set the "from" charset (with US-ASCII being used if the option
isn't specified) and the "to" charset will be "utf-8" (with no ./
configure option).  If configure's "--disable-unicode" option is used,
bogofilter will operate as it has done in the past.

David



More information about the bogofilter-dev mailing list