unicode [was: bogofilter-0.95.0 - New Current Release]

David Relson relson at osagesoftware.com
Wed Jun 22 00:10:14 CEST 2005


On Tue, 21 Jun 2005 19:54:16 +0200 (CEST)
Boris 'pi' Piwinger wrote:

> David Relson said:
> 
> >> Also: How about word boundaries? In Unicode there is much more whitespace
> >> than in the small charsets. How do we do this? Now the question which
> >> character makes a word really changes. I work with this as a legal character
> >> for tokens: [^[:blank:][:cntrl:]<>;&%@|/\\{}^"*,[\]=()+?:#$._!'`~-]
> >> Does this fully translate to Unicode? That would seem great.
> >
> > Again, the answer is "insufficient information and test cases".
> 
> Actually, my question is mainly about [:blank:], punctuation (like French
> quotes) will be a problem anyway, but this is unchanged.
> 
> >> > Command line options "--unicode=yes" and "--unicode=no" can be used.
> >>
> >> Are there also config file options?
> >
> > Yes.
> 
> Which are?

_Every_ long command line option has a config file option.  Simply
remove the leading "--" and convert hyphens (from command line) to
underscores (for config file).

The config file options are:

   unicode=yes
   unicode=no

AFAIK, _all_ the config file options are present in
bogofilter.cf.example which is part of all the source packages
- .tar.gz, .tar.bz2, and .src.rpm


> > Bogofilter checks the database for the .ENCODING token and, if present,
> > uses its value.  The config file option only affects bogofilter when
> > creating a new wordlist.
> 
> Good enough.
> 
> >> > For a wordlist containing tokens from multiple languages, particularly
> >> > non-european languages, the conversion methods described above may not
> >> > work well for you.  Building a new wordlist (from scratch) will likely
> >> > work better as the new wordlist will be based solely on unicode.
> >>
> >> I will (once I upgrade) certainly do that. Since for all train-on-error
> >> methods (in particular training to exhaustion) the set of messages used
> >> will certainly look differently.
> >
> > You can build it and test it from the command line.  No need to replace
> > your mail delivery tool chain :->
> 
> Well, I won't be able to build before the weekend (could be the weekend in a
> month from now;-). But I will rebuild the database for sure.

OK



More information about the Bogofilter mailing list