First Unicode experience

Sat Jul 2 21:23:37 CEST 2005

Hi!

I am just retraining my database with about 100,000 mails.
The third run is just finished and I already have more
entries in my database than in my old one which had several
retrainings (and just a few days ago a period of ten days
without any mistake). Also in the third run there were 22
spam messages being skipped, i.e., not recognized as spam
after they were used for training once. We will see how the
database performs.

My first guess is that due to the unification special
encoding variants in spam are no longer seen and hence it is
harder to identify. This is a hint that only language makes
it harder to make the decision. It is not much, though.

Now the question is: If we would do it much simpler, would
be as efficient as today or even better?

pi