lexer change

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Wed Nov 5 11:33:41 CET 2003


Boris 'pi' Piwinger wrote:

>> 2 - acceptance of digits at the beginning of tokens and acceptance of
>> numbers as tokens
>> 
>>     Rejected.  I don't see value in this change.
> 
> Here are my personal results. I have *lots* of those tokens
> (4806), only 11 of which are significant (all hammish).

Forget that. I was fooled by bogoutil -p which claims to
give probabilities but uses default values. This option
would only be useful if you can give the real values used to it.

New test (I wrap all into a test e-mail and look at -vvv): I
have 4723 tokens starting numeric, 4199 are significant
(\+$). So they are really very useful.

> I have 141 of the numeric-starting tokens with subj:, of
> which only 1 is significant.

I now have 155 of those. 134 of which are significant.

> So this is enough for me to accept that those are not
> helpful. It is a suprise to me.

OK, I am no longer surprised, I now have strong evidence of
their importance.

> I do see a lot starting with 3D (which is = in
> quoted-printable). So I'll have to investigate if there is a
> problem. Not today.

I have 41 of those. It is really hard to check those, since
the show up in mails quite often, most of them are decoded,
so it is tough to find the ones which are not.

Actually the one spam message I did identify was completely
broken, so we could not do it correctly. No bug:-))

In ham they usually come from quoted messages or are real
strings (ZIP codes etc.). So also no bug, great:-))

pi





More information about the Bogofilter mailing list