Test with different lexers

Tue Dec 2 15:35:50 CET 2003

David Relson wrote:

>> > As a second example, "!" is accepted at the end
>> > (but not the beginning), reflecting common spammer usage.
>> 
>> This is a nice example of an idea which sounds totally
>> reasonable. In my test (which I did post), though, it was
>> actually indifferent, so in some test it worked better in
>> another worse. With rules like this we try to code some
>> actual technique we see as humans into bogofilter, so we
>> want to be more clever than the statistics. It might well
>> work out in some cases, it might also surprise us or change
>> nothing in effect.
> 
> "reasonable" isn't why it's it bogofilter.  Paul Graham tested and found
> it useful.  Greg and I tested and found it useful.  That's why it's
> present.

Yes, but the idea comes from some human observation. So with
some testing results in favor of it we got this special
treatment.

>> We also have no understanding how different rules play
>> together, do they remain useful if combined? Could be, maybe
>> not. So this test was designed to get as much of those a
>> priori judgements out as seemed reasonable to me (others
>> might go even further or not all that far). My result being
>> that we can just as well leave those out.
> 
> You don't indicate which special characters you allow and which ones you
> don't allow.

I actually did not change much. I use TOKENBORDER for
TOKENFRONT and TOKENBACK. Comparing TOKENBORDER with
TOKENBACK, I just don't allow ! and ~, but do allow $ (the
latter will allow for more general $-tokens than the special
rule). So TOKENBACK does not change a lot. TOKENFRONT does
not allow $ and digits, what I do. I hope I have not missed
anything.

pi