Test with different lexers

Tue Dec 2 15:17:20 CET 2003

On Tue, 02 Dec 2003 15:01:52 +0100
Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> wrote:

> David Relson wrote:
> 
> >> Over the time we have introduced several special rules to
> >> deal with specific problematic messages. My version has
> >> removed some of those (different token front and back,
> >> dollar rule, no short tokens, no numeric tokens, doctype
> >> switch, maybe more).
> > 
> > This description concerns me.  Some of the removed rules have only
> > been in your private version of bogofilter. 
> 
> This must be a misunderstanding. Maybe I did not write clear
> enough.

Sorry, I misinterpreted.  You're _allowing_ short tokens, numbers, and
digits at the beginning of tokens.  You're _not_ checking for money or
"doctype".

...[snip]...

> > As a second example, "!" is accepted at the end
> > (but not the beginning), reflecting common spammer usage.
> 
> This is a nice example of an idea which sounds totally
> reasonable. In my test (which I did post), though, it was
> actually indifferent, so in some test it worked better in
> another worse. With rules like this we try to code some
> actual technique we see as humans into bogofilter, so we
> want to be more clever than the statistics. It might well
> work out in some cases, it might also surprise us or change
> nothing in effect.

"reasonable" isn't why it's it bogofilter.  Paul Graham tested and found
it useful.  Greg and I tested and found it useful.  That's why it's
present.

> We also have no understanding how different rules play
> together, do they remain useful if combined? Could be, maybe
> not. So this test was designed to get as much of those a
> priori judgements out as seemed reasonable to me (others
> might go even further or not all that far). My result being
> that we can just as well leave those out.

You don't indicate which special characters you allow and which ones you
don't allow.