Test with different lexers
David Relson
relson at osagesoftware.com
Tue Dec 2 15:17:20 CET 2003
On Tue, 02 Dec 2003 15:01:52 +0100
Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> wrote:
> David Relson wrote:
>
> >> Over the time we have introduced several special rules to
> >> deal with specific problematic messages. My version has
> >> removed some of those (different token front and back,
> >> dollar rule, no short tokens, no numeric tokens, doctype
> >> switch, maybe more).
> >
> > This description concerns me. Some of the removed rules have only
> > been in your private version of bogofilter.
>
> This must be a misunderstanding. Maybe I did not write clear
> enough.
Sorry, I misinterpreted. You're _allowing_ short tokens, numbers, and
digits at the beginning of tokens. You're _not_ checking for money or
"doctype".
...[snip]...
> > As a second example, "!" is accepted at the end
> > (but not the beginning), reflecting common spammer usage.
>
> This is a nice example of an idea which sounds totally
> reasonable. In my test (which I did post), though, it was
> actually indifferent, so in some test it worked better in
> another worse. With rules like this we try to code some
> actual technique we see as humans into bogofilter, so we
> want to be more clever than the statistics. It might well
> work out in some cases, it might also surprise us or change
> nothing in effect.
"reasonable" isn't why it's it bogofilter. Paul Graham tested and found
it useful. Greg and I tested and found it useful. That's why it's
present.
> We also have no understanding how different rules play
> together, do they remain useful if combined? Could be, maybe
> not. So this test was designed to get as much of those a
> priori judgements out as seemed reasonable to me (others
> might go even further or not all that far). My result being
> that we can just as well leave those out.
You don't indicate which special characters you allow and which ones you
don't allow.
More information about the Bogofilter
mailing list