Minimal rules [was: Test with different lexers]

Tom Anderson tanderso at oac-design.com
Wed Dec 3 14:40:32 CET 2003


On Tue, 2003-12-02 at 12:21, Boris 'pi' Piwinger wrote:
> That sound like an approach really going way too far. I
> cannot see why you would not do decoding, since that is
> something just hidden to the recipient. The idea is after
> all to look at the text the reader will see. And clearly
> headers are seperated from the body, so this should be
> reflected.
> 
> Simplification does not mean to be dumb.

I concur.  The idea here is to remove rule-based filtering, not
decoding.  HTML, attachment, header tagging, quoted-printable, MIME,
etc., should be decoded... then the resulting tokens should be delimited
by whitespace without regard to numbers or punctuation or length
restrictions.  If my friend always greets me with "<%)>", then I want to
recognize that as a hammish token, and if a spammer always uses
"!!!!!!!", then I want to recognize that as a spammish token.  Another
token currently beyond length and punctuation restrictions which would
certainly be important to recognize is a pgp signature.

Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20031203/a60cd619/attachment.sig>


More information about the Bogofilter mailing list