OT: Chunking the cruft - random lettered words

Bob George mailings02 at ttlexceeded.com
Wed Mar 17 13:14:59 CET 2004


Tom Allison wrote:

> [...]
> I would try the perl script first and see how it pans out.
> pipe it after bogofilter since bogofilter is already 99.9% effective 
> and the theory we're testing here is that the remaining 0.1% can't 
> spell worth a d at rn.

Rather than incur the penalty of doing a spell-check on all the words in 
such a message -- which will fail on the "random word" technique anyhow 
-- many on the spamassassin list have had good luck with things like:

Lack of conjunctions (and, but etc.) and punctuation
"abnormal" letter pairings (for english anyhow) and consanants

There are probably a lot of other "un-english" like rules these fail. 
Just watch out for mailing lists with lots of program code and things 
like that. Even then, "long words with weird pairings" might fit.

If you're going to be running something besides bogo that's rather 
processing intensive, perhaps spamassassin is a good fit. I use the two 
together myself.

- Bob







More information about the Bogofilter mailing list