OT: Chunking the cruft - random lettered words
Bob George
mailings02 at ttlexceeded.com
Wed Mar 17 13:14:59 CET 2004
Tom Allison wrote:
> [...]
> I would try the perl script first and see how it pans out.
> pipe it after bogofilter since bogofilter is already 99.9% effective
> and the theory we're testing here is that the remaining 0.1% can't
> spell worth a d at rn.
Rather than incur the penalty of doing a spell-check on all the words in
such a message -- which will fail on the "random word" technique anyhow
-- many on the spamassassin list have had good luck with things like:
Lack of conjunctions (and, but etc.) and punctuation
"abnormal" letter pairings (for english anyhow) and consanants
There are probably a lot of other "un-english" like rules these fail.
Just watch out for mailing lists with lots of program code and things
like that. Even then, "long words with weird pairings" might fit.
If you're going to be running something besides bogo that's rather
processing intensive, perhaps spamassassin is a good fit. I use the two
together myself.
- Bob
More information about the Bogofilter
mailing list