Spam in images
relson at osagesoftware.com
Wed Aug 9 20:34:00 EDT 2006
On Wed, 09 Aug 2006 20:27:49 -0400
Tom Anderson wrote:
> .rp wrote:
> >>These spams don't have a single URL in them, so I suppose
> >>stripsearch wouldn't help.
> >>Or can stripsearch read the URLs in the GIF?
> > hmm, I wonder if it would be worth hooking in an OCR program to
> > read the image and what the min hardware requirements would be to
> > scan them without bringing a system to a crawl.
> Feel free to test, but my feeling is that even a moderate rate of
> spams would be unfeasible. And you could easily DOS yourself given
> just a slight uptick from expected. You would have to build in
> limits so that OCR is skipped on new emails when a few other OCR
> processes are already running. You could also only perform OCRs on
> emails in the unsure range, allowing the text alone to damn or bless
> the fairly certain ones. But in the end, spammers would simply add
> characteristics to baffle the OCR reader, like CAPTCHAs do. At least
> they couldn't do those phishing emails where they make it look like a
> regular text email though.
I'd suggest that OCR be used only when the message scores "unsure".
Using procmail or maildrop it'd be easy enough to test for unsure, then
run a script (to test if there's an image attachment), then run OCR and
More information about the Bogofilter