Spam in images
tanderso at oac-design.com
Wed Aug 9 20:27:49 EDT 2006
>>These spams don't have a single URL in them, so I suppose stripsearch
>>Or can stripsearch read the URLs in the GIF?
> hmm, I wonder if it would be worth hooking in an OCR program to read the
> image and what the min hardware requirements would be to scan them
> without bringing a system to a crawl.
Feel free to test, but my feeling is that even a moderate rate of spams
would be unfeasible. And you could easily DOS yourself given just a
slight uptick from expected. You would have to build in limits so that
OCR is skipped on new emails when a few other OCR processes are already
running. You could also only perform OCRs on emails in the unsure
range, allowing the text alone to damn or bless the fairly certain ones.
But in the end, spammers would simply add characteristics to baffle
the OCR reader, like CAPTCHAs do. At least they couldn't do those
phishing emails where they make it look like a regular text email though.
More information about the Bogofilter