DCC & bf & a new spam-fighting-service (maybe)

Sun Oct 8 17:22:52 CEST 2006

On Sat, 7 Oct 2006 the voices made Tony L. Svanstrom write:

TLS>  I'm thinking about DCC <URL: http://www.rhyolite.com/anti-spam/dcc/ >

 I installed DCC (Debian-stable), and "head:many" more or less instantly became
a sign of spam.

 That was just a default-installation without me having the time to change any
settings, nor did I test it on anything but new e-mails as they arrived;
without knowing the efficiency of DCC nowadays it's hard to say for sure if
"head:many" will remain a sign of spam or not, but for me it most likely will.
 If I play around with the settings some I'm sure the results will be much
better.

 DCC is basically just telling us the bulkiness of something, and without any
special trust or security in the system we can only use it to tell if an e-mail
has been seen by someone else or not; using that for only the images in our
e-mails isn't enough to find spam-images.
 Sure, the images might remain the same even though the text changes, and thus
the spam would get by DCC, but an image-centric solution would catch the image.
The problem remains the same though, as we still only find out the bulkiness
and not the spamminess.

 I think that a (fuzzy-)spam-image-matching-service would do great if we just
separate the spamminess from the bulkiness; and I've started working on
something like that to play around with locally; a few questions...

 1. Do you see a need for a service able to answer if an image is verified
	as belonging to a spammer?
 2. Would you consider such a service both new enough and potentially useful
	enough to be of service?
 3. Would you be willing to take part of such a system as it's being
	developed?

 There'd be no strict limitations on how it could be used by those testing it;
but the intention is that you let the special header be added before the e-mail
is sent to bogofilter, that it's only used on e-mails as they arrive (no
processing of your 15GB spam-archive unless asked for, please =)); and that if
you work at Google you don't get the smart idea of testing every e-mail passing
through gmail... ;-)

 Automatically reporting images as spam-images is ok during testing, as long as
you feel confident enough in your filters (ie probably not enough to just use
bf); the live-version would of course not allow just anyone to report an image
as spam, as we then would be back to the bulkiness vs spam-problem (and thus
allowing people to basically DOS very image-heavy HTML-newsletters, or remove
the trust in the system).

 The header added to your e-mails would initially only show the spam-count (ie
the number of times the images' been reported as spam); but when there are
enough people using the system the bulkiness of the images (ie how many times
they've been checked) would also be included.

 The actual images would not be sent, only the hash-value.

 And finally... The point of taking part of this would be to give feedback, but
don't worry if you feel that you later on might not have the time to do so; the
data sent will still be helpful, and if the protocol changes too much your
client will simply stop checking the e-mails/images. If that happens the header
added will show that the service has been disabled, and why.

 Any thoughts? (I'm basically just typing away as I'm brainstorming; but I
have left out the security-precautions for a full-scale implementation.)

	/Tony
-- 
        /\___/\                                          /\___/\
        \_@ @_/                                          \_@ @_/
   .--oOO-(_)-OOo--------------------------------------oOO-(_)-OOo--.
   |  perl -e'print$_{$_} for sort%_=`lynx -dump svanstrom.org/t`'  |
   `---ôôô---ôôô----------------------------------------ôôô---ôôô---´
       \O/   \O/        ©1998-2006 svanstrom.org        \O/   \O/