Training ham seems difficult

Andreas Pardeike andreas at pardeike.net
Thu Jan 15 08:19:24 CET 2004


On 2004-01-14, at 18.03, Matej Cepl wrote:

>> the confirmation request is unlikely to be detected as spam and
>> since no reply is send out when something is received at the
>> confirmation account no such loop will ever happen.
>
> ``unlikely'' -- why you think so?

And even if it is detected as spam and the recipient runs also my
system then this will happen:

Assumption: Person A & B both run "my system". Because of that, there
will be two extra accounts only used for sending out the system msg
waiting solely for legitimate responses. I will call those A.admin
and B.admin

A send email to B
B puts message on hold in database
B.admin sends out confirmation request to A
A puts confirmation request in database
A.admin sends out confirmation request to B.admin
B.admin ignores that message (like all other non-confirmations)

After 7 days:
System B will expire and delete msg [A->B]
System A will expire and delete msg [B.admin->A]

And that is only if I don't add some extra detection for
confirmation requests. If I do so I can easily detect that
the original sender (A) runs "my system" and immediately
unlock his message as ham.

Bottom line
-----------

If you like me receive about 150-250 spams per day (the life of a
sharewawe author) then any false positive that gets into the spam
trap is almost equal to deleting it immediately because I don't
have the nerves and time to go through my spam trap at all.

As a result, I need a system that deals with false positives 100%
autonomy and at the same times tries to minimise the false negatives.
Since I run MacOS X my email client does a great job on filtering
out spam with a filter similar to bogofilter and thus will file most
spam into a IMAP Junk mailbox where the server can easily train it.

All I have to do is to train the false negatives that appear in my
inbox by clicking "This is Junk" in my mail client.

It's true that my system needs refinements but that's the whole idea
of putting it public and making it easy to test and try. That way I
can implement all new ideas that I can't possible have. I am a very
good programmer but for getting ideas nothing can beat collaboration.

I am running this now written in Perl but as soon as I get the
feeling that this works great I can easily port this to C. I have
written a full featured IMAP server in merely two weeks (4D Mail).

Regards,
Andreas Pardeike
-- If no symptoms manifest, does a problem exist?





More information about the Bogofilter mailing list