http://www.epilogue.com/~dab/delegation-protocol.html (World Wide Web Directory, 06/1995)
Lice - A Delegation Protocol
Lice - A Delegation Protocol
Notes on the design of a delegation protocol.
Overview
We're designing a system that lets us delegate computation to someone
else. We're trying to be as general as possible so as not to limit
the range of uses for such a system but some of the possible uses that
are leading us on are: distributed computations like the RSA129
project or distributed ray tracing, networkwide search engines such as
knowbots, delegated network management, and remote control of devices
where it's useful to give programatic instructions to be carried out
locally (deep space probes for instance). Our dream is that
eventually that much of the idle computational power on the Internet
is available through such a system.
Such a system has two parts, a protocol to distribute pieces of a
computation about the net and a language in which to specify those
pieces. Instead of designing our own language we chose scheme (a
discussion of why scheme). Most of
the work here is designing the protocol.
The overall architecture is very straightforward. You open a TCP
connection to a computation server and end up talking to a scheme
listener. You send it scheme expressions, the server evaluates them
and sends back the result. The expressions can, of course, be scheme
programs which you then direct the server to run.
That was easy, now the hard parts.
Open Issues
- Which scheme
- Scheme is scheme right? Well, not quite. Each implementation
has its own peculiarities. Eventually we'll have to specify just what
we need for this to be a standard. To begin we'll just pick one
implementation and that will be enough standard for us to design and
build the protocol.
I suggest scheme48. It seems to be under active development by a
group at MIT (The
Scheme Underground) who will probably add several things that
we'll find useful and it seems a fairly complete and reasonable
implementation of scheme.
- Session layer
- Computations could take a significant amount of time. In fact,
some uses of this protocol involve long term monitoring of hundreds to
thousands of remote computations. There's really no need to keep the
transport connection open during all that time. So we'd like a small
session layer that holds the higher level information about who's
talking even through transport layer outages.
- Security
- Yuck. As soon as we go beyond the simple compute server we need
this. It should be in the basic design and probably interacts with
the session layer.
Some thoughts on what we need do do for
security.
- Async vs synchronous
- In a synchronous system you send down an expression to be
evaluated and sometime later it gives back an answer whereupon you can
send down the next expression. A traditional lisp listener works this
way. If you want to run a second expression before the first
finishes, you have to start up a second lisp listener.
An asynchronous system lets you send down a second expression and have
it worked on before the first finishes. The execution of the two
expressions is not synchronized.
Synchronous is a lot easier to implement and use but it's somewhat
inflexible. It turns out there's a trivial tasking system that
appears asynchronous in the flexibility it gives you but is really
synchronous underneath. See Frank's writeup
for a description of how we're planning to handle this (based
on an old writeup by Dave).
- Protocol
- What does the protocol look like? I'd suggest that
syntactically it's just S-expressions (I bet you knew that was coming)
but semantically it's a little different than just talking to a
normal, human usable scheme listener. A couple of reasons: a) we need
to be able to reliably separate error responses from real responses
and b) if we want to allow for asynchronous operation we need extra
stuff for that.
- Locating a compute server
- We might be able to get by with hardwiring this while we
experiment but things don't get interesting until we have more compute
servers than will work with hardwiring.
- Extension Query
- We need extensions to scheme for various things having to do
with the distributed nature of this protocol. We'll also add
extensions to give the servers various capabilites. We need a way of
asking the server what extensions it has. Some of this information
may be available to the compute server locator system too.
- Demand Paging
- We had this idea a while back that the server could give scheme
expressions back to the client. It could do this to give back its
answers as expressions that need to be evaluated (this helps if the
answer is circular). We also thought about using this to demand load
libraries that are missing from the server. I suspect that this
requires async.
- Jobs
- The question here is should we have a job structure of some
sort. This would give us a handle to ask about status of multiple
computations in a single instance of a server. The handle is also
useful for suspending or killing specific computations.
The alternative is we only have control at a per-server-instance
granularity. If you want multiple jobs then you fire up a new
instance of the server. This is supposed to be distributed
computation after all, let's distribute it.
Why would we want multiple jobs in a single instance of a server? If
the communications requirements between the pieces are such that they
require or are much better done by shared memory. If we're going to
download a lot of code that's used by both jobs and we only want to
download the code once.
- Debugging
- What sort of cross the net debugging do we need? This could get
involved.
It seems like everything I get onvolved with starts with a new
vocabulary. Here's our new words.
People
Here are the loons involved so far.
David Bridgham <dab@epilogue.com>
last updated: Thu Sep 8 10:19:43 1994 by Dave Bridgham