Lice - A Delegation Protocol

http://www.epilogue.com/~dab/delegation-protocol.html (World Wide Web Directory, 06/1995)

Lice - A Delegation Protocol

Notes on the design of a delegation protocol.

Overview

We're designing a system that lets us delegate computation to someone else. We're trying to be as general as possible so as not to limit the range of uses for such a system but some of the possible uses that are leading us on are: distributed computations like the RSA129 project or distributed ray tracing, networkwide search engines such as knowbots, delegated network management, and remote control of devices where it's useful to give programatic instructions to be carried out locally (deep space probes for instance). Our dream is that eventually that much of the idle computational power on the Internet is available through such a system.

Such a system has two parts, a protocol to distribute pieces of a computation about the net and a language in which to specify those pieces. Instead of designing our own language we chose scheme (a discussion of why scheme). Most of the work here is designing the protocol.

The overall architecture is very straightforward. You open a TCP connection to a computation server and end up talking to a scheme listener. You send it scheme expressions, the server evaluates them and sends back the result. The expressions can, of course, be scheme programs which you then direct the server to run.

That was easy, now the hard parts.

Open Issues

Which scheme

Scheme is scheme right? Well, not quite. Each implementation has its own peculiarities. Eventually we'll have to specify just what we need for this to be a standard. To begin we'll just pick one implementation and that will be enough standard for us to design and build the protocol.

I suggest scheme48. It seems to be under active development by a group at MIT (The Scheme Underground) who will probably add several things that we'll find useful and it seems a fairly complete and reasonable implementation of scheme.

Session layer

Computations could take a significant amount of time. In fact, some uses of this protocol involve long term monitoring of hundreds to thousands of remote computations. There's really no need to keep the transport connection open during all that time. So we'd like a small session layer that holds the higher level information about who's talking even through transport layer outages.

Security

Yuck. As soon as we go beyond the simple compute server we need this. It should be in the basic design and probably interacts with the session layer.

Some thoughts on what we need do do for security.

Async vs synchronous

In a synchronous system you send down an expression to be evaluated and sometime later it gives back an answer whereupon you can send down the next expression. A traditional lisp listener works this way. If you want to run a second expression before the first finishes, you have to start up a second lisp listener.

An asynchronous system lets you send down a second expression and have it worked on before the first finishes. The execution of the two expressions is not synchronized.

Synchronous is a lot easier to implement and use but it's somewhat inflexible. It turns out there's a trivial tasking system that appears asynchronous in the flexibility it gives you but is really synchronous underneath. See Frank's writeup for a description of how we're planning to handle this (based on an old writeup by Dave).

Protocol

What does the protocol look like? I'd suggest that syntactically it's just S-expressions (I bet you knew that was coming) but semantically it's a little different than just talking to a normal, human usable scheme listener. A couple of reasons: a) we need to be able to reliably separate error responses from real responses and b) if we want to allow for asynchronous operation we need extra stuff for that.

Locating a compute server

We might be able to get by with hardwiring this while we experiment but things don't get interesting until we have more compute servers than will work with hardwiring.

Extension Query

We need extensions to scheme for various things having to do with the distributed nature of this protocol. We'll also add extensions to give the servers various capabilites. We need a way of asking the server what extensions it has. Some of this information may be available to the compute server locator system too.

Demand Paging

We had this idea a while back that the server could give scheme expressions back to the client. It could do this to give back its answers as expressions that need to be evaluated (this helps if the answer is circular). We also thought about using this to demand load libraries that are missing from the server. I suspect that this requires async.

Jobs

The question here is should we have a job structure of some sort. This would give us a handle to ask about status of multiple computations in a single instance of a server. The handle is also useful for suspending or killing specific computations.

The alternative is we only have control at a per-server-instance granularity. If you want multiple jobs then you fire up a new instance of the server. This is supposed to be distributed computation after all, let's distribute it.

Why would we want multiple jobs in a single instance of a server? If the communications requirements between the pieces are such that they require or are much better done by shared memory. If we're going to download a lot of code that's used by both jobs and we only want to download the code once.

Debugging

What sort of cross the net debugging do we need? This could get involved.

Glossary

It seems like everything I get onvolved with starts with a new vocabulary. Here's our new words.

People

Here are the loons involved so far.

David Bridgham <dab@epilogue.com>

last updated: Thu Sep 8 10:19:43 1994 by Dave Bridgham