Our work on making parallel and distributed programs easier to write began with the Prelude system, and is currently represented by the Autopilot project (a collaboration with Prof. Kaashoek's Parallel and Distributed Operating System group). Autopilot combines an easy-to-use shared-memory programming model with an efficient runtime system based on a scalable distributed-memory message-passing architecture. Autopilot simplifies the programmer's job by managing locality and the decomposition of work into parallel tasks.
Our work on efficient runtime system mechanisms includes work on scheduling and resource management (register relocation and lottery scheduling) designed to reduce the costs of managing parallel tasks and also to permit multiple tasks to share computing resources fairly, and on efficient communication mechanisms designed to provide very low-overhead communication with predictable performance, including both point-to-point messages and global communication patterns.
To bridge the gap between parallel and distributed systems, we are collaborating with other groups in LCS on the Exokernel and Fugu projects. A principal goal of these projects is to provide low-overhead protected communication, thus enabling a convergence of architectures between parallel and distributed systems. We are also developing resource management mechanisms (gang scheduling and global load distribution) that permit a range of applications, including parallel supercomputing, sequential, and distributed jobs, to share the same underlying computing resources.
Usage of the PSG WWW server is kept track of through usage statistics.
carl@lcs.mit.edu