Direct manipulation and object orientation

The main interaction style for updating the common ground in a typical graphical human-computer interface is direct manipulation: through physical actions the user manipulates graphical objects on the screen. The impact on the object is immediately visible, as the user receives continuous feedback. Well implemented direct manipulation interaction tends to give the user a sense of control. This subsection is dedicated entirely to object orientation and direct manipulation. Next to benefits, I present some serious problems of direct manipulation interfaces. Also, I state the case for an action-object type of interaction. Finally, a cognitive account of direct manipulation is presented.

The text is mainly based on the work of Hutchins et. al. [1986], Jacob [1989], Laurel [1993], and Shneiderman [1992] who came up with the term.

Object orientation in the human-computer interface

Almost inevitably, direct manipulation interfaces are object oriented in nature. Object orientation in the human-computer interface allows the illustration and use of real-world objects. And indeed, the objects on the screen should be modeled according to real-world entities and experiences (i.e., real-world metaphors). After all, humans must fully comprehend the real-world object (or at least the mental model of it) before they can carry that knowledge across to the manipulation of the corresponding objects on the screen (Laurel). An interface should be in line with a human's world.

Hellman [1992] illustrates this point by comparing the representation of activities in the UNIX environment to the Macintosh. In the UNIX environment, the activity is somewhat 'clumsy' in two ways: first, as often stated, the commands themselves do not communicate, and second, focusing and operating on objects and activities take place at a very syntax-oriented level where differences between these two are difficult to perceive.

According to Hellman, this is in conflict with the natural way of perceiving environmental entities: one usually focuses on the object separately from planning and performing the activity. In the Macintosh environment, the object is always focused on in a comprehensible manner - by activating it. Also at a particular moment, non-performable activities are marked (they are greyed out). Briefly stated, the user of an object oriented, visually realised interface seldom needs, wants or gets the idea of doing things in a remarkably different manner. This is because of the style of representation: sense-making is inherently easy and acting is fluent.

Thus, objects allow humans to think in the familiar terms of the application domain rather than those of the medium of computation. As you will see, these are essential characteristics for direct manipulation.

Direct manipulation defined

For a characterisation of direct manipulation interfaces, I stick to the terminology that is proposed by Shneiderman. These are the main principles describing direct manipulation:

Using these three principles, it is possible to design systems which have beneficial attributes that are examined in the following paragraph.

Benefits of direct manipulation

Within this paragraph, a number of benefits of direct manipulation are given. As you can see, these fit the five usability attributes nicely:

However, there appear to be some concerns with regard to direct manipulation.

Problems with direct manipulation

Not only do problems exist with regard to the technical implementation of direct manipulation interfaces (i.e., the amount of resources required), also some concerns with regard to the underlying ideas are mentioned in the literature:

While there are numerous disadvantages, they are in essence technical problems, which are likely to yield to future research, particularly since direct manipulation interfaces are still relatively new. According to Jacob, the advantages are more fundamental, rooted in human's psychological characteristics, less likely to change, and thus decisive.

The case for an action-object type of interaction

Barfield [1993], distinguishes two possible ways to organise the order of selection of the action and the object. In the first manner, the object-action type, the user first selects the object to perform an action on and then he selects the action. Alternatively, in the action-object type, the user first selects the action and then applies this action to the required objects. With the object-action type selection the user issues a command to the computer and the computer is doing the actual work. This way, the computer is seen as a tool.

In the alternative selection style the user himself performs the action and he experiences a greater sense of control, a feeling of directness. According to Hutchins et. al., there is a feeling of involvement directly with a world of objects rather than of communicating with an intermediary. The interactions are much like interacting with objects in the physical world. Actions apply to objects, observations are made directly upon those objects, and the interface and the computer become invisible (i.e., the principle of transparency mentioned by Shneiderman). The user has a different perception: instead of a tool, the computer is now seen as a medium, able to represent tools. He feels he has true controls over the objects in the task domain.

The following figure has been adapted from Cox and Walker [1993]. This example shows how modal dialogues are often used to implement the action-object model of interaction whereas modeless dialogues in windowing systems are often used to establish an object-action style of interaction. The designers of direct manipulation interfaces such as the Macintosh often choose the object-action type of selection because the action-object type leads to mode problems. For example, if you are in 'modify mode' (option 2 in the modal dialogue), you must go out, select a new mode, and select the record again before you can add or delete. This is very inflexible.

Figure: The difference between object-action and action-object interaction

The example features dialogue-style interaction; the act of selection is inherently part of a dialogue. With action-object type selection you would still use a menu but rather than selecting an action that the computer carries out, you would select the possibility of performing an action yourself. The menu has turned into a toolbox. Menus for action-object and menus for action-object type selection are in fact inherently different concepts. One is a menu for commands, the other is a toolbox. According to Barfield, confusion about their difference stems from the fact that they are very often presented in exactly the same way.

The distinction between these styles of interaction led Laurel [1993] to state her primary human-computer interface design principle: focus on designing the action. The design of objects, environments, and characters is all subsidiary to this central goal. Whenever possible, specification of activities should be transformed into direct manipulation interaction styles. This will eventually lead to more usable computer systems, though it will probably take some time before we stare at our virtual world without noticing the computer system through which we interact with it.

A cognitive account of direct manipulation

Hutchins et. al. examined what it is that provides the sensation of directness. They found two different ways in which human-computer interfaces are direct. One is direct engagement, the sense of manipulating objects directly on a screen, rather than conversing about them, which is discussed in the previous paragraph. It is the most visible characteristic of a direct manipulation interface.

The other form of directness is found in a reduction of cognitive distance, which is the distance between human's thoughts and the physical requirements of the system under use. For example, a short distance means that the translation is simple and straightforward. Thoughts are readily translated into the physical actions required by the system and the system output is in a form readily interpreted in terms of the goals of interest to the operator of the system (Hutchins et. al.). Clearly then, the interface must try to match the goals of the person.

The hope, expressed by Hutchins et. al., is that people will be able to spend their time learning the task domain, not learning the computer system.


Index TOC

Sjoerd Michels, Tilburg, The Netherlands.