Direct manipulation and object orientation

The main interaction style for updating the common ground in a typical graphical human-computer interface is direct manipulation: through physical actions the user manipulates graphical objects on the screen. The impact on the object is immediately visible, as the user receives continuous feedback. Well implemented direct manipulation interaction tends to give the user a sense of control. This subsection is dedicated entirely to object orientation and direct manipulation. Next to benefits, I present some serious problems of direct manipulation interfaces. Also, I state the case for an action-object type of interaction. Finally, a cognitive account of direct manipulation is presented.

The text is mainly based on the work of Hutchins et. al. [1986], Jacob [1989], Laurel [1993], and Shneiderman [1992] who came up with the term.

Object orientation in the human-computer interface

Almost inevitably, direct manipulation interfaces are object oriented in nature. Object orientation in the human-computer interface allows the illustration and use of real-world objects. And indeed, the objects on the screen should be modeled according to real-world entities and experiences (i.e., real-world metaphors). After all, humans must fully comprehend the real-world object (or at least the mental model of it) before they can carry that knowledge across to the manipulation of the corresponding objects on the screen (Laurel). An interface should be in line with a human's world.

Hellman [1992] illustrates this point by comparing the representation of activities in the UNIX environment to the Macintosh. In the UNIX environment, the activity is somewhat 'clumsy' in two ways: first, as often stated, the commands themselves do not communicate, and second, focusing and operating on objects and activities take place at a very syntax-oriented level where differences between these two are difficult to perceive.

According to Hellman, this is in conflict with the natural way of perceiving environmental entities: one usually focuses on the object separately from planning and performing the activity. In the Macintosh environment, the object is always focused on in a comprehensible manner - by activating it. Also at a particular moment, non-performable activities are marked (they are greyed out). Briefly stated, the user of an object oriented, visually realised interface seldom needs, wants or gets the idea of doing things in a remarkably different manner. This is because of the style of representation: sense-making is inherently easy and acting is fluent.

Thus, objects allow humans to think in the familiar terms of the application domain rather than those of the medium of computation. As you will see, these are essential characteristics for direct manipulation.

Direct manipulation defined

For a characterisation of direct manipulation interfaces, I stick to the terminology that is proposed by Shneiderman. These are the main principles describing direct manipulation:

Continuous representation of the objects and actions of interest. Laurel stresses this point by turning it into 'continuous respresentation of the potential for action'.
Physical actions or presses of labeled buttons instead of complex syntax.
Rapid incremental and reversible operations whose effect on the object of interest is immediately visible.

Using these three principles, it is possible to design systems which have beneficial attributes that are examined in the following paragraph.

Benefits of direct manipulation

Within this paragraph, a number of benefits of direct manipulation are given. As you can see, these fit the five usability attributes nicely:

Because the system lacks a complex syntax, novices can learn basic functionality quickly, usually through a demonstration by a more experienced user.
Experts can work rapidly to carry out a wide range of tasks, even defining new functions and features.
Knowledgeable intermittent users can retain operational concepts because no complex syntax has to be remembered.
Less syntax results in reduced error rates. Besides, errors are more preventable. As a result, error messages are rarely needed.
Users can immediately see if their actions are furthering their goals, and, if the actions are counterproductive, they can simply change the direction of their activity.
Users experience less anxiety because the system is comprehensible and because actions can be reversed so easily. This encourages exploration, which helps to comprehend the system and its shortcomings even further as the discussion on metaphors makes clear.
Users gain confidence and mastery because they are the initiators of action, they feel in control, and the system responses are predictable. This point is examined in more detail in upcoming paragraphs.

However, there appear to be some concerns with regard to direct manipulation.

Problems with direct manipulation

Not only do problems exist with regard to the technical implementation of direct manipulation interfaces (i.e., the amount of resources required), also some concerns with regard to the underlying ideas are mentioned in the literature:

In general, offering direct manipulation interfaces requires more system resources. Considering the minimal configuration we had to face this could be difficult. Spatial or visual representations are not neccesarily an improvement over text, because they may be too spread out, causing off-page connectors on paper or tedious scrolling on displays. Similarly, direct manipulation designs may consume valuable screen space and thus force valuable information offscreen, requiring scrolling or multiple actions (Shneiderman).
A repetitive operation is probably best done via a script or macro. According to Hutchins et. al., direct manipulation interfaces have difficulty handling variables, or distinguishing the depiction of an individual element from a representation of a set or class of elements. Cypher [1990] subscribes to this point.
Direct manipulation interfaces have problems with accuracy, for the notion of mimetic action puts the responsibility on the user to control the action with precision. However, that responsibility is often best handled through the intelligence of the system, and sometimes best communicated symbolically.
A more fundamental problem arises from the fact that direct manipulation requires substantial knowledge from the real world. If we restrict ourselves to only building interfaces that allow us to do things we can do already in ways we think already, we will miss the most exciting potential of new technology: to provide new ways to think and to interact with a domain. This is especially valid in the case of cooperative work. As Sørgaard [1988] states, shared material requires the construction of new design metaphors like hypertext technologies.
According to Hutchins et. al., it is important not to equate the direct sense of interaction with ease of use. Direct manipulation interfaces do not pretend to assist in overcoming problems that result from poor understanding of the task domain. Again, especially in computer supported cooperative work, which is an all new field, this aspect should not be underestimated.

While there are numerous disadvantages, they are in essence technical problems, which are likely to yield to future research, particularly since direct manipulation interfaces are still relatively new. According to Jacob, the advantages are more fundamental, rooted in human's psychological characteristics, less likely to change, and thus decisive.

The case for an action-object type of interaction

Barfield [1993], distinguishes two possible ways to organise the order of selection of the action and the object. In the first manner, the object-action type, the user first selects the object to perform an action on and then he selects the action. Alternatively, in the action-object type, the user first selects the action and then applies this action to the required objects. With the object-action type selection the user issues a command to the computer and the computer is doing the actual work. This way, the computer is seen as a tool.

In the alternative selection style the user himself performs the action and he experiences a greater sense of control, a feeling of directness. According to Hutchins et. al., there is a feeling of involvement directly with a world of objects rather than of communicating with an intermediary. The interactions are much like interacting with objects in the physical world. Actions apply to objects, observations are made directly upon those objects, and the interface and the computer become invisible (i.e., the principle of transparency mentioned by Shneiderman). The user has a different perception: instead of a tool, the computer is now seen as a medium, able to represent tools. He feels he has true controls over the objects in the task domain.

The following figure has been adapted from Cox and Walker [1993]. This example shows how modal dialogues are often used to implement the action-object model of interaction whereas modeless dialogues in windowing systems are often used to establish an object-action style of interaction. The designers of direct manipulation interfaces such as the Macintosh often choose the object-action type of selection because the action-object type leads to mode problems. For example, if you are in 'modify mode' (option 2 in the modal dialogue), you must go out, select a new mode, and select the record again before you can add or delete. This is very inflexible.

Figure: The difference between object-action and action-object interaction

The example features dialogue-style interaction; the act of selection is inherently part of a dialogue. With action-object type selection you would still use a menu but rather than selecting an action that the computer carries out, you would select the possibility of performing an action yourself. The menu has turned into a toolbox. Menus for action-object and menus for action-object type selection are in fact inherently different concepts. One is a menu for commands, the other is a toolbox. According to Barfield, confusion about their difference stems from the fact that they are very often presented in exactly the same way.

The distinction between these styles of interaction led Laurel [1993] to state her primary human-computer interface design principle: focus on designing the action. The design of objects, environments, and characters is all subsidiary to this central goal. Whenever possible, specification of activities should be transformed into direct manipulation interaction styles. This will eventually lead to more usable computer systems, though it will probably take some time before we stare at our virtual world without noticing the computer system through which we interact with it.

A cognitive account of direct manipulation

Hutchins et. al. examined what it is that provides the sensation of directness. They found two different ways in which human-computer interfaces are direct. One is direct engagement, the sense of manipulating objects directly on a screen, rather than conversing about them, which is discussed in the previous paragraph. It is the most visible characteristic of a direct manipulation interface.

The other form of directness is found in a reduction of cognitive distance, which is the distance between human's thoughts and the physical requirements of the system under use. For example, a short distance means that the translation is simple and straightforward. Thoughts are readily translated into the physical actions required by the system and the system output is in a form readily interpreted in terms of the goals of interest to the operator of the system (Hutchins et. al.). Clearly then, the interface must try to match the goals of the person.

The hope, expressed by Hutchins et. al., is that people will be able to spend their time learning the task domain, not learning the computer system.

Sjoerd Michels, Tilburg, The Netherlands.