|Lev Manovich on Tue, 18 Aug 1998 23:12:59 -0700 (PDT)|
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
|Syndicate: A THEORY OF CULTURAL INTERFACES 2/3|
II. Cinema Printed word tradition which has initially dominated the language of cultural interfaces, is becoming less important, while the part played by cinematic elements is getting progressively stronger. This is consistent with a general trend in modern society towards presenting more and more information in the form of time-based audio-visual moving image sequences, rather than as text. As new generations of both computer users and computer designers are growing up in a media- rich environment dominated by television rather than by printed texts, it is not surprising that they favor cinematic language over the language of print. A hundred years after cinema's birth, cinematic ways of seeing the world, of structuring time, of narrating a story, of linking one experience to the next, are being extended to become the basic ways in which computer users access and interact with all cultural data. In this way, the computer fulfills the promise of cinema as a visual Esperanto which pre-occupied many film artists and critics in the 1920s, from Griffith to Vertov. Indeed, millions of computer users communicate with each other through the same computer interface. And, in contrast to cinema where most of its "users" were able to "understand" cinematic language but not "speak" it (i.e., make films), all computer users can "speak" the language of the interface. They are active users of the interface, employing it to perform many tasks: send email, run basic applications, organize files and so on. The original Esperanto never became truly popular. But cultural interfaces are widely used and are easily learned. We have a truly unprecedented situation in the history of cultural langauges: something which is designed by a rather small group of people is immediately adopted by millions of computer users. How is it possible that people around the world adopt today something which a 20- something programmer in Northern California has hacked together just the night before? Shall we conclude that we are somehow bilogically "wired" to the interface language, the way we are "wired," according to the original hypothesis of Noam Chomsky, to different natural languages? Interestingly, the speed with which the language of cultural interfaces is formulated in the end of the twentieth century is comparable to the speed with which cinematic language was formulated exactly a hundred years ago. In both cases, the ease with which the users "acquired" these languages was to a large extent due to the fact that these languages drew on previous and already well acquired cultural forms. In the case of cinema, it was theater, magic lantern shows and other nineteenth century forms of public entertainment. Cultural interfaces in their turn draw on older cultural forms such as the printed word and cinema. I have already discussed some ways in which the printed word tradition structures interface language; now it is cinema's turn. I will begin with probably the most important case of cinema's influence on cultural interfaces - the mobile camera. Originally developed as part of 3-D computer graphics technology for such applications as computer-aided design, flight simulators and computer movie making, during the 1980's and 1990's the camera model became as much of an interface convention as scrollable windows or cut and paste function. It became an accepted way for interacting with any data which is represented in three dimensions -- which, in a computer culture, means literally anything and everything: the results of a physical simulation, an architectural site, design of a new molecule, financial data, the structure of a computer network and so on. As computer culture is gradually spatializing all representations and experiences, they become subjected to the camera's particular grammar of data access. Zoom, tilt, pan and track: we now use these operations to interact with data spaces, models, objects and bodies. Abstracted from its historical temporary "imprisonment" within the physical body of a movie camera directed at physical reality, a virtualized camera also becomes an interface to all types of media beside 3-D space. As an example, consider GUI (Graphical User Interface) of the leading computer animation software -- PowerAnimator from Alias/Wavefront.  In this interface, each window, regardless of whether it displays a 3-D model, a graph or even plain text, contains Dolly, Track and Zoom buttons. In this way, the model of a virtual camera is extended to apply to navigation through any kind of information, not only the one which was spatialized. It is particularly important that the user is expected to dolly and pan over text as though it is a 3-D scene. Cinematic vision triumphed over the print tradition, with the camera subsuming the page. The Guttenberg galaxy turned out to be just a subset of the Lumieres' universe. Another feature of cinematic perception which persists in cultural interfaces is a rectangular framing of represented reality. Cinema itself inherited this framing from Western painting. Since the Renaissance, the frame acted as a window into a larger space assumed to extend beyond the frame. This space was cut by the frame's rectangle into two parts: "onscreen space," the part which is inside the frame, and the part which is outside. In the famous formulation of Leon- Battista Alberti, the frame acted as a window onto the world. Or, in a more recent formulation of Jacques Aumont and his co-authors, "The onscreen space is habitually perceived as included within a more vast scenographic space. Even though the onscreen space is the only visible part, this larger scenographic part is nonetheless considered to exist around it."  Just as a rectangular frame of painting and photography presents a part of a larger space outside it, a window in HCI presents a partial view of a larger document. But if in painting (and later in photography), the framing chosen by an artist was final, computer interface benefits from a new invention introduced by cinema: the mobility of the frame. As a kino-eye moves around the space revealing its different regions, so can a computer user scroll through a window's contents. It is not surprising to see that screen-based interactive 3-D environments, such as VRML words, also use cinema's rectangular framing since they rely on other elements of cinematic vision, specifically a mobile virtual camera. It may be more surprising to realize that Virtual Reality (VR) interface, often promoted as the most "natural" interface of all, utilizes the same framing.  As in cinema, the world presented to a VR user is cut by a rectangular frame. As in cinema, this frame presents a partial view of a larger space.  As in cinema, the virtual camera moves around to reveal different parts of this space. Of course, the camera is now controlled by the user and in fact is identified with his/her own sight. Yet, it is crucial that in VR one is seeing the virtual world through a rectangular frame, and that this frame always presents only a part of a larger whole. This frame creates a distinct subjective experience which is much more close to cinematic perception than to unmediated sight. Interactive virtual worlds, whether accessed through a screen- based or a VR interface, are often discussed as the logical successor to cinema, as potentially the key cultural form of the twenty-first century, just as cinema was the key cultural form of the twentieth century. These discussions usually focus on the issues of interaction and narrative. So, the typical scenario for twenty-first century cinema involves a user represented as an avatar existing literally "inside" the narrative space, rendered with photorealistic 3-D computer graphics, interacting with virtual characters and perhaps other users, and affecting the course of narrative events. It is an open question whether this and similar scenarios commonly invoked in new media discussions of the 1990's, indeed represent an extension of cinema or if they rather should be thought of as a continuation of some theatrical traditions, such as improvisational or avant-garde theater. But what undoubtedly can be observed in the 1990's is how virtual technology's dependence on cinema's mode of seeing and language is becoming progressively stronger. This coincides with the move from proprietary and expensive VR systems to more widely available and standardized technologies, such as VRML (Virtual Reality Modeling Language).  The creator of a VRML world can define a number of viewpoints which are loaded with the world.  These viewpoints automatically appear in a special menu in a VRML browser which allows the user to step through them, one by one. Just as in cinema, ontology is coupled with epistemology: the world is designed to be viewed from particular points of view. The designer of a virtual world is thus a cinematographer as well as an architect. The user can wander around the world or she can save time by assuming the familiar position of a cinema viewer for whom the cinematographer has already chosen the best viewpoints. Equally interesting is another option which controls how a VRML browser moves from one viewpoint to the next. By default, the virtual camera smoothly travels through space from the current viewpoint to the next as though on a dolly, its movement automatically calculated by the software. Selecting the "jump cuts" option makes it cut from one view to the next. Both modes are obviously derived from cinema. Both are more efficient than trying to explore the world on its own. With a VRML interface, nature is firmly subsumed under culture. The eye is subordinated to the kino-eye. The body is subordinated to a virtual body of a virtual camera. While the user can investigate the world on her own, freely selecting trajectories and viewpoints, the interface privileges cinematic perception -- cuts, pre- computed dolly-like smooth motions of a virtual camera, and pre- selected viewpoints. The area of computer culture where cinematic interface is being transformed into a cultural interface most aggressively is computer games. By the 1990's, game designers have moved from two to three dimensions and have begun to incorporate cinematic language in a increasingly systematic fashion. Games started featuring lavish opening cinematic sequences (called in the game business "cinematics") to set the mood, establish the setting and introduce the narrative. Frequently, the whole game would be structured as an oscillation between interactive fragments requiring user's input and non-interactive cinematic sequences, i.e. "cinematics".  As the decade progressed, game designers were creating increasingly complex -- and increasingly cinematic -- interactive virtual worlds. Regardless of a game's genre -- action/adventure, fighting, flight simulator, first- person action, racing or simulation -- they came to rely on cinematography techniques borrowed from traditional cinema, including the expressive use of camera angles and depth of field, and dramatic lighting of 3-D sets to create mood and atmosphere. In the beginning of the decade, games used digital video of actors superimposed over 2-D or 3-D backgrounds, but by its end they switched to fully synthetic characters.  This switch also made virtual words more cinematic, as the characters could be better visually integrated with their environments.  A particularly important example of how computer games use -- and extend -- cinematic language, is their implementation of a dynamic point of view. In driving and flying simulators and in combat games, such as Tekken 2 (Namco, 1994 -), after a certain event takes place (car crashes, a fighter being knocked down), it is automatically replayed from a different point of view. Other games such as the Doom series (Id Software, 1993 -) and Dungeon Keeper (Bullfrog Productions, 1997) allow the user to switch between the point of view of the hero and a top down "bird's eye" view. Finally, Nintendo went even further by dedicating four buttons on their N64 joypad to controlling the view of the action. While playing Nintendo games such as Super Mario 64 (Nintendo, 1996) the user can continuously adjust the position of the camera. Some Sony Playstation games such as Tomb Rider (Eidos, 1996) also use the buttons on the Playstation joypad for changing point of view. The incorporation of virtual camera controls into the very hardware of a game consoles is truly a historical event. Directing the virtual camera becomes as important as controlling the hero's actions. This is admitted by the game industry itself. For instance, a package for Dungeon Keeper lists four key features of the game, out of which the first two concern control over the camera: "switch your perspective," "rotate your view," "take on your friend," "unveil hidden levels." In games such as this one, cinematic perception functions as the subject in its own right.  Here, the computer games are returning to "The New Vision" movement of the 1920s (Moholy-Nagy, Rodchenko, Vertov and others), which foregrounded new mobility of a photo and film camera, and made unconventional points of view the key part of their poetics. The fact that computer games continue to encode, step by step, the grammar of a kino-eye in software and in hardware is not an accident. This encoding is consistent with the overall trajectory driving the computerization of culture since the 1940's, that being the automation of all cultural operations. This automation gradually moves from basic to more complex operations: from image processing and spell checking to software-generated characters, 3-D worlds, and Web Sites. The side effect of this automation is that once particular cultural codes are implemented in low-level software and hardware, they are no longer seen as choices but as unquestionable defaults. To take the automation of imaging as an example, in the early 1960's the newly emerging field of computer graphics incorporated a linear one- point perspective in 3-D software, and later directly in hardware.  As a result, linear perspective became the default mode of vision in digital culture, be it computer animation, computer games, visualization or VRML worlds. Now we are witnessing the next stage of this process: the translation of cinematic grammar of points of view into software and hardware. As Hollywood cinematography is translated into algorithms and computer chips, its convention becomes the default method of interacting with any data subjected to spatialization, with a narrative, and with other human beings. (At SIGGRAPH '97 in Los Angeles, one of the presenters called for the incorporation of Hollywood-style editing in multi-user virtual worlds software. In such implementation, user interaction with other avatar(s) will be automatically rendered using classical Hollywood conventions for filming dialog. ) Element by element, cinema is being poured into a computer: first one-point linear perspective; next the mobile camera and a rectangular window; next cinematography and editing conventions, and, of course, digital personas also based on acting conventions borrowed from cinema, to be followed by make-up, set design, and, of course, the narrative structures themselves. From one cultural language among others, cinema is becoming the cultural interface, a toolbox for all cultural communication, overtaking the printed word. But, in one sense, all computer software already has been based on a particular cinematic logic. Consider the key feature shared by all modern human-computer interfaces - overlapping windows.  All modern interfaces display information in overlapping and resizable windows arranged in a stack, similar to a pile of papers on a desk. As a result, the computer screen can present the user with practically an unlimited amount of information despite its limited surface. Overlapping windows of HCI can be understood as a synthesis of two basic techniques of twentieth-century cinema: temporal montage and montage within a shot. In temporal montage, different images follow each other in time, while in montage within the shot, these images co-exist within the screen. The first technique defines the cinematic language as we know it; the second is used more rarely. Examples of this technique are vignettes within a screen employed in early cinema to show an interlocutor of a telephone conversation; superimpositions of a few images and multiple screens used by the avant-garde filmmakers; and the use of deep focus and a particular compositional strategy (for instance, a character looking through a window, such as in Citizen Kane, Ivan the Terrible and Rear Window) to juxtapose close and far away scenes.  As testified by its popularity, temporal montage works. However, it is not a very efficient method of communication: the display of each additional piece of information takes time to watch, thus slowing communication. It is not accidental that the European avant-garde of the 1920's inspired by the engineering ideal of efficiency, experiments with various alternatives, trying to load the screen with as much information at one time as possible.  In his 1927 Napoleon Abel Gance uses a multiscreen system which shows three images side by side. Two years later, in A Man with a Movie Camera (1929) we watch Dziga Vertov speeding up the temporal montage of individual shots, more and more, until he seems to realize: why not simply superimpose them in one frame? Vertov overlaps the shots together, achieving temporal efficiency -- but he also pushes the limits of a viewer's cognitive capacities. His superimposed images are hard to read -- information becomes noise. Here cinema reaches one of its limits imposed on it by human psychology; from that moment on, cinema retreats, relying on temporal montage or deep focus, and reserving superimpositions for infrequent cross-dissolves. In window interface, the two opposites -- temporal montage and montage within the shot -- finally come together. The user is confronted with a montage within the shot -- a number of windows present at once, each window opening up into its own reality. This, however, does not lead to the cognitive confusion of Vertov's superimpositions because the windows are opaque rather than transparent, so the user is only dealing with one of them at a time. In the process of working with a computer, the user repeatedly switches from one window to another, i.e. the user herself becomes the editor accomplishing montage between different shots. In this way, window interface synthesizes two different techniques of presenting information within a rectangular screen developed by cinema. This last example shows once again the extent to which human- computer interfaces -- and, the cultural interfaces which follow them -- are cinematic, inheriting cinema's particular ways of organizing perception, attention and memory. Yet it also demonstrates the cognitive distance between cinema and the computer age. For the viewers of the 1920's, the temporal replacement of one image by another, as well as superimposition of two images together were an aesthetic and perceptual shock, a truly modern and unfamiliar experience -- as testified, for instance, by Walter Benjamin's description of cinema in his Artwork essay.  Film directors were able to use montage to create meaning, because the cut from one image to another was a meaningful, even traumatic (if we are to believe Benjamin) event. At the end of the century, however, anaesthetized first by cinema and then by television channel flipping, we feel at home with a number of overlapping windows on a computer screen. We switch back and forth between different applications, processes, tasks. Not only are we no longer shocked, but in fact we feel angry when a computer occasionally crashes because we opened too many windows at once. Cinema, the major cultural form of the twentieth century, has found a new life as the toolbox of a computer user. What was an individual artistic vision -- of Griffith, Eisenstein, Gance, Vertov -- has become a way of work and a way of life for millions in the computer age. Cinema's aesthetic strategies have become basic organizational principles of computer software. The window in a fictional world of a cinematic narrative has become a window in a datascape. In short, what was cinema has become human-computer interface. I will conclude this section by discussing a few artistic projects which, in different ways, offer alternatives to this trajectory. To summarize it once again, the trajectory involves gradual translation of elements and techniques of cinematic perception and language into a decontextualized set of tools to be used as an interface to any data. In the process of this translation, cinematic perception is divorced from its original material embodiment (camera, film stock), as well as from the historical contexts of its formation. If in cinema the camera functioned as a material object, co-existing, spatially and temporally, with the world it was showing us, it has now become a set of abstract operations. The art projects described below refuse this separation of cinematic vision from the material world. They reunite perception and material reality by making the camera and what it records a part of a virtual world's ontology. They also refuse the universalization of cinematic vision by computer culture, which (just as post-modern visual culture in general) treats cinema as a toolbox, a set of "filters" which can be used to process any input. In contrast, each of these projects employs a unique cinematic strategy which has a specific relation to the particular virtual world it reveals to the user. In The Invisible Shape of Things Past Joachim Sauter and Dirk LÃ?Â¼senbrink of the Berlin-based Art+Com collective created a truly innovative cultural interface for accessing historical data about Berlin's history.  The interface de-virtualizes cinema, so to speak, by placing the records of cinematic vision back into their historical and material context. As the user navigates through a 3-D model of Berlin, he or she comes across elongated shapes lying on city streets. These shapes, which the authors call "filmobjects", correspond to documentary footage recorded at the corresponding points in the city. To create each shape the original footage is digitized and the frames are stacked one after another in depth, with the original camera parameters determining the exact shape. The user can view the footage by clicking on the first frame. As the frames are displayed one after another, the shape is getting correspondingly thinner. In following with the already noted general trend of computer culture towards spatialization of every cultural experience, this cultural interface spatializes time, representing it as a shape in a 3-D space. This shape can be thought of as a book, with individual frames stacked one after another as book pages. The trajectory through time and space taken by a camera becomes a book to be read, page by page. The records of camera's vision become material objects, sharing the space with the material reality which gave rise to this vision. Cinema is solidified. This project, than, can be also understood as a virtual monument to cinema. The (virtual) shapes situated around the (virtual) city, remind us about the era when cinema was the defining form of cultural expression -- as opposed to a toolbox for data retrieval and use, as it is becoming today in a computer. Hungarian-born artist TamÃ?Â¡s Waliczky openly refuses the default mode of vision imposed by computer software, that of the one- point linear perspective. Each of his computer animated films The Garden (1992), The Forest (1993) and The Way (1994) utilizes a particular perspectival system: a water-drop perspective in The Garden, a cylindrical perspective in The Forest and a reverse perspective in The Way. Working with computer programmers, the artist created custom- made 3-D software to implement these perspectival systems. Each of the systems has an inherent relationship to the subject of a film in which it is used. In The Garden, its subject is the perspective of a small child, for whom the world does not yet have an objective existence. In The Forest, the mental trauma of emigration is transformed into the endless roaming of a camera through the forest which is actually just a set of transparent cylinders. Finally, in the The Way, the self-sufficiency and isolation of a Western subject from his/her environment are conveyed by the use of a reverse perspective. In Waliczky's films the camera and the world are made into a single whole, whereas in The Invisible Shape of Things Past the records of the camera are placed back into the world. Rather than simply subjecting his virtual worlds to different types of perspectival projection, Waliczky modified the spatial structure of the worlds themselves. In The Garden, a child playing in a garden becomes the center of the world; as he moves around, the actual geometry of all the objects around him is transformed, with objects getting bigger as he gets close to him. To create The Forest, a number of cylinders were placed inside each other, each cylinder mapped with a picture of a tree, repeated a number of times. In the film, we see a camera moving through this endless static forest in a complex spatial trajectory -- but this is an illusion. In reality, the camera does move, but the architecture of the world is constantly changing as well, because each cylinder is rotating at its own speed. As a result, the world and its perception are fused together.