Lev Manovich 1 Oca 2000

Principles of New Media (1)

The identity of media has changed even more dramatically. Below I summarize some of the key differences between old and new media.

In compiling this list of differences I tried to arrange them in a logical order. That is, the principles 3-5 are dependent on the principles 1-2. This is not dissimilar to axiomatic logic where certain axioms are taken as staring points and further theorems are proved on their basis.

Not every new media object obeys these principles. They should be considered not as some absolute laws but rather as general tendencies of a culture undergoing computerization. As the computerization affects deeper and deeper layers of culture, these tendencies will manifest themselves more and more.

 

1.Numerical Representation

All new media objects, whether they are created from scratch on computers or converted from analog media sources, are composed of digital code; they are numerical representations. This has two key consequences:

1.1. New media object can be described formally (mathematically). For instance, an image or a shape can be described using a mathematical function.

1.2. New media object is a subject to algorithmic manipulation. For instance, by applying appropriate algorithms, we can automatically remove noise from a photograph, improve its contrast, locate the edges of the shapes, or change its proportions. In short, media becomes programmable.

When new media objects are created on computers, they originate in numerical form. But many new media objects are converted from various forms of old media. Although most readers understand the difference between analog and digital media, few notes should be added on the terminology and the conversion process itself. This process assumes that data is originally continuos, i.e. 'the axis or dimension that is measured has no apparent indivisible unit from which it is composed.' Converting continuos data into a numerical representation is called digitization. Digitization consists from two steps: sampling and quantization. First, data is sampled, most often at regular intervals, such as the grid of pixels used to represent a digital image. Technically, a sample is defined as 'a measurement made at a particular instant in space and time, according to a specified procedure.' The frequency of sampling is referred to as resolution. Sampling turns continuos data into discrete data. This is data occurring in distinct units: people, pages of a book, pixels. Second, each sample is quantified, i.e. assigned a numerical vale drawn from a defined range (such as 0-255 in the case of a 8-bit greyscale image).2

While some old media such as photography and sculpture is truly continuos, most involve the combination of continuos and discrete coding. One example is motion picture film: each frame is a continuos photograph, but time is broken into a number of samples (frames). Video goes one step further by sampling the frame along the vertical dimension (scan lines). Similarly, a photograph printed using a halftone process combine discrete and continuos representations. Such photograph consist from a number of orderly dots (i.e., samples), however the diameters and areas of dots vary continuously.

As the last example demonstrates, while old media contains level(s) of discrete representation, the samples were never quantified. This quantification of samples is the crucial step accomplished by digitization. But why, we may ask, modern media technologies were often in part discrete? The key assumption of modern semiotics is that communication requires discrete units. Without discrete units, there is no language. As Roland Barthes has put it, "language is, as it were, that which divides reality (for instance the continuos spectrum of the colors is verbally reduced to a series of discontinuous terms). In postulating this, semioticians took human language as a prototypical example of a communication system. A human language is discrete on most scales: we speak in sentences; a sentence is made from words; a word consists from morphemes, and so on. If we are to follow the assumption that any form of communication requires discrete representation, we may expect that media used in cultural communication will have discrete levels. At first this explanation seems to work. Indeed, a film samples continuos time of human existence into discrete frames; a drawing samples visible reality into discrete lines; and a printed photograph samples it into discrete dots. This assumption does not universally work, however: photographs, for instance, do not have any apparent units. (Indeed, in the 1970s semiotics was criticized for its linguistic bias, and most semioticians came to recognize that language-based model of distinct units of meaning can't be applied to many kinds of cultural communication.) More importantly, the discrete units of modern media are usually not the units of meanings, the way morphemes are. Neither film frames not the halftone dots have any relation to how film or a photographs affect the viewer (except in modern art and avant-garde film - think of paintings by Roy Lichtenstein and films of Paul Sharits - which often make the material units of media into the units of meaning.)

The more likely reason why modern media has discrete levels is because it emerges during Industrial Revolution. In the nineteenth century, a new organization of production known as factory system gradually replaced artisan labor. It reached its classical form when Henry Ford installed first assembly line in his factory in 1913. The assembly line relied on two principles. The first was standardization of parts, already employed in the production of military uniforms in the nineteenth century. The second, never principle, was the separation of the production process into a set of repetitive, sequential, and simple activities that could be executed by workers who did not have to master the entire process and could be easily replaced.

Not surprisingly, modern media follows the factory logic, not only in terms of division of labor as witnessed in Hollywood film studios, animation studios or television production, but also on the level of its material organization. The invention of typesetting machines in the 1880s industrialized publishing while leading to standardization of both type design and a number and types of fonts used. In the 1890s cinema combined automatically produced images (via photography) with a mechanical projector. This required standardization of both image dimensions (size, frame ratio, contrast) and of sampling rate of time (see Digital Cinema section for more detail). Even earlier, in the 1880s, first television systems already involved standardization of sampling both in time and in space. These modern media systems also followed the factory logic in that once a new model (a film, a photograph, an audio recording) was introduced, numerous identical media copies would be produced from this master. As I will show below, new media follows, or actually, runs ahead of a quite a different logic of post-industrial society - that of individual customization, rather that of mass standardization.

2. Modularity

This principle can be called 'fractal structure of new media.' Just as a fractal has the same structure on different scales, a new media object has the same modular structure throughout. Media elements, be it images, sounds, shapes, or behaviors, are represented as collections of discrete samples (pixels, polygons, voxels, characters, scripts). These elements are assembled into larger-scale objects but they continue to maintain their separate identity. The objects themselves can be combined into even larger objects -- again, without losing their independence. For example, a multimedia movie authored in popular Macromedia Director software may consist from hundreds of still images, QuickTime movies, and sounds which are all stored separately and are loaded at run time. Because all elements are stored independently, they can be modified at any time without having to change Director movie itself. These movies can be assembled into a larger movie, and so on. Another example of modularity is the concept of object used in Microsoft Office applications. When an object is inserted into a document (for instance, a media clip inserted into a Word document), it continues to maintain its independence and can always be edited with the program used originally to create it. Yet another example of modularity is the structure of a HTML document: with the exemption of text, it consists from a number of separate objects - GIF and JPEG images, media clips, VRML scenes, Shockwave and Flash movies -- which are all stored independently locally and/or on a network. In short, a new media object consists from independent parts which, in their turn, consist from smaller independent parts, and so on, up to the level of smallest atoms such as pixels, 3D points or characters.

World Wide Web as a whole is also completely modular. It consists from numerous Web pages, each in its turn consisting from separate media elements. Every element can be always accessed on its own. Normally we think of elements as belonging to their corresponding Web sites, but this just a convention, reinforced by commercial Web browsers. Netomat browser which extract elements of a particular media type from different Web pages (for instance, only images) and display them together without identifying the Web sites they come from, highlights for us this fundamentally discrete and non-hierarchical organization of the Web (see introduction to Interface chapter for more on this browser.)

In addition to using the metaphor of a fractal, we can also make an analogy between modularity of new media and the structured computer programming. Structural computer programming which became standard in the 1970s involves writing small and self-sufficient modules (called in different computer languages subroutines, functions, procedures, scripts) which are assembled into larger programs. Many new media objects are in fact computer programs which follow structural programming style. For example, most interactive multimedia applications are programs written in Macromedia Director's Lingo. A Lingo program defines scripts which control various repeated actions, such as clicking on a button; these scripts are assembled into larger scripts. In the case of new media objects which are not computer programs, an analogy with structural programming still can be made because their parts can be accessed, modified or substituted without affecting the overall structure of an object. This analogy, however, has its limits. If a particular module of a computer program is deleted, the program would not run. In contrast, just as it is the case with traditional media, deleting parts of a new media object does not render its meaningless. In fact, the modular structure of new media makes such deletion and substitution of parts particularly easy. For example, since a HTML document consists from a number of separate objects each represented by a line of HTML code, it is very easy to delete, substitute or add new objects. Similarly, since in Photoshop the parts a digital image are usually placed on separate layers, these parts can be deleted and substituted with a click of a button.

3. Automation

Numerical coding of media (principle 1) and modular structure of a media object (principle 2) allow to automate many operations involved in media creation, manipulation and access. Thus human intentionally can be removed from the creative process, at least in part.

The following are some of the examples of what can be called low-level automation of media creation, in which the computer user modifies or creates from scratch a media object using templates or simple algorithms. These techniques are robust enough so that they are included in most commercial software for image editing, 3D graphics, word processing, graphic layout, and so on. Image editing programs such as Photoshop can automatically correct scanned images, improving contrast range and removing noise. They also come with filters which can automatically modify an image, from creating simple variations of color to changing the whole image as though it was painted by Van Gog, Seurat or other brand-name artist. Other computer programs can automatically generate 3D objects such as trees, landscapes, human figures and detailed ready-to-use animations of complex natural phenomena such as fire and waterfalls. In Hollywood films, flocks of birds, ant colonies and crowds of people are automatically created by AL (artificial life) software. Word processing, page layout, presentation and Web creation programs come with agents which can automatically create the layout of a document. Writing software helps the user to create literary narratives using formalized highly conventions genre convention. Finally, in what maybe the most familiar experience of automation of media generation to most computer users, many Web sites automatically generate Web pages on the fly when the user reaches the site. They assemble the information from the databases and format it using generic templates and scripts.

The researchers are also working on what can be called high-level automation of media creation which requires a computer to understand, to a certain degree, the meanings embedded in the objects being generated, i.e. their semantics. This research can be seen as a part of a larger initiative of artificial intelligence (AI). As it is well known, AI project achieved only very limited success since its beginnings in the 1950s. Correspondingly, work on media generation which requires understanding of semantics is also in the research stage and is rarely included in commercial software. Beginning in the 1970s, computers were often used to generate poetry and fiction. In the 1990s, the users of Internet chat rooms became familiar with bots - the computer programs which simulate human conversation. The researchers at New York University showed a virtual theater composed of a few virtual actors which adjust their behavior in real-time in response to user's actions. The MIT Media Lab developed a number of different projects devoted to high-level automation of media creation and use: a smart camera which can automatically follow the action and frame the shots given a script; ALIVE, a virtual environment where the user interacted with animated characters; a new kind of human-computer interface where the computer presents itself to a user as an animated talking character. The character, generated by a computer in real-time, communicates with user using natural language; it also tries to guess user's emotional state and to adjust the style of interaction accordingly.

The area of new media where the average computer user encountered AI in the 1990s was not, however, human-computer interface, but computer games. Almost every commercial game includes a component called AI engine. It stands for part of the game's computer code which controls its characters: car drivers in a car race simulation, the enemy forces in a strategy game such as Command and Conquer, the single enemies which keep attacking the user in first-person shooters such as Quake. AI engines use a variety of approaches to simulate human intelligence, from rule-based systems to neural networks. Like AI expert systems, these characters have expertise in some well-defined but narrow area such as attacking the user. But because computer games are highly codified and rule-based, these characters function very effectively. That is, they effectively respond to whatever few things the user are allowed to ask them to do: run forward, shoot, pick up an object. They can't do anything else, but then the game does not provide the opportunity for the user to test this. For instance, in a martial arts fighting game, I can't ask questions of my opponent, nor do I expect him or her to start a conversation with me. All I can do is to 'attack' my opponent by pressing a few buttons; and within this highly codified situation the computer can 'fight' me back very effectively. In short, computer characters can display intelligence and skills only because the programs put severe limits on our possible interactions with them. Put differently, the computers can pretend to be intelligent only by tricking us into using a very small part of who we are when we communicate with them. So, to use another example, at 1997 SIGGRAPH convention I was playing against both human and computer-controlled characters in a VR simulation of some non-existent sport game. All my opponents appeared as simple blobs covering a few pixels of my VR display; at this resolution, it made absolutely no difference who was human and who was not.

Along with low-level and high-level automation of media creation, another area of media use which is being subjected to increasing automation is media access. The switch to computers as means to store and access enormous amount of media material, exemplified by the by media assets stored in the databases of stock agencies and global entertainment conglomerates, as well as by the public media assets distributed across numerous Web sites, created the need to find more efficient ways to classify and search media objects. Word processors and other text management software for a long time provided the abilities to search for specific strings of text and automatically index documents. UNIX operating system also always included powerful commands to search and filter text files. In the 1990s software designers started to provide media users with similar abilities. Virage introduced Virage VIR Image Engine which allows to search for visually similar image content among millions of images as well as a set of video search tools to allow indexing and searching video files. By the end of the 1990s, the key Web search engines already included the options to search the Internet by specific media such as images, video and audio.

The Internet, which can be thought of as one huge distributed media database, also crystallized the basic condition of the new information society: over-abundance of information of all kind. One response was the popular idea of software agents designed to automate searching for relevant information. Some agents act as filters which deliver small amounts of information given user's criteria. Others are allowing users to tap into the expertise of other users, following their selections and choices. For example, MIT Software Agents Group developed such agents as BUZZwatch which 'distills and tracks trends, themes, and topics within collections of texts across time' such as Internet discussions and Web pages; Letizia, 'a user interface agent that assists a user browsing the World Wide Web by… scouting ahead from the user's current position to find Web pages of possible interest'; and Footprints which 'uses information left by other people to help you find your way around.'

By the end of the twentieth century, the problem became no longer how to create a new media object such as an image; the new problem was how to find the object which already exists somewhere. That is, if you want a particular image, chances are it is already exists -- but it may be easier to create one from scratch when to find the existing one. Beginning in the nineteenth century, modern society developed technologies which automated media creation: a photo camera, a film camera, a tape recorder, a video recorder, etc. These technologies allowed us, over the course of one hundred and fifty years, to accumulate an unprecedented amount of media materials: photo archives, film libraries, audio archives…This led to the next stage in media evolution: the need for new technologies to store, organize and efficiently access these media materials. These new technologies are all computer-based: media databases; hypermedia and other ways of organizing media material such the hierarchical file system itself; text management software; programs for content-based search and retrieval. Thus automation of media access is the next logical stage of the process which was already put into motion when a first photograph was taken. The emergence of new media coincides with this second stage of a media society, now concerned as much with accessing and re-using existing media as with creating new one. (See Database section for more on databases).

See for continuation of this text: Principles of New Media (2)