IMG: Kerns on video

Charles Kerns - Digital Video and Communication

15 February 1996

(comments by SXW)

Charles described some experiences from his residence at the Apple Media Lab in the early 90's. Charles and colleagues (C.H-Woolsey, S. Gano, K. _ et al.), experimented with digital video in many forms and situations.

General remarks

Treating video as a"language" is highly problematic; there's no double-articulation, for example. Even in the simplest iconic systems, it proved impossible to disambiguate a video sign without annotation (out of video band).[Contrary claim by Marc Davis, who invented MediaStreams, an iconic video annotation system at the MIT Media Lab, and is now working on extensions of such work at Interval Research. - SXW]

But some interesting issues were raised by the video form, aspects of which were peculiar to the technology of the times. (eg, small size, slow delivery rate, unrealistic sound...)

Digital video brought into high relief the implied viewer, whereas film -- cinema that filled the perceptual field -- traditionally maintained the artifice of immersion.

Digital video always seen as an object, rather than a full world. [Xerox, though, unlike Apple, had people who disdained this view of video, and tried to design video as portals into (other) perceptual worlds.]

Barthes said "film is then, video is now." We played with designs that enhanced or weakened this sense of video not as memory but as vision.

Video is "flat" [an example of "opaque" media in the general case 1/96 - SXW] -- the digital representation encodes no semantics other than the raw audio-visual signal. There was no other layer of information like, speaker, point of view, even place or time.

We came up with a distinction between REPRESENTATION and DEPICTION. [Recall raw content / structure / rendering trichotomy mentioned in Jan. - SXW]

Examples

In one experiment, we had students exchange video messages via "baseball cards" -- small pieces of paper printed with barcodes that could be read back into a Mac to retrieve video segments from a videodisk. Students would dive into a video-base (videodisk in those days), and mark segments they found useful for narratives they were constructing for peers in informal or formal situations. Students even wrote up a newspaper/bulletin board in which video segments were sequenced via the indirection of barcodes. A kind of economy emerged from the exchange of the physical proxies for the video segments. Students traded "video." Some scripts [My term, in two senses. -SXW] were worth more than others. A kind of authority was invested -- the first student to discover/ mark a segment was informally associated with that video. (eg "let's put in so-and-so's video...") One interesting finding: no one seemed concerned at all with the ethics of decontextualizing video citations. If you wanted to make a narrative point, you could cut any fragment of video out that could be interpreted to fit. Even the teacher didn't seem to be much concerned about this issue.

Passed out paper descriptions of projects: Ross Report, Ross Bulletin Board, Clipster, Network Wrapper, ... [SEE handouts (to be scanned)]

There was an interesting one by Steve Gano and Christine ____, where videos were embedded in a (drawn) Victorian house's windows. Clicking on a scene seen through a window could either start a tableau in situ, or pull up an enlarged video clip.

Showed two videos: (1) Heart project (5th grade kids documenting selves doing heart experiment); (2) Exploratorium meteorology project (5th graders post messages in video "pads" that come back with answers from professional scientists).

RE (1) The video itself was terrible; the interesting process here was the students' negotiation of the sound bytes recited to annotate the video'd activities. (WHo would say what.) Charles said that it was interesting how it was possible for the video editor to construct familiarities with these projected video selves of the kids with no physical acquaintance.

Re (2) Charles mentioned that the most successful exchanges were those where the answers came back with suggested experiments. [So much for learning IN disembodied media. - SXW] Scientist's responses were typically in jargon and of little use to the students. Turnaround time of 2-3 days worked fine for these kids.

Bead catalog: point: the catalog of proxies earned just as much $ for company as sales of actual beads.

Discussion

Continuity becomes a problem (by its absence?) Video blows apart film's traditional relationship with the editing process. A lot of film editing achieves continuity (and other effects) by the construction of motion within the frame. Computer scientists had a very literal notion of video as a linear representation of 3+1 space. [Though this may be slowly changing - SXW] Digital video engineers (and editors) have worked ignoring the entire grammar of film editing.

One example is the use of a cut to close-up for temporal ellipsis. [Here, see Daniel Arijon's _Grammar of Film Language_ to get a flavor of the complexity and richness of cinematic grammar. - SXW ] Digital video today resembles the naivete of early film. For example, if you look at Edwin Porter's American Firemen, you'll see that it's done entirely in long shots. He used no close-ups for fear of losing the audience.

Larry: Maybe one difference here is the issue of editorial control. In digital video, we lose one layer of authorial control. Another problem is that video is rigid -- the video segments come with their own already frozen points of view, etc.

That's why I find the Project OZ's protocols for narration very interesting because it doesn't seem to be as clunky or chunky as the modular video editing I've seen.

XW: Well this is the fundamental chunk problem of all (digital) media. This is also not limited to video. For example, we had the same problem in the Paris Theater -- gluing together scanned images of theater interiors was a problem because the art came each with its own perspectival focal point(s), which could not be registered well.

But this need not be the only way to have digital video. For example, people now conceive of video sprites -- characters that are clothed in video; object-oriented video; video with embedded synthetic artifacts; dynamically editable video (synthetic POV, etc.) This goes back to the problem that today, digital video is "opaque" or "flat" and its representation is not rich enough to encode enough information to afford a grip to more richer editing.

L: A video segment cannot be "neutral" like a letter.

C: ANother issue that came up was that video is a broadcast medium, rather than an archival medium. Home videos dated extremely fast, even for the video subjects themselves. We asked, can video by used to make/show commitments? Can it be used to share experience? What does it take to make a video _ message_, rather than a "home video"?

We also tried various hardware augmentations: eg. encoding the video with extra camera data like: brightness, color, focus (range to focal point in scene), gamma, time, GPS [Global Positioning System locates anyone with a device to within a couple of meters, or less - SXW]

L: We can self- mark, can't we? Ie. we can annotate the video ourselves?

X: Scott Minneman and colleagues at Xerox, in a different approach, have been studying the use of video as memory intensely. There have been a sequence of very good papers from that group at PARC. One project was "Where We Were" in which engineering design meetings were videotaped along with multiple parallel streams of data, tracking for example what was on an electronic and analog whiteboard, speaker turns, etc. Participants had instant replay of the entire video-recorded "space", retrievable by gestures like "When J. started to draw something on the board." People started to invent semiotic neologisms like making a mark on the board just for the video, that carried not very much semantic content by itself. The mark was just a way to set a retrieval point. This way, there was no need to copy out a lot of information, knowing that the video memory would be there.

C: Cameraman used tricks like those, too: put hand over lens to introduce black that could be searched later. OR shoot self's face to mark a segment as important for later retrieval during high speed scan by human.

X: Is it possible to mark a video during live recording in such a way to allow rich interactions to be constructed afterward?

L: How can a performer or a director introduce "marks" into a performance's video record?

X: It seems like we've seen several forms emerge out of these experiments. and even if the various framing constraints (like perceptual field coverage, call-response time) change,it would be interesting to see how these _functions_ can evolve:

kiosk: stand-alone, public location around which people collect in physical space.

calling card: exchangeable representations of self

CU-SeeMe: communication channels

letters: narratives

One lesson is of course that the sign is no less slippery in video than in text.

Another issue is the indexical level of a video. Why/how is it that video "must be" annotated in order to make sense? How shall we annotate video )or any time-based media, like an animation script)? Even the most mundane problem: voice annotation takes up too much time. The different time-based media may not "register" properly. I recommend an article by Jeff Schnapp on paratext and the general problem of commentary and annotation in manuscript and book cultures. [Materialities of Communication]

Teaching music , we face the same problem How does a musician deal with this? By pre-discussion, by performing and talking over it, by looking at a score -- the best way analytically. Music has an extremely rich abstraction available to it -- the musical notation and the score. Video (to an even greater extent film) doesn't enjoy this re-performability. [This will change with the next generation of kids, if minicams become commodities as cheap as any other elementary school writing instrument.]

L: Maybe we could somehow "impoverish" the video in order to make it skimmable?

X: This thinning is analogous to making a piano reduction of a full symphonic score in order to make it easier to perform and to study a piece of music. This recalls the issues of scripting and shaping a performance. We simply do not have a "human-readable" representation of video or performance anywhere close to the precision and efficiency of musical notation and its interpretative apparatus. [Qualifications: I certainly don't claim that musical notation is all that preciseofr many sorts of music. And it's not at all clear how to split the descritptions between human and machine -interpretable representations.]