CSA3020

Lecture 3 - Overview of Multimedia Systems

Introduction

So, what does it mean for a system to be a multimedia system?

The word multimedia in a computer environment implies that many media are under computer control. In its loosest possible sense, a multimedia computer should support more than one of the following media types: text, images, video, audio and animation. However, that means that a computer which manipulates only text and images would qualify as a multimedia computer.
Happily, there is a stronger definition of a multimedia computer: a computer which controls at least one media type of each of continuous and discrete media.
Text and images are example of discrete media (i.e., they are time-independent), whereas video and audio are time-dependent, and consequently, continuous.

The processing of time-independent media should happen as fast as possible, but this processing is not time critical because the validity of of the data does not depend on any time condition. However, in the case of time-dependent media, their values change over time - and, in fact, processing values in the wrong sequence can invalidate (part of) the data.

Characteristics of a stand-alone multimedia system

A stand-alone multimedia system may have I/O devices for the capture of, but at least for the playback of, both discrete and continuous media types.
Audio and video usually have significantly higher bit-rates than the system throughput is capable of. Additionally, audio and video in their raw form are significantly large. For example, PAL standard video has 25 frames per second. A single full-size frame (at a resolution of 640 x 480, and a 16-bit colour depth) would require 4,915,200 bits, or 614,400 bytes to represent it. Full-motion video of that resolution and colour depth would require 15,360,000 bytes, or 14.65 Mbytes, for a single second of video. Even a 30 second video clip would require more than 439 Mbytes of storage. For these reasons, and to facilitate transmission of continuous media over a network, video and audio are usually compressed. Real-time decompression of the video and audio streams is possible in software, although currently hardware support is required for real-time compression (of video).
Storage of compressed video and audio still comsumes vast amounts of magnetic or optical storage media. CD-DA (Compact Disk - Digital Audio) technology is also used to store video. Originally, the maximum data transfer rate was 150Kb/second (compare this with the requirement of a data transfer rate of 14.65 Mbytes/second for uncompressed full-screen full-motion video!). Even compressed full-motion video optimised for playback in a smaller (typically 160 x 120) window required significantly high data transfer rates. Compromises to ensure that the video playback was of some enjoyment included dropping the frame rate (the human eye is deceived into perceiving motion at a minimum of 16 frames per second) and colour depth of the stored video. If an audio stream was also present it was usually presented in 8-bit mono to reduce the overhead on the system bus.
Modern CD devices operate at around 50x the original - capable of producing data transfer rates in the region of 7.5Mbytes/second - (DVD manages about 10Mbytes/s; a modern AV hard disk is capable of approximately 59.7Mbytes per second). An advantage that CDs have over HDs is that data is written to contiguous sectors, meaning that there is little or no head movement (and hence, latency), giving almost constant retrieval times. HDs are re-usable, and as blocks are required and released, although there may be space overall to store high capacity data, there is no guarantee that the data will be stored in contiguous blocks. To overcome this, modern file systems allocate space in larger blocks (typically 64k/128k blocks), and make more of an effort to store data in cylinders, also reducing seek time, in an effort to maintain constant transfer rates, while still catering for the more bursty rates required by discrete media.
A stand-alone multimedia system, then, has a large HD capacity, a CD-ROM/DVD-ROM player, and a sound-card. There are also specifications governing the computer monitor, but these are outside the scope of this lecture. Additionally, a multimedia computer will have speakers and optionally a microphone. For video playback no hardware is required, but for video capture, a video capture card (which has at least a resident Digital Signal Processor, and optionally a video codec (real-time compressor/decompressor)) is necessary. External ports enable the coupling of a scanner, digital camera, etc., to capture high volume multimedia data.

Characteristics of a networked multimedia system

A networked multimedia system typically shares the characteristics of a stand-alone multimedia system with the additional requirement that it is connected to a (reasonably high-speed) network.
The main implications are that a networked multimedia system is likely to be i) a shared resource, possibly serving multimedia data to other networked computers; ii) runs distributed multimedia applications (e.g., CSCW, video conferencing, etc.); and iii) at least, is used to remotely access multimedia data.
The network connection is a bottle-neck. Additionally, if the networked computer is a server, or is running distributed multimedia applications, then there is also the overhead of preparing data for distribution over the network, possibly in conjunction with capturing the multimedia data in the first place. For example, a Web server which has video files available for download may be simultaneously accessed by several clients. A computer used for video conferencing must capture audio and video, compress it, and prepare it for transmission over the network. The slower the network connection, or the more traffic there is on the network, the less likely it is that data, although it may be captured in real-time, can be transmitted at the corresponding rate.
There are a number of significant disadvantages with the more common existing network standards, which are really a disadvantage of the network protocols. Take TCP/IP, the underlying protocol for the Internet. A data file to be transmitted is divided into packets of equal size (typically 1K), and numbered sequentially. The underlying motivation of the protocol is to find the fastest route from the source to the destination. Each packet may take a different route to arrive at the destination. This implies that the packets may (and usually do) arrive in a different sequence from the one in which they were sent. With text and graphics file, or any file which is to be downloaded prior to being viewed, this is not a problem. The destination computer simply waits for all packets to be received prior to reassembling them. If packets do not arrive in time, then the destination computers re-requests the missing packets. So far, so good.
Video and audio files are typically large (often inconvenient to download in their entirety first), and the Internet is also used for live broadcasts (which, by implication, cannot be stored and played back later!). Video and audio streaming allows continuous multimedia data to be played as it is received. However, what happens to packets that are received out of sequence? Sometimes, if the delay is very small, they can be buffered, but typically, as it does not make sense to play video and audio data out of sequence, the "bad" packets are discarded, resulting in jitter when either the system waits for missing packets or skips bad packets). The scenario can be rectified slightly if the client (destination) informs the server (source) that many packets are being "lost". Round-Trip Delay (RTD) information can also be relayed to the server so that the server sends less data to the client by, for example, dropping the video frame or audio sampling rates. This can be an on-going negotiation during the session. Although this is a feature of the applications on the Internet, with ATM and B-ISDN, this negotiation is an essential part of the network.

Traditional Data Stream Characteristics

Asynchronous Transmission Mode
There is no time restriction on the transmission of packets. Packets reach the receiver (client) as fast as possible. Transmission of discrete data typically requires only this transmission mode.

Synchronous Transmission Mode
A maximum end-to-end delay is specified, for the transmission of packets, and the maximal delay is never violated, although each packet may be received at any arbitrary earlier time. This is essential for multimedia applications which require that no packets are dropped due to network and server overheads. All packets sent will be received in a timely fashion. Note, however, that the sequence in which packets are received is not guaranteed. The client will still need to buffer packets which arrive out of sequence. There may also be a slight jitter in the playback of the stream as the playback system waits for data which has not yet arrived, but which is still within the bounds.

Isochronous Transmission Mode
As well as a maximum end-to-end delay, a minimum end-to-end delay is also specified and guaranteed. This reduces (but does not completely eliminate) the need for temporary storage in the client to buffer out-of-sequence packets, as well as jitter. However, the implication is that nodes on the network are reponsible for storing packets which have already been sent by the server, but which are not yet ready to be received by the client.

Back to the index for this course.
In case of any difficulties or for further information e-mail cstaff@cs.um.edu.mt

Date last amended: 2nd September 2002