CSA402

Lecture 12 - Hypermedia and the World-Wide Web: Part 1

References:
Steinmetz, R., and Nahrstedt, K. (1995). Multimedia: Computing, Communications & Applications. Prentice Hall. Chapter 15.

Introduction

Hypermedia is probably the most popular, but certainly the most widely used, distributed multimedia system. The philosophy behind hypermedia is simple: in a computer-based environment, enable users to immediately and seamlessly access documents which are presented through a common interface, regardless of the documents' location and type.
Hypertext has a long and varied history, largely in the domain of research. It was only when Mosaic was designed as the first graphical user interface to a distributed hypertext system developed at CERN that popular hypertext exploded. The system developed at CERN has now become the World-Wide Web (WWW), and Web browsers, like Microsoft Internet Explorer and Netscape Navigator/Communicator, are present on virtually all computer systems.
This lecture assumes familiarity with the Web, and a basic understanding of what hypermedia is. We will see just how the Web is constructed, by first explaining how the Internet provides services, and the taking a closer look at the HyperText Transfer Protocol (HTTP), and the HyperText Markup Language (HTML). Finally, we will compare the characteristics of the TCP/IP protocol, the Internet's underlying protocol, with the requirements of distributed multimedia systems.

Hypertext

In hypertext, individual documents can be referred to within another document, and support is provided for users to perform a simple action to immediately access the referred to document. This simple statement hides a considerable number of implementation factors:

in order to support distributed hypertext, computers need to be interconnected
documents need to be uniquely identified in a distibuted environment, and the method of naming files needs to be supported in a distributed fashion
hypermedia implies that documents of any type (continuous or discrete) can be referred to: however, the user interface should provide a standard method of interaction
when a document contains a reference to another document, users should be able to perform a simple action to access the document
references to other documents can be made from any document type
the distributed hypertext system needs to be an open system, so that any computer running any operating system can access or serve documents
usually, but not always, hypermedia can link to live data, although it is rare to link from live data

As we shall see, these requirements are provided by a number of different layers in the communication, operating, and application systems.

The Internet

The basis for communication between computers is provided by the Internet. The internet progressively joins local area networks allowing a variety of otherwise incompatible computers to communicate. The basic protocol which makes this possible is the Internet Protocol (IP). IP is predominantly an addressing and data routing scheme. Some computers on a network will act as routers. Routers typically know the identity of the computers on their local network, and know the address of another router to which data addressed to an unknown computer should be sent. At the level of IP, data is simply routed from a source computer to a destination computer through a series of 0 routers (if the destination computer is on the same LAN as the source computer) or more. IP processes relatviely small chunks of data. A single file may be decomposed into smaller parts (datagrams) before it is transmitted over the Internet. Individual routers will decide at the time they receive a datagram, to which router they will forward the datagram. It is possible, therefore, that the individual datagrams comprising the original file will take different routes before they arrive at the destination. For this reason, it is possible for datagrams to be received out of sequence. Additionally, data may get corrupted or even lost. This is where TCP (Transmission Control Protocol) comes in.
IP is solely responsible for taking datagrams and delivering them to their destination over any available route. TCP is responsible for preparing the datagrams for transmission, ensuring that the communication is reliable, and re-assembling the datagrams received at the destination. When, for example, a file is being prepared for transfer, TCP will divide the file into datagrams (the size of which is negotiated with the destination computer). TCP will add the source and destination computer addresses (IP addresses), a datagram sequence number, and error detection data (e.g., a checksum), amongst other things, to each datagram. TCP then hands each datagram to IP which attempts to deliver them. When datagrams arrive at the destination computer, TCP attempts to re-assemble the original data, using the sequence numbers. At this point it may notice that a datagram has been corrupted during transfer. TCP at the destination will simply discard any bad datagrams. However, whenever TCP at the destination encouters a valid datagram it will send an acknowledgement back to the source computer. If the source computer does not receive an acknowledgement within some time-out period, it re-transmits the datagram.
Armed with a method of reliably transferring data between any two computers anywhere in the world, it becomes possible to offer a range of services. Traditionally, Internet services were based around telnet (remote login), ftp (file transfer), and electronic mail. These soon expanded to include network file systems, remote printing, and remote execution of programs. An important service built on top of TCP/IP which abstracts away of the specific binding of computers to an IP address is provided by Domain Name Servers (DNS). In this scheme, computers are known by a domain name (e.g., saturn.cs.um.edu.mt) and a domain name server can be queried to return the actual IP address of the computer known by its domain name. Services can now be mobile (in the case of a server being relocated, either onto another computer within the same LAN, or onto another LAN entirely). Although the computer offering the service has a different IP address, it will still be known by its domain name. For example, in the Department of Computer Science and AI, Univeristy of Malta, mail services are currently provided by the machine with the IP address 193.188.34.1 . The computer is also known as zeus.cs.um.edu.mt. The mail service, however, is known as mail.cs.um.edu.mt. As far as DNS is concerned, the mail service is provided by mail.cs.um.edu.mt. It is possible to seamlessly relocate the mail service to any other computer, or even attach zeus.cs.um.edu.mt to any other LAN. In both cases, this will result in the IP address of the mail server changing. However, all that is required is to update the IP address of the mail server in the DNS file, so that all data directed to mail.cs.um.edu.mt will be sent to the correct computer providing the mail service. DNS entries must be unique, just as IP address must be unique. Otherwise, if there are two computers both known as mail.cs.um.edu.mt (or 193.188.34.1), then routers will be unable to determine to which computer datagrams should be directed. Each country in the world has a Network Information Centre (NIC) associated with it. The NICs are responsible for coordinating IP address usage, to ensure that IP addresses are used efficiently (as there are a finite number available) and through which domain names can be registered.

Internet Services and Ports

Internet services are provided by running appropriate server software on a computer connected to the Internet. For example, to run an FTP service, an FTP server is run on a computer connected to the Internet. Any other computer can then download files from this server by connecting to it using an FTP client.
It is possible to run several services from the same computer. Server applications then have the task of determining to which service the various datagrams are addressed, given that many simultaneous conversations with different servers may be in progress. This is achieved by the application layer. The application layer sits on top of TCP/IP and "listens" to a communication channel called a port. Each server listens on a different private port. When client software sends data to a particular server, it needs to be addressed to the appropriate port. The server will know that data is for its attention when data arrives at the port to which it is listening. Port numbers are "well known" or "assigned" - for example, connecting (using telnet, for example) to port 25 on any machine will, if there is a mail server listening on that port, result in a connection being constructed for a mail session. Each conversation has 4 numbers - two identifying the client and server computers by their IP addresses, one for the port on the server on which the conversation is taking place, and finally the port on the client computer which initiated the conversation. It is possible for the same client to be running two FTP (or any other service) sessions with the same server concurrently. In this case, the client will use a different port number for each session so that the separate conversations do not become garbled. Application layers have their own protocol for controlling the conversation. In the case of file transfer, it is FTP. Mail services use the Simple Mail Transfer Protocol (SMTP), etc.

Back to the index for this course.
In case of any difficulties or for further information e-mail cstaff@cs.um.edu.mt

Date last amended: Monday, 22 March, 1999