CSA402
Lecture 13 - Hypermedia and the World-Wide Web: Part 2
References: 
Steinmetz, R., and Nahrstedt, K. (1995). Multimedia: 
Computing, Communications & Applications. Prentice Hall. Chapter 15.
WWW
The World-Wide Web is an Internet service. Consequently, it is implemented 
on top of TCP/IP and uses the HyperText Transfer Protocol (HTTP). Usually, 
HTTP servers listen on port 80. HTTP, like all other internet services, assumes 
that TCP/IP will correctly transfer data between computers. 
In HTTP, documents are uniquely identified using a Uniform Resource 
Identifier (URI), usually in the form of a Uniform Resource Locator 
(URL).
The HTTP protocol is based on a request/response paradigm. A client establishes a connection 
with a server and sends a request to the server 
in the form of a request method, URI, and protocol version, followed by 
a message containing request modifiers, client information, 
and possible body content. The server responds with a status line, 
including the message's protocol version and a success or error code, 
followed by a message containing server information, entity 
metainformation (e.g., document accounting information, like "date last 
modified"), and possibly body content. 
The URL is the generic document naming strategy for HTTP. It usually takes 
the form of
protocol://host [:port] [absolute_pathname] [#fragment]
where protocol is the protocol to use, e.g., http, 
host is the domain name or IP address of the HTTP server hosting 
the document, e.g., www.cs.um.edu.mt, port is the port on 
which the HTTP server is listening - the default is 80, 
absolute_pathname is the fully qualified path of the document 
from the HTTP server's document root, and fragment is a symbolic offset 
within the document. If the absolute_pathname is omitted, then the 
HTTP server refers to a default document.
You will have noticed that the http protocol is, in fact, not the only 
protocol that can be referred to in the URL. An HTTP client can be designed 
to accept any Internet protocol (e.g, ftp, gopher), in which case it will 
interact with the appropriate application layer on the server-side. HTTP 
clients (Web browsers such as Microsoft Internet Explorer and Netscape 
Communicator) can offer a consistent user interface to many Internet 
services.
HTTP requires that a telnet connection is made by the client to the HTTP 
server (and appropriate port), prior to the request for a document being sent. 
The request of the server can then be made. Normally, once the server has 
satisfied the request, the server will terminate the connection. However, 
either the client or the server can abnormally terminate the connection 
(if, for instance, the user interrupts the download of a document, or if 
the server crashes). The client must be resilient enough to recognise an 
abnormal termination of the connection.
An HTTP request generally takes the form of
<request_type> <URL> <HTTP_version_number> CRLF
For example, GET http://www.cs.um.edu.mt/~cstaff/index.html 
HTTP/1.0. The server will respond with a message that includes the 
status (whether the document was found/not found/relocated/etc.), document 
metainformation, and the document body (if the document was found). For 
example,
[24] zeus telnet www.cs.um.edu.mt 80 -- connect to HTTP server on port 80
Trying 193.188.34.81... 
Connected to babe.cs.um.edu.mt.
Escape character is '^]'.
GET http://www.cs.um.edu.mt/~cstaff/index.html HTTP/1.0 -- HTTP request
HTTP/1.1 200 OK -- status line, 200 indicates document exists and can be downloaded
Date: Tue, 16 Mar 1999 10:01:31 GMT -- date document requested
Server: Apache/1.2.1 -- identity of HTTP server
Last-Modified: Wed, 27 Jan 1999 15:25:47 GMT -- date document last modified
Content-Length: 2319 -- in bytes
Accept-Ranges: bytes
Connection: close
Content-Type: text/html -- document type
The "Never to be Completed" Site
Chris Staff's not at home, Page
Well, you've stumbled across my site. I'm a firm believer that 99% of all
Web sites should be under construction, so I'm not even going to bother
putting
graphics of construction workers up on this page, because they'll never be
removed!
One day this page will be neatly laid out, but it isn't going to happen
for the foreseeable future.
A little bit about me
 I lecture in Computer Science in
the Dept. of Computer Science and
A.I. at the University of Malta.
I'm also reading a PhD in Adaptive Hypertext at
The School of Cognitive and Computing
Sciences, University of Sussex. I'm
about half way through at the moment (March 1997).
I lecture in Computer Science in
the Dept. of Computer Science and
A.I. at the University of Malta.
I'm also reading a PhD in Adaptive Hypertext at
The School of Cognitive and Computing
Sciences, University of Sussex. I'm
about half way through at the moment (March 1997). 
The courses I teach
You can follow the links to course notes, where they're on-line
CSM202:
Operating Systems. 
CSM210:
Systems Programming in C (Part I).
CSA402
: Graphics
and Multimedia Systems: Multimedia Systems.
 
 These lectures
form part of the University's BSc IT (Hons.) degree.  The Web
pages for the degree also give information like course descriptions, when
lectures are scheduled for delivery, how many credits they're worth,
etc.
I also service some other lecture courses.
Practical I.T. for Human Resource
Development, for the MA in Human Resource Development.
Marketing on the WWW, for the MA in Marketing.
Contact Details
E-mail: cstaff@cs.um.edu.mt
Postal Address: 
University of Malta, Dept. of Computer Science
and A.I., Tal-Qroqq, Msida MSD 06, Malta, Europe
Location on Campus: Room 402, New Computer Building (off Car Park 5)
Telephone: (356)-32902506
Fax: (356)-320539
Connection closed by foreign host. -- server closes connection
[25] zeus
The content-type field is an important part of the Web. The Web 
supports of a variety of multimedia types. The type is used to indicate to 
the Web client how the data being downloaded should be handled. For example, 
GIF images, have the content type image/gif, so that the Web 
client can handle the stream as a possibly compressed image. Other types 
might require the loading of an application in which to display the data 
(e.g., video, or a Microsoft Word document). The request_type can 
also be HEAD, in which case only document metainformation is 
downloaded by the server. This is particularly useful to see if the 
document has been changed since the last time it was downloaded. In order 
to make efficient use of the Internet, Web clients use a local cache, in 
which recently downloaded documents are stored. If the document is still in 
the cache and hasn't changed since it was cached, then the document is 
loaded from the cache instead. Consequently, unless directed otherwise by 
the user, GET requests are typically preceded by a HEAD 
request.
The request typically originates within Web client application. A user will be 
browsing through a document, using a Web browser, and will click on text which is marked up as 
being a "link" to another multimedia document. The majority of documents on 
the Web are so-called HTML documents, documents which contain instructions 
which the Web browser can use to interpret the manner in which the 
document is to be displayed to the user. HTML, the HyperText Markup 
Language, is a presentation language which is used to indicate the 
context of text. Text is modified in accordance with recognised 
tags (which are composed of a < and > surrounding a 
modifier). For example, the following text is interpreted as being 
emboldened by the presence of the <b> and </b> tags 
surrounding it. The </b> indicates that emboldening ends at that 
point. <b>This text is emboldened</b>. Typically, a tag 
is a pair <tag>text</tag>. The majority of tags are used to enable a browser to modify how 
the content of the document is displayed to the user, according to user 
preferences. The glue that links documents to each other is the HTML 
Hypertext reference tag, <A 
HREF="URL">anchor_text</A>, where 
the anchor_text is the text in the document, which, when clicked 
on, results in the generation of an HTTP request to display the document 
identified by the URL.
When a user clicks on a link, the application will generate a request for 
the associated document to be displayed. Typically, this involves 
downloading the document to the client's computer and processing it 
locally. The client will set up a communications channel by telneting to 
the server's port identified in the URL. If the connection is successful, 
and if the document is already in the client's cache, the application may generate
a HEAD request to check the modification date of the document. If the 
document has not been modified since it was last cached, then the client 
will reuse the version in the cache, otherwise, or if the document is not 
already in the cache, the application will generate a GET request 
(possibly first having to reconnect to the server using telnet). The 
client waits until it receives a response from the server, or until the 
request times out. It then processes the response, or reports a time-out 
error.
Deficiencies of TCP/IP
The deficiencies of TCP/IP should be fairly obvious, given what you 
already know about multimedia and the brief description of TCP/IP given above. 
However, for the sake of completeness, they are given below.
TCP/IP is perfectly suited to the requirements of discrete data, but the 
inherent properties of TCP/IP means that it is not suitable for continuous 
media types.
The overriding factor in IP is that datagrams are transported to the sink 
over the fastest available route. This means that datagrams can be 
delivered out of sequence. It also means that QoS parameters are severely 
restricted, as no minimum of maximum delivery times can be specified. Also, 
round-trip delays are subject to network loads, and service can 
deteriorate rapidly, requiring the sink and source to renegotiate QoS 
parameters frequently during a dialogue. 
TCP is primarily concerned with ensuring that datagrams have been received 
correctly. If the source has not had an acknowledgement that the datagram 
has not been received, then it re-sends it. For video and audio broadcasts, this is 
wasteful of resources as more often than not, the client application will 
have missed the deadline for processing the missing data. 
If many sinks are receiving the same broadcast, TCP/IP at the source is 
responsible for generating as many individually addressed datagrams as are 
necessary. This implies that the sinks may not be receiving the data in 
synchrony (it may take a variable amount of time for each datagram to be generated and routed to 
the individual sinks), and that the broadcast is contributing to network 
congestion. The TCP/IP network is unable to give any guarantees over and 
above the guarantee that as long as there is a route between the computers 
involved in the multimedia session, then data will eventually be routed 
between the source(s) and the sink(s). It is up to the application layers 
to provide additional guarantees, in so far as they are supported by TCP/IP. 
For example, in the case of real time video and audio playback or telephony over the 
Internet, servers and clients can negotiate on the frame rate 
and buffer size, taking into account the current round-trip delay. If the 
client experiences a degradation or an improvement of service, it can 
renegotiate the frame-rate. 
The Multicast Backbone - MBONE
The Multicast Backbone was designed specifically to overcome some of the shortcomings with the 
Internet, especially in the areas of network efficiency. 
Consider how packets on an (Ethernet) LAN are sent from the source to the 
destination. On the LAN all connected computers are able to listen and send 
simultaneously. This is essential, as in order to send data onto the 
network, the network must be silent. It is still possible for two computers 
to send data simultaneously. However, in this case, there will be a 
collision, which the computers will recognise, and they will re-send their 
data at a later time. While the data is on the network, any computer 
(typically the one to which it is addressed) can read the data off the 
network. If it were possible to address a packet to all computers 
on the LAN, then all computers would receive the same data, without 
duplicating packets. However, the more computers are attached to the LAN, 
the less likely it is that computers can send data at any point in time, 
because the probability that the network is already active is increased. 
This results in the illusion that the network is slow. Consequently, IP 
broadcasts are typically not allowed to cross boundaries created by 
routers. A router creates a physical divide between computers on one LAN 
(attached to one of the router's ethernet cards) and computers on another 
LAN (attached to the router's other ethernet card). However, as it is 
possible to reach many computers without linearly increasing the amount of 
required bandwidth, broadcasting remains attractive. One of the down sides 
of broadcasting is that all computers will receive the data, even if it is 
not addressed to some of them. Injudicious use of broadcasting would result 
in a severely saturated network. Multicasting provides the benefits of 
broadcasting, but it limits the broadcast to those computers which should 
be exposed to the data (except, of course, where this is unavoidable). 
Implementing this is quite a departure from IP, so specialist hardware, the 
Real-Time Protocol and software needs to be employed to benefit from multicasting on the 
Internet. 
Multicast hardware and software support
At the end of the day, multicast datagrams cannot be relayed on the 
Internet in their raw form. Instead, they are packaged as ordinary unicast 
datagrams which routers can forward. Operating systems need to be modified or patched in 
order to address IP datagrams to multiple destinations, and then disguise 
the datagram as a unicast datagram. Currently, most operating systems which 
can be modified to support multicast routing are UNIX-based. This solves 
the problem of addressing data to multiple computers, engaged in a 
multi-way conferencing session, or receiving a live broadcast, without 
creating undue traffic on the Internet, and without the host duplicating 
effort to create multiple datagrams that are identical apart from the 
address. However, we are still unable to provide isochronous transfer 
modes, which permit applications to impose and adhere to real-time delivery 
time scales, because this requires the support of lower layers that actually have 
control over resources in switches and routers.
Back to the index for this course.
In case of any difficulties or for further information e-mail 
cstaff@cs.um.edu.mt
Date last amended: Monday, 22 March, 1999