CSA2060 Assessed Project for the January 2004 exam session

Date due: Monday 26th January 2004 at midday

This assessed programming task is worth 100% of the study-unit CSA2060 (Introduction to C).

Changes!!!!

13/11/03... following lecture...

I have introduced the concept of CPU and Resource clock ticks to make the separation between the two easier to understand (I hope!). One resource clock tick can be the equivalent of several CPU clock ticks (because CPUs are generally faster than resources!). I have changed every reference to clock tick to either CPU clock tick or resource clock tick. Apart from that, the changed parts of this specification are indicated by "==>".

Rules and Regulations

These are IMPORTANT. Please read them, and if in any doubt, seek clarification from me PRIOR to the submission of the assessed practical task.

Your programs MUST be compilable using the version of UNIX gcc installed on a UNIX server of the Department of Computer Science and AI. If the examiner is unable to compile the program on at least one of the UNIX servers provided by the Department of Computer Science and AI, the examiner will penalise the submission accordingly.

Plagiarism will not be tolerated. Evidence of plagiarism in the assignment will attract a Fail grade for the candidate, and may result in further disciplinary action being taken against the candidate in accordance with University guidelines. For more information please visit the departmental Web site on plagiarism.

The deadline for the assessed practical task is midday on Monday 26th January 2004. The task must be submitted to Room 202, New Computing Bulding, University of Malta, Tal-Qroqq, Msida, and must be signed in as proof of submission. Late submission of tasks will attract an immediate 50% penalty (regardless of the reason for lateness) with an additional 10% penalty for each subsequent day of late submission, weekends included. In the event that a candidate is sick on the day of the deadline, the candidate must ensure that the assessed practical task is delivered to the location specified above in conjunction with the medical certificate (which must arrive by midday on Monday 26th January 2004). Missing "final touches" due to illness on and slightly before the day of the deadline will be taken into consideration. If you are ill during the semester, and the illness prevents you from working on this assignment, then you must inform me as soon as possible (and the Chair of the Board of Studies for IT, or the Dean of the Faculty of Science, if it is effecting other courses). In any case, as with anything else, you must be prepared for all sorts of eventualities. Short-term illnesses (of a few days' duration) should not really have any significant impact on your ability to complete all tasks - assuming that you have planned your work well, and you work steadily throughout the semester :-) If you leave everything till the last moment and then get sick, well...

The examiner reserves the right to ask *any* candidate to defend his or her submission via an oral examination prior to the results being published for this credit.

Failing candidates: in the event of a resit, candidates will sit for a written paper the following September. Expect compulsory questions about the assessed assignment :-) Candidates who have the right to a delayed first sit, because a legitimate reason prevented them from working on and submitting the project by the deadline at the end of this semester will also sit for a written paper the following September. If the September session contains resitting and first sitting students, they will all take the same written paper. All students will be expected to have sufficient knowledge of the assignment.

Submission Guidelines

Please follow these instructions carefully. Failure to comply with these instructions may lead to loss of marks. If in doubt, please seek the lecturer's advice.

Your project will be anonymous. Your name, ID number, student registration number, etc. must not appear on anything that will be given to the examiner. Disciplinary action may be taken against students who identify themselves in their submission.

When you submit your project to Rm 202 (see above), you will be given a cover sheet to fill in with your personal details. This information will not be passed on to the examiner.

Project Description

Good, you've made it this far :-) 

An Operating System environment typically consists of a number of processes all competing for computer resources to complete their task. For example, a program which reads data from a file, processes it, and then produces a hard copy of its results requires, along with the CPU and access to memory (RAM) to execute in the first place,  access to long-term storage (typically a hard disk) and a printer. If any of these resources is denied to the process, then the program's task cannot complete.

In your program, you will simulate requests being generated which must then be processed by the resource manager. Although some resources can be shared (used by many processes simultaneously), some resources are non-shareable. A non-shareable resource is one, like a printer, which must complete one job in its entirety before it is able to start the next one. For example, it would be considered to be an error if a printer began printing somebody else's job before it had finished printing the current job (even if it was possible to return to it later!). On the other hand, disks, for example, can be shared because although a program may request a file to be written (or read), it is possible to attend to another request (to read from or write to another file stored on the same disk) before the first file has been written (or read) in its entirety.

In a real operating system environment, most resources can be used simultaneously by different processes. For instance, it is possible for one program to read data off a CD while another prints a file. The resources are managed independently of each other. However, in your simulated environment you will not be managing real resources, so we will use a single resource manager to control several different types of resource.

NB: This specification may contains mistakes! If you encounter something that looks like it may me a mistake please e-mail me at cstaff@cs.um.edu.mt.

The Resource Manager

Resources hang around waiting for request. Upon receiving a request the resource will service it. A non-shareable resource must service the request to completion before it can process another request. This is not the case with a shareable resource, which is able to service many requests from different processes apparently concurrently.

At the heart of the  Resource Manager is a random event generator which generates events. These events include: the id of the currently running process and a request for a resource.

Whereas the request for a resource is generated by the currently running process, the resource itself will generate an interrupt to indicate that a request has been serviced. If we assume a uniprocessor environment then only one process at a time (in a CPU clock tick) can generate a single request. However, technically resources (that do not require the central processing unit to operate) can each generate an interrupt at the same point in time to indicate that they have completed some task. To model this, each of your CPU clock ticks will last long enough for a single resource request to be generated, and to check whether a resource has generated an interrupt.

If a resource is currently servicing a request, what happens if a new request of the same resource is generated? Each resource will maintain its own queue of incoming requests. What happens to the incoming request depends on whether the resource is shareable or non-shareable. If the resource is non-shareable, then the queued requests must wait their turn for the resource to become available. Each process that is waiting for a resource is suspended until the request is serviced and the resource signals that the resource has completed the task. On the other hand, with a shareable resource, waiting requests can be attended to one at a time in sequence with a little bit of work done on each in turn. The requesting process will also be suspended waiting for the resource to signal that its request has been completed. This means that whereas the queue for non-shareable resources will be a FIFO queue, the queue for shareable resources will resemble a circular queue.

In general, shareable resources are requested more frequently than non-shareable ones. Also, non-shareable resources require significantly more CPU clock ticks to complete a task than shareable resources. Several megabytes of data can be written to disk much faster than a printer can print a few 10's of kilobytes of data.

Whenever a process requests a resource you will need to know the following things:

You will also need to know the following about each resource:

==> Each resource clock tick the number of bytes left to transfer per resource is the number of bytes transferred so far minus the number of bytes that can be transferred in a resource clock tick. Once a resource has transferred all of the data that it can process in a single resource clock tick, then an interrupt can be generated.

When your program is run, the first thing it will do is locate and open a configuration file that contains certain data which will control how the environment behaves. The information in the configuration file should be read into an appropriate data structure.

The program will then enter a loop in which it will scan the input channel to determine if a 't', 's', or 'g' has been entered by the user (the user should not need to enter any input to cause the program to continue executing). If a 't' has been input, then the program will terminate. If an 's' has been input then the program will enter single-step mode and report on the current status of all resources, pausing at each iteration until the user types a <return> or a 'g'. The report will be printed to a file as well as to the screen. If the users types 'g', the program will enter go mode and stop reporting the status of each resource. If the user presses <return> while in single-step mode, then program will continue to report on the status of each resource after each iteration (and prompt the user for input at the end of each iteration). An iteration consists of the following steps, which are explained further below.

An iteration is the equivalent of a CPU clock tick. In a typical CPU clock tick, the following will logically happen:

  1. Generate the id of the currently running process. This is the process that may make a request.
  2. Generate a request for a resource, and add the request to the appropriate queue.
  3. For each resource in the resource structure:
    1. If it has been servicing a request, has it finished?
      1. If it has finished, then generate an interrupt and start servicing the next request in the resource queue
      2. If it has not finished, then:
        1. If it is a nonshareable resource, then:
          1. Continue processing the current
        1. Otherwise, if it is a shareable resource:
          1. Start or resume servicing the next request in the circular queue

Generate the ID of the currently running process

Generate a random PID as a short integer.

Generate a request for a resource

There is a 30% chance that a process will request a resource, and the chances of requesting a shareable resource are 3 times higher than the chance of the process requesting a non-shareable resource. However, there is then an equal chance of any of the shareable (or nonshareable) resources being requested. As a example, consider the following. On average, a resource request will be generated only 3 times in every 10 "goes" (CPU clock ticks).  However, for every 4 requests that are made 3 requests will be for a shareable resource and only 1 will be for a non-shareable resource. If your system has 5 shareable resources, then each one has an equal opportunity of being requested if the process has made a request for a shareable resource.

Once a request for a specific resource has been made the request is sent to the Resource Manager which checks if the resource is free. If it is, the status of the resource is set to busy, the resource is "given" as much data as it can process in a single resource clock tick, and the rest of the data, if any, is placed on the request queue. If the resource is currently busy, then the request is added to the appropriate queue. If there is more than one resource of a given type (e.g., 3 identical printers), then those resources will share a queue so that the next job can be given to the first resource in its class that becomes available. You will also need to generate the size of the data that needs to be transferred, remembering that a nonshareable resource will normally be given significantly less data to transfer than a shareable resource. Also, record the time at which the request was added to the resource queue, together with two other time slots, one which will be used to record the number of CPU clock ticks spent being serviced and the other to record the number of CPU clock ticks spent waiting in the queue.

Managing Resources

Each resource is represented by a resource descriptor that records the following information:

Note that multiple resources of the same type will each require a separate resource descriptor, but each resource descriptor will have the same Type. For example, you can have three PRINTER1 resources each with a unique Resource ID.

==> Note: Mapping resource clock ticks to CPU clock ticks...

In a real operating system, resources operate slower than the computer's CPU. If Time is real time (measured in nanoseconds) then, for example, a single CPU clock tick can occur in one Time interval. However, a printer may take a whole second of Time to register a single one of its resource clock ticks. The resource descriptor records how many bytes of data can be processed in a single resource clock tick. However, one resource's clock tick may last the equivalent of several CPU clock ticks. The resource descriptor will record the number of equivalent CPU clock ticks that are required to process the data that can be processed in one resource clock tick.

Resource descriptors are linked together through the resource structure. The resource structure is a linked list of resource descriptors. The interrupt vector table is a table of pairs of interrupt together with a pointer to the function which acts as the resource handler. The resource handler is responsible for transferring a block of data to the resource so that it can be  processed. The resource handler is invoked by the resource manager when a resource has generated an interrupt at the appropriate location in the interrupt vector table. The resource handler will reset the interrupt in the vector table and will change the status of the resource to "free", if the resource has completed its current task and there are no more tasks in the resource's request queue.

Each CPU clock tick the interrupt vector table is traversed. If an interrupt is set, then the Resource Manager will call the function for the associated resource handler. For a nonshareable resource, if the resource is busy, the resource handler will reduce the size of data remaining to be transferred (stored with the resource request in the resource queue) by the size of the data block that can be processed in one resource clock tick. If the resulting value is 0 or negative, the request has been completed. The resource handler will generate an interrupt (through the interrupt vector table), and do whatever else you consider to be good housekeeping (remember to document what you do!). For each shareable resource, do the same as for nonshareable resources but even if there is still data remaining to be transferred (after reducing it by the amount that can be processed in a single resource clock tick) you will now transfer control to the next waiting process so that it can make some progress the next time around. You will need to implement this structure at least as a circular queue. However, because you will want to add new requests to the "end" of the list, you'll also need to implement it as a doubly linked circular list (because you know that the logical end of the list is just before the process you are currently servicing, because it is the one which will take you longest to reach). ==> Do remember that one resource clock tick may be equivalent to several CPU clock ticks. When a resource begins to process a block of data, you will need to initialise an associated counter to either the number of CPU equivalent clock ticks that it will take to transfer the maximum amount of data that the resource can process in one resource clock tick, or if the amount of data to transfer is less than the maximum amount that can be handled in a single resource clock tick, then the actual number of CPU equivalent clock ticks that will be required. Each CPU clock tick that passes the counter will be decremented. When the counter reaches 0, an interrupt will be generated signalling the completion of the current task.

If the resource is free, then you'll need to see if there's anything in the resource queue waiting to be serviced. If there is then load it into the resource descriptor. For each request, whether or not it is being serviced, update the appropriate time serviced/time waiting time slots.

Scanning the input buffer

Your program normally runs non-interactively. With no user interaction, the program terminates after a number of iterations (read in from the configuration file). Just before the program terminates, it produces a short report on the statistics that have been kept.

At each iteration, however, the program will scan its input buffer, without pausing, to see if the user has entered a directive. The possible directives are:

All other inputs are discarded. If there is more than one character in the input buffer, obey the first legitimate directive, and discard all the others. For instance if the input buffer contains 'ast', ignore the 'a' (because it is invalid), obey the single-step directive ('s'), and discard the remaining input. Similarly, if the program is in go mode and the input buffer contains '<return>gts', then the <return> ('\n') and 'g' should be ignored (because they both require the program to first be in single-step mode), the 't' directive will be obeyed, and the remaining characters will be discarded.

t: terminate program

Forces the program to end before it reaches the maximum number of possible iterations. The behaviour of the program should be identical to a normal termination. It will still print the vital statistics, but obviously, they will have been collected over a smaller number of iterations.

s: enter single-step mode

The current state of all processes and statistical information are printed to the screen and appended to file. Pause for user input. The user is allowed to enter <return>, 't', and 'g' only. All other input is rejected.

The following information will be printed (to screen and to file):

Current iteration number:

Resource [Resource_ID] (Resource Name):
Current Status: <busy, free>
Current PID serviced: (if busy)
Last PID serviced:
Average wait for service:
%age of up-time spent in free state:
Queued Requests: [PID, job size[,PID, job size]...]
...
(obviously, time is measured in iterations. Time created will be an absolute number, whereas the other time references will be an offset from the time of creation).

<return>: continue single-step mode

Iterate once through the process queue and then display (and append to file) the updated statistics and information about each process. Pause for user input. The user is allowed to enter <return>, 't', or 'g' only. All other input is rejected.

g: go mode

Only allowed in single-step mode. Interactivity is switched off, and reporting to the screen and file is disabled. In this state, only 't' and 's' are valid directives.

When your program terminates

Your program can terminate for two legitimate reasons. Either the user has input 't', to force early termination, or else the total number of iterations has exceeded the maximum value specified in the configuration file.

The statistics that you will print (to screen and to file) are:

Total number of iterations:
[The next lines are repeated for each resource]
Resource Resource_ID (Resource Type):

%age of up-time spent in free state:
Number of requests serviced:
Average length of service per request:
Average length of time request spent waiting for service:
Resource Resource_ID (Resource Type):
...

Configuration file

The contents of the configuration file are:

The initial state: {s | g} (s = single step mode, g = go mode)
Maximum number of iterations: {9999 = infinite}
Odds of a process generating a request for a request: 30%
Odds of a nonshareable request being requested, if a resource has been requested: 25%
List of Resource Descriptors
Resource Type: string
Resource ID: unique integer
Shareable: 0 = nonshareable, 1 = shareable
Data Block Size: (in bytes, for max data transfer size in one resource clock tick)
==> Clock ticks: the number of CPU clock ticks that are executed for one resource clock tick to pass
Max Job Size: the maximum size of job that can be handled by the resource (artificial limit, in bytes)

Resources of the same type will have the same string value, e.g., PRINTER.

You should use the following representations of information in the configuration file:

INIT=g
MAXITER=30000
REQUEST=30
REQUESTNONSHARE=25
RESOURCESEPARATOR
RESOURCETYPE=PRINTER
RESOURCEID=1
SHAREABLE=0
DATABLOCKSIZE=50
==> CLOCKTICKS=7
MAXJOBSIZE=1000
RESOURCESEPARATOR
RESOURCETYPE=...
You should check that the token on the left hand side is recognisable, rejecting any errors in the input. You may ignore tokens that are not recognised without aborting, but you must have all of the expected tokens to continue. For example, you can ignore ITER=9, but if DATABLOCKSIZE is missing for the first resource, then you must abort the program.  You should also ensure that legal values are provided for each resource. You should also ensure that MAXJOBSIZE, which is the maximum size of job that a resource will accept, is reasonable (MAXJOBSIZE/ DATABLOCKSIZE << MAXITER). On average, jobs processed by shareable resources will tend to be much smaller than those processed by nonshareable resources, otherwise requests for nonshareable resources will be left unserviced for unreasonable lengths of time.

Deliverables

C Source code
Documentation, including a brief section on weaknesses (if any) of your approach. If your solution has no weaknesses, please explain why.
Evidence that your program has been adequately tested.
An answer to the question posed at the end of this document.

Guidelines for the documentation

Your documentation should be from 15-30 pages in length (excluding source code listing). You will normally describe, in your own words, the problem you are trying to solve; the solution you implemented, and why that particular solution, rather than any other solution; problems you encountered, and how you solved them; major data structures used and the operations on those data structures; evidence that your program works; an example session (with screen shots, if appropriate); weaknesses of your approach (including things required but not implemented); and future enhancements.

The documentation should also contain a section which reports comparisons of the final vital statistics of the program (which should be allowed to terminate normally) when it has been run with different values in the configuration file for the odds of occurance of different events, different maximum loads and a different number of resources. You should provide final statistics, an explanation of why the statistics differ, and which configuration file appears to result in "better system behaviour" for at least the following two experiments. ==> Each configuration file also contains references to CLOCKTICKS which are the number of CPU clock ticks that are equivalent to a single resource clock tick.

Experiment 1: Initial configuration file

INIT=g
MAXITER=3000
REQUEST=30
REQUESTNONSHARE=25
RESOURCESEPARATOR
RESOURCETYPE=PRINTER
RESOURCEID=1
SHAREABLE=0
DATABLOCKSIZE=5
MAXJOBSIZE=50
CLOCKTICKS=10
RESOURCESEPARATOR
RESOURCETYPE=DISK
RESOURCEID=2
SHAREABLE=1
CLOCKTICKS=3
DATABLOCKSIZE=250
MAXJOBSIZE=5000
RESOURCESEPARATOR
RESOURCETYPE=PRINTER
RESOURCEID=3
SHAREABLE=0
DATABLOCKSIZE=5
CLOCKTICKS=10
MAXJOBSIZE=50
RESOURCESEPARATOR
RESOURCETYPE=RAM
RESOURCEID=4
SHAREABLE=1
DATABLOCKSIZE=500
CLOCKTICKS=1
MAXJOBSIZE=10000
Experiment 2: Initial configuration file
INIT=g
MAXITER=5000
REQUEST=30
REQUESTNONSHARE=25
RESOURCESEPARATOR
RESOURCETYPE=PRINTER1
RESOURCEID=1
SHAREABLE=0
DATABLOCKSIZE=5
MAXJOBSIZE=50
CLOCKTICKS=10
RESOURCESEPARATOR
RESOURCETYPE=DISK
RESOURCEID=2
SHAREABLE=1
DATABLOCKSIZE=250
MAXJOBSIZE=5000
CLOCKTICKS=3
RESOURCESEPARATOR
RESOURCETYPE=PRINTER2
RESOURCEID=3
SHAREABLE=0
DATABLOCKSIZE=5
MAXJOBSIZE=50
CLOCKTICKS=8
RESOURCESEPARATOR
RESOURCETYPE=RAM
RESOURCEID=4
SHAREABLE=1
DATABLOCKSIZE=500
MAXJOBSIZE=10000
CLOCKTICKS=1
You should, of course, also test your program with more resources, less and more processes, different data transfer rates, etc.

When the program terminates, one statistic that is given for each resource is the amount of time it was free (idle). Of course, it is not efficient for resources to be idle for most of their time. Another statistic is the average amount of time a request is waiting to be serviced. It is not efficient for requests to be delayed indefinitely, because the processes they belong to cannot continue executing, resulting in many frustrated users. Experiment with  combinations of values and resources in the initial configuration files. Which combinations appear to approach the ideal situation of all resources being in constant use and requestsbeing serviced almost immediately?

On average, and assuming a reasonable level of competence with C, this program should take you approximately 2.5 days of effort to code and test, and another 1.5 days of effort to document.

Have fun!