CSM217 Assessed Project for the January 2002 session

Date due: The practical must be submitted before the end of semester test for CSM217 due to be held in January 2002.

This assessed programming task is worth 40% of the credit CSM217 (C for Computer Scientists).

Rules and Regulations

These are IMPORTANT. Please read them, and if in any doubt, seek clarification from me PRIOR to the submission of the assessed practical task.

Your programs MUST be compilable using the version of UNIX gcc installed on a UNIX server of the Department of Computer Science and AI. If the examiner is unable to compile the program on at least one of the UNIX servers provided by the Department of Computer Science and AI, the examiner will penalise the submission accordingly.

Plagiarism will not be tolerated. Students found to have plagiarised will fail the credit and will risk being expelled from their respective degree course. THIS IS FOR REAL.

The deadline for the assessed practical task will be the time of the credit test for CSM217. The task must be submitted to Room 202, New Computing Bulding, University of Malta, Tal-Qroqq, Msida, and must be signed in as proof of submission. Late submission of tasks will attract an immediate 50% penalty (regardless of the reason for lateness) with an additional 10% penalty for each subsequent day of late submission, weekends included. In the event that a candidate is sick on the day of the credit test, the candidate must ensure that the assessed practical task is delivered to the location specified above in conjunction with the medical certificate (which must arrive within 1 hour of the start of the exam). Note that the penalties referred to in this document apply only to the 40% allocated to the assessed practical task.

The examiner reserves the right to ask *any* candidate to defend his or her submission via an oral examination prior to the results being published for this credit.

Failing candidates: in the event of a resit, the marks awarded for the assessed practical task in the first sit will stand, unless the candidate gives notice that they intend to resubmit the assignment at the time they register for the resit. In this case, the submission will be worth only 20%, with the written part of the examination worth 60%. It is not possible for the marks awarded to the first submission to be disclosed to resitting candidates.

Submission Guidelines

Please follow these instructions carefully. Failure to comply with these instructions may lead to loss of marks. If in doubt, please seek the lecturer's advice.

Your project will be anonymous. Your name, ID number, student registration number, etc. must not appear on anything that will be given to the examiner.

When you submit your project to Rm 202 (see above), you will be given a cover sheet to fill in with your personal details. This information will not be passed on to the examiner.

Project Description

One of the more laborious aspects of application development is unit testing program code. When this is done properly, it means creating data which will exhaustively test a program, so that logical errors can be found. It is usually not possible to exhaustively test a program in a single run. For example, if a program contains an instruction to open a file and to terminate if the file cannot be opened, then it is necessary to run the program at least twice - once when the input file can be opened, and once when it cannot.

Properly testing a program means that all possible flow paths through a program must be executed, and the output from each flow path is identical to the expected data.

You are to write a utility which will assist a unit tester to determine whether all of a program has been tested. This means that each decision point in a program must be uniquely identified and marked up to determine if control has passed through it on a particular run.

As a program can be tested cumulatively, by running it several times using different data sets, it is necessary to establish which decision points have been executed during each run, and to identify which decision points have never been executed, so that the tester can create specific data to force those points to be executed.

Write a C program, called dataTester.c which takes a C program file as its input and which marks up all decision points in the input.

At each decision point in the input program, dataTester.c will add to the input program calls to a function called dataTester which will indicate that that specific point has been reached when the input program is run.

For example, given the original source code

...

if (isprint(a)) {
 //if a is a printable character then do...
 
} else {
 // otherwise do...
}

dataTester.c may produce

...

dataTester(24);
 if (inprint(a)) {
  dataTester(25);
  // if a is a printable character then do...
 
 } else { 
   dataTester(26);
  // otherwise do...  

 }
dataTester(27);

The call to dataTester does something to record that program control has passed through that point. In this way, when the program is executed, you can tell which path through a program the data caused the flow of control to pass (e.g., either points 24, 25, and 27, or 24, 26, and 27).

You will notice that the calls to dataTester are inserted just before, during and just after each decision point. This proves that the decision point was reached, executed when the conditional is true, and that the conditional was successully passed through. However, you must not have two successive calls to dataTester. For example,

dataTester(1);
for (a = 0; x < 10; x++) {
 dataTester(2);
 printf("Hello\n");
}
dataTester(3);
dataTester(4);
for (a = 0; x < 10; x++) {
dataTester(5);
 printf("Goodbye\n");
}
dataTester(6);
In this case, dataTester(4) is unnecessary, because it is impossible to execute dataTester(4) without also executing dataTester(3). They represent a sequence of events, rather than a branch. By executing dataTester(3) you demonstrate not only that the first loop successfully terminated, but also that you are about to execute the second loop.

The decision points which you are to mark up are

if
while
do
(user defined function) function calls (you may ignore system library function calls)
conditional expressions of the form ?:
for
switch

Ignore decision points that the user has commented out. Do not merely markup each and every statement :-)

dataTester.c should take care not to overwrite the original source file, and must write protect the source file created by dataTester.c to prevent the user from accidentally modifying the marked up code. Given a source file, dataTester.c should check if it has already created a modified file, and if so the user should be interactively asked if the output file should be overwritten. dataTester.c must change the write protection on the existing output file before attempting to overwrite the file if the user gives permission. For example, if dataTester.c is given test.c to process, it normally creates a write protected file test.dataTester.c. If test.dataTester.c already exists, the user must give permission to overwrite the file.

After test.dataTester.c is compiled and executed, it must tell the user which path(s) through the program were executed, how much of the program was executed on this run, how much of the program has been tested over cumulative runs, and which decision points have not yet been tested (cumulatively).

It would be useful if a temporary file contains the last decision point successfully reached, so that if the program terminates abnormally, then the user can look up the location of the last decision point which will indicate at which point the program may have failed.

Beware of single-step statements inside conditionals which are not enclosed within control blocks ({}), because adding a line of code may result in a logical error being introduced by your program. For example, if the original code was

...

if (a > 0)
 printf("Hello\n");
...

The *correct* way to amend this is

...
dataTester();
if (a > 0) { // notice the introduction of { and ...
 dataTester();
 printf("Hello\n");
} // the introduction of }
dataTester();
...

Without the introduction of { and }, only the call to dataTester would be conditionally executed depending on the value of a. The printf statement would be unconditionally printed. Because of the complexity of tackling this particular problem, it is sufficient if dataTester.c simply indicates that it has failed to modify the original program because the { is missing. In this case, give the user the line number in the original program from which the { is missing. You may otherwise assume that the original source code given as input to dataTester.c compiles without error.

What should the function dataTester do?

dataTester must keep track of which decision points have been executed in each run of the program being tested. In particular, it must report which decision points have not been executed in any of the test runs. At the end of each program run it must indicate what percentage of the decision points have never been tested. dataTester must also keep a record (on disk) of the last decision point that was executed during a test run, so that if the program being tested terminates abnormally, the programmer can see between which decision points the program was when it terminated.

You are unable to calculate the overall percentage of decision points executed without knowing the total number of decision points in the source code. Armed with this information, it is technically possible to define a fixed-length array to store the information about whether a decision point has ever been accessed. However, to slightly increase the complexity of the program, and because this course is entitled "C for Computer Scientists", you are required to utilise a more space efficient data structure to store this information. You must also use dymanic memory allocation to extend the data structure as more space is needed to store encountered decision points. You may use any data structure to achieve this, as long as you do not pre-allocate all the required space for it. Some suitable data structures are: linked lists, balanced trees.... Dynamic arrays are allowed, but you will lose 5% of your marks (i.e., your assignment will be graded out of 35% instead of 40%). You must also store a list of decision points in the order that they are executed each run. This last list must be preserved for post-run checking, so that a unit tester will be able to determine that the flow of execution was as expected, given the input data. For example, consider the execution of the following program fragment

...
int a[] = {5, 10, 0};
int x;
dataTester(1);
for (x = 0; a[x] != 0; a++) {
 dataTester(2);
 if (a[x] > 6) {
  dataTester(3);
  printf("Error!\n");
 }
 dataTester(4);
}
dataTester(5);
...

Assuming that the program runs to completion, the following path of execution is produced:

1, 2, 4, 2, 3, 4, 5

DataTester would have been able to verify that 100% of the decision points were executed. On the other hand, if a = {5, 4, 0}, then dataTester would have produced 1, 2, 4, 2, 4, 5 as the path of execution; that this run tested points 1, 2, 4, and 5; that 80% of decision points were executed; and that decision point 3 was never executed. If a = {5, 4, 0} was run after a = {5, 10, 0}, then as decision point 3 was executed on a previous run, dataTester would still be able to report that all decision points had been executed.

Assumptions

You may assume that each program file in a multi-file program is tested separately. If a function in one program file, which is being tested, calls a function in another program file, you may assume that the function in the other program file has already been tested and that it works correctly. dataTester.c will accept only one program file as its input. It will place tracking information around the calls to the function in the calling program, but it will not add tracking information to the called function in the program file containing it. For example, assume you have two program files, a.c and b.c. a.c #includes b.h (which is the header file for b.c), and calls a function boo() which is defined in b.c. Although dataTester.c will place calls to the function dataTester around the call to boo() in a.c, it will not open b.c and place calls to dataTester in function boo() in the program file b.c. However, you will need to run gcc a.dataTester.c b.c to be able to run the compiled, marked up program correctly.

Deliverables:

C Source code
Object Code, compiled on babe.cs.um.edu.mt
Documentation, including a brief section on weaknesses (if any) of your approach. If your solution has no weaknesses, please explain why
Evidence that your program has been adequately tested
In another section, the #if family of preprocessor commands can be used to conditionally compile different parts of the source code. Although you do not have to specifically cater for this in dataTester.c, you should explain what are the side effects of either ignoring or including conditional compilation commands.

Guidelines for the documentation

Your documentation should normally not exceed 15 pages in length (excluding source code listing). You will normally describe, in your own words, the problem you are trying to solve; the solution you implemented, and why that particular solution, rather than any other solution; problems you encountered, and how you solved them; major data structures used and the operations on those data structures; evidence that your program works; an example user session; weaknesses of your approach (including things required but not implemented); and future enhancements. For this particular exercise, you should also explain the side effects of dataTester.c ignoring or including conditional compilation commands (#if, etc.).

On average, and assuming a reasonable level of competence with C, this program should take you approximately 2 days of effort to code and test, and another 2 days effort to document.