CSM210 Assessed Project for the January 2001 session

Date due: The practical must be submitted before the end of semester test for CSM210 due to be held in January 2001.

This assessed programming task is worth 40% of the credit CSM210 (Programming in C).

Rules and Regulations

These are IMPORTANT. Please read them, and if in any doubt, seek clarification from me PRIOR to the submission of the assessed practical task.

Your programs MUST be compilable using the version of UNIX gcc installed on a UNIX server of the Department of Computer Science and AI. If the examiner is unable to compile the program on at least one of the UNIX servers provided by the Department of Computer Science and AI, the examiner will penalise the submission accordingly.

Plagiarism will not be tolerated. Students found to have plagiarised will fail the credit and will risk being expelled from their respective degree course. THIS IS FOR REAL.

The deadline for the assessed practical task will be the time of the credit test for CSM210. The task must be submitted to Room 202, New Computing Bulding, University of Malta, Tal-Qroqq, Msida, and must be signed in as proof of submission. Late submission of tasks will attract an immediate 50% penalty (regardless of the reason for lateness) with an additional 10% penalty for each subsequent day of late submission, weekends included. In the event that a candidate is sick on the day of the credit test, the candidate must ensure that the assessed practical task is delivered to the location specified above in conjunction with the medical certificate (which must arrive within 1 hour of the start of the exam). Note that the penalties referred to in this document apply only to the 40% allocated to the assessed practical task.

The examiner reserves the right to ask *any* candidate to defend his or her submission via an oral examination prior to the results being published for this credit.

Failing candidates: in the event of a resit, the marks awarded for the assessed practical task in the first sit will stand, unless the candidate gives notice that they intend to resubmit the assignment at the time they register for the resit. In this case, the submission will be worth only 20%, with the written part of the examination worth 60%. It is not possible for the marks awarded to the first submission to be disclosed to resitting candidates.

Project Description

The project entails a re-implementation of the UNIX wc (word count) utility program, with some modifications and additions.

It is important that your implementation of wc (called wcPlus) follows the requirements detailed in this description. Failure to do so could result in the deduction of marks.

Program Requirements

NAME
wcPlus - display a count of lines, words, and characters in a file. Additionally, wcPlus will display lists of words in the input sorted both alphabetically and according to word frequency.

SYNOPSIS
wcPlus [ -clLws ] [ name ... ]

DESCRIPTION
wcPlus counts lines, words, and characters in the named text files, or in the standard input if no filenames appear. It also keeps a total count for all named files. A word is a string of characters delimited by a SPACE, TAB, or by any other character in the library function iswspace() (see the UNIX manual pages for a description of iswspace()). wcPlus also returns an alphabetically sorted list of words in each file and a numerically sorted list in descending order of word frequency. It also returns the words and frequency for the most frequently occurring words in each named file.

OPTIONS
When the filename is specified on the command line, they are printed along with the counts.

If no option is specified the default is -lwc (count lines, words, and characters.)

-c Count characters.

-l Count lines.

-w Count words delimited by white space characters or new line characters. Delimiting characters are Extended Unix Code (EUC) characters from any code set defined by iswspace().

-L Return an alphabetically sorted list of words that appear in the input. Words must not appear more than once in the list. Capital letters are not significant (i.e., "West" and "west" are considered identical) and acronyms are treated as whole words (e.g., "A.R.C" is one word). Words composed of digits (e.g., telephone numbers) should be sorted according to their numerical value (e.g., 12345 is greater than 213).

-s Return a list of words and their frequency in descending order of word frequency.

Additional Requirements

Apart from the functions for -L and -s, which can be coded in the same C source file, each function for the other command line arguments, the main() function, and the function to test for error conditions must be coded in separate C source files.

Apart from the usual error conditions (named files exist, user has permissions to read named files, command line arguments given are supported, etc.) the program must check that the named files are text files.

The functions which cater for the -L and -s command line arguments must use linked lists of data structures to store the words and word frequencies. Memory allocated for the lists must be managed dynamically. If you wish to implement a more complex data structure (e.g., a balanced tree), you must first seek approval from your lecturer (i.e., me!).

The C source code must be compilable by the UNIX gcc compiler, and the executable must run on one of the UNIX hosts belonging to the Department of Computer Science and AI (e.g., babe.cs.um.edu.mt).

Documentation

The documentation must describe each function of the wcPlus program, and how the functions interact. Evidence that the wcPlus program has been fully tested must be documented. The program listing of each C source file and any header files you have written must also be included.

Deliverables

1. Electronic version of the C source code, which must be compilable by gcc on one of the department's UNIX servers. The source code should be adequately commented.

2. Electronic version of the C object code, which must have been generated by gcc on one of the department's UNIX servers.

3. Written documentation to include: