CSM 210 (Part I) Assessed Project.

Date due: The practical must be submitted at the end of semester test for CSM210 
(Part I) due to be held in January 1997.

Project Description

The project entails a re-implementation of the UNIX wc (word count) utility program, 
with some modifications and additions.

It is important that your implementation of wc (called wc+) follows the requirements 
detailed in this description. Failure to do so could result in the deduction of marks.

Program Requirements

NAME
	wc+ - display a count of lines, words, and characters  in a file. Additionally, wc+ 
will display lists of words in the input sorted both alphabetically and according to 
word frequency.
 
SYNOPSIS
	wc+ [ -clLws ] [ name ... ]
 
DESCRIPTION
	wc+ counts lines, words, and characters in the named text files, or  in the standard 
input if no names appear.  It also keeps a total count for all named files.  A word 
is  a  string  of characters  delimited  by  a  SPACE, TAB, or by any other 
character  in   the   library   function   iswspace() (see the UNIX manual pages for 
a description of iswspace()). wc+ also returns an alphabetically sorted list of 
words in each file and a numerically sorted list in descending order of word 
frequency. It also returns the word and its frequency for the most frequently 
occurring word in each named file. 
 
OPTIONS
	When name is specified on the command line,  the  names  are printed along with 
the counts.
 
	If no option is specified the default is -lwc (count  lines, words, and characters.)
 
	-c	Count characters.
  
	-l 	Count lines.
 
	-w	Count words delimited by white space characters or new line  characters.   
	Delimiting characters are Extended Unix Code (EUC) characters from any 
	code set defined by iswspace().

-L	Return an alphabetically sorted list of words that appear in the input. Words 
must not appear more than once in the list. Capital letters are not significant 
(i.e., ÒWestÓ and ÒwestÓ are considered identical) and acronyms are treated as 
whole words (e.g., ÒA.R.CÓ is one word). Words composed of digits (e.g., 
telephone numbers) should be sorted according to their numerical value (e.g., 
12345 is greater than 213).

-s	Return a list of words and their frequency in descending order of word 
frequency. 

Additional Requirements

Apart from the functions for -L and -s, which can be coded in the same C source file, 
each function for the other command line arguments, the main() function, and the 
function to test for error conditions must be coded in separate C source files.

Apart from the usual error conditions (named files exist, user has permissions to read 
named files, command line arguments given are supported, etc.) the program must 
check that the named files are text files.

The functions which cater for the -L and -s command line arguments must use linked 
lists of data structures to store the words and word frequencies. Memory allocated for 
the lists must be managed dynamically.

The C source code must be compilable by the UNIX gcc compiler.

Documentation

The documentation must describe each function of the wc+ program, and how the 
functions interact. Evidence that the wc+ program has been fully tested must be 
documented. The program listing of each C source file and any header files you have 
written must also be included.

Important Note

Plagiarism will not be tolerated and any candidates found to have plagiarised will risk 
failing the entire double credit for CSM210.