CSM210 Assessed Project for the May/June 2001 session

Date due: The practical must be submitted before the end of semester test for CSM210 due to be held in May/June 2001.

This assessed programming task is worth 40% of the credit CSM210 (Programming in C).

Rules and Regulations

These are IMPORTANT. Please read them, and if in any doubt, seek clarification from me PRIOR to the submission of the assessed practical task.

Your programs MUST be compilable using the version of UNIX gcc installed on a UNIX server of the Department of Computer Science and AI. If the examiner is unable to compile the program on at least one of the UNIX servers provided by the Department of Computer Science and AI, the examiner will penalise the submission accordingly.

Plagiarism will not be tolerated. Students found to have plagiarised will fail the credit and will risk being expelled from their respective degree course. THIS IS FOR REAL.

The deadline for the assessed practical task will be the time of the credit test for CSM210. The task must be submitted to Room 202, New Computing Bulding, University of Malta, Tal-Qroqq, Msida, and must be signed in as proof of submission. Late submission of tasks will attract an immediate 50% penalty (regardless of the reason for lateness) with an additional 10% penalty for each subsequent day of late submission, weekends included. In the event that a candidate is sick on the day of the credit test, the candidate must ensure that the assessed practical task is delivered to the location specified above in conjunction with the medical certificate (which must arrive within 1 hour of the start of the exam). Note that the penalties referred to in this document apply only to the 40% allocated to the assessed practical task.

The examiner reserves the right to ask *any* candidate to defend his or her submission via an oral examination prior to the results being published for this credit.

Failing candidates: in the event of a resit, the marks awarded for the assessed practical task in the first sit will stand, unless the candidate gives notice that they intend to resubmit the assignment at the time they register for the resit. In this case, the submission will be worth only 20%, with the written part of the examination worth 60%. It is not possible for the marks awarded to the first submission to be disclosed to resitting candidates.

Submission Guidelines

Please follow these instructions carefully. Failure to comply with these instructions may lead to loss of marks. If in doubt, please seek the lecturer's advice.

Your project will be anonymous. Your name, ID number, student registration number, etc. must not appear on anything that will be given to the examiner.

When you submit your project to Rm 202 (see above), you will be given a cover sheet to fill in with your personal details. This information will not be passed on to the examiner.

Project Description

The project entails a re-implementation of the UNIX grep (pattern search) utility program, with some modifications and additions.

It is important that your implementation of grep (called myGrep) follows the requirements detailed in this description. Failure to do so could result in the deduction of marks.

Program Requirements

NAME
myGrep - search a file for a pattern.

SYNOPSIS
myGrep [ -uicChw ] string [ file_list ... ]
myGrep [ -licCw ] string [ file_list ... ]

DESCRIPTION
myGrep searches files for a string and prints all sentences containing that string.

If a sentence contains n occurrances of string, then the sentence will be printed n times. Each printed sentence is preceded by the name of the file containing it.

By default, comparisons to string are case sensitive. A sentence is defined as being contained within a SENTENCE_START and a SENTENCE_END. A SENTENCE_START is met if one of the following conditions is true:

The character just read is the first character in the input stream
The character just read is white space character (tab, space, return, etc.) and the previous character was a SENTENCE_END character.

A SENTENCE_END character is one of period, exclamation mark, question mark, EOF. Except when it is EOF, a SENTENCE_END character must be followed by either white space, or by a single or double quotation mark, or a close bracket. You must handle quotation marks and brackets sensibly.

If the file_list is missing, myGrep takes its input from standard input.

OPTIONS

-u A sentence is printed only once, even if the string occurs more than once.

-i Ignore the case of string.

-l Print file names only. Do not print sentences. File name is printed once only.

-c Print count of occurrences of string in each file.

-C Print count of sentences containing string.

-h Prevent name of file from being printed.

-w Match string against a whole word.

Options u and l, and l and h are mutually exclusive. myGrep must report an error if these options appear together. myGrep must report an error if an unknown option is encountered. If myGrep reports errors in the options, processing must stop immediately.

For option w, a whole word is once which is a string which is surrounded by non-alphanumeric characters.

By default, myGrep will print the filename and matching sentence for each string match in the format filename: sentence. This means that if a sentence contains three occurrances of string, then filename: sentence will be printed three times.

Additional Requirements

The program must be written using at least three separate source files. All the sentence handling functions and the error handling functions should be coded in separate source files.

Apart from the usual error conditions (named files exist, user has permissions to read named files, command line arguments given are supported, etc.) the program must check that the named files are text files. An invalid file should be skipped, after printing an appropriate error message.

You must not use the C string library functions strcmp, strncmp, or any of their derivatives. You must write the complete string matching functions yourself.

Dynamic memory management should be used to manipulate sentences. DO NOT USE ARRAYS OF PREDEFINED LENGTHS.

The C source code must be compilable by the UNIX gcc compiler, and the executable must run on one of the UNIX hosts belonging to the Department of Computer Science and AI (e.g., babe.cs.um.edu.mt).

Deliverables

1. Electronic version of the C source code, which must be compilable by gcc on one of the department's UNIX servers. The source code should be adequately commented.

2. Electronic version of the C object code, which must have been generated by gcc on one of the department's UNIX servers.

Electronic versions of your files should be submitted on floppy disk (either Mac or DOS formatted). The disk should be attached to the written documentation (see below), and should be anonymous.

3. Compilation instructions.

4. Written documentation to include: