Assessed Exercise

The exercise has three aims

  1. Mark up named entities from this example of news text by hand.(30%)
  2. Mark up named entities from same example by algorithm. (50%)
  3. Calculate precision and recall of your algorithm (10%)
  4. Comment on the results (10%)

 

Examples of markup are as follows, but further details, if required, can be found on the MUC6 pages (see IE Lectures pages).

Type Example Markup
Numerical Expressions 2.5 per cent
$100
<numex type=pc>2.5.per cent</numex>
<numex type=money>$100</numex>
Organisations Morgan Stanley <namex type=org>Morgan Stanley</namex>
Persons Thierry Lacraz <namex type=per>Thierry Lacraz</namex>
Time Expressions 1645 GMT <timex type=time>1645<timex>
Locations Frankfurt <namex type=loc>Frankfurt</namex>

 

Notes

This is worth 7.5% of the marks for the double credit. Translated into hours this means 7.5 hrs of work. Please indicate the number of hours spent on each part.

I suggest (but do not insist) you use a combination of unix tools + xfst to handle this exercise. Whatever you use I will need to understand the operation of the algorithms you describe.