some useful resources
From time to time I put up some stuff I've done
that might be useful to others.
- Tools for Maltese NLP:
Various tools (including POS Tagger, tokeniser, phonetic transcriber) for Maltese. Mostly written in Python. Hosted on the Maltese Language Resource Server.
a java library for morphological generation and syntactic
realisation. This used to be hosted on Google Code, but is now on Github.
- The GenChal Repository: an online repository of
datasets related to the Generation Challenges, a series of Shared
Task challenges organised since 2007.
Maltese Language Resource Server (MLRS): a server for language
resources and tools in Maltese. Currently hosts a corpus of ca.
100m tokens of Maltese text. This is continuously being updated.
- The TUNA Corpus of Referring Expressions, a
semantically transparent, annotated corpus of references to objects
in visual domains. This corpus has been used in three Shared Task
Evaluations since its development.
- Experiment on temporal structure in narrative: I've
recently run this experiment, and have collected a large corpus of
narratives that I'll make available once annotation is complete.
Meantime, you can read a summary here.
- Annotated bibliography on the generation of
referring expressions (and related problems)
collection of publications on reference and its computational
treatment in generation, compiled as part of the TUNA Project. Not up to date at all!