Examples of Corpora
-
Brown Corpus. A tagged corpus of about a million words put together
at Brown University during the 1960s and 1970s.
Brown corpus is balanced. It was intended as a representative example of
American English. It contains texts of different genres including newspapers,
fiction, scientific text, legal text, and others.
-
Lancaster Oslo Bergen (LOB) Corpus: A British English replication
of the Brown Corpus
-
Suzanne Corpus: A 130K word subset of the Brown Corpus (freely available)
which is annotated with the syntactic structure of sentences.
-
Penn Treebank. A corpus of parsed sentences based on text from the Wall
Street Journal.
-
Canadian Hansards: bilingual corpus of the proceedings of the Canadian
parliament Contains parallel texts in English and French which have been
used to investigate statistically based machine translation.
-
WordNet: An electronic dictionary of English. Words are organised
into a hierarchy, rather like a thesaurus. Each node consists of a synset
of words with identical or near identical meanings.
Last modified: Wed Apr 5 09:35:21 MET DST