What is a Corpus? 

A corpus is a body of texts, utterances, or other specimens considered more or less representative of a language, and usually stored as an electronic database. 

Corpora differ according to 

  • Medium: printed, electronic text, digitized speech, video.

  • Language: monolingual/multilingual

  • Information Content: Plain 
  • PRIDE AND PREJUDICE 

    vol. 1 

    chapter 1 

    IT is a truth universally acknowledged, that a single man in 
    possession of a good fortune, must be in want of a wife. 

    versus Tagged 
    A01 2 ^ *'_*' stop_VB electing_VBG life_NN peers_NNS **'_**' ._. 
    A01 3 ^ by_IN Trevor_NP Williams_NP ._. 
    A01 4 ^ a_AT move_NN to_TO stop_VB \0Mr_NPT Gaitskell_NP from_IN 
    A01 4 nominating_VBG any_DTI more_AP labour_NN 
    A01 5 life_NN peers_NNS is_BEZ to_TO be_BE made_VBN at_IN a_AT meeting_NN 
    A01 5 of_IN labour_NN \0MPs_NPTS tomorrow_NR ._. 

    More on part-of-speech tags 

    examples of corpora