Markup


  • Text is either "raw", or else marked up in some way. 

  • The term "markup" has its origins in the publishing industry, where it refers to annotations on a manuscript that are used to indicate layout, font-size etc.

  • Every electronic document standard involves markup which is not actually part of the text, but explains something about the structure of the text.

  • Markup for layout versus markup for content.

  • Examples of electronic markup languages are RTF, PostScript, HTML, SGML and XML.

  • Most word processing software hides the markup from the user, who only sees the end result.

  • Normally, when dealing with corpora, we will want explicit markup that we can see. For the most part, this is manipulated with an ordinary text editor.