LIN3022 Natural Language Processing

Course Details Course Description Lectures Assessment Resources
Course tutor Albert Gatt ()
Venue CCT 404A
Lectures Mondays, 12:00 - 14:00

bibliography

  1. Jurafsky, D. & Martin,J.H. (2009). Speech and Language Processing (2nd Ed.). New Jersey: Prentice Hall

further reading

Lectures will often be accompanied by specific readings. These will be made available in good time.

course description

This course will build on the basic concepts that students will have acquired in the introductory course in Computational Linguistics, LIN2160. Participants will be given a state-of-the art overview of current work in Natural Language Processing, focusing in particular on (a) theoretical paradigms (statistical vs. symbolic); (b) NLP tasks and techniques.

Content Covered

  1. Paradigms in NLP: Statistical vs. Symbolic and knowledge-based approaches to NLP tasks
  2. Natural Language Analysis: tagging, parsing and understanding
  3. Discourse: Theoretical concepts underlying discourse structure and rhetorical structure.
  4. Natural Language Generation

lectures

This page contains details of lectures and readings for each lecture. Following each lecture, I will put up the lecture notes for download.

Note: The topics of lectures may alter at short notice, so please check back regularly. The relevant resources listed for a lecture are usually linked from the resources page.
Acknowledgement: Some of the slides in Lectures 3 & 4 are based on online course materials by James Martin, University of Colorado at Boulder.

  1. Introduction to NLP; historical overview
    • Reading: Jurafsky & Martin Chapter 1
    • Lecture notes: ppt
    • Task for next week: try out Weizenbaum's Eliza here. Try in particular to test its capacity for handling ambiguous utterances (e.g. structurally ambiguous ones). You could also try using Winograd's SHRDLU here (with downloadable demo). There's a nice Wikipedia article about it.
  2. Turing and the Turing Test; Finite State Machines
    • Reading: Jurafsky & Martin, Chapter 2
    • Lecture notes: pptx
    • Supplementary notes on regular expressions: ppt on regular expressions
  3. Finite State Transducers and Morphology
  4. Spell checking, minimum edit distance; Introduction to language models
    • Reading: Jurafsky and Martin, Chapter 3, Sections 3.10 -- 3.11
    • Lecture notes: pptx
    • Task: try out the google ngrams book search (think about phrases of varying lengths whose usage could have changed over time)
  5. Language models and Markov assumptions
  6. Markov models and Part of Speech Tagging
    • Reading: Jurafsky & Martin, Chapter 5, up to Section 5.5.2
    • Lecture notes: pptx
  7. Parsing I: Definitions and basic approaches
    • Reading: Jurafsky & Martin, Chapter 13, up to Section 13.2
    • Lecture notes: pptx
  8. Parsing II: Dynamic programming and probabilistic approaches
    • Reading: &M, Chapter 13, Sections 13.3 up to Section 13.4.1; J&M Chapter 14,up to Section 14.2
    • Lecture notes: pptx
  9. Natural Language Generation and Models of Discourse I: Document planning and Rhetorical Structure Theory
  10. Natural Language Generation and Models of Discourse II: reference and Centering
    • Reading: Jurafsky & Martin, Chapter 21: Introduction, Section 21.2, Sections 21.3-21.5, Section 21.6.2
    • Lecture notes: pptx
  11. Machine Translation

assessment

Assessment for this course is via a two-hour examination. The examination will be split into two: Part A (40%) will consist of 10 multiple-choice questions. Part B (60%) will consist of 8 short questions, from which you will need to choose 4.

A short model paper is provided here

resources