Natural Language Processing

Course tutor		Albert Gatt ()
Venue		CCT 404A
Lectures		Mondays, 12:00 - 14:00

bibliography

Jurafsky, D. & Martin,J.H. (2009). Speech and Language Processing (2nd Ed.). New Jersey: Prentice Hall

course description

This course will build on the basic concepts that students will have acquired in the introductory course in Computational Linguistics, LIN2160. Participants will be given a state-of-the art overview of current work in Natural Language Processing, focusing in particular on (a) theoretical paradigms (statistical vs. symbolic); (b) NLP tasks and techniques.

Content Covered

Paradigms in NLP: Statistical vs. Symbolic and knowledge-based approaches to NLP tasks
Natural Language Analysis: tagging, parsing and understanding
Discourse: Theoretical concepts underlying discourse structure and rhetorical structure.
Natural Language Generation

lectures

This page contains details of lectures and readings for each lecture. Following each lecture, I will put up the lecture notes for download.

Note: The topics of lectures may alter at short notice, so please check back regularly. The relevant resources listed for a lecture are usually linked from the resources page.
Acknowledgement: Some of the slides in Lectures 3 & 4 are based on online course materials by James Martin, University of Colorado at Boulder.

Introduction to NLP; historical overview
- Reading: Jurafsky & Martin Chapter 1
- Lecture notes: ppt
- Task for next week: try out Weizenbaum's Eliza here. Try in particular to test its capacity for handling ambiguous utterances (e.g. structurally ambiguous ones). You could also try using Winograd's SHRDLU here (with downloadable demo). There's a nice Wikipedia article about it.
Turing and the Turing Test; Finite State Machines
- Reading: Jurafsky & Martin, Chapter 2
- Lecture notes: pptx
- Supplementary notes on regular expressions: ppt on regular expressions
Finite State Transducers and Morphology
- Reading: Jurafsky & Martin, Chapter 3, Sections 3.1 -- 3.6
- Lecture notes: pptx
- Task: take a look at the Porter stemmer online demo
Spell checking, minimum edit distance; Introduction to language models
- Reading: Jurafsky and Martin, Chapter 3, Sections 3.10 -- 3.11
- Lecture notes: pptx
- Task: try out the google ngrams book search (think about phrases of varying lengths whose usage could have changed over time)
Language models and Markov assumptions
- Reading: Jurafsky & Martin, Chapter 4, Sections 4.1 -- 4.3; Section 4.5 up to 4.5.1
- Lecture notes: pptx
- Practical task: docx (to be discussed next week)
- A couple of interesting articles about google ngrams:
  - G. Nunberg (2009). Google's book search: a disaster for scholars. The Chronicle
  - J-B Michel et al. (2011). "Quantitative analysis of culture using millions of digitized books. Science, 331: 176-182
Markov models and Part of Speech Tagging
- Reading: Jurafsky & Martin, Chapter 5, up to Section 5.5.2
- Lecture notes: pptx
Parsing I: Definitions and basic approaches
- Reading: Jurafsky & Martin, Chapter 13, up to Section 13.2
- Lecture notes: pptx
Parsing II: Dynamic programming and probabilistic approaches
- Reading: &M, Chapter 13, Sections 13.3 up to Section 13.4.1; J&M Chapter 14,up to Section 14.2
- Lecture notes: pptx
Natural Language Generation and Models of Discourse I: Document planning and Rhetorical Structure Theory
- Reading 1: E. Reiter & R. Dale (1997). Building applied Natural Language Generation Systems. Natural Language Engineering, 3: 57--87
- Reading 2: W. Mann and S. Thompson (1988). Rhetorical Structure Theory: Towards a functional theory of text organisation. Text 8(5): 243--281 (up to Section 5).
- Lecture notes: pptx
Natural Language Generation and Models of Discourse II: reference and Centering
- Reading: Jurafsky & Martin, Chapter 21: Introduction, Section 21.2, Sections 21.3-21.5, Section 21.6.2
- Lecture notes: pptx
Machine Translation

Reading: Jurafsky & Martin, Chapter 25: Sections 25.1-25.3
Lecture notes: pptx

assessment

Assessment for this course is via a two-hour examination. The examination will be split into two: Part A (40%) will consist of 10 multiple-choice questions. Part B (60%) will consist of 8 short questions, from which you will need to choose 4.

A short model paper is provided here

resources

Online demo of Weizenbaum's ELIZA program and wikipedia article
Winograd's SHRDLU: homepage (with downloadable demo) and Wikipedia article
The homepage of the Annual Loebner Prize
Homepage of the Porter Stemmer
Porter Stemmer online demo

LIN3022 Natural Language Processing