Course tutor |
|
Albert Gatt () |
Venue |
|
CCT 404A |
Lectures |
|
Mondays, 12:00 - 14:00 |
bibliography
- Jurafsky, D. & Martin,J.H. (2009). Speech and Language
Processing (2nd Ed.). New Jersey: Prentice Hall
further reading
Lectures will often be accompanied by specific readings. These
will be made available in good time.
course description
This course will build on the basic concepts that students will
have acquired in the introductory course in Computational Linguistics,
LIN2160. Participants will be given a state-of-the art overview of
current work in Natural Language Processing, focusing in particular on
(a) theoretical paradigms (statistical vs. symbolic); (b) NLP tasks and
techniques.
Content Covered
- Paradigms in NLP: Statistical vs. Symbolic and knowledge-based
approaches to NLP tasks
- Natural Language Analysis: tagging, parsing and understanding
- Discourse: Theoretical concepts underlying discourse structure
and rhetorical structure.
- Natural Language Generation
lectures
This page contains details of lectures and readings for each
lecture. Following each lecture, I will put up the lecture notes for
download.
Note: The topics of lectures may alter at short
notice, so please check back regularly. The relevant resources listed
for a lecture are usually linked from the resources page.
Acknowledgement: Some of the slides in Lectures 3 & 4
are based on online course materials by James Martin, University of
Colorado at Boulder.
- Introduction to NLP; historical overview
- Reading: Jurafsky & Martin Chapter 1
- Lecture notes: ppt
- Task for next week: try out Weizenbaum's Eliza here.
Try in particular to test its capacity for handling ambiguous
utterances (e.g. structurally ambiguous ones). You could also try
using Winograd's SHRDLU here
(with downloadable demo). There's a nice Wikipedia
article about it.
- Turing and the Turing Test; Finite State Machines
- Reading: Jurafsky & Martin, Chapter 2
- Lecture notes: pptx
- Supplementary notes on regular expressions: ppt on regular expressions
- Finite State Transducers and Morphology
- Spell checking, minimum edit distance; Introduction to
language models
- Reading: Jurafsky and Martin, Chapter 3, Sections 3.10 --
3.11
- Lecture notes: pptx
- Task: try out the google ngrams book search (think about phrases of
varying lengths whose usage could have changed over time)
- Language models and Markov assumptions
- Reading: Jurafsky & Martin, Chapter 4, Sections 4.1 -- 4.3;
Section 4.5 up to 4.5.1
- Lecture notes: pptx
- Practical task: docx
(to be discussed next week)
- A couple of interesting articles about google ngrams:
- Markov models and Part of Speech Tagging
- Reading: Jurafsky & Martin, Chapter 5, up to Section 5.5.2
- Lecture notes: pptx
- Parsing I: Definitions and basic approaches
- Reading: Jurafsky & Martin, Chapter 13, up to Section 13.2
- Lecture notes: pptx
- Parsing II: Dynamic programming and probabilistic approaches
- Reading: &M, Chapter 13, Sections 13.3 up to Section 13.4.1;
J&M Chapter 14,up to Section 14.2
- Lecture notes: pptx
- Natural Language Generation and Models of Discourse I:
Document planning and Rhetorical Structure Theory
- Natural Language Generation and Models of Discourse II:
reference and Centering
- Reading: Jurafsky & Martin, Chapter 21: Introduction, Section
21.2, Sections 21.3-21.5, Section 21.6.2
- Lecture notes: pptx
- Machine Translation
- Reading: Jurafsky & Martin, Chapter 25: Sections 25.1-25.3
- Lecture notes: pptx
assessment
Assessment for this course is via a two-hour examination. The
examination will be split into two: Part A (40%) will consist of 10
multiple-choice questions. Part B (60%) will consist of 8 short
questions, from which you will need to choose 4.
A short model paper is provided here