Computational Psycholinguistics, Spring 2018
1 Course information
Lecture Times | Mondays & Wednesdays 9:30-11:00am |
Lecture Location | 46-4199 |
Class website | http://stellar.mit.edu.ezproxy.canberra.edu.au/S/course/9/sp18/9.19/index.html |
Syllabus | http://www.mit.edu.ezproxy.canberra.edu.au/~rplevy/teaching/2018spring/9.19 |
2 Instructor information
Instructor | Roger Levy (rplevy@mit.edu) |
Instructor's office | 46-3033 |
Instructor's office hours | Mondays 11am-12pm, Tuesdays 2-3pm |
Teaching Assistants | Yevgeni Berzak (berzak@mit.edu) |
Richard Futrell (futrell@mit.edu) | |
TA Offices | Yevgeni: 46-3027G |
Richard: 46-3037 | |
TA Office Hours | Yevgeni: W 12:30-2pm |
Richard: R 2-3pm |
3 Course Description
Over the last two and a half decades, computational linguistics has been revolutionized as a result of three closely related developments: increases in computing power, the advent of large linguistic datasets, and a paradigm shift toward probabilistic modeling. At the same time, similar theoretical developments in cognitive science have led to a view major aspects of human cognition as instances of rational statistical inference. These developments have set the stage for renewed interest in computational approaches to human language use. Correspondingly, this course covers some of the most exciting developments in computational psycholinguistics over the past decade. The course spans human language comprehension, production, and acquisition, and covers key phenomena spanning phonetics, phonology, morphology, syntax, semantics, and pragmatics. Students will learn technical tools including probabilistic models, formal grammars, neural networks, and decision theory, and how theory, computational modeling, and data can be combined to advance our fundamental understanding of human language acquisition and use.
4 Course organization
We'll meet twice a week; the course format will be a combination of lecture, discussion, and in-class exercises as class size, structure, and interests permit.
5 Intended Audience
Undergraduate or graduate students in Brain & Cognitive Sciences, Linguistics, Electrical Engineering & Computer Science, and any of a number of related disciplines. The undergraduate section is 9.19, the graduate section is 9.190. Postdocs and faculty are also welcome to participate!
The course prerequisites are:
- One semester of Python programming (fulfillable by 6.00/6.0001+6.0002, for example), plus
- Either:
- one semester of probability/statistics/machine learning (fulfilled by, for example, 6.041B or 9.40), or
- one semester of introductory linguistics (fulfilled by 24.900).
If you think you have the requisite background but have not taken the specific courses just mentioned, please talk to the instructor to work out whether you should take this course or do other prerequisites first.
We will be doing some Python programming in this course, and also using programs that must be run from the Unix/Linux/OS X command line.
6 Readings & Textbooks
Readings will frequently be drawn from the following textbooks:
- Daniel Jurafsky and James H. Martin. Speech and Language Processing. Third edition (draft). Draft chapters can be found here. (I refer to this book as "SLP" in the syllabus.)
This textbook is the single most comprehensive and up-to-date introduction available to the field of computational linguistics.
- Bird, Steven, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python. O'Reilly Media. (I refer to this book as "NLTK" in the syllabus.)
This is the book for the Natural Language Toolkit (or NLTK), which we will be using extensively to do programming We will also be doing some of our programming in the Python programming language, and will make quite a bit of use of for Python. You can buy this book, or you can freely access it on the Web at http://www.nltk.org/book.
- Christopher D. Manning and Hinrich Schütze. (1999). Foundations of statistical natural language processing. Cambridge: MIT press. Book chapter PDFs can be obtained through the MIT library website. (I refer to this book as "M&S" in the syllabus.)
This is an older but still very useful book on NLP.
- Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press. Book chapter PDFs can be obtained through the MIT library website. (I refer to this book as "MRS" in the syllabus.)
We'll also occasionally draw upon other sources for readings, including original research papers in computational linguistics, psycholinguistics, and other areas of the cognitive science of language.
7 Syllabus (subject to modification)
Week | Day | Topic | Readings | Related readings | Problem sets |
---|---|---|---|---|---|
Week 1 | Wed 7 Feb | Course intro; intro to probability theory; | Feldman et al., 2009 | ||
Week 2 | Mon 12 Feb | Speech perception and the perceptual magnet | MRS Chapter 13 | Pset 1 out | |
Wed 14 Feb | Elementary text classification in NLTK; Naive Bayes; Word sequences, language models and \(N\)-grams | SLP 4.1-4.4 | |||
Week 3 | Tue 20 Feb | More advanced issues in \(N\)-gram modeling | SLP 4.5-4.8 | ||
Wed 21 Feb | Prediction in human language understanding; surprisal | Kutas et al. 2011; Piantadosi et al., 2011 | Smith & Levy, 2013 | ||
Week 4 | Mon 26 Feb | Regular expressions | SLP 2.1-2.1.6 | Pset 1 due; Pset 2 out | |
Wed 28 Feb | Finite-state machines | SLP 2.2-2.4, 3 | |||
Week 5 | Mon 5 Mar | Finite-state machines II | |||
Wed 7 Mar | Finite-state transducers | ||||
Week 6 | Mon 12 Mar | Weighted finite-state machines; noisy-channel models; Policy optimization and modeling human eye movements in reading | M&S 3.1; Sutton & Barto in progress, 1.1-1.6; Bicknell & Levy, 2010 | Pset 2 due; Pset 3 out | |
Wed 14 Mar | Bayes Nets and interventions | Kraljic et al., 2008; Russell & Norvig, 2010, chapter 14 (on Stellar); Levy in progress, Directed Graphical Models appendix; | Bayes Nets lecture notes | ||
Week 7 | Mon 19 Mar | Multi-factor models: logistic regression; word order preferences in language. Hierarchical models; binomial construction. | SLP 7; Graphical models intro ; Morgan & Levy, 2015 | ||
Wed 21 Mar | Midterm exam | ||||
Spring Break | 26-30 Mar | Spring break, no class | |||
Week 8 | Mon 2 Apr | Is human language finite state? | SLP 11; SLP 2nd edition chapter 16 (under Readings on Stellar) | Chomsky, 1956 | Pset 3 due |
Wed 4 Apr (46-3189) | Context-free grammars; Syntactic analysis. | SLP 12; NLTK 8.1-8.5; Levy & Andrew, 2006 | Gazdar, 1981 (in particular Section 2); Müller, 2018 (in particular Section 5.4); Joshi et al., 1991 (on formalisms that go beyond context-free) | ||
Week 9 | Mon 9 Apr | Probabilistic context-free grammars, incremental parsing, human syntactic processing | SLP 13; NLTK 8.6; Levy, 2013 | Jurafsky, 1996; Hale, 2001; Levy, 2008 | Pset 4 out |
Wed 11 Apr | Searching treebanks; parsing as (weighted) intersection of context-free grammars and finite-state machines; noisy-channel syntactic comprehension | Levy, 2011 | |||
Week 10 | Mon 16 Apr | Patriots Day, no class (student holiday due to Patriots Day) | |||
Wed 18 Apr | Word embeddings | SLP 15,16; Levy, Goldberg, & Dagan, 2015 | Mikolov et al., 2013, Pennington et al., 2014, Gutierrez et al., 2016 | Pset 4 due (April 20); Pset 5 out | |
Week 11 | Mon 23 Apr | Implicit associations in word embeddings | Caliskan et al., 2017 | ||
Wed 25 Apr | Recurrent neural network models for language | Collobert et al., 2011 | |||
Week 12 | Mon 30 Apr | What do neural networks learn about language structure? | Linzen et al., 2016 | ||
Wed 2 May | Pragmatics in language understanding. Scalar inference. The Rational Speech Acts model. | Goodman & Frank, 2016 (see also Frank & Goodman, 2012) | Pset 5 due (Friday, May 4); Pset 6 out | ||
Week 13 | Mon 7 May | Advanced pragmatics models: lexical uncertainty, scalar adjectives | Lassiter & Goodman, 2015 | ||
Wed 9 May (46-3189) | Statistical word learning in humans; modeling with nonparametric Bayes | Saffran et al., 1996; Goldwater, Griffiths, & Johnson, 2009 | Pset 6 due (Friday May 11) | ||
Week 14 | Mon 14 May | The emergence of syntactic productivity in language development | Meylan et al., 2017 | ||
Wed 16 May | End-of-semester review (bring your questions!) | Final projects due Thursday, May 17 | |||
Wed 23 May, 9am-noon | Final exam (in 46-3310) |
8 Requirements & grading
You'll be graded on:
Work | Grade percentage (9.19) | Grade percentage (9.190) |
A number of homework assignments throughout the semester | 50% | 37.5% |
A midterm exam | 20% | 15% |
A final exam | 30% | 22.5% |
If you are enrolled in 9.190, a final project | -- | 25% |
Active participation in the class is also encouraged and taken into account in borderline grade cases!
8.1 Homework late policy
Homework assignments can be turned in up to 7 days late; 10% of your score will be deducted for each 24 hours of lateness (rounded up). For example, if a homework assignment is worth 80 points, you turn it in 3 days late, and earn a 70 before lateness is taken into account, your score will be (1-0.3)*70=49.
8.2 Medical or personal circumstances impacting psets, exams, or projects
If medical or personal circumstances such as illness impact your work on a pset or project, or your ability to take an exam on the scheduled date with adequate preparation, please work with Student Support Services (S3) to verify these circumstances and be in touch with the instructor. We are happy to work with you in whatever way is most appropriate to your individual circumstances to help ensure that you are able to achieve your best performance in class while maintaining your health, happiness, and well-being.
8.3 Mapping of class score to letter grade
I grade the entire course on a curve, so that your end-of-semester letter grade will be determined by your overall points score in light of the distribution of scores among all class members. However, I guarantee minimum grades on the basis of the following thresholds:
Threshold | Guaranteed minimum grade |
>=90% | A- |
>=80% | B- |
>=70% | C- |
>=60% | D |
So, for example, an overall score of 90.0001% of points guarantees you an A-, but you could well wind up with a higher grade depending on the curve.
9 Mailing list
There will be a mailing list for this course, which you can access at https://s444-mailman-mit-edu.ezproxy.canberra.edu.au/mailman/listinfo/9.19-2018-spring. Please make sure you're signed up for it! This list is both for discussion of ideas in the class and for communications about organizational issues.