Computational Psycholinguistics, Spring 2018

1 Course information

2 Instructor information

Instructor Roger Levy (rplevy@mit.edu)
Instructor's office 46-3033
Instructor's office hours Mondays 11am-12pm, Tuesdays 2-3pm
Teaching Assistants Yevgeni Berzak (berzak@mit.edu)
  Richard Futrell (futrell@mit.edu)
TA Offices Yevgeni: 46-3027G
  Richard: 46-3037
TA Office Hours Yevgeni: W 12:30-2pm
  Richard: R 2-3pm

3 Course Description

Over the last two and a half decades, computational linguistics has been revolutionized as a result of three closely related developments: increases in computing power, the advent of large linguistic datasets, and a paradigm shift toward probabilistic modeling. At the same time, similar theoretical developments in cognitive science have led to a view major aspects of human cognition as instances of rational statistical inference. These developments have set the stage for renewed interest in computational approaches to human language use. Correspondingly, this course covers some of the most exciting developments in computational psycholinguistics over the past decade. The course spans human language comprehension, production, and acquisition, and covers key phenomena spanning phonetics, phonology, morphology, syntax, semantics, and pragmatics. Students will learn technical tools including probabilistic models, formal grammars, neural networks, and decision theory, and how theory, computational modeling, and data can be combined to advance our fundamental understanding of human language acquisition and use.

4 Course organization

We'll meet twice a week; the course format will be a combination of lecture, discussion, and in-class exercises as class size, structure, and interests permit.

5 Intended Audience

Undergraduate or graduate students in Brain & Cognitive Sciences, Linguistics, Electrical Engineering & Computer Science, and any of a number of related disciplines. The undergraduate section is 9.19, the graduate section is 9.190. Postdocs and faculty are also welcome to participate!

The course prerequisites are:

  1. One semester of Python programming (fulfillable by 6.00/6.0001+6.0002, for example), plus
  2. Either:
    • one semester of probability/statistics/machine learning (fulfilled by, for example, 6.041B or 9.40), or
    • one semester of introductory linguistics (fulfilled by 24.900).

If you think you have the requisite background but have not taken the specific courses just mentioned, please talk to the instructor to work out whether you should take this course or do other prerequisites first.

We will be doing some Python programming in this course, and also using programs that must be run from the Unix/Linux/OS X command line.

6 Readings & Textbooks

Readings will frequently be drawn from the following textbooks:

  1. Daniel Jurafsky and James H. Martin. Speech and Language Processing. Third edition (draft). Draft chapters can be found here. (I refer to this book as "SLP" in the syllabus.)

    This textbook is the single most comprehensive and up-to-date introduction available to the field of computational linguistics.

  2. Bird, Steven, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python. O'Reilly Media. (I refer to this book as "NLTK" in the syllabus.)

    This is the book for the Natural Language Toolkit (or NLTK), which we will be using extensively to do programming We will also be doing some of our programming in the Python programming language, and will make quite a bit of use of for Python. You can buy this book, or you can freely access it on the Web at http://www.nltk.org/book.

  3. Christopher D. Manning and Hinrich Schütze. (1999). Foundations of statistical natural language processing. Cambridge: MIT press. Book chapter PDFs can be obtained through the MIT library website. (I refer to this book as "M&S" in the syllabus.)

    This is an older but still very useful book on NLP.

  4. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press. Book chapter PDFs can be obtained through the MIT library website. (I refer to this book as "MRS" in the syllabus.)

We'll also occasionally draw upon other sources for readings, including original research papers in computational linguistics, psycholinguistics, and other areas of the cognitive science of language.

7 Syllabus (subject to modification)

Week Day Topic Readings Related readings Problem sets
Week 1 Wed 7 Feb Course intro; intro to probability theory;   Feldman et al., 2009  
Week 2 Mon 12 Feb Speech perception and the perceptual magnet MRS Chapter 13   Pset 1 out
  Wed 14 Feb Elementary text classification in NLTK; Naive Bayes; Word sequences, language models and \(N\)-grams SLP 4.1-4.4    
Week 3 Tue 20 Feb More advanced issues in \(N\)-gram modeling SLP 4.5-4.8    
  Wed 21 Feb Prediction in human language understanding; surprisal Kutas et al. 2011; Piantadosi et al., 2011 Smith & Levy, 2013  
Week 4 Mon 26 Feb Regular expressions SLP 2.1-2.1.6   Pset 1 due; Pset 2 out
  Wed 28 Feb Finite-state machines SLP 2.2-2.4, 3    
Week 5 Mon 5 Mar Finite-state machines II      
  Wed 7 Mar Finite-state transducers      
Week 6 Mon 12 Mar Weighted finite-state machines; noisy-channel models; Policy optimization and modeling human eye movements in reading M&S 3.1; Sutton & Barto in progress, 1.1-1.6; Bicknell & Levy, 2010   Pset 2 due; Pset 3 out
  Wed 14 Mar Bayes Nets and interventions Kraljic et al., 2008; Russell & Norvig, 2010, chapter 14 (on Stellar); Levy in progress, Directed Graphical Models appendix; Bayes Nets lecture notes  
Week 7 Mon 19 Mar Multi-factor models: logistic regression; word order preferences in language. Hierarchical models; binomial construction. SLP 7; Graphical models intro ; Morgan & Levy, 2015    
  Wed 21 Mar Midterm exam      
Spring Break 26-30 Mar Spring break, no class      
Week 8 Mon 2 Apr Is human language finite state? SLP 11; SLP 2nd edition chapter 16 (under Readings on Stellar) Chomsky, 1956 Pset 3 due
  Wed 4 Apr (46-3189) Context-free grammars; Syntactic analysis. SLP 12; NLTK 8.1-8.5; Levy & Andrew, 2006 Gazdar, 1981 (in particular Section 2); Müller, 2018 (in particular Section 5.4); Joshi et al., 1991 (on formalisms that go beyond context-free)  
Week 9 Mon 9 Apr Probabilistic context-free grammars, incremental parsing, human syntactic processing SLP 13; NLTK 8.6; Levy, 2013 Jurafsky, 1996; Hale, 2001; Levy, 2008 Pset 4 out
  Wed 11 Apr Searching treebanks; parsing as (weighted) intersection of context-free grammars and finite-state machines; noisy-channel syntactic comprehension Levy, 2011    
Week 10 Mon 16 Apr Patriots Day, no class (student holiday due to Patriots Day)      
  Wed 18 Apr Word embeddings SLP 15,16; Levy, Goldberg, & Dagan, 2015 Mikolov et al., 2013, Pennington et al., 2014, Gutierrez et al., 2016 Pset 4 due (April 20); Pset 5 out
Week 11 Mon 23 Apr Implicit associations in word embeddings Caliskan et al., 2017    
  Wed 25 Apr Recurrent neural network models for language Collobert et al., 2011    
Week 12 Mon 30 Apr What do neural networks learn about language structure?   Linzen et al., 2016  
  Wed 2 May Pragmatics in language understanding. Scalar inference. The Rational Speech Acts model. Goodman & Frank, 2016 (see also Frank & Goodman, 2012)   Pset 5 due (Friday, May 4); Pset 6 out
Week 13 Mon 7 May Advanced pragmatics models: lexical uncertainty, scalar adjectives Lassiter & Goodman, 2015    
  Wed 9 May (46-3189) Statistical word learning in humans; modeling with nonparametric Bayes Saffran et al., 1996; Goldwater, Griffiths, & Johnson, 2009   Pset 6 due (Friday May 11)
Week 14 Mon 14 May The emergence of syntactic productivity in language development Meylan et al., 2017    
  Wed 16 May End-of-semester review (bring your questions!)     Final projects due Thursday, May 17
  Wed 23 May, 9am-noon Final exam (in 46-3310)      

8 Requirements & grading

You'll be graded on:

Work Grade percentage (9.19) Grade percentage (9.190)
A number of homework assignments throughout the semester 50% 37.5%
A midterm exam 20% 15%
A final exam 30% 22.5%
If you are enrolled in 9.190, a final project -- 25%

Active participation in the class is also encouraged and taken into account in borderline grade cases!

8.1 Homework late policy

Homework assignments can be turned in up to 7 days late; 10% of your score will be deducted for each 24 hours of lateness (rounded up). For example, if a homework assignment is worth 80 points, you turn it in 3 days late, and earn a 70 before lateness is taken into account, your score will be (1-0.3)*70=49.

8.2 Medical or personal circumstances impacting psets, exams, or projects

If medical or personal circumstances such as illness impact your work on a pset or project, or your ability to take an exam on the scheduled date with adequate preparation, please work with Student Support Services (S3) to verify these circumstances and be in touch with the instructor. We are happy to work with you in whatever way is most appropriate to your individual circumstances to help ensure that you are able to achieve your best performance in class while maintaining your health, happiness, and well-being.

8.3 Mapping of class score to letter grade

I grade the entire course on a curve, so that your end-of-semester letter grade will be determined by your overall points score in light of the distribution of scores among all class members. However, I guarantee minimum grades on the basis of the following thresholds:

Threshold Guaranteed minimum grade
>=90% A-
>=80% B-
>=70% C-
>=60% D

So, for example, an overall score of 90.0001% of points guarantees you an A-, but you could well wind up with a higher grade depending on the curve.

9 Mailing list

There will be a mailing list for this course, which you can access at https://s444-mailman-mit-edu.ezproxy.canberra.edu.au/mailman/listinfo/9.19-2018-spring. Please make sure you're signed up for it! This list is both for discussion of ideas in the class and for communications about organizational issues.

Author: Roger Levy

Created: 2018-05-15 Tue 12:14

Emacs 25.1.1 (Org mode 8.2.5h)

Validate