Stefan Th. Gries
Contact information
Last updated: 18 May 2022

Teaching at the University of California, Santa Barbara

Ling 105: Predictive modeling in linguistics (Spring 2022)

Syllabus and overview

This course is a selective introduction to predictive modeling applications in linguistics. We start with a one-session intro of predictive modeling with an emphasis on regression modeling, which will survey model formulation, model selection, multifactoriality, and validation. Then, we work our way through a variety of regression modeling applications: linear regression, binary logistic regression, multinomial, and ordinal regression models. Then, one session will be concerned with model diagnostics and, perhaps, model validation. Finally, there are two sessions on tree-based approaches: classification and regression trees as well as random forests. Like its prerequisite course Ling 104, this course is based on the third edition of my textbook Statistics for linguistics with R: a practical introduction (2021) and uses the open source programming language R .

Downloads for class sessions
(files will be made available when appropriate)

Session 01 (predictive modeling overview): slides and html
Session 02 (linear regression modeling 1): data and html
Session 03 (linear regression modeling 2): data and html
Session 04 (binary logistic regression modeling 1): data, a function for C, a function for R2, and html
Session 05 (binary logistic regression modeling 2): data, a function for C, a function for R2, and html
Session 06 (multinomial regression modeling): data and html
Session 07 (ordinal logistic regression modeling): data and html
Session 08 (model assumptions, diagnostics, & validation): html
Session 09 (classification & regression trees): data and html and a quick script on bootstrapping for linear models
Session 10 (random forests): data and html


Note: pick one of assignments 1 and 2 and one of assignments 3 and 4
deadline for final submission: 09 June 2022, 23:59 Pac Time (no extensions!)

assignment 1 and data (.csv)
assignment 2 and data (.csv)
assignment 3 and data (.csv)
assignment 4 and data (.csv)

Links to relevant software and sites

R (at least version 4.1.3)
RStudio (at least version: 1.4)
my 2021 statistics textbook, its companion website, and its StatForLing with R newsgroup, which I moderate.