Stefan Th. Gries
Contact information
Last updated: 14 May 2024

Teaching at the University of California, Santa Barbara

Ling 105: Predictive modeling in linguistics (Spring 2024)

Syllabus and overview

This course is a selective introduction to predictive modeling applications in linguistics. We start with a one-session intro of predictive modeling with an emphasis on regression modeling, which will provide an overview of several fundamental aspects of survey model formulation, model selection, multifactoriality, and validation. Then, we work our way through a variety of regression modeling applications: linear regression, binary logistic regression, multinomial, and ordinal regression models. Then, one session will be concerned with model diagnostics and, perhaps, model validation. Finally, there will be bit on predictive modeling with classification and regression trees. Like its prerequisite course Ling 104, this course is based on the third edition of my textbook Statistics for linguistics with R: a practical introduction (2021) and uses the open source programming language R .

Downloads for class sessions
(files will be available as appropriate)

Folder for the whole course

Additional files to be added to that folder per session:
For session 01: PDF of slides
For session 02: HTML
For session 03: HTML
For session 04: HTML
For session 05: HTML
For session 06: HTML
For session 07: HTML>
For session 08: Markdown/Quarto file and the Google doc
For session 09: Markdown/Quarto file and the Google doc
For session 10: Markdown/Quarto file and the Google doc

Graded assignments

Attendance is not required and will not be monitored. Choose and work on as many assignments from this page as you need to have the sum of their difficulty levels add up to minimally 5 points and send them to the TA by 14 June 2024, 23:59 PDST (no extensions!). Your assignments can be submitted as R scripts, as RMarkdown or Quarto documents, or as R reports and must have the following file name structure: <105_lastname_assignment##.html> (as in <105_smith_assignment02.html>). The assignments will be graded on (i) whether your statistical analysis 'makes sense' (does the code work? did you explore and prepare the data? choose the right method? visualize properly? summarize the findings in a short paragraph properly?) and (ii) the form in which you submit it (on a scale from a haphazardly formatted R script to a nicely formatted HTML knitted from Quarto); students' preparation of the assignments must comply with UCSB's academic integrity principles.


R (from CRAN) (required, ideally at least version 4.3.3). Also, make sure (i) you have the packages car, effects, magrittr, multcomp, plotly, rgl, and tree installed and (ii) all your packages are up to date
RStudio (required, ideally version 2024.04.0+402)
Quarto (optional, ideally at least version quarto-1.4.553)
LibreOffice (even more optional, ideally at least version 24.2)