Stefan Th. Gries
Home
Contact information
Disclaimer
Last updated: 12 December 2021

Teaching at the University of California, Santa Barbara


Ling 104: Statistical Methods in Linguistics (Winter 2022)

Syllabus and overview

This course is a hands-on introduction to fundamentals of quantitative/statistical methodology in linguistics. It is based on the third edition of my textbook Statistics for linguistics with R: a practical introduction (2021), which also forms the basis for Ling 105 next quarter! We begin by looking at a few basic notions such as variables and hypotheses. We then discuss the logic of quantitative studies using the null-hypothesis falsification approach and how data from experiments and corpora should be set up for subsequent statistical evaluation. Then, we are concerned with a variety of descriptive graphs and statistics for frequency data, averages, dispersions, and correlations. The largest part is concerned with a variety of statistical tests: distribution fitting tests, tests for independence, and tests for differences for frequencies, means, dispersions, and elementary aspects of correlation/regression. We end with a small primer for the kind of multifactorial regression and tree-based methods that are the subject of Ling 105. We use the open source software tool R .


Downloads for class sessions
(files will be made available when appropriate)


Navigator
Session 01 (fundamentals): slides
Session 02 (intro to R): exercise code, exercise data (must be unzipped), the answer key
Session 03 (descriptive stats, univariate): exercise code, exercise data (must be unzipped), and the answer key
Session 04 (descriptive stats, bivariate): exercise code, exercise data (must be unzipped), and the answer key
Session 05 (distributions & frequencies): exercise code and the answer key
Session 06 (dispersions & means): exercise code and the answer key
Session 07 (means & correlation/regression): exercise code and the answer key
Session 08 (plotting, functions, code tips): developing a more complex graph of this data set, code for function writing, and tips for readable code
Session 09 (exercise/practice session): exercise code, the answer key, and the Berkeley admissions data
Session 10 (outlook towards multifactorial)


Assignments


assignment 1: code (.r) and data (.csv)
assignment 2: code (.r)
assignment 3: code (.r)
assignment 'final': code (.r) and data (must be unzipped)


Links to relevant software and sites


R (at least version 4.1.1)
RStudio (at least version: 1.4)
my 2021 statistics textbook, its companion website, and its StatForLing with R newsgroup, which I moderate.