Stefan Th. Gries Home Contact information Disclaimer Last updated: 23 February 2021

Ling 104: Statistical Methods in Linguistics (Winter 2022)

 Syllabus and overview This course is a hands-on introduction to fundamentals of quantitative/statistical methodology in linguistics. It is based on the third edition of my textbook Statistics for linguistics with R: a practical introduction (2021), which also forms the basis for Ling 105 next quarter! We begin by looking at a few basic notions such as variables and hypotheses. We then discuss the logic of quantitative studies using the null-hypothesis falsification approach and how data from experiments and corpora should be set up for subsequent statistical evaluation. Then, we are concerned with a variety of descriptive graphs and statistics for frequency data, averages, dispersions, and correlations. The largest part is concerned with a variety of statistical tests: distribution fitting tests, tests for independence, and tests for differences for frequencies, means, dispersions, and elementary aspects of correlation/regression. We end with a small primer for the kind of multifactorial regression and tree-based methods that are the subject of Ling 105. We use the open source software tool R . Downloads for class sessions(files will be made available when appropriate) Navigator Session 01 (fundamentals): slides Session 02 (intro to R): exercise code, exercise data (must be unzipped), the answer key Session 03 (descriptive stats, univariate): exercise code, exercise data (must be unzipped), and the answer key Session 04 (descriptive stats, bivariate): exercise code, exercise data (must be unzipped), and the answer key Session 05 (distributions & frequencies): exercise code and the answer key Session 06 (dispersions & means): exercise code and the answer key Session 07 (means & correlation/regression): exercise code, exercise data (must be unzipped), and the answer key Session 08 (plotting, functions, code tips): developing a more complex graph of this data set, two more data files, and tips for readable code Session 09 (exercise/practice session): exercise code, the answer key, and the Berkeley admissions data Session 10 (outlook towards multifactorial) Assignments assignment 1: code (.r) and data (.csv) assignment 2: code (.r) assignment 3: code (.r) assignment 'final': code (.r) and data (must be unzipped) Links to relevant software and sites R (at least version 4.1.1) RStudio (at least version: 1.4) my 2021 statistics textbook, its companion website, and its StatForLing with R newsgroup, which I moderate.