Ling 201: Research methodology and statistics in linguistics (2006, 2008, 2014: S; 20082012, 20142018, 2020, 2023: F; 2025: W)


This course was a handson introduction to fundamental aspects of quantitative/statistical methodology. We began by looking at a few basic notions such as variables and hypotheses. We then discussed the logic of quantitative studies in general as well as the design of factorial experiments in particular. We dealt with how data from experiments and corpora should be set up for subsequent statistical evaluation. In terms of analysis and evaluation, we explored a variety of descriptive graphs and statistics for frequency data, averages, dispersions, distributions, and correlations. The largest part of the course was concerned with handson practice on a variety of statistical tests: practicing different methods each session, we worked on distribution fitting tests, tests for independence, and tests for differences for frequencies, means, dispersions, distributions, and correlations. We used corpus and psycholinguistic example data, sometimes from published research. This course used the open source programming language R , and was based on the third edition (2021) of my book Statistics for Linguistics with R: […], which comes with sample data, exercises, answer keys, etc. Since the class requires no prior knowledge of statistics and only very little knowledge of mathematics, it is an entry class for absolute beginners from degree programs esp. in the humanities and social sciences. See also the StatForLing with R Google group, which I moderate and which leads to the companion website of my book.

Ling 202: Advanced research methods and statistics in linguistics (2009: F; 2011, 2013, 2015, 2017, 2021: W; 2019, 2023, 2025: S)


This course was a handson introduction to more advanced statistical methods to analyze observational and experimental data. After a small recap of monofactorial methods and graphs and an introduction to a process called modeling or model selection, we systematically extended monofactorial tests to their multifactorial and multivariate counterparts. We began with the linear model and extended correlations and ttests to multiple linear regression, ANOVAs, and ANCOVAs. We then broadened the scope to the powerful methods included in generalized linear modeling (such as binomial logistic regression for binary dependent variables) as well as multinomial and ordinal logistic regression. There was also one session on model assumptions and diagnostics, and we conclude with a session on classification and regression trees. We use the open source software tool the open source software tool R and the third edition (2021) of my book Statistics for Linguistics with R: […].

Ling 204: Statistical methodology (2014, 2016, 2020: W; 2018: S; 2023: F)


This course was a more advanced course on statistical modeling with an emphasis on more sophisticated aspects of regression modeling and other multivariate methods; it presupposed a good understanding of Chapter 1 to 5 of the third edition (2021) of my Statistics for Linguistics with R: [...]. We began with a first brief recap of linear and generalized linear regression modeling. We then discussed the use of contrasts and general linear hypothesis tests for linear and generalized linear regression models, followed by some ideas on how to explore curvature in data (regressions with breakpoints, polynomial regressions, and generalized additive models). This was followed by a larger chunk on linear and generalized linear mixedeffects (or multilevel) modeling, where we reanalyzed published data and discussed numerical and visual exploration of regression results. The last parts were then devoted to random forests and clustering as well as other similaritybased methods. We used the open source software tool R .

Ling 210/110: Computational linguistics (2007: W; 2010: S)


This course was a (highly selective) introduction to a discipline known as Computational Linguistics. It featured (i) a brief general introduction to some main areas of research within this field, (ii) an introduction to the programming language R based on my book Quantitative Corpus Linguistics with R: […], with which we work on linguistic data, and (iii) handson work in a computer lab on a variety of case studies from domains such as computational lexicography as well as word sense and synonym disambiguation, information retrieval, automatic text processing, and a few other things such as orthographic similarities of words and spellchecking, computational methods for authorship attribution, and others. Given the practical orientation of the course, this course was ideally suited for students who were thinking of practical applications and wanted to acquire some first computational programming experience (prior experience with R was not necessary, but a largerthanaverage computer savviness was recommended). Reading assignments included parts of Manning and Schütze's (2000) Foundations of Statistical Natural Language Processing as well as Jurafsky and Martin's (2000) Speech and Language Processing, supplemented with a variety of introductory chapters and research articles.

Ling 218: Corpus linguistics (2007, 2020: S; 2012: F, 2024: W)


This course was an introduction to computerized research methods, which are applied to large data bases of language used in natural communicative settings to supplement more traditional ways of linguistic analysis in all linguistic subdisciplines. There was a bit of 'theoretical' reading on what a corpus is, what kinds of corpora there are, and how they are created/compiled, but nearly all of the course was handson practice with the open source programming language and environment R . We began with some basics of R and then dealt with how to write R scripts for the four main corpuslinguistic methods – frequencies, dispersions, association, and keyness – as well as some other applications. For that, we read a few overview articles and a few corpuslinguistic studies that covered topics including syntax (patterns and alternations), lexis/semantics (key words in different cultures and near synonymy), psycholinguistics (disfluencies), and others. Note: This course was based on the second edition of my textbook Quantitative corpus linguistics with R: […]. New York: Routledge, Taylor & Francis Group, which one needed to have: it teaches most fundamentals of R programming for text analysis (and can therefore be useful way beyond this course).

Ling 225: Semantics (2011: S; 2015: W)


In this course, we explored a small range of topics in semantics. Topics we dealt with were structuralist approaches involving necessary and sufficient conditions, the Natural Semantic Metalanguage approach, lexical relations, cognitive semantics (esp. with regard to polysemy and prototypes), computational / distributional semantics, and the acquisition of meaning.

Ling 237/137: Introduction to first language acquisition (20072008: F; 2010: W)


This course was a selective introduction to the interdisciplinary enterprise of research on first language acquisition. It covered several different though interrelated topics: an introduction to 'the problem of language acquisition', overviews of different theoretical and methodological approaches towards first language acquisition, and introductions to aspects and processes of first language acquisition in different linguistic subdisciplines: phonology/morphology, semantics/lexicon, syntax.
