Ling 201: Research methodology and statistics in linguistics (2006, 2008, 2014: S; 20082012, 20142018, 2020: F)


This course was a handson introduction to fundamental aspects of quantitative/statistical methodology. We began by looking at a few basic notions such as variables and hypotheses. We then discussed the logic of quantitative studies in general as well as the design of factorial experiments in particular. We dealt with how data from experiments and corpora should be set up for subsequent statistical evaluation. In terms of analysis and evaluation, we explored a variety of descriptive graphs and statistics for frequency data, averages, dispersions, distributions, and correlations. The largest part of the course was concerned with handson practice on a variety of statistical tests: practicing different methods each session, we worked on distribution fitting tests, tests for independence, and tests for differences for frequencies, means, dispersions, distributions, and correlations. We used corpus and psycholinguistic example data, sometimes from published research. This course uses the open source programming language R , and was based on the second edition of my book Statistics for Linguistics with R, which comes with sample data, exercises, answer keys, etc. Since the class requires no prior knowledge of statistics and only very little knowledge of mathematics, it is an entry class for absolute beginners from degree programs esp. in the humanities and social sciences. See also the StatForLing with R Google group, which I moderate and which leads to the companion website of my book.

Ling 202: Advanced research methods and statistics in linguistics (2009: F; 2011, 2013, 2015, 2017, 2021: W; 2019: S)


This course was a handson introduction to more advanced statistical methods to analyze observational and experimental data. After a small recap of monofactorial methods and graphs and an introduction to a process called modeling or model selection, we systematically extended monofactorial tests to their multifactorial and multivariate counterparts. We begin with the linear model and extend correlations and ttests to multiple linear regression, ANOVAs, and ANCOVAs. We then broadened the scope to the powerful methods included in generalized linear modeling (such as binomial logistic regression for binary dependent variables and Poisson regression for dependent variables that are counts) as well as ordinal logistic and multinomial regression. There was also one session on treebased methods (classification and regression trees as well as random forests) and half a session on similaritybased prediction. In addition to these modeling techniques, we also discuss the exploratory method of hierarchical cluster analysis to find structure in large, potentially messy data sets. We used the open source software tool the open source software tool R and the second edition of my book Statistics for Linguistics with R.

Ling 204: Statistical methodology (2014, 2016, 2020: W, 2018: S)


This course was a more advanced course on statistical modeling with an emphasis on more sophisticated aspects of regression modeling and other multivariate methods; it presupposed a good understanding of the second edition (2013) of my Statistics for Linguistics with R: [...]. We began with a first recap of linear and generalized linear regression modeling. We then discussed the use of contrasts and general linear hypothesis tests for linear and generalized linear regression models, followed by some ideas on how to explore curvature in data (regressions with breakpoints, polynomial regressions, and generalized additive models). This was followed by a larger chunk on linear and generalized linear mixedeffects (or multilevel) modeling, where we reanalyzed published data and discussed numerical and visual exploration of regression results. The last parts were then devoted to influential data points and validation approaches as well as classification/regression trees and random forests. We used the open source software tool R .

Ling 210/110: Computational linguistics (2007: W, 2010: S)


This course was a (highly selective) introduction to a discipline known as Computational Linguistics. It featured (i) a brief general introduction to some main areas of research within this field(ii) an introduction to the programming language R based on my book Quantitative Corpus Linguistics with R: […], with which we work on linguistic data, and (iii) handson work in a computer lab on a variety of case studies from domains such as computational lexicography as well as word sense and synonym disambiguation, information retrieval, automatic text processing, and a few other things such as orthographic similarities of words and spellchecking, computational methods for authorship attribution, and others. Given the practical orientation of the course, this course was ideally suited for students who were thinking of practical applications and wanted to acquire some first computational programming experience (prior experience with R was not necessary, but a largerthanaverage computer savviness was recommended). Reading assignments included parts of Manning and Schütze's (2000) Foundations of Statistical Natural Language Processing as well as Jurafsky and Martin's (2000) Speech and Language Processing, supplemented with a variety of introductory chapters and research articles.

Ling 218: Corpus linguistics (2007, 2020: S, 2012: F)


This course was a handson introduction to advanced corpuslinguistic research methods, which are applied to large data bases of language used in natural communicative settings to supplement more traditional ways of linguistic analysis in all linguistic subdisciplines. It was broadly based on my (2016) textbook Quantitative corpus linguistics with R: a practical introduction and McEnery & Hardie's (2012) Corpus Linguistics, supplemented with a variety of research articles. We began with an intro into R programming especially for textual data before we read a wide variety of papers on corpuslinguistic applications in particular in usage/exemplarbased and psycholinguistic approaches while writing R scripts that cover the four main corpuslinguistic methods – frequency, dispersion, cooccurrence, and concordancing – on the basis of a variety of different corpora and corpus formats. We concluded by looking at slightly more advanced applications involving anonymous functions and scripts using parallel execution. We used the open source software tool R .

Ling 225: Semantics (2011: S, 2015: W)


In this course, we explored a small range of topics in semantics. Topics we dealt with were structuralist approaches involving necessary and sufficient conditions, the Natural Semantic Metalanguage approach, lexical relations, cognitive semantics (esp. with regard to polysemy and prototypes), computational / distributional semantics, and the acquisition of meaning.

Ling 237/137: Introduction to first language acquisition (2007, 2008: F, 2010: W)


This course was a selective introduction to the interdisciplinary enterprise of research on first language acquisition. It covered several different though interrelated topics: an introduction to 'the problem of language acquisition', overviews of different theoretical and methodological approaches towards first language acquisition, and introductions to aspects and processes of first language acquisition in different linguistic subdisciplines: phonology/morphology, semantics/lexicon, syntax.

Ling 252A/B: Cognitive Linguistics (2006: F; 2007: W)


In the first quarter of this twoquarter seminar, we explored the set of related approaches known as Cognitive Linguistics. The course provided a brief general introduction to the assumptions governing or underlying most of the field, followed by a variety of case studies focusing on central notions of, and areas of research within, Cognitive Linguistics; these notions and areas of research include metaphor/metonymy, polysemy, Cognitive Grammar, (argument structure) constructions etc.

Ling 257A/B: Psycholinguistics (2010: F; 2011: W)


In the first quarter of this twoquarter seminar, we explored topics in psycholinguistics from (i) the theoretical perspective of exemplar/usagebased cognitive/functional linguistics and (ii) the methodological perspective of experimental and observational data and analysis. We read and discussed a variety of papers on topics in language acquisition, language production, 'distributional linguistics', and, depending on participants' choices, language change and sociolinguistics.
