Stefan Th. Gries
Contact information
Last updated: 26 May 2020

Teaching at the University of California, Santa Barbara

Ling 218: Corpus Linguistics (S 2020)

Syllabus and overview

This course is a hands-on introduction to advanced corpus-linguistic research methods, which are applied to large data bases of language used in natural communicative settings to supplement more traditional ways of linguistic analysis in all linguistic sub-disciplines. It is broadly based on my (2016) textbook Quantitative corpus linguistics with R: a practical introduction and McEnery & Hardie's (2012) Corpus Linguistics, supplemented with a variety of research articles. We begin with an intro into R programming especially for textual data before we read a wide variety of papers on corpus-linguistic applications in particular in usage-/exemplar-based and psycholinguistic approaches while writing R scripts that cover the four main corpus-linguistic methods – frequency, dispersion, co-occurrence, and concordancing – on the basis of a variety of differenyt corpora and corpus formats. We conclude by looking at slightly more advanced applications involving anonymous functions and scripts using parallel execution. We use the open source software tool R .

Course downloads

Course folder

Files for session 01-04
Files for session 05
Files for session 06
Files for session 07
Files for session 08
Files for session 09
Files for session 10


R from CRAN (make sure you have version 3.6.3)
RStudio (make sure you have version 1.2.5033 or 1.3.919-2)
LibreOffice (make sure you have version 6.4.1)