Ling 201: assignments (class)

Author

Affiliation

UC Santa Barbara & JLU Giessen

Published

05 Jan 2025 12-34-56

1 Fundamentals/descriptives

1.1 Assignment 01

A study on relative clauses investigated the frequencies of three types of relative clauses. A small corpus search resulted in 37, 26, and 6 instances of relative clause types RC1, RC2, and RC3 respectively. Your task is to

input the data into R;
draw a barplot of the data such that
- the labels used for the three relative clause types are “RC1”, RC2”, and “RC3”;
- the colors for the three relative clause types are blue, red, and green;
- the observed frequencies are shown on top of the bars in the same colors.
determine the 95%-confidence intervals of the percentages of each relative clause type and discuss briefly how the proportions relate to
- what might be expected on the basis of chance;
- what the confidence intervals suggest concerning the different frequencies of the three relative clause types.

1.2 Assignment 02

You have collected data from several different speakers on the acceptability of particular nouns in the subject (S), direct object (DO), and indirect object (IO) position. Create this data frame in R and call it d:

CASE	SPEAKER	RELATION	NOUN	ACC
1	S1	S	x	4
2	S2	S	x	3
3	S3	S	y	0
4	S4	S	y	5
5	S5	S	z	5
6	S6	S	z	7
7	S1	S	a	5
8	S2	S	a	8
9	S3	DO	b	3
10	S4	DO	b	9
11	S5	DO	c	0
12	S6	DO	c	3
13	S1	DO	d	8
14	S2	DO	d	3
15	S3	DO	e	1
16	S4	DO	e	6
17	S5	IO	f	9
18	S6	IO	f	8
19	S1	IO	g	4
20	S2	IO	g	0
21	S3	IO	h	8
22	S4	IO	h	1
23	S5	IO	i	4
24	S6	IO	i	5

1.3 Assignment 03

Generate a data frame dd by resorting d according to

the speaker (ascending), then
the relation (descending), then
the acceptability rating (descending).

1.4 Assignment 04

Generate two vectors ACC_HI and ACC_LO which contain the acceptability ratings which are larger than and smaller than the mean of all acceptability ratings; then compute the means of these two vectors.

1.5 Assignment 05

Show how often which speakers provided ratings larger and equal to / smaller than the overall average rating;

1.6 Assignment 06

Compute the mean acceptability ratings for subjects, direct objects, and indirect object positions and represent it with a boxplot;

1.7 Assignment 07

Add a column ACC_GTAV to d which, for each acceptability rating in ACC, says whether this rating is larger or equal to / smaller than the average; then cross-tabulate d$ACC_GTAV with d$RELATION such that

the relations are in the columns;
you provide percentages that add up to 1 in each column.

1.8 Assignment 08

In a study on the frequencies of 5 prepositions you obtained the following frequencies: 35, 73, 23, 89, 30. What’s the normalized entropy for this frequency distribution?

2 Statistical tests 01

2.1 Assignment 01

Are the lengths of disfluencies (in ms) in _input/disfluencies.csv (see _input/disfluencies.r) normally distributed?

2.2 Assignment 02

Is it correct to say that the three kinds of disfluencies in _input/disfluencies.csv (see _input/disfluencies.r) are equally frequent in general?

2.3 Assignment 03

In a new data set (but also one on disfluencies), do uh and uhm differ with regard to how often they appear before content/lexical words and function words? (If so, this might be explainable with the different kinds of planning/processing effort that is coming up for the speaker.

Here are the data, which you then have to enter into R:

	Before content word	Before function word
uh	19	32
uhm	38	15

3 Statistical tests 02

3.1 Assignment 01

Are the lengths of initial sentences of apologies produced by men generally distributed the same way as the the lengths of initial sentences of apologies produced by women? The data from a small pilot study are stored in _input/apologies.csv with their structure as always in _input/apologies.r.

3.2 Assignment 02

Are different kinds of disfluencies more likely in dialogs or in monologs? We revisit our data from _input/disfluencies.csv (_input/disfluencies.r)

3.3 Assignment 03

An interesting phenomenon in English is adjective suffixation with -ic and -ical, especially when both forms are attested. For example, it is difficult to detect any pattern governing the distribution of suffixes: when does an adjective end in -ic only (e.g., acrobatic) and when does it end in -ical only (zoological)? Also, with regard to the adjectives’ general frequency, Marchand (1969) suggested that words in wider common use tend to end in -ical. Your task is to test Marchand’s claim on the data in _input/icical.csv (see _input/icical.r) and check whether the average frequency of all adjectives ending in -ical is indeed higher than the average frequency of all adjectives ending in -ic. Note: since I am not providing the exact adjectives, you are allowed to test them not in the pairwise fashion that would normally be required here (i.e., politic vs. political, economic vs. economical) but just ‘across the board’, which would of course be less than ideal for a real study.

3.4 Assignment 04

Is the frequency with which the verb to be is contracted before a progressive verb form (e.g., I’m saying) correlated with the frequency of that same following lexical verb? You expect that the more frequent the verb in the progressive form, the more likely the form of to be will be contracted. The data are in _input/contractions.csv (see _input/contractions.r).

--- title: "Ling 201: assignments (class)" author: - name: "[Stefan Th. Gries](https://www.stgries.info)" affiliation: "UC Santa Barbara & JLU Giessen" orcid: 0000-0002-6497-3958 date: "2025-01-05 12:34:56 PDT" date-format: "DD MMM YYYY HH-mm-ss" editor: source format: html: page-layout: full code-fold: false code-link: true code-copy: true code-tools: true code-line-numbers: true code-overflow: scroll number-sections: true smooth-scroll: true toc: true toc-depth: 4 number-depth: 4 toc-location: left monofont: lucida console tbl-cap-location: top fig-cap-location: bottom fig-width: 6 fig-height: 6 fig-format: png fig-dpi: 300 fig-align: center embed-resources: true execute: cache: false echo: true eval: true warning: false --- # Fundamentals/descriptives ## Assignment 01 A study on relative clauses investigated the frequencies of three types of relative clauses. A small corpus search resulted in 37, 26, and 6 instances of relative clause types RC1, RC2, and RC3 respectively. Your task is to * input the data into R; * draw a barplot of the data such that + the labels used for the three relative clause types are "RC1", RC2", and "RC3"; + the colors for the three relative clause types are blue, red, and green; + the observed frequencies are shown on top of the bars in the same colors. * determine the 95%-confidence intervals of the percentages of each relative clause type and discuss briefly how the proportions relate to + what might be expected on the basis of chance; + what the confidence intervals suggest concerning the different frequencies of the three relative clause types. ## Assignment 02 You have collected data from several different speakers on the acceptability of particular nouns in the subject (S), direct object (DO), and indirect object (IO) position. Create this data frame in R and call it `d`: | CASE | SPEAKER | RELATION | NOUN | ACC | |:-----|:--------|:---------|:-----|:----| | 1 | S1 | S | x | 4 | | 2 | S2 | S | x | 3 | | 3 | S3 | S | y | 0 | | 4 | S4 | S | y | 5 | | 5 | S5 | S | z | 5 | | 6 | S6 | S | z | 7 | | 7 | S1 | S | a | 5 | | 8 | S2 | S | a | 8 | | 9 | S3 | DO | b | 3 | | 10 | S4 | DO | b | 9 | | 11 | S5 | DO | c | 0 | | 12 | S6 | DO | c | 3 | | 13 | S1 | DO | d | 8 | | 14 | S2 | DO | d | 3 | | 15 | S3 | DO | e | 1 | | 16 | S4 | DO | e | 6 | | 17 | S5 | IO | f | 9 | | 18 | S6 | IO | f | 8 | | 19 | S1 | IO | g | 4 | | 20 | S2 | IO | g | 0 | | 21 | S3 | IO | h | 8 | | 22 | S4 | IO | h | 1 | | 23 | S5 | IO | i | 4 | | 24 | S6 | IO | i | 5 | ## Assignment 03 Generate a data frame `dd` by resorting `d` according to * the speaker (ascending), then * the relation (descending), then * the acceptability rating (descending). ## Assignment 04 Generate two vectors `ACC_HI` and `ACC_LO` which contain the acceptability ratings which are larger than and smaller than the mean of all acceptability ratings; then compute the means of these two vectors. ## Assignment 05 Show how often which speakers provided ratings larger and equal to / smaller than the overall average rating; ## Assignment 06 Compute the mean acceptability ratings for subjects, direct objects, and indirect object positions and represent it with a boxplot; ## Assignment 07 Add a column `ACC_GTAV` to `d` which, for each acceptability rating in `ACC`, says whether this rating is larger or equal to / smaller than the average; then cross-tabulate `d$ACC_GTAV` with `d$RELATION` such that * the relations are in the columns; * you provide percentages that add up to 1 in each column. ## Assignment 08 In a study on the frequencies of 5 prepositions you obtained the following frequencies: 35, 73, 23, 89, 30. What's the normalized entropy for this frequency distribution? # Statistical tests 01 ## Assignment 01 Are the lengths of disfluencies (in ms) in [_input/disfluencies.csv](_input/disfluencies.csv) (see [_input/disfluencies.r](_input/disfluencies.r)) normally distributed? ## Assignment 02 Is it correct to say that the three kinds of disfluencies in [_input/disfluencies.csv](_input/disfluencies.csv) (see [_input/disfluencies.r](_input/disfluencies.r)) are equally frequent in general? ## Assignment 03 In a new data set (but also one on disfluencies), do *uh* and *uhm* differ with regard to how often they appear before content/lexical words and function words? (If so, this might be explainable with the different kinds of planning/processing effort that is coming up for the speaker. Here are the data, which you then have to enter into R: | | Before content word | Before function word | |:------|:--------------------|:---------------------| | *uh* | 19 | 32 | | *uhm* | 38 | 15 | # Statistical tests 02 ## Assignment 01 Are the lengths of initial sentences of apologies produced by men generally distributed the same way as the the lengths of initial sentences of apologies produced by women? The data from a small pilot study are stored in [_input/apologies.csv](_input/apologies.csv) with their structure as always in [_input/apologies.r](_input/apologies.r). ## Assignment 02 Are different kinds of disfluencies more likely in dialogs or in monologs? We revisit our data from [_input/disfluencies.csv](_input/disfluencies.csv) ([_input/disfluencies.r](_input/disfluencies.r)) ## Assignment 03 An interesting phenomenon in English is adjective suffixation with -*ic* and -*ical*, especially when both forms are attested. For example, it is difficult to detect any pattern governing the distribution of suffixes: when does an adjective end in -*ic* only (e.g., *acrobatic*) and when does it end in -*ical* only (*zoological*)? Also, with regard to the adjectives' general frequency, Marchand (1969) suggested that words in wider common use tend to end in -*ical*. Your task is to test Marchand's claim on the data in [_input/icical.csv](_input/icical.csv) (see [_input/icical.r](_input/icical.r)) and check whether the average frequency of all adjectives ending in -*ical* is indeed higher than the average frequency of all adjectives ending in -*ic*. Note: since I am not providing the exact adjectives, you are allowed to test them *not* in the pairwise fashion that would normally be required here (i.e., *politic* vs. *political*, *economic* vs. *economical*) but just 'across the board', which would of course be less than ideal for a real study. ## Assignment 04 Is the frequency with which the verb *to be* is contracted before a progressive verb form (e.g., *I'm saying*) correlated with the frequency of that same following lexical verb? You expect that the more frequent the verb in the progressive form, the more likely the form of *to be* will be contracted. The data are in [_input/contractions.csv](_input/contractions.csv) (see [_input/contractions.r](_input/contractions.r)).

CASE	SPEAKER	RELATION	NOUN	ACC
1	S1	S	x	4
2	S2	S	x	3
3	S3	S	y	0
4	S4	S	y	5
5	S5	S	z	5
6	S6	S	z	7
7	S1	S	a	5
8	S2	S	a	8
9	S3	DO	b	3
10	S4	DO	b	9
11	S5	DO	c	0
12	S6	DO	c	3
13	S1	DO	d	8
14	S2	DO	d	3
15	S3	DO	e	1
16	S4	DO	e	6
17	S5	IO	f	9
18	S6	IO	f	8
19	S1	IO	g	4
20	S2	IO	g	0
21	S3	IO	h	8
22	S4	IO	h	1
23	S5	IO	i	4
24	S6	IO	i	5

CASE	SPEAKER	RELATION	NOUN	ACC
1	S1	S	x	4
2	S2	S	x	3
3	S3	S	y	0
4	S4	S	y	5
5	S5	S	z	5
6	S6	S	z	7
7	S1	S	a	5
8	S2	S	a	8
9	S3	DO	b	3
10	S4	DO	b	9
11	S5	DO	c	0
12	S6	DO	c	3
13	S1	DO	d	8
14	S2	DO	d	3
15	S3	DO	e	1
16	S4	DO	e	6
17	S5	IO	f	9
18	S6	IO	f	8
19	S1	IO	g	4
20	S2	IO	g	0
21	S3	IO	h	8
22	S4	IO	h	1
23	S5	IO	i	4
24	S6	IO	i	5

CASE	SPEAKER	RELATION	NOUN	ACC
1	S1	S	x	4
2	S2	S	x	3
3	S3	S	y	0
4	S4	S	y	5
5	S5	S	z	5
6	S6	S	z	7
7	S1	S	a	5
8	S2	S	a	8
9	S3	DO	b	3
10	S4	DO	b	9
11	S5	DO	c	0
12	S6	DO	c	3
13	S1	DO	d	8
14	S2	DO	d	3
15	S3	DO	e	1
16	S4	DO	e	6
17	S5	IO	f	9
18	S6	IO	f	8
19	S1	IO	g	4
20	S2	IO	g	0
21	S3	IO	h	8
22	S4	IO	h	1
23	S5	IO	i	4
24	S6	IO	i	5