Ling 201: assignments (class)
1 Fundamentals/descriptives
1.1 Assignment 01
A study on relative clauses investigated the frequencies of three types of relative clauses. A small corpus search resulted in 37, 26, and 6 instances of relative clause types RC1, RC2, and RC3 respectively. Your task is to
- input the data into R;
- draw a barplot of the data such that
- the labels used for the three relative clause types are “RC1”, RC2”, and “RC3”;
- the colors for the three relative clause types are blue, red, and green;
- the observed frequencies are shown on top of the bars in the same colors.
- determine the 95%-confidence intervals of the percentages of each relative clause type and discuss briefly how the proportions relate to
- what might be expected on the basis of chance;
- what the confidence intervals suggest concerning the different frequencies of the three relative clause types.
1.2 Assignment 02
You have collected data from several different speakers on the acceptability of particular nouns in the subject (S), direct object (DO), and indirect object (IO) position. Create this data frame in R and call it d
:
CASE | SPEAKER | RELATION | NOUN | ACC |
---|---|---|---|---|
1 | S1 | S | x | 4 |
2 | S2 | S | x | 3 |
3 | S3 | S | y | 0 |
4 | S4 | S | y | 5 |
5 | S5 | S | z | 5 |
6 | S6 | S | z | 7 |
7 | S1 | S | a | 5 |
8 | S2 | S | a | 8 |
9 | S3 | DO | b | 3 |
10 | S4 | DO | b | 9 |
11 | S5 | DO | c | 0 |
12 | S6 | DO | c | 3 |
13 | S1 | DO | d | 8 |
14 | S2 | DO | d | 3 |
15 | S3 | DO | e | 1 |
16 | S4 | DO | e | 6 |
17 | S5 | IO | f | 9 |
18 | S6 | IO | f | 8 |
19 | S1 | IO | g | 4 |
20 | S2 | IO | g | 0 |
21 | S3 | IO | h | 8 |
22 | S4 | IO | h | 1 |
23 | S5 | IO | i | 4 |
24 | S6 | IO | i | 5 |
1.3 Assignment 03
Generate a data frame dd
by resorting d
according to
- the speaker (ascending), then
- the relation (descending), then
- the acceptability rating (descending).
1.4 Assignment 04
Generate two vectors ACC_HI
and ACC_LO
which contain the acceptability ratings which are larger than and smaller than the mean of all acceptability ratings; then compute the means of these two vectors.
1.5 Assignment 05
Show how often which speakers provided ratings larger and equal to / smaller than the overall average rating;
1.6 Assignment 06
Compute the mean acceptability ratings for subjects, direct objects, and indirect object positions and represent it with a boxplot;
1.7 Assignment 07
Add a column ACC_GTAV
to d
which, for each acceptability rating in ACC
, says whether this rating is larger or equal to / smaller than the average; then cross-tabulate d$ACC_GTAV
with d$RELATION
such that
- the relations are in the columns;
- you provide percentages that add up to 1 in each column.
1.8 Assignment 08
In a study on the frequencies of 5 prepositions you obtained the following frequencies: 35, 73, 23, 89, 30. What’s the normalized entropy for this frequency distribution?
2 Statistical tests 01
2.1 Assignment 01
Are the lengths of disfluencies (in ms) in _input/disfluencies.csv (see _input/disfluencies.r) normally distributed?
2.2 Assignment 02
Is it correct to say that the three kinds of disfluencies in _input/disfluencies.csv (see _input/disfluencies.r) are equally frequent in general?
2.3 Assignment 03
In a new data set (but also one on disfluencies), do uh and uhm differ with regard to how often they appear before content/lexical words and function words? (If so, this might be explainable with the different kinds of planning/processing effort that is coming up for the speaker.
Here are the data, which you then have to enter into R:
Before content word | Before function word | |
---|---|---|
uh | 19 | 32 |
uhm | 38 | 15 |
3 Statistical tests 02
3.1 Assignment 01
Are the lengths of initial sentences of apologies produced by men generally distributed the same way as the the lengths of initial sentences of apologies produced by women? The data from a small pilot study are stored in _input/apologies.csv with their structure as always in _input/apologies.r.
3.2 Assignment 02
Are different kinds of disfluencies more likely in dialogs or in monologs? We revisit our data from _input/disfluencies.csv (_input/disfluencies.r)
3.3 Assignment 03
An interesting phenomenon in English is adjective suffixation with -ic and -ical, especially when both forms are attested. For example, it is difficult to detect any pattern governing the distribution of suffixes: when does an adjective end in -ic only (e.g., acrobatic) and when does it end in -ical only (zoological)? Also, with regard to the adjectives’ general frequency, Marchand (1969) suggested that words in wider common use tend to end in -ical. Your task is to test Marchand’s claim on the data in _input/icical.csv (see _input/icical.r) and check whether the average frequency of all adjectives ending in -ical is indeed higher than the average frequency of all adjectives ending in -ic. Note: since I am not providing the exact adjectives, you are allowed to test them not in the pairwise fashion that would normally be required here (i.e., politic vs. political, economic vs. economical) but just ‘across the board’, which would of course be less than ideal for a real study.
3.4 Assignment 04
Is the frequency with which the verb to be is contracted before a progressive verb form (e.g., I’m saying) correlated with the frequency of that same following lexical verb? You expect that the more frequent the verb in the progressive form, the more likely the form of to be will be contracted. The data are in _input/contractions.csv (see _input/contractions.r).