1 Fundamentals/descriptives

1.1 Assignment 01

A study on relative clauses investigated the frequencies of three types of relative clauses. A small corpus search resulted in 37, 26, and 6 instances of relative clause types RC1, RC2, and RC3 respectively. Your task is to

  • input the data into R;
  • draw a barplot of the data such that
    • the labels used for the three relative clause types are “RC1”, RC2”, and “RC3”;
    • the colors for the three relative clause types are blue, red, and green;
    • the observed frequencies are shown on top of the bars in the same colors.
  • determine the 95%-confidence intervals of the percentages of each relative clause type and discuss briefly how the proportions relate to
    • what might be expected on the basis of chance;
    • what the confidence intervals suggest concerning the different frequencies of the three relative clause types.

1.2 Assignment 02

You have collected data from several different speakers on the acceptability of particular nouns in the subject (S), direct object (DO), and indirect object (IO) position. Create this data frame in R and call it x:

CASE SPEAKER RELATION NOUN ACC
1 S1 S x 4
2 S2 S x 3
3 S3 S y 0
4 S4 S y 5
5 S5 S z 5
6 S6 S z 7
7 S1 S a 5
8 S2 S a 8
9 S3 DO b 3
10 S4 DO b 9
11 S5 DO c 0
12 S6 DO c 3
13 S1 DO d 8
14 S2 DO d 3
15 S3 DO e 1
16 S4 DO e 6
17 S5 IO f 9
18 S6 IO f 8
19 S1 IO g 4
20 S2 IO g 0
21 S3 IO h 8
22 S4 IO h 1
23 S5 IO i 4
24 S6 IO i 5

1.3 Assignment 02

Generate a data frame xx by resorting `x according to

  • the speaker (ascending), then
  • the relation (descending), then
    • the acceptability rating (descending).

1.4 Assignment 03

Generate two vectors ACC.HI and ACC.LO which contain the acceptability ratings which are larger than and smaller than the mean of all acceptability ratings; then compute the means of these two vectors.

1.5 Assignment 04

Show how often which speakers provided ratings larger and equal to / smaller than the overall average rating;

1.6 Assignment 05

Compute the mean acceptability ratings for subjects, direct objects, and indirect object positions and represent it with a boxplot;

1.7 Assignment 06

Add a column ACC.GTAV to x which, for each acceptability rating in ACC, says whether this rating is larger or equal to / smaller than the average; then cross-tabulate x$ACC.GTAV with the x$RELATION such that

  • the relations are in the columns;
  • you provide percentages that add up to 1 in each column.

1.8 Assignment 07

In a study on the frequencies of 5 prepositions you obtained the following frequencies: 35, 73, 23, 89, 30. What’s the normalized entropy for this frequency distribution?

2 Statistical tests 01

2.1 Assignment 01

Are the lengths of disfluencies (in ms) normally distributed?

2.2 Assignment 02

Is it correct to say that the three kinds of disfluencies are equally frequent in general?

2.3 Assignment 03

In a new data set (but also one on disfluencies), do uh and uhm differ with regard to how often they appear before content/lexical words and function words? (If so, this might be explainable with the different kinds of planning/processing effort that is coming up for the speaker.

3 Statistical tests 02

3.1 Assignment 01

Are the lengths of initial sentences of apologies produced by men generally distributed the same way as the the lengths of initial sentences of apologies produced by women? The data from a small pilot study are stored in <inputfiles/201_11_apologies.csv>.

3.2 Assignment 02

Are disfluencies more likely in dialogs or in monologs? The data from a small pilot study are stored in <inputfiles/201_04-05_uh(m).csv>.

3.3 Assignment 03

An interesting phenomenon in English is adjective suffixation with -ic and -ical, especially when both forms are attested. For example, it is difficult to detect any pattern governing the distribution of suffixes: when does an adjective end in -ic only (e.g., acrobatic) and when does it end in -ical only (zoological)? Also, with regard to the adjectives’ general frequency, Marchand (1969) suggested that words in wider common use tend to end in -ical. Your task is to test Marchand’s claim on the data in <inputfiles/201_11_icical.csv> and check whether the average frequency of all adjectives ending in -ical is indeed higher than the average frequency of all adjectives ending in -ic. Note: since I am not providing the exact adjectives, you are allowed to test them not in a pairwise fashion (i.e., politic vs. political, economic vs. economical) but just ‘across the board’, which would of course be less than ideal for a real study.

3.4 Assignment 04

Is the frequency with which the verb to be is contracted before a progressive verb form (e.g., I’m saying) correlated with the frequency of that same following lexical verb? You expect that the more frequent the verb in the progressive form, the more likely the form of to be will be contracted. The data are in <inputfiles/201_11_contractions.csv>.