#+ fig.width=12, fig.height=8
# Assignment 2
rm(list=ls(all=TRUE)) # clear memory
# (1) A study on relative clauses investigated the frequencies of three types of relative clauses. A small corpus search resulted in 37, 26, and 6 instances of relative clause types RC1, RC2, and RC3 respectively.
# (a) Input the data into R into a vector rcs. Draw a barplot of the data such that (i) the labels used for the three relative clause types are "RelCl 1", RelCl 2", and "RelCl 3", (ii) the colors for the three relative clause types are blue, red, and green, and (iii) the middle of the bars contains the observed frequencies.
# (b) Determine the confidence intervals of the percentage of each relative clause type. Discuss briefly how the proportions found the small corpus sample relate to what might be expected on the basis of chance (i.e., from the population) and what the confidence intervals suggest concerning the different frequencies of the three relative clause types.
# (2) You have collected data from several different speakers on the acceptability of particular nouns in the subject, direct object, and indirect object position. Create this data frame in R by combining several vectors and call it number2.
# CASE SPEAKER RELATION NOUN ACC
# 1 1 S x 4
# 2 2 S x 3
# 3 3 S y 0
# 4 4 S y 5
# 5 5 S z 5
# 6 6 S z 7
# 7 1 S a 5
# 8 2 S a 8
# 9 3 DO b 3
# 10 4 DO b 9
# 11 5 DO c 0
# 12 6 DO c 3
# 13 1 DO d 8
# 14 2 DO d 3
# 15 3 DO e 1
# 16 4 DO e 6
# 17 5 IO f 9
# 18 6 IO f 8
# 19 1 IO g 4
# 20 2 IO g 0
# 21 3 IO h 8
# 22 4 IO h 1
# 23 5 IO i 4
# 24 6 IO i 5
# (a) Generate a data frame number2.b by resorting exp according to (i) the speaker (ascending), (ii), the relation (descending), and (iii) the acceptability rating (descending).
# (b) Generate two vectors ACC.HI and ACC.LO which contain the acceptability ratings which are larger than and smaller than the mean of all acceptability ratings. Compute the means of these two vectors and compare the vectors to each other with boxplots.
# (c) Use frequency lists to show how often which speakers provided ratings larger and smaller than the overall average rating.
# (d) What are the mean acceptability ratings for subjects, direct objects, and indirect object positions? Generate a boxplot to compare the acceptability ratings for subjects, direct objects, and indirect object positions.
# (e) Generate a vector x which, for each acceptability rating, says whether this rating is larger or smaller than the average. Crosstabulate this vector with the vector RELATION such that the relations are in the columns and that you provide percentages that add up to 1 in each column.
# (3) Which of the speakers exhibits the largest variability with regard to ACC?
# (4) In a study on the frequencies of 5 prepositions you obtained the following frequencies: 35, 73, 23, 89, 30. Compute the relative/normalized entropy for this frequency distribution.