#+ fig.width=12, fig.height=8
#### Chapter 3: Descriptive bivariate statistics
# (1) Load the file <104_03-04_uh(m).csv> into a dataframe UHM
# (2) Determine whether men or women produce more of the disfluencies that are longer than average.
# (3) What does this do? (If you cannot see that immediately from the plot, execute the function without the plotting.)
barplot(prop.table(table(FILLER, GENRE),2))
# (4) Compute the average positions in sentences for "uh" and "uhm" and their 95% confidence intervals, and discuss briefly what the confidence intervals suggest concerning the different average positions of these two disfluency markers.
# (5) Generate separate summary statistics for the lengths in the two genres.
# (6) Ten bilingual students (English/German) took one dictation in English and one in German. They made the following numbers of mistakes in English and German respectively:
# 29, 20, 10, 16, 12, 15, 25, 22, 20, 23
# 21, 19, 28, 28, 26, 18, 16, 22, 20, 28
# (a) Compute a measure of correlation to quantify the association between the numbers of errors.
# (b) Illustrate the correlation in a graph and interpret the results (in one sentence).
# (7) Compute the number of mistakes expected from a student in the German dictation, if that student made 12 mistakes in the English dictation (i.e., use german.dict as the dependent variable).
# (8) Now you also obtained the sexes of the students: students 2 to 6 were girls, the rest boys.
# (a) Enter this into R
# (b) Compute the average numbers of errors in the German dictation for boys and girls.
# (c) Represent the numbers of mistakes in the German dictation as a function of the sex of the students graphically.
# (9) The file <104_03-04_vpcs.csv> contains data from a corpus study on the alternation of particle placement that was introduced in Section 1.3.1.
# - column 1: the number of the data point
# - column 2: whether the example is from spoken or written language
# - column 3: which construction is used
# - column 4: how complex the direct object is (3 levels)
# - column 5: how long the direct object is (in syllables)
# - column 6: whether the verb-particle construction is followed by a directional PP (2 levels)
# - column 7: whether the referent of the direct object is animate or inanimate (2 levels)
# - column 8: whether the referent of the direct object is concrete or abstract (2 levels)
# (a) Read in this file, make the column names available, and test whether the input was successful.
# (b) Represent the correlation between the choice of construction and the complexity of the direct object graphically.
# (c) Create a table reprenting the correlation between the choice of construction and the complexity of the direct object and briefly summarize the result.
# (d) Represent the correlation between the choice of construction and the length of the direct object graphically and briefly summarize the result.
# (e) Compute whether the choice of construction is influenced by the length of the direct object.
# (f) Investigate whether the choice of construction depends on the animacy of the referent of the direct objects and the presence/absence of a directional prepositional phrase.
# (10) 50 students took a statistics exam, 80% passed. What is the 95%-confidence interval for this result?