#+ fig.width=12, fig.height=8
#### Chapter 4: Analytical statistics
# The file <104_03-04_vpcs.csv> contains data from a corpus study on the alternation of particle placement that was introduced in Section 1.3.1.
# - column 1: the number of the data point
# - column 2: whether the example is from spoken or written language
# - column 3: which construction is used
# - column 4: how complex the direct object is (3 levels)
# - column 5: how long the direct object is (in syllables)
# - column 6: whether the verb-particle construction is followed by a directional PP (2 levels)
# - column 7: whether the referent of the direct object is animate or inanimate (2 levels)
# - column 8: whether the referent of the direct object is concrete or abstract (2 levels)
# Read in this file, make the column names available, and test whether the input was successful.
# (1) Last time, we saw that the distributions of the lengths of the direct objects of the verb-particle constructions differ across the two constructions (using the Kolmogorov-Smirnov test). You now want to test whether you can pinpoint what exactly is responsible for this result and you want to test whether the lengths of the direct objects of the two verb-particle constructions differ in their dispersions.
# (a) Formulate the text and statistical hypotheses for this study.
# (b) Explore/summarize the data and represent them graphically.
# (c) Compute the required statistical test and briefly summarize the result.
# (2) You want to test whether the average length of all direct objects that you have in your data corresponds to the mean length of direct objects in general as reported in Boring (2007), who found an average length of 5.8 syllables (mean) / 6 syllables (median).
# (a) Formulate the text and statistical hypotheses for this study.
# (b) Explore/summarize the data and represent them graphically.
# (c) Compute the required statistical test and briefly summarize the result.
# (3) Ihalainen (1991) investigates the frequencies of different explicitly marked habitual past tense forms in East and West Somerset dialects and reports the following distribution:
# Dialect Did Used to Would Totals
# East 20 36 16 72
# West 2 48 33 83
# Totals 22 84 49 155
# While Ihalainen reports the results of a chi-squared test for the overall frequencies of East and West Somerset dialect (72 vs. 83), he does not report the results of a chi-squared test for the overall table. Improve on his analysis:
# (a) Formulate the text and statistical hypotheses for this study.
# (b) Explore/summarize the data and represent them graphically.
# (c) Compute the required statistical test(s) and measures and briefly summarize the result.