Ling 104, session 07: means & corrs (key)

Author

Affiliation

UC Santa Barbara & JLU Giessen

Published

14 May 2026 12-34-56

rm(list=ls(all=TRUE)); library(magrittr)

1 Exercise 01

Question: We have known for a long time that the frequency of a word is one of several powerful predictors of how quickly people can recognize the word as a word on a screen: more frequent words are recognized faster. We now want to determine whether that’s also true for recently collected data on word recognition/reaction times collected for native speakers of English and Spanish when they were shown both English and Spanish words. Load the file _input/rts.csv into a data frame d. This file contains data from an experimental study on reaction times to words; you can find information about this data set in _input/rts.r. We want to see whether there is a correlation between ZIPFFREQ and RT; for that correlation, check the assumptions of the correlation you should want to compute and collect those results, but – for didactic reasons only – don’t act on the assumptions/results even if they turn out to be not met and pretend they’re fine. Once you’re done, consider briefly the following: ideally, we would want to see whether that frequency effect is observed even if we somehow control for length. (This is an important exercise and data set in the sense that, if you take more stats classes with me, this data set will feature in all of them to develop successively more comprehensive analyses.)

1.1 Hypotheses

The

dependent/response variable is RT;
independent/predictor variable is ZIPFFREQ.

What are the hypotheses?

text hypotheses:
- H₁: there is a negative correlation between the (Zipf) frequency of a word and its recognition/reaction time;
- H₀: there is no negative correlation between the (Zipf) frequency of a word and its recognition/reaction time;
statistical hypotheses:
- H₁: r<0;
- H₀: r≥0.

1.2 Descriptive stats/visualization

We load the data:

rm(list=ls(all=TRUE))
summary(d <- read.delim(
   "_input/rts.csv",
   stringsAsFactors=TRUE))

      CASE            RT             LENGTH         LANGUAGE         GROUP
 Min.   :   1   Min.   : 271.0   Min.   :3.000   english:4023   english :3961
 1st Qu.:2150   1st Qu.: 505.0   1st Qu.:4.000   spanish:3977   heritage:4039
 Median :4310   Median : 595.0   Median :5.000
 Mean   :4303   Mean   : 661.5   Mean   :5.198
 3rd Qu.:6450   3rd Qu.: 732.0   3rd Qu.:6.000
 Max.   :8610   Max.   :4130.0   Max.   :9.000
                NAs    :248
      CORRECT        FREQPMW           ZIPFFREQ     SITE
 correct  :7749   Min.   :   1.00   Min.   :3.000   a:2403
 incorrect: 251   1st Qu.:  17.00   1st Qu.:4.230   b:2815
                  Median :  42.00   Median :4.623   c:2782
                  Mean   :  81.14   Mean   :4.591
                  3rd Qu.: 101.00   3rd Qu.:5.004
                  Max.   :1152.00   Max.   :6.061

We do descriptive stuff: We visualize the dependent variable (the response) and the independent variable (the predictor):

par(mfrow=c(2,2))
hist(d$RT)
hist(d$RT, breaks="FD")
hist(d$ZIPFFREQ)
hist(d$ZIPFFREQ, breaks="FD")

par(mfrow=c(1,1))

Are the two variables normally distributed? No, they are not, but I told you to ignore that for now.

nortest::lillie.test(d$RT)


    Lilliefors (Kolmogorov-Smirnov) normality test

data:  d$RT
D = 0.15117, p-value < 2.2e-16

nortest::lillie.test(d$ZIPFFREQ)


    Lilliefors (Kolmogorov-Smirnov) normality test

data:  d$ZIPFFREQ
D = 0.038074, p-value < 2.2e-16

What are the descriptive results for the correlation between the two variables?

cor(d$ZIPFFREQ, d$RT, use="pairwise.complete.obs") # because of the NA!

[1] -0.1300222

plot(d$ZIPFFREQ, d$RT, pch=16, col="#00000020"); grid()
abline(lm(d$RT ~ d$ZIPFFREQ), col="red")

1.3 Statistical testing

We test the correlation with a Pearson product-moment correlation (with a one-tailed p-value, remember the hypotheses) and a linear regression model:

# the NA arguments are not needed here, but are good habits to adopt (esp. for lm)
cor.test(d$ZIPFFREQ, d$RT, alternative="less", use="pairwise.complete.obs")
summary(lm(RT ~ ZIPFFREQ, data=d, na.action=na.exclude))


    Pearson's product-moment correlation

data:  d$ZIPFFREQ and d$RT
t = -11.544, df = 7750, p-value < 2.2e-16
alternative hypothesis: true correlation is less than 0
95 percent confidence interval:
 -1.00000 -0.11161
sample estimates:
       cor
-0.1300222


Call:
lm(formula = RT ~ ZIPFFREQ, data = d, na.action = na.exclude)

Residuals:
   Min     1Q Median     3Q    Max
-411.4 -153.1  -64.6   71.5 3486.9

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   949.36      25.11   37.80   <2e-16 ***
ZIPFFREQ      -62.46       5.41  -11.54   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 260.9 on 7750 degrees of freedom
  (248 observations deleted due to missingness)
Multiple R-squared:  0.01691,   Adjusted R-squared:  0.01678
F-statistic: 133.3 on 1 and 7750 DF,  p-value: < 2.2e-16

What is the exact p-value?

pt(q=-11.54, df=7750, lower.tail=TRUE)

[1] 7.36185e-31

1.4 Write-up

According to a one-tailed Pearson product-moment correlation, there is a highly significant but apparently very weak negative correlation between RT and ZIPFFREQ (r=-0.13, mult. R²=0.017, t=-11.54, df=7750, p_one-tailed<10^-30). For every one-unit increase of ZIPFFREQ, words are recognized 62.5 ms faster. [show plot]

1.5 Excursus

Now, how we might we – in an amateurish way for now, we’ll do this better later – ‘control’ for the effect of length? We could compute the above correlation between ZIPFFREQ and RT for each length by looping over the data and computing as well as collecting the correlations and the regression line slopes for each length:

correlations <- slopes <- CIlowers <- CIuppers <- rep(NA, length(unique(d$LENGTH)))
for (i in seq(correlations)) {
   d_curr <- d[d$LENGTH==i+2,]
   m_curr <- lm(RT ~ ZIPFFREQ, data=d_curr, na.action=na.exclude)
   correlations[i] <- summary(m_curr)$r.squared %>% sqrt %>% "*"(sign(coef(m_curr)[2]))
   slopes[i] <- coef(m_curr)[2]
   CIlowers[i] <- confint(m_curr)[2,1]
   CIuppers[i] <- confint(m_curr)[2,2]
}

Let’s see what the results are:

data.frame(
   LENGTH=3:9,
   NS=as.numeric(table(d$LENGTH)),
   CORRS=correlations,
   SLOPES=slopes,
   SLOPESlower=CIlowers,
   SLOPESupper=CIuppers)

  LENGTH   NS       CORRS     SLOPES SLOPESlower SLOPESupper
1      3  198 -0.15758496  -69.60829  -133.22773   -5.988848
2      4 1910 -0.10650861  -41.74682   -59.63420  -23.859430
3      5 3066 -0.11900365  -58.07387   -75.43092  -40.716818
4      6 2025 -0.07532891  -40.92908   -64.80203  -17.056142
5      7  583 -0.09729075  -44.04661   -81.89019   -6.203032
6      8  176 -0.32784937 -244.93730  -353.09566 -136.778938
7      9   42 -0.24854012 -735.44933 -1663.79051  192.891851

It seems as if there is a correlation between ZIPFFREQ and RT even if LENGTH is controlled for (by holding it constant): every correlation and every slope between ZIPFFREQ and RT is negative as well and, with the exception of LENGTH being 9, all slopes of ZIPFFREQ have confidence intervals that do not include 0. While this is not how this should be done, it at least recognizes the general need for a multifactorial approach; later courses will tell you how to do this right.

2 Exercise 02

Question: In the same reaction time data, we want to see whether the two speaker groups – the variable GROUP, which distinguished between native speakers of English who were learners of Spanish (english) and heritage speakers of Spanish (heritage) – differed in their average reaction times to words. As above, check the assumptions of the test you should want to compute and collect those results, but – for didactic reasons only – don’t act on the assumptions/results even if they turn out to be not met and pretend they’re fine (This is an important exercise and data set in the sense that, if you take more stats classes with me, this data set will feature in all of them to develop successively more comprehensive analyses.)

2.1 Hypotheses

The

dependent/response variable is RT;
independent/predictor variable is GROUP.

What are the hypotheses?

text hypotheses:
- H₁: there is a difference between the average reaction/recognition times for English learners of Spanish and Heritage speakers of Spanish;
- H₀: there is no difference between the average reaction/recognition times for English learners of Spanish and Heritage speakers of Spanish;
statistical hypotheses:
- H₁: t≠0;
- H₀: t=0.

2.2 Descriptive stats/visualization

We already loaded the data so/but we again now We visualize the dependent variable (the response) and the independent variable (the predictor):

par(mfrow=c(2,2))
hist(d$RT)
hist(d$RT, breaks="FD")
hist(d$RT[d$GROUP=="english"] , main="Engl", breaks="FD")
hist(d$RT[d$GROUP=="heritage"], main="Heri", breaks="FD")

par(mfrow=c(1,1))

Are the two sets of values of the response normally distributed? No, they are not, but I told you to ignore that for now.

tapply(     # apply to 
   d$RT,    # these values
   d$GROUP, # a grouping by these values
   # and run lillie.test on each group but as an anonymous function
   # so we can have R return only the p-value & nothing else
   FUN=nortest::lillie.test)

$english

    Lilliefors (Kolmogorov-Smirnov) normality test

data:  X[[i]]
D = 0.15741, p-value < 2.2e-16


$heritage

    Lilliefors (Kolmogorov-Smirnov) normality test

data:  X[[i]]
D = 0.14387, p-value < 2.2e-16

What are the descriptive results for the correlation between the two variables?

tapply(     # apply to
   d$RT,    # these values
   d$GROUP, # a grouping by these values
   \(af) c(summary(af), "sd"=sd(af, na.rm=TRUE))) # then collect summaries

$english
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.       NAs        sd
 271.0000  499.0000  588.0000  663.9734  737.0000 4130.0000  168.0000  289.1716

$heritage
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.       NAs        sd
 281.0000  514.5000  598.0000  659.0687  729.0000 2594.0000   80.0000  235.4064

boxplot(d$RT ~ d$GROUP, notch=TRUE); grid()

It certainly does seem as if the two speaker groups were about equally fast.

2.3 Statistical testing

Because we ignore the violation of normality for didactic reasons, we test the difference correlation with a t-test for independent samples and also a linear regression model, but now not with a numeric independent variable/predictor (like always before and just above) but a binary independent variable/predictor. For the same didactic reasons as before, we also compute not the usual t-test version (Welch’s t-test), but a version where we force t.test to assume the standard deviations/variances in the two groups are identical; this one is often called Student’s t-test.

t.test(d$RT ~ d$GROUP, var.equal=TRUE)


    Two Sample t-test

data:  d$RT by d$GROUP
t = 0.82051, df = 7750, p-value = 0.412
alternative hypothesis: true difference in means between group english and group heritage is not equal to 0
95 percent confidence interval:
 -6.81303 16.62237
sample estimates:
 mean in group english mean in group heritage
              663.9734               659.0687

summary(lm(RT ~ GROUP, data=d, na.action=na.exclude))


Call:
lm(formula = RT ~ GROUP, data = d, na.action = na.exclude)

Residuals:
   Min     1Q Median     3Q    Max
-393.0 -156.1  -67.0   70.9 3466.0

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)    663.973      4.272 155.431   <2e-16 ***
GROUPheritage   -4.905      5.978  -0.821    0.412
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 263.1 on 7750 degrees of freedom
  (248 observations deleted due to missingness)
Multiple R-squared:  8.686e-05, Adjusted R-squared:  -4.216e-05
F-statistic: 0.6732 on 1 and 7750 DF,  p-value: 0.412

2.4 Write-up

According to a two-tailed Student’s t-test for independent samples, there is no significant difference between the average reaction/recognition times between the English learners of Spanish (mean: 663.97 ms, sd=289.17 ms) and the Heritage speakers of Spanish (mean: 659.07 ms, sd=235.41 ms): t=0.82, df=7750, p_two-tailed0.412). [show plot]

3 Exercise 03

Question: Earlier, we saw in the data from _input/partplacement.csv (see _input/partplacement.r) that the distributions of the lengths of the DOs of the verb-particle constructions differ across the two constructions (using the Kolmogorov-Smirnov test). Then, we followed that up with a test of whether the dispersions of the lengths of the DOs of the verb-particle constructions differ across the two constructions (using the Fligner-Killeen test). You now want to also test whether there is also a difference in the central tendency between the DO lengths in the two constructions. (That, too, could be explainable with processing considerations along the lines of short-before-long (since most particles are just one word or syllable long).)

3.1 Hypotheses

The

dependent/response variable is DO_LENSYLL;
independent/predictor variables is CONSTRUCTION.

What are the hypotheses?

text hypotheses:
- H₁: The mean length of DOs in the construction v_do_prt is different from the mean length of DOS in the construction v_prt_do;
- H₀: The mean length of DOs in the construction v_do_prt is not different from the mean length of DOS in the construction v_prt_do;
statistical hypotheses:
- H₁: t≠0;
- H₀: t=0.

3.2 Descriptive stats/visualization

We load the data:

rm(list=ls(all=TRUE)); library(magrittr)
summary(d <- read.delim(
   "_input/partplacement.csv",
   stringsAsFactors=TRUE))

      CASE          CONSTRUCTION     MEDIUM       DO_COMPLX     DO_LENSYLL
 Min.   :  1.00   v_do_prt:100   spoken :100   clausmod:  6   Min.   : 1.00
 1st Qu.: 50.75   v_prt_do:100   written:100   phrasmod: 67   1st Qu.: 2.00
 Median :100.50                                simple  :127   Median : 3.00
 Mean   :100.50                                               Mean   : 4.72
 3rd Qu.:150.25                                               3rd Qu.: 6.00
 Max.   :200.00                                               Max.   :31.00
      DO_ANIM        DO_CONC      PP
 animate  : 27   abstract: 95   no :167
 inanimate:173   concrete:105   yes: 33

We first describe the data; it seems the pattern could hardly be clearer …

tapply(            # apply to
   d$DO_LENSYLL,   # these values
   d$CONSTRUCTION, # a grouping by these values
   summary)        # then apply hist to each group

$v_do_prt
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
   1.00    1.00    2.00    2.46    3.00    9.00

$v_prt_do
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
   1.00    3.00    6.00    6.98    9.25   31.00

spineplot(d$CONSTRUCTION ~ d$DO_LENSYLL, breaks=12)

boxplot(d$DO_LENSYLL ~ d$CONSTRUCTION, varwidth=TRUE, notch=TRUE); grid()

3.3 Statistical testing

For this question, we would prefer to use a t-test for independent samples, which requires normality of the two groups to be compared, which we already know isn’t the case:

tapply(            # apply to 
   d$DO_LENSYLL,   # these values
   d$CONSTRUCTION, # a grouping by these values
   # and run lillie.test on each group but as an anonymous function
   # so we can have R return only the p-value & nothing else
   FUN=\(af) nortest::lillie.test(af)$p.value)

    v_do_prt     v_prt_do
4.519140e-15 1.897841e-07

Thus, we adjust our hypotheses to the ordinal level to which we fall back now:

text hypotheses:
- H₀: The median length of DOs in the construction v_do_prt is not different from the median length of DOS in the construction v_prt_do;
- H₁: The median length of DOs in the construction v_do_prt is different from the median length of DOS in the construction v_prt_do;
statistical hypotheses:
- H₀: W=0;
- H₁: W>0.

Then we compute the U-test:

wilcox.test(                      # compute a U-test
   d$DO_LENSYLL ~ d$CONSTRUCTION, # of the DO lengths in the two constructions
   correct=FALSE, exact=FALSE)    # no continuity correction, no exact test


    Wilcoxon rank sum test

data:  d$DO_LENSYLL by d$CONSTRUCTION
W = 1509.5, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0

3.4 Write-up

[Show descriptive statistics.] To compare the average DO lengths in the two constructions, a t-test for independent samples was considered, but was not permitted because, according to a Lilliefors test, the DO lengths in the each construction are not normally distributed (both p<10^-6). Instead, a U-test was computed, which concluded that the DO lengths of the two constructions are significantly different from each other (W=1509.5, p<10^-15).

3.5 Excursus

What if our hypothesis had been one-tailed? Because of an expected short-before-long preference, it would have been reasonable and completely defensible to expect that the average length of DOs in v_do_prt is smaller than in v_prt_do. In that case, the test changes to the following:

wilcox.test(                      # compute a U-test
   d$DO_LENSYLL ~ d$CONSTRUCTION, # of the DO lengths in the two constructions
   correct=FALSE, exact=FALSE,    # no continuity correction, no exact test
   alternative="less") # spell out H1, which in the formula case applies to


    Wilcoxon rank sum test

data:  d$DO_LENSYLL by d$CONSTRUCTION
W = 1509.5, p-value < 2.2e-16
alternative hypothesis: true location shift is less than 0

                       # the first level of the predictor (see page 197f.)

The conclusion doesn’t change much: [Show descriptive statistics.] To see whether the average DO length in the construction v_do_prt is indeed shorter than in the construction v_prt_do, a one-tailed t-test for independent samples was considered, but was not permitted because, according to a Lilliefors test, the DO lengths in the each construction are not normally distributed (both p<10^-6). Instead, a one-tailed U-test was computed, which concluded that the average DO length in the construction v_do_prt is indeed shorter than in v_prt_do (W=1509.5, p_one-tailed<10^-15).

4 Exercise 04

Question: Does a particular grammar exercise help improve the performance of second language learners. 10 learners do a grammar test and you note their numbers of correct answers. Then, the learners first do the grammar exercise you want to test and then participate in a second test (with the same number of questions). Again, you note the numbers of correct responses, hoping and expecting that the exercise leads to a better performance.

4.1 Hypotheses

The

dependent/response variable is the number of correct responses;
independent/predictor variable is the point of time of testing.

What are the hypotheses?

text hypotheses:
- H₁: The mean number of correct responses after the exercise is higher than the mean number of correct responses before the exercise;
- H₀: The mean number of correct responses after the exercise is not higher than the mean number of correct responses before the exercise;
statistical hypotheses (with t being based on before minus after):
- H₁: t<0.
- H₀: t&ge0;

4.2 Descriptive stats/visualization

The data are in _input/grmtest.csv, with their explanation in _input/grmtest.r; we load them from there:

rm(list=ls(all=TRUE))
summary(d <- read.delim(
   "_input/grmtest.csv",
   stringsAsFactors=TRUE))

    STUDENT          BEFORE        AFTER
 Min.   : 1.00   Min.   :1.0   Min.   :4.00
 1st Qu.: 3.25   1st Qu.:2.5   1st Qu.:5.25
 Median : 5.50   Median :5.5   Median :6.00
 Mean   : 5.50   Mean   :4.9   Mean   :6.40
 3rd Qu.: 7.75   3rd Qu.:7.0   3rd Qu.:7.75
 Max.   :10.00   Max.   :8.0   Max.   :9.00

This is not the usual case-by-variable format! We can actually do everything we want to do with this so-called wide format but, for consistency, we change this to what we are used to and what is nearly always better or even required:

d <- data.frame(            # create a data frame x
   CASE=1:20,               # with case numbers
   SUBJ=rep(1:10, 2),       # with subject identifiers
   CR=c(d$BEFORE, d$AFTER), # all numbers of correct responses in 1 column/variable
   TT=rep(c("before", "after"), each=10), # all times of testing in 1 column/variable
   stringsAsFactors=TRUE)
summary(d)

      CASE            SUBJ            CR            TT
 Min.   : 1.00   Min.   : 1.0   Min.   :1.00   after :10
 1st Qu.: 5.75   1st Qu.: 3.0   1st Qu.:4.00   before:10
 Median :10.50   Median : 5.5   Median :6.00
 Mean   :10.50   Mean   : 5.5   Mean   :5.65
 3rd Qu.:15.25   3rd Qu.: 8.0   3rd Qu.:7.00
 Max.   :20.00   Max.   :10.0   Max.   :9.00

Out of pure ‘anality’, we change the order of the levels of TT to one that is not alphabetical but chronological. (Actually, it’s not just anality, this kind of stuff can be really important later.)

d$TT <- factor(d$TT, levels=levels(d$TT)[2:1])
summary(d)

      CASE            SUBJ            CR            TT
 Min.   : 1.00   Min.   : 1.0   Min.   :1.00   before:10
 1st Qu.: 5.75   1st Qu.: 3.0   1st Qu.:4.00   after :10
 Median :10.50   Median : 5.5   Median :6.00
 Mean   :10.50   Mean   : 5.5   Mean   :5.65
 3rd Qu.:15.25   3rd Qu.: 8.0   3rd Qu.:7.00
 Max.   :20.00   Max.   :10.0   Max.   :9.00

We first describe the data, bearing in mind these are dependent samples already:

tapply(  # apply to 
   d$CR, # these values
   d$TT, # a grouping by these values
   # and run an anonymous function that returns both
   # the usual summary of the vector and its sd
   \(af) c(summary(af), "sd"=sd(af)))

$before
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.       sd
1.000000 2.500000 5.500000 4.900000 7.000000 8.000000 2.514403

$after
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.       sd
4.000000 5.250000 6.000000 6.400000 7.750000 9.000000 1.837873

summary(test_diffs <- d$CR[d$TT=="before"] - d$CR[d$TT=="after"])

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  -8.00   -3.75   -0.50   -1.50    1.00    3.00

Let’s visualize with the arrows plot:

plot(1, 10, type="n", axes=FALSE, # plot a point at {1,10} NOT, and no axes
     xlim=1:2, ylim=c(0, 10),     # w/ these axis limits &
     xlab="Test", ylab="Correct responses") # these axis labels
   axis(1, at=1:2, labels=c("Before", "After")) # add an x-axis with these tickmarks
   axis(2); box(); grid() # add a y-axis, a box, a grid
arrows(1,                    # draw arrows from the starting points with x=1
       d$CR[d$TT=="before"], # and y= the before scores
       2,                    # to the end points with x=2
       d$CR[d$TT=="after"],  # and y= the after scores
       col=ifelse(test_diffs>0, # if after > before
                  "green",      # make the arrow green
                  "red"))       # otherwise red
arrows(1,                          # draw one arrow from x=1 and
       mean(d$CR[d$TT=="before"]), # y= the mean before score
       2,                          # to x=2 and
       mean(d$CR[d$TT=="after"]),  # y= the mean after score
       lwd=2)                      # with a heavy/bold line

This looks like there might be an improvement (but also a lot of variability).

4.3 Statistical testing

For this question, we would prefer to use a t-test for dependent samples, which requires normality of the pairwise differences, which we check:

nortest::lillie.test(test_diffs)


    Lilliefors (Kolmogorov-Smirnov) normality test

data:  test_diffs
D = 0.16034, p-value = 0.6601

We can proceed with the t-test:

t.test(                    # compute the t-test
   x=d$CR[d$TT=="before"], # comparing the before...
   y=d$CR[d$TT=="after"],  # ... to the after scores
   paired=TRUE,            # as two dependent samples
   alternative="less")     # spell out H1: x should be lower!


    Paired t-test

data:  d$CR[d$TT == "before"] and d$CR[d$TT == "after"]
t = -1.3072, df = 9, p-value = 0.1118
alternative hypothesis: true mean difference is less than 0
95 percent confidence interval:
      -Inf 0.6034255
sample estimates:
mean difference
           -1.5

Yes, we could have gotten this with t.test(x=d$BEFORE, y=d$AFTER, paired=TRUE, alternative="less"), but now we’re getting it from the data being organized the right way … Also, see the excursus below for why this would have also worked: t.test(d$BEFORE-d$AFTER, mu=0, alternative="less").

4.4 Write-up

According to a one-tailed t-test for dependent samples, the before results (mean=4.9, sd=2.51) and the after results (mean=6.4, sd=1.84) are not significantly different from each other (t=-1.307, df=9, p_one-tailed=0.1118).

4.5 Excursus

Note that the t-test for dependent samples A and B is actually a test for whether the pairwise differences of A and B have a mean of 0:

# (using a bidirectional test for ease of exposition)
t.test(test_diffs, mu=0) # two-tailed


    One Sample t-test

data:  test_diffs
t = -1.3072, df = 9, p-value = 0.2235
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -4.095737  1.095737
sample estimates:
mean of x
     -1.5

t.test(test_diffs, mu=0, alternative="less") # our one-tailed


    One Sample t-test

data:  test_diffs
t = -1.3072, df = 9, p-value = 0.1118
alternative hypothesis: true mean is less than 0
95 percent confidence interval:
      -Inf 0.6034255
sample estimates:
mean of x
     -1.5

Also, given the really small number of data points, it might still be advisable to also check with a Wilcoxon test:

wilcox.test(               # compute the Wilcoxon test
   x=d$CR[d$TT=="before"], # comparing the before...
   y=d$CR[d$TT=="after"],  # ... to the after scores
   paired=TRUE,            # as two dependent samples
   correct=FALSE,          # don't use a continuity correction
   alternative="less")     # spell out H1: x should be lower!


    Wilcoxon signed rank exact test

data:  d$CR[d$TT == "before"] and d$CR[d$TT == "after"]
V = 17.5, p-value = 0.1836
alternative hypothesis: true location shift is less than 0

Ok, this confirms our result, excellent: According to a one-tailed Wilcoxon test for dependent samples, the before results (median=5.5, IQR=4.5) and the after results (median=6, IQR=2.5) are not significantly different from each other (V=13.5, p_one-tailed=0.1421).

How might one test this using a simulation approach? Like this, and its one-tailed p-value is really not all that different from the one we got from the t-test for dependent samples:

set.seed(123) # set a random number generator
collector <- rep(NA, 2000) # set up a collector vector
for (i in 1:2000) { # do the following 2000 times
   sampled_TT <- sample(d$TT) # reorder the testing times randomly
   collector[i] <- diff( # store the difference between 
      tapply(        # the two means resulting from
         d$CR,       # grouping these values
         sampled_TT, # by these values
         mean))      # & computing the mean for each group
}
hist(collector); abline(v=1.5, lty=2, col="red")

sum(collector>=1.5) / 2000

[1] 0.0915

5 Exercise 05

What does this script exemplify?

rm(list=ls(all=TRUE)); set.seed(12) # clear all memory
x1 <- rnorm(50)  ; y1 <- rnorm(50)
x2 <- rnorm(50)+5; y2 <- rnorm(50)+5
x <- append(x1, x2); y <- append(y1, y2)
plot(y ~ x, type="n"); abline(lm(y ~ x)); grid()
   points(y1 ~ x1, col="#0000FF40", pch=19); abline(lm(y1 ~ x1), col="#0000FF40")
   points(y2 ~ x2, col="#FF000040", pch=19); abline(lm(y2 ~ x2), col="#FF000040")
text(0, 6, "r for the\nblack line=0.88")

It exemplifies that interpreting a correlation without visual inspection of the data is really dangerous: In a way that should remind you of slides 10-11 of session 1, if you just compute a correlation for all data points, you are lead to believe that there is a fairly strong positive correlation. A visual inspection (supported by the different colors of points and regression lines) immediately reveals that the data set actually appears to consist of two separate data sets, in each of which no correlation whatsoever can be found. The high r-value for the black regression line could only be taken somewhat seriously if the data looked like this, where the same correlation was found in both the blue and the red data:

rm(list=ls(all=TRUE)); set.seed(12) # clear all memory
x1 <- rnorm(50)  ; y1 <- x1+rnorm(50, 0, 0.5)
x2 <- rnorm(50)+5; y2 <- x2+rnorm(50, 0, 0.5)
x <- append(x1, x2); y <- append(y1, y2)
plot(y ~ x, type="n"); abline(lm(y ~ x)); grid()
   points(y1 ~ x1, col="#0000FF40", pch=19); abline(lm(y1 ~ x1), col="#0000FF80")
   points(y2 ~ x2, col="#FF000040", pch=19); abline(lm(y2 ~ x2), col="#FF000080")
text(0, 6, "r for every\nline > 0.88")

6 Exercise 06

Question: Deane (1987) posited that the choice of the of-genitive (e.g., the speech of the President) vs. the s-genitive (e.g., the President’s speech) was explainable with regard to the relevant referents’ position on an 11-point cognitive-entrenchment hierarchy (from 1 ‘lowest’ to 11 ‘highest’). Gries (1999) wanted to determine whether this entrenchment hierarchy would also explain the verb-particle construction alternation. He created examples of both constructions that used DOs from all 11 entrenchment levels (yielding 22 factor level combinations). Subjects were then given test sentences representing these factor level combinations and were instructed to rate the acceptability of the test sentences; 60 judgments were collected per factor level combination. The file in _input/entrenchment.csv contains the average judgments for each of the 22 combinations resulting from the experiment, see _input/entrenchment.r for the usual explanations. Based on Deane’s work, Gries had the hypotheses listed below; your job here is to test them and interpret the results.

6.1 Hypotheses

The

dependent/response variable is JUDGMENT;
independent/predictor variable is ENTRENCHMENT.

but separately for each construction!

What are the hypotheses?

text hypotheses for V_DO_PRT:
- H₁: there is a positive correlation between the degree of entrenchment of the DO and the acceptability of the sentence;
- H₀: there is no positive correlation between the degree of entrenchment of the DO and the acceptability of the sentence;
statistical hypotheses for V_DO_PRT:
- H₁: some correlation coefficient is >0;
- H₀: some correlation coefficient is ≤0.
text hypotheses for V_PRT_DO:
- H₁: there is a negative correlation between the degree of entrenchment of the DO and the acceptability of the sentence;
- H₀: there is no negative correlation between the degree of entrenchment of the DO and the acceptability of the sentence;
statistical hypotheses for V_PRT_DO:
- H₁: some correlation coefficient is <0;
- H₀: some correlation coefficient is ≥0.

6.2 Descriptive stats/visualization

We load the data:

rm(list=ls(all=TRUE))
summary(d <- read.delim(
   "_input/entrenchment.csv",
   stringsAsFactors=TRUE))

      CASE         CONSTRUCTION  ENTRENCHMENT      JUDGMENT
 Min.   : 1.00   v_do_prt:11    Min.   : 1.00   Min.   :26.20
 1st Qu.: 6.25   v_prt_do:11    1st Qu.: 3.25   1st Qu.:59.32
 Median :11.50                  Median : 6.00   Median :72.88
 Mean   :11.50                  Mean   : 6.00   Mean   :69.08
 3rd Qu.:16.75                  3rd Qu.: 8.75   3rd Qu.:84.56
 Max.   :22.00                  Max.   :11.00   Max.   :92.23

We compute descriptive correlations:

d_split <- split(d, d$CONSTRUCTION, drop=TRUE)
with(d_split$v_do_prt, {
   c("Spearman's rho"=cor(ENTRENCHMENT, JUDGMENT, method="spearman"),
     "Kendall's tau" =cor(ENTRENCHMENT, JUDGMENT, method="kendall")) })

Spearman's rho  Kendall's tau
     0.7727273      0.6000000

with(d_split$v_prt_do, {
   c("Spearman's rho"=cor(ENTRENCHMENT, JUDGMENT, method="spearman"),
     "Kendall's tau" =cor(ENTRENCHMENT, JUDGMENT, method="kendall")) })

Spearman's rho  Kendall's tau
    -0.6000000     -0.4181818

This looks great! Let’s visualize the trend:

plot(d$JUDGMENT ~ d$ENTRENCHMENT,
   xlab="Entrenchment"         , xlim=c(1,  11),
   ylab="Average acceptability", ylim=c(0, 100),
   pch=substr(d$CONSTRUCTION, 3, 3), # pch = what's after the verb
   col=rainbow(2)[d$CONSTRUCTION]); grid() # in two different colors; add grid
# add a regression line for v_do_prt
abline(model_d <- lm(
   JUDGMENT ~ ENTRENCHMENT, data=d,
   subset=CONSTRUCTION==levels(CONSTRUCTION)[1]),
   col=rainbow(2)[1])
# add a regression line for v_do_prt
abline(model_p <- lm(
   JUDGMENT ~ ENTRENCHMENT, data=d,
   subset=CONSTRUCTION==levels(CONSTRUCTION)[2]),
   col=rainbow(2)[2])
# add a legend
legend(x=4, xjust=0.5, y=20, yjust=0.5, # add a centered legend at these coordinates
   legend=levels(d$CONSTRUCTION), # w/ these legend labels and
   pch=substr(as.character(levels(d$CONSTRUCTION)), 3, 3), # these point characters &
   col=rainbow(2), bty="n")       # w/ these colors, no box

6.3 Statistical testing

Let’s again use both ordinal correlation coefficients (Gries 1999 used Kendall’s τ only):

with(d_split$v_do_prt, { list(
   cor.test(ENTRENCHMENT, JUDGMENT, method="spearman", alternative="greater"),
   cor.test(ENTRENCHMENT, JUDGMENT, method="kendall", alternative="greater")
)})

[[1]]

    Spearman's rank correlation rho

data:  ENTRENCHMENT and JUDGMENT
S = 50, p-value = 0.004031
alternative hypothesis: true rho is greater than 0
sample estimates:
      rho
0.7727273


[[2]]

    Kendall's rank correlation tau

data:  ENTRENCHMENT and JUDGMENT
T = 44, p-value = 0.004973
alternative hypothesis: true tau is greater than 0
sample estimates:
tau
0.6

with(d_split$v_prt_do, { list(
   cor.test(ENTRENCHMENT, JUDGMENT, method="spearman", alternative="less"),
   cor.test(ENTRENCHMENT, JUDGMENT, method="kendall", alternative="less")
)})

[[1]]

    Spearman's rank correlation rho

data:  ENTRENCHMENT and JUDGMENT
S = 352, p-value = 0.02809
alternative hypothesis: true rho is less than 0
sample estimates:
 rho
-0.6


[[2]]

    Kendall's rank correlation tau

data:  ENTRENCHMENT and JUDGMENT
T = 16, p-value = 0.04328
alternative hypothesis: true tau is less than 0
sample estimates:
       tau
-0.4181818

Everything’s significant in our one-tailed tests. And yet, if one pays attention, at least half of it is actually problematic. We can see that from fitting a smoother, which one should always do:

# this is all as before:
plot(d$JUDGMENT ~ d$ENTRENCHMENT,
   xlab="Entrenchment",        , xlim=c(1, 11),
   ylab="Average acceptability", ylim=c(0, 100),
   pch=substr(d$CONSTRUCTION, 3, 3), # pch = what's after the verb
   col=rainbow(2)[d$CONSTRUCTION]); grid() # in two different colors; add grid
# add a legend
legend(x=4, xjust=0.5, y=20, yjust=0.5, # add a centered legend at these coordinates
   legend=levels(d$CONSTRUCTION), # w/ these legend labels and
   pch=substr(as.character(levels(d$CONSTRUCTION)), 3, 3), # these point characters &
   col=rainbow(2), bty="n")       # w/ these colors, no box
# now the new stuff: add the smoother
lines(lowess(d_split$v_do_prt$JUDGMENT ~ 1:11), col=rainbow(2)[1])
lines(lowess(d_split$v_prt_do$JUDGMENT ~ 1:11), col=rainbow(2)[2])

The correlation for v_prt_do looks not that great anymore. In fact, the overall negative correlation is completely due to the highest three entrenchment levels:

with(d[d$ENTRENCHMENT<9,], {
plot(JUDGMENT ~ ENTRENCHMENT,
   xlab="Entrenchment",        , xlim=c(1, 11),
   ylab="Average acceptability", ylim=c(0, 100),
   pch=substr(CONSTRUCTION, 3, 3), # pch = what's after the verb
   col=rainbow(2)[CONSTRUCTION]); grid() # in two different colors; add grid
# add a regression line for v_do_prt
abline(model_d <- lm(
   JUDGMENT ~ ENTRENCHMENT,
   subset=CONSTRUCTION==levels(CONSTRUCTION)[1]),
   col=rainbow(2)[1])
# add a regression line for v_do_prt
abline(model_p <- lm(
   JUDGMENT ~ ENTRENCHMENT,
   subset=CONSTRUCTION==levels(CONSTRUCTION)[2]),
   col=rainbow(2)[2])
# add a legend
legend(x=4, xjust=0.5, y=20, yjust=0.5, # add a centered legend at these coordinates
   legend=levels(d$CONSTRUCTION), # w/ these legend labels and
   pch=substr(as.character(levels(d$CONSTRUCTION)), 3, 3), # these point characters &
   col=rainbow(2), bty="n")       # w/ these colors, no box
})

Now the correlations are much worse – the ones for v_do_prt mostly due to the reduction in sample size, but the ones for v_prt_do because now the only thing that’s responsible for it is gone …

with(d_split$v_do_prt, { list(
   cor.test(ENTRENCHMENT[1:8], JUDGMENT[1:8], method="spearman", alternative="greater"),
   cor.test(ENTRENCHMENT[1:8], JUDGMENT[1:8], method="kendall", alternative="greater")
)})

[[1]]

    Spearman's rank correlation rho

data:  ENTRENCHMENT[1:8] and JUDGMENT[1:8]
S = 32, p-value = 0.05749
alternative hypothesis: true rho is greater than 0
sample estimates:
      rho
0.6190476


[[2]]

    Kendall's rank correlation tau

data:  ENTRENCHMENT[1:8] and JUDGMENT[1:8]
T = 21, p-value = 0.05434
alternative hypothesis: true tau is greater than 0
sample estimates:
tau
0.5

with(d_split$v_prt_do, { list(
   cor.test(ENTRENCHMENT[1:8], JUDGMENT[1:8], method="spearman", alternative="less"),
   cor.test(ENTRENCHMENT[1:8], JUDGMENT[1:8], method="kendall", alternative="less")
)})

[[1]]

    Spearman's rank correlation rho

data:  ENTRENCHMENT[1:8] and JUDGMENT[1:8]
S = 86, p-value = 0.4884
alternative hypothesis: true rho is less than 0
sample estimates:
        rho
-0.02380952


[[2]]

    Kendall's rank correlation tau

data:  ENTRENCHMENT[1:8] and JUDGMENT[1:8]
T = 14, p-value = 0.5476
alternative hypothesis: true tau is less than 0
sample estimates:
tau
  0

The sensitivity of the regression line to what the three highest ENTRENCHMENT values do is shown in the animation here; note how the regression line only gets to have a negative slope when ENTRENCHMENT becomes 9 or higher:

plot(d$JUDGMENT[12:22] ~ d$ENTRENCHMENT[12:22],
     type="n", lty=2, lwd=2,
     xlim=c(1, 11), ylim=c(0, 100),
     xlab="Entrenchment", ylab="Mean acceptability judgment"); grid()
points(d$JUDGMENT[12:13] ~ d$ENTRENCHMENT[12:13], type="p", pch="p", col=rainbow(2)[2])
for (i in 14:22) {
   points(d$JUDGMENT[i] ~ d$ENTRENCHMENT[i], type="p", pch="p", col=rainbow(2)[2])
   abline(lm((d$JUDGMENT[12:i] ~ d$ENTRENCHMENT[12:i])), col=rainbow(2)[2])
   Sys.sleep(1)
}

6.4 Write-up

Initial correlations were promising: [show plot and initial correlation coefficients]. However, closer inspection (visually, with smoothers and subsets of the data [explain in more detail]) indicate that, while the positive correlation for v_do_prt seems supported overall, the negative correlation for v_prt_do seems entirely due to a bifurcation in the data:

when ENTRENCHMENT is 8 or lower, there is no correlation with JUDGMENT and the sentences exhibit an overall high degree of acceptability (of ≈70);
when ENTRENCHMENT is 9 or higher, there is no correlation with JUDGMENT and the sentences exhibit an overall low degree of acceptability (of ≈30).

Thus, it seems that one might consider the overall entrenchment hierarchy supported for v_do_prt, but not for v_prt_do, where, if anything, the 11-point entrenchment hierarchy collapses into a two-way distinction of 1st/2nd/3rd person pronouns (ENTRENCHMENT values in the interval [9, 11]) and everything else. While the acceptability differences of those two groups of ENTRENCHMENT values is still compatible with the overall thrust of the hypothesis, it seems to certainly weakens the overall utility of Deane’s hierarchy for particle placement. At least if the analysis wasn’t a load of crap in one other major way as well, namely …?

6.5 Excursus

Is the difference in the judgments between ENTRENCHMENT being 1:8 vs. being 9:11 significant? Because if it is, that could be at least a tiny bit of an effect as desired …

tapply(
   d_split$v_prt_do$JUDGMENT,
   d_split$v_prt_do$ENTRENCHMENT<9,
   mean) # a difference of -40.13333

   FALSE     TRUE
29.76667 69.90000

A t-test for independent samples doesn’t seem like a good idea because one of the two groups of course has only 3 data points. Thus, we do something that I haven’t mentioned so far (and won’t mention again so, breathe, “this isn’t on the test”): permutation-based exact tests. They are somewhat ‘similar’ (can I hedge even more?) to the bootstrapping/simulation approaches, but this time we don’t draw randomly (with replacement) a certain number of times, this time we literally generate all possible combinations of the data exhaustively and count how many of all possible results are as or more extreme than ours. Step 1: all possible ways to take three values from the 11 for v_prt_do:

(all_ways_to_take_3 <- t(combn(1:11, 3)))

       [,1] [,2] [,3]
  [1,]    1    2    3
  [2,]    1    2    4
  [3,]    1    2    5
  [4,]    1    2    6
  [5,]    1    2    7
  [6,]    1    2    8
  [7,]    1    2    9
  [8,]    1    2   10
  [9,]    1    2   11
 [10,]    1    3    4
 [11,]    1    3    5
 [12,]    1    3    6
 [13,]    1    3    7
 [14,]    1    3    8
 [15,]    1    3    9
 [16,]    1    3   10
 [17,]    1    3   11
 [18,]    1    4    5
 [19,]    1    4    6
 [20,]    1    4    7
 [21,]    1    4    8
 [22,]    1    4    9
 [23,]    1    4   10
 [24,]    1    4   11
 [25,]    1    5    6
 [26,]    1    5    7
 [27,]    1    5    8
 [28,]    1    5    9
 [29,]    1    5   10
 [30,]    1    5   11
 [31,]    1    6    7
 [32,]    1    6    8
 [33,]    1    6    9
 [34,]    1    6   10
 [35,]    1    6   11
 [36,]    1    7    8
 [37,]    1    7    9
 [38,]    1    7   10
 [39,]    1    7   11
 [40,]    1    8    9
 [41,]    1    8   10
 [42,]    1    8   11
 [43,]    1    9   10
 [44,]    1    9   11
 [45,]    1   10   11
 [46,]    2    3    4
 [47,]    2    3    5
 [48,]    2    3    6
 [49,]    2    3    7
 [50,]    2    3    8
 [51,]    2    3    9
 [52,]    2    3   10
 [53,]    2    3   11
 [54,]    2    4    5
 [55,]    2    4    6
 [56,]    2    4    7
 [57,]    2    4    8
 [58,]    2    4    9
 [59,]    2    4   10
 [60,]    2    4   11
 [61,]    2    5    6
 [62,]    2    5    7
 [63,]    2    5    8
 [64,]    2    5    9
 [65,]    2    5   10
 [66,]    2    5   11
 [67,]    2    6    7
 [68,]    2    6    8
 [69,]    2    6    9
 [70,]    2    6   10
 [71,]    2    6   11
 [72,]    2    7    8
 [73,]    2    7    9
 [74,]    2    7   10
 [75,]    2    7   11
 [76,]    2    8    9
 [77,]    2    8   10
 [78,]    2    8   11
 [79,]    2    9   10
 [80,]    2    9   11
 [81,]    2   10   11
 [82,]    3    4    5
 [83,]    3    4    6
 [84,]    3    4    7
 [85,]    3    4    8
 [86,]    3    4    9
 [87,]    3    4   10
 [88,]    3    4   11
 [89,]    3    5    6
 [90,]    3    5    7
 [91,]    3    5    8
 [92,]    3    5    9
 [93,]    3    5   10
 [94,]    3    5   11
 [95,]    3    6    7
 [96,]    3    6    8
 [97,]    3    6    9
 [98,]    3    6   10
 [99,]    3    6   11
[100,]    3    7    8
[101,]    3    7    9
[102,]    3    7   10
[103,]    3    7   11
[104,]    3    8    9
[105,]    3    8   10
[106,]    3    8   11
[107,]    3    9   10
[108,]    3    9   11
[109,]    3   10   11
[110,]    4    5    6
[111,]    4    5    7
[112,]    4    5    8
[113,]    4    5    9
[114,]    4    5   10
[115,]    4    5   11
[116,]    4    6    7
[117,]    4    6    8
[118,]    4    6    9
[119,]    4    6   10
[120,]    4    6   11
[121,]    4    7    8
[122,]    4    7    9
[123,]    4    7   10
[124,]    4    7   11
[125,]    4    8    9
[126,]    4    8   10
[127,]    4    8   11
[128,]    4    9   10
[129,]    4    9   11
[130,]    4   10   11
[131,]    5    6    7
[132,]    5    6    8
[133,]    5    6    9
[134,]    5    6   10
[135,]    5    6   11
[136,]    5    7    8
[137,]    5    7    9
[138,]    5    7   10
[139,]    5    7   11
[140,]    5    8    9
[141,]    5    8   10
[142,]    5    8   11
[143,]    5    9   10
[144,]    5    9   11
[145,]    5   10   11
[146,]    6    7    8
[147,]    6    7    9
[148,]    6    7   10
[149,]    6    7   11
[150,]    6    8    9
[151,]    6    8   10
[152,]    6    8   11
[153,]    6    9   10
[154,]    6    9   11
[155,]    6   10   11
[156,]    7    8    9
[157,]    7    8   10
[158,]    7    8   11
[159,]    7    9   10
[160,]    7    9   11
[161,]    7   10   11
[162,]    8    9   10
[163,]    8    9   11
[164,]    8   10   11
[165,]    9   10   11

Step 2: we take each row of all_ways_to_take_3 and use it to

split the 11 values of x_split$v_prt_do$JUDGMENT into
- the 3 values that are indexed in the row of all_ways_to_take_3;
- the remaining 8; and to then
compute the difference between the means of JUDGMENT of those two groups:

all_means_differences <- apply(
   all_ways_to_take_3,
   1,
   \(af) {
      selector <- rep(FALSE, 11);
      selector[af] <- TRUE
      diff(tapply(
         d_split$v_prt_do$JUDGMENT,
         selector,
         mean)) })

Step 3: we check how many of all the theoretically possible differences between those means is as small or smaller than ours and express that as a p-value:

"/"(
   sum(all_means_differences<=-40.13333),
   length(all_means_differences))

[1] 0.006060606

Thus, according to an exact (permutation-based) t-test for independent samples, the averages of JUDGMENT for ENTRENCHMENT being 1 to 8 and for ENTRENCHMENT being 9 to 11 is significant (p_one-tailed=0.006).

On my computer, there’s of course a function for that:

exact.t.test.indep(
   d_split$v_prt_do$JUDGMENT[1:8 ],
   d_split$v_prt_do$JUDGMENT[9:11],
   tails=1)

7 Session info

sessionInfo()

R version 4.6.0 (2026-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Linux Mint 22.3

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: America/Los_Angeles
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  compiler  methods
[8] base

other attached packages:
[1] STGmisc_1.06   Rcpp_1.1.1-1.1 magrittr_2.0.5

loaded via a namespace (and not attached):
 [1] digest_0.6.39     fastmap_1.2.0     xfun_0.57         nortest_1.0-4
 [5] knitr_1.51        htmltools_0.5.9   rmarkdown_2.31    cli_3.6.6
 [9] rstudioapi_0.18.0 tools_4.6.0       evaluate_1.0.5    yaml_2.3.12
[13] otel_0.2.0        htmlwidgets_1.6.4 rlang_1.2.0       jsonlite_2.0.0
[17] MASS_7.3-65