Ling 202: session 07: ordinal regression (key)

Author
Affiliations

UC Santa Barbara

JLU Giessen

Published

14 May 2025 12-34-56

1 Intro

Let’s load a data set for an example of an ordinal model; the data are in _input/agreementratings.csv and you can find information about the variables/columns in _input/agreementratings.r. Importantly, we make sure the response variable is an ordered factor:

rm(list=ls(all.names=TRUE))
library(car); library(effects); library(MASS) # & now load small helper functions:
source("_helpers/R2s.r"); source("_helpers/summarize.polr.r")
str(d <- read.delim(       # summarize d, the result of loading
   file="_input/agreementratings.csv", # this file
   stringsAsFactors=TRUE)) # change categorical variables into factors
'data.frame':   4300 obs. of  8 variables:
 $ CASE      : int  1 2 3 4 5 6 7 8 9 10 ...
 $ JUDGMENT  : int  3 -3 3 3 -3 2 -2 2 -3 3 ...
 $ AGREEMENT : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 2 ...
 $ NEGATION  : Factor w/ 2 levels "affirmative",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ EMPHPRON  : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
 $ SUBJORDER : Factor w/ 2 levels "gramsem","semgram": 1 1 1 1 1 1 1 1 2 2 ...
 $ NUMBERGRAM: Factor w/ 2 levels "plural","singular": 1 1 1 1 1 1 1 1 1 1 ...
 $ PERSONGRAM: Factor w/ 3 levels "first","second",..: 3 3 3 3 3 3 3 3 3 3 ...
d$JUDGMENT <- factor(d$JUDGMENT, ordered=TRUE)
summary(d)
      CASE      JUDGMENT  AGREEMENT         NEGATION    EMPHPRON    SUBJORDER    NUMBERGRAM    PERSONGRAM
 Min.   :   1   -3: 620   no :2077   affirmative:2139   no :2133  gramsem:2173  plural  :1989   first :1381
 1st Qu.:1137   -2: 480   yes:2223   negative   :2161   yes:2167  semgram:2127  singular:2311   second:1561
 Median :2262   -1: 387                                                                         third :1358
 Mean   :2266   0 : 329
 3rd Qu.:3394   1 : 355
 Max.   :4528   2 : 826
                3 :1303

We want to determine whether the rating of a Spanish construction involving the semantic role of an experiencer is predictable from

  • AGREEMENT, NEGATION, EMPHPRON, SUBJORDER, NUMPERGRAM, and PERSONGRAM;
  • any interaction of AGREEMENT with any of the other predictors.

2 Deviance & baseline(s)

Let’s already compute a multinomial kind of baseline for what will be the response variable, JUDGMENT:

(baselines <- c(
   "baseline 1"=max(          # make baselines[1] the highest
      prop.table(             # proportion in the
         table(d$JUDGMENT))), # frequency table of the response
   "baseline 2"=sum(             # make baselines[2] the sum of the
      prop.table(                # proportions in the
         table(d$JUDGMENT))^2))) # frequency table of the response squared
baseline 1 baseline 2
 0.3030233  0.1827431 

But this is actually not ideal here because it only checks whether a prediction is correct or not, but not how far off it is. If JUDGMENT is 2 and the model predicts 3, it is an incorrect prediction, but it’s also clear that the prediction is much better than if it had been -3, which neither of these baselines can reflect. To address this, let’s compute a baseline that takes the ordinal nature of the response (and the predictions) into consideration. We will use a very simple/crude simulation approach for this: We will randomly reshuffle the response variable 1000 times and compute for each reshuffling how far the random reshufflings are away from the actual data, where the ‘distance’ is operationalized as the sum of the absolute differences between observed and reshuffled ranks. This is how one might visualize that with a small histogram of those reshuffled sums-of-absolute-deviations; I am adding

  • the quantiles one might use as a 95% confidence interval (in orange);
  • the sum that corresponds to baseline 1 (in blue);
  • the worst possible prediction result, i.e. the score that would result from always guessing as far away as possible from the actual score, (in red):
op <- par(mar=c(4,4,1,1))
collector <- numeric(1000); set.seed(1); for (i in seq(collector)) {
   random.judgments <- sample(d$JUDGMENT)
   collector[i] <- "-"(
      as.numeric(random.judgments),
      as.numeric(d$JUDGMENT))       %>%
      abs %>% sum
}
sim.eval.plot.1 <- function () {
   hist(collector, xlim=c(0, 22000), main="")
      abline(v=quantile(collector, probs=c(0.025, 0.5, 0.975)), lty=2, col="orange")
      abline(v=sum(abs(as.numeric(d$JUDGMENT)-7)), lty=2, col="blue")
      abline(v=sum(pmax(7-as.numeric(d$JUDGMENT), abs(1-as.numeric(d$JUDGMENT)))), lty=2, col="red")
}; sim.eval.plot.1()

par(op)

With this, we also update our vector baselines:

baselines["simulated"] <- median(collector); baselines
  baseline 1   baseline 2    simulated
3.030233e-01 1.827431e-01 1.070700e+04 

Let’s compute the deviance of the null model:

deviance(m.00 <- polr(JUDGMENT ~ 1, Hess=TRUE, data=d, na.action=na.exclude))
[1] 15669.11

3 Exploration & preparation

Some exploration of the relevant variables:

# the predictor(s)/response on its/their own
# just check the summary for the main effects
table(d$NEGATION, d$AGREEMENT)

                no  yes
  affirmative 1049 1090
  negative    1028 1133
table(d$EMPHPRON, d$AGREEMENT)

        no  yes
  no  1038 1095
  yes 1039 1128
table(d$SUBJORDER, d$AGREEMENT)

            no  yes
  gramsem 1047 1126
  semgram 1030 1097
table(d$NUMBERGRAM, d$AGREEMENT)

             no  yes
  plural    926 1063
  singular 1151 1160
table(d$PERSONGRAM, d$AGREEMENT)

          no yes
  first  657 724
  second 809 752
  third  611 747
# the predictor(s) w/ the response
table(d$AGREEMENT, d$JUDGMENT)

       -3  -2  -1   0   1   2   3
  no  463 304 225 158 190 344 393
  yes 157 176 162 171 165 482 910
ftable(d$NEGATION, d$AGREEMENT, d$JUDGMENT)
                  -3  -2  -1   0   1   2   3

affirmative no   231 146 114  86  96 176 200
            yes   80  94  75  78  83 224 456
negative    no   232 158 111  72  94 168 193
            yes   77  82  87  93  82 258 454
ftable(d$EMPHPRON, d$AGREEMENT, d$JUDGMENT)
          -3  -2  -1   0   1   2   3

no  no   240 149 113  78  99 171 188
    yes   74  82  66  91  75 242 465
yes no   223 155 112  80  91 173 205
    yes   83  94  96  80  90 240 445
ftable(d$SUBJORDER, d$AGREEMENT, d$JUDGMENT)
              -3  -2  -1   0   1   2   3

gramsem no   253 159 124  87  97 161 166
        yes   98 107 104  98  99 254 366
semgram no   210 145 101  71  93 183 227
        yes   59  69  58  73  66 228 544
ftable(d$NUMBERGRAM, d$AGREEMENT, d$JUDGMENT)
               -3  -2  -1   0   1   2   3

plural   no   162 113  89  70  97 184 211
         yes   64  84  75  88  80 243 429
singular no   301 191 136  88  93 160 182
         yes   93  92  87  83  85 239 481
ftable(d$PERSONGRAM, d$AGREEMENT, d$JUDGMENT)
             -3  -2  -1   0   1   2   3

first  no   173 112  85  50  51  81 105
       yes   64  64  61  52  50 141 292
second no   190 122  86  65  71 139 136
       yes   41  51  47  62  62 173 316
third  no   100  70  54  43  68 124 152
       yes   52  61  54  57  53 168 302

4 Modeling & numerical interpretation

Let’s fit our initial regression model:

summary(m.01 <- polr(     # summarize m.01, which models
   JUDGMENT ~ 1 +         # JUDGMENT ~ an overall intercept (1)
   # AGREEMENT interacting w/ all other predictors
   AGREEMENT * (NEGATION+EMPHPRON+SUBJORDER+NUMBERGRAM+PERSONGRAM),
   Hess=TRUE,             # recommended for using summary
   data=d,                # those vars are in d
   na.action=na.exclude)) # skip cases with NA/missing data
Call:
polr(formula = JUDGMENT ~ 1 + AGREEMENT * (NEGATION + EMPHPRON +
    SUBJORDER + NUMBERGRAM + PERSONGRAM), data = d, na.action = na.exclude,
    Hess = TRUE)

Coefficients:
                                   Value Std. Error t value
AGREEMENTyes                     1.00762    0.23047  4.3720
NEGATIONnegative                -0.07963    0.07825 -1.0177
EMPHPRONyes                      0.01820    0.07869  0.2313
SUBJORDERsemgram                 0.26155    0.07833  3.3390
NUMBERGRAMsingular              -0.28824    0.12844 -2.2441
PERSONGRAMsecond                 0.12173    0.10676  1.1402
PERSONGRAMthird                  0.41046    0.16241  2.5273
AGREEMENTyes:NEGATIONnegative    0.06444    0.10980  0.5869
AGREEMENTyes:EMPHPRONyes        -0.14410    0.11020 -1.3077
AGREEMENTyes:SUBJORDERsemgram    0.39130    0.11081  3.5311
AGREEMENTyes:NUMBERGRAMsingular  0.24572    0.18523  1.3266
AGREEMENTyes:PERSONGRAMsecond   -0.05795    0.15453 -0.3750
AGREEMENTyes:PERSONGRAMthird    -0.36910    0.23059 -1.6007

Intercepts:
      Value   Std. Error t value
-3|-2 -1.2230  0.1653    -7.4002
-2|-1 -0.4605  0.1635    -2.8162
-1|0   0.0101  0.1632     0.0617
0|1    0.3675  0.1634     2.2493
1|2    0.7353  0.1637     4.4917
2|3    1.6316  0.1652     9.8743

Residual Deviance: 15099.88
AIC: 15137.88 

Let’s also compute the t-values for these coefficient values (however uninterpretable those are):

(m.01.tscores <- summary(m.01)$coefficients[,"t value"])
                   AGREEMENTyes                NEGATIONnegative
                     4.37198298                     -1.01767960
                    EMPHPRONyes                SUBJORDERsemgram
                     0.23126312                      3.33897026
             NUMBERGRAMsingular                PERSONGRAMsecond
                    -2.24412620                      1.14023537
                PERSONGRAMthird   AGREEMENTyes:NEGATIONnegative
                     2.52729216                      0.58690684
       AGREEMENTyes:EMPHPRONyes   AGREEMENTyes:SUBJORDERsemgram
                    -1.30766349                      3.53108615
AGREEMENTyes:NUMBERGRAMsingular   AGREEMENTyes:PERSONGRAMsecond
                     1.32657549                     -0.37498824
   AGREEMENTyes:PERSONGRAMthird                           -3|-2
                    -1.60067926                     -7.40016289
                          -2|-1                            -1|0
                    -2.81623304                      0.06170461
                            0|1                             1|2
                     2.24927021                      4.49166614
                            2|3
                     9.87427499 

And from that we compute the p-values:

(m.01.pvalues <- 2*pnorm(abs(m.01.tscores), lower.tail=FALSE))
                   AGREEMENTyes                NEGATIONnegative
                   1.231231e-05                    3.088302e-01
                    EMPHPRONyes                SUBJORDERsemgram
                   8.171104e-01                    8.408957e-04
             NUMBERGRAMsingular                PERSONGRAMsecond
                   2.482428e-02                    2.541883e-01
                PERSONGRAMthird   AGREEMENTyes:NEGATIONnegative
                   1.149458e-02                    5.572663e-01
       AGREEMENTyes:EMPHPRONyes   AGREEMENTyes:SUBJORDERsemgram
                   1.909875e-01                    4.138569e-04
AGREEMENTyes:NUMBERGRAMsingular   AGREEMENTyes:PERSONGRAMsecond
                   1.846491e-01                    7.076692e-01
   AGREEMENTyes:PERSONGRAMthird                           -3|-2
                   1.094480e-01                    1.360175e-13
                          -2|-1                            -1|0
                   4.859041e-03                    9.507981e-01
                            0|1                             1|2
                   2.449531e-02                    7.066814e-06
                            2|3
                   5.382075e-23 

Also, let’s quickly check the model’s overall significance value and see which predictors may get dropped:

anova(update(m.01, ~ 1), # compare the null model generated here ad hoc
   m.01, test="Chisq")   # to the model w/ the predictor using a LRT
Likelihood ratio tests of ordinal regression models

Response: JUDGMENT
                                                                        Model
1                                                                           1
2 1 + AGREEMENT * (NEGATION + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM)
  Resid. df Resid. Dev   Test    Df LR stat. Pr(Chi)
1      4294   15669.11
2      4281   15099.88 1 vs 2    13 569.2355       0
drop1(m.01,      # drop each droppable predictor at a time from m.01 &
   test="Chisq") # test its significance w/ a LRT
Single term deletions

Model:
JUDGMENT ~ 1 + AGREEMENT * (NEGATION + EMPHPRON + SUBJORDER +
    NUMBERGRAM + PERSONGRAM)
                     Df   AIC     LRT Pr(>Chi)
<none>                  15138
AGREEMENT:NEGATION    1 15136  0.3444  0.55727
AGREEMENT:EMPHPRON    1 15138  1.7102  0.19095
AGREEMENT:SUBJORDER   1 15148 12.4862  0.00041 ***
AGREEMENT:NUMBERGRAM  1 15138  1.7590  0.18474
AGREEMENT:PERSONGRAM  2 15137  3.4406  0.17901
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s fit the 2nd model with the interaction AGREEMENT:NEGATION deleted:

summary(m.02 <- update(   # make m.02 an update of
   m.01, .~.              # m.01, namely all of it (.~.), but then
   - AGREEMENT:NEGATION)) # remove this interaction
Call:
polr(formula = JUDGMENT ~ AGREEMENT + NEGATION + EMPHPRON + SUBJORDER +
    NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER +
    AGREEMENT:NUMBERGRAM + AGREEMENT:PERSONGRAM, data = d, na.action = na.exclude,
    Hess = TRUE)

Coefficients:
                                   Value Std. Error t value
AGREEMENTyes                     1.03982    0.22386  4.6450
NEGATIONnegative                -0.04691    0.05489 -0.8545
EMPHPRONyes                      0.01919    0.07868  0.2439
SUBJORDERsemgram                 0.26067    0.07832  3.3283
NUMBERGRAMsingular              -0.29084    0.12836 -2.2658
PERSONGRAMsecond                 0.11651    0.10639  1.0951
PERSONGRAMthird                  0.40454    0.16209  2.4958
AGREEMENTyes:EMPHPRONyes        -0.14501    0.11019 -1.3160
AGREEMENTyes:SUBJORDERsemgram    0.39248    0.11080  3.5423
AGREEMENTyes:NUMBERGRAMsingular  0.24393    0.18521  1.3171
AGREEMENTyes:PERSONGRAMsecond   -0.05529    0.15446 -0.3579
AGREEMENTyes:PERSONGRAMthird    -0.36913    0.23059 -1.6008

Intercepts:
      Value   Std. Error t value
-3|-2 -1.2119  0.1641    -7.3827
-2|-1 -0.4494  0.1624    -2.7674
-1|0   0.0210  0.1622     0.1295
0|1    0.3783  0.1623     2.3310
1|2    0.7462  0.1627     4.5874
2|3    1.6425  0.1642    10.0033

Residual Deviance: 15100.22
AIC: 15136.22 
anova(m.01, m.02, # compare m.01 to m.02
   test="Chisq")  # using a LRT
Likelihood ratio tests of ordinal regression models

Response: JUDGMENT
                                                                                                                                                           Model
1 AGREEMENT + NEGATION + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM + AGREEMENT:PERSONGRAM
2                                                                                    1 + AGREEMENT * (NEGATION + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM)
  Resid. df Resid. Dev   Test    Df  LR stat.   Pr(Chi)
1      4282   15100.22
2      4281   15099.88 1 vs 2     1 0.3444451 0.5572746
drop1(m.02,      # drop each droppable predictor at a time from m.02 &
   test="Chisq") # test its significance w/ a LRT
Single term deletions

Model:
JUDGMENT ~ AGREEMENT + NEGATION + EMPHPRON + SUBJORDER + NUMBERGRAM +
    PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM +
    AGREEMENT:PERSONGRAM
                     Df   AIC     LRT  Pr(>Chi)
<none>                  15136
NEGATION              1 15135  0.7302 0.3928118
AGREEMENT:EMPHPRON    1 15136  1.7323 0.1881216
AGREEMENT:SUBJORDER   1 15147 12.5658 0.0003929 ***
AGREEMENT:NUMBERGRAM  1 15136  1.7338 0.1879251
AGREEMENT:PERSONGRAM  2 15136  3.4864 0.1749558
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s fit the 3rd model with the main effect NEGATION deleted:

summary(m.03 <- update( # make m.03 an update of
   m.02, .~.            # m.02, namely all of it (.~.), but then
   - NEGATION))         # remove this predictor
Call:
polr(formula = JUDGMENT ~ AGREEMENT + EMPHPRON + SUBJORDER +
    NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER +
    AGREEMENT:NUMBERGRAM + AGREEMENT:PERSONGRAM, data = d, na.action = na.exclude,
    Hess = TRUE)

Coefficients:
                                   Value Std. Error t value
AGREEMENTyes                     1.02402    0.22306  4.5908
EMPHPRONyes                      0.02060    0.07866  0.2618
SUBJORDERsemgram                 0.25944    0.07831  3.3130
NUMBERGRAMsingular              -0.29455    0.12828 -2.2962
PERSONGRAMsecond                 0.10906    0.10604  1.0285
PERSONGRAMthird                  0.39612    0.16178  2.4485
AGREEMENTyes:EMPHPRONyes        -0.14650    0.11018 -1.3297
AGREEMENTyes:SUBJORDERsemgram    0.39322    0.11080  3.5490
AGREEMENTyes:NUMBERGRAMsingular  0.25412    0.18481  1.3751
AGREEMENTyes:PERSONGRAMsecond   -0.04408    0.15391 -0.2864
AGREEMENTyes:PERSONGRAMthird    -0.35195    0.22969 -1.5323

Intercepts:
      Value   Std. Error t value
-3|-2 -1.1959  0.1631    -7.3332
-2|-1 -0.4336  0.1613    -2.6873
-1|0   0.0368  0.1611     0.2283
0|1    0.3940  0.1612     2.4437
1|2    0.7619  0.1616     4.7141
2|3    1.6581  0.1632    10.1612

Residual Deviance: 15100.95
AIC: 15134.95 
anova(m.02, m.03, # compare m.02 to m.03
   test="Chisq")  # using a LRT
Likelihood ratio tests of ordinal regression models

Response: JUDGMENT
                                                                                                                                                           Model
1            AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM + AGREEMENT:PERSONGRAM
2 AGREEMENT + NEGATION + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM + AGREEMENT:PERSONGRAM
  Resid. df Resid. Dev   Test    Df  LR stat.   Pr(Chi)
1      4283   15100.95
2      4282   15100.22 1 vs 2     1 0.7302203 0.3928118
drop1(m.03,      # drop each droppable predictor at a time from m.03 &
   test="Chisq") # test its significance w/ a LRT
Single term deletions

Model:
JUDGMENT ~ AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM +
    AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM +
    AGREEMENT:PERSONGRAM
                     Df   AIC     LRT Pr(>Chi)
<none>                  15135
AGREEMENT:EMPHPRON    1 15135  1.7682 0.183607
AGREEMENT:SUBJORDER   1 15146 12.6131 0.000383 ***
AGREEMENT:NUMBERGRAM  1 15135  1.8905 0.169147
AGREEMENT:PERSONGRAM  2 15134  3.3222 0.189929
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s fit the 4th model with the interaction AGREEMENT:PERSONGRAM deleted:

summary(m.04 <- update( # make m.04 an update of
   m.03, .~.            # m.03, namely all of it (.~.), but then
   - AGREEMENT:PERSONGRAM)) # remove this interaction
Call:
polr(formula = JUDGMENT ~ AGREEMENT + EMPHPRON + SUBJORDER +
    NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER +
    AGREEMENT:NUMBERGRAM, data = d, na.action = na.exclude, Hess = TRUE)

Coefficients:
                                   Value Std. Error t value
AGREEMENTyes                     0.76959    0.11295  6.8139
EMPHPRONyes                      0.01133    0.07850  0.1443
SUBJORDERsemgram                 0.26537    0.07813  3.3965
NUMBERGRAMsingular              -0.41025    0.10669 -3.8454
PERSONGRAMsecond                 0.08777    0.07689  1.1415
PERSONGRAMthird                  0.21723    0.11466  1.8946
AGREEMENTyes:EMPHPRONyes        -0.13700    0.11004 -1.2449
AGREEMENTyes:SUBJORDERsemgram    0.38909    0.11063  3.5171
AGREEMENTyes:NUMBERGRAMsingular  0.48785    0.11089  4.3995

Intercepts:
      Value   Std. Error t value
-3|-2 -1.3213  0.1329    -9.9452
-2|-1 -0.5594  0.1305    -4.2860
-1|0  -0.0895  0.1301    -0.6880
0|1    0.2674  0.1301     2.0556
1|2    0.6350  0.1304     4.8681
2|3    1.5310  0.1323    11.5759

Residual Deviance: 15104.28
AIC: 15134.28 
anova(m.03, m.04, # compare m.03 to m.04
   test="Chisq")  # using a LRT
Likelihood ratio tests of ordinal regression models

Response: JUDGMENT
                                                                                                                                                Model
1                        AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
2 AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM + AGREEMENT:PERSONGRAM
  Resid. df Resid. Dev   Test    Df LR stat.   Pr(Chi)
1      4285   15104.28
2      4283   15100.95 1 vs 2     2 3.322208 0.1899291
drop1(m.04,      # drop each droppable predictor at a time from m.04 &
   test="Chisq") # test its significance w/ a LRT
Single term deletions

Model:
JUDGMENT ~ AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM +
    AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
                     Df   AIC     LRT  Pr(>Chi)
<none>                  15134
PERSONGRAM            2 15134  3.6143 0.1641241
AGREEMENT:EMPHPRON    1 15134  1.5498 0.2131632
AGREEMENT:SUBJORDER   1 15145 12.3871 0.0004323 ***
AGREEMENT:NUMBERGRAM  1 15152 19.3703 1.077e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s fit the 5th model with the interaction AGREEMENT:EMPHPRON deleted:

summary(m.05 <- update(   # make m.05 an update of
   m.04, .~.              # m.04, namely all of it (.~.), but then
   - AGREEMENT:EMPHPRON)) # remove this interaction
Call:
polr(formula = JUDGMENT ~ AGREEMENT + EMPHPRON + SUBJORDER +
    NUMBERGRAM + PERSONGRAM + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM,
    data = d, na.action = na.exclude, Hess = TRUE)

Coefficients:
                                   Value Std. Error t value
AGREEMENTyes                     0.69713    0.09674   7.206
EMPHPRONyes                     -0.05837    0.05502  -1.061
SUBJORDERsemgram                 0.26392    0.07812   3.378
NUMBERGRAMsingular              -0.42288    0.10623  -3.981
PERSONGRAMsecond                 0.08768    0.07688   1.141
PERSONGRAMthird                  0.21233    0.11460   1.853
AGREEMENTyes:SUBJORDERsemgram    0.39157    0.11061   3.540
AGREEMENTyes:NUMBERGRAMsingular  0.49044    0.11087   4.424

Intercepts:
      Value    Std. Error t value
-3|-2  -1.3652   0.1282   -10.6526
-2|-1  -0.6033   0.1257    -4.7990
-1|0   -0.1334   0.1252    -1.0654
0|1     0.2235   0.1252     1.7846
1|2     0.5910   0.1256     4.7069
2|3     1.4865   0.1273    11.6748

Residual Deviance: 15105.83
AIC: 15133.83 
anova(m.04, m.05, # compare m.04 to m.05
   test="Chisq")  # using a LRT
Likelihood ratio tests of ordinal regression models

Response: JUDGMENT
                                                                                                                         Model
1                      AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
2 AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
  Resid. df Resid. Dev   Test    Df LR stat.   Pr(Chi)
1      4286   15105.83
2      4285   15104.28 1 vs 2     1 1.549812 0.2131632
drop1(m.05,      # drop each droppable predictor at a time from m.05 &
   test="Chisq") # test its significance w/ a LRT
Single term deletions

Model:
JUDGMENT ~ AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM +
    AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
                     Df   AIC     LRT  Pr(>Chi)
<none>                  15134
EMPHPRON              1 15133  1.1264 0.2885398
PERSONGRAM            2 15133  3.4489 0.1782729
AGREEMENT:SUBJORDER   1 15144 12.5500 0.0003962 ***
AGREEMENT:NUMBERGRAM  1 15151 19.5864 9.615e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s fit the 6th model with the main effect EMPHPRON deleted:

summary(m.06 <- update( # make m.06 an update of
   m.05, .~.            # m.05, namely all of it (.~.), but then
   - EMPHPRON))         # remove this predictor
Call:
polr(formula = JUDGMENT ~ AGREEMENT + SUBJORDER + NUMBERGRAM +
    PERSONGRAM + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM,
    data = d, na.action = na.exclude, Hess = TRUE)

Coefficients:
                                   Value Std. Error t value
AGREEMENTyes                     0.70368    0.09656   7.288
SUBJORDERsemgram                 0.26511    0.07811   3.394
NUMBERGRAMsingular              -0.41126    0.10563  -3.893
PERSONGRAMsecond                 0.08941    0.07685   1.163
PERSONGRAMthird                  0.21791    0.11446   1.904
AGREEMENTyes:SUBJORDERsemgram    0.39147    0.11060   3.539
AGREEMENTyes:NUMBERGRAMsingular  0.47677    0.11012   4.330

Intercepts:
      Value    Std. Error t value
-3|-2  -1.3264   0.1228   -10.8030
-2|-1  -0.5646   0.1203    -4.6947
-1|0   -0.0948   0.1198    -0.7917
0|1     0.2619   0.1198     2.1853
1|2     0.6293   0.1202     5.2342
2|3     1.5247   0.1222    12.4814

Residual Deviance: 15106.95
AIC: 15132.95 
anova(m.05, m.06, # compare m.05 to m.06
   test="Chisq")  # using a LRT
Likelihood ratio tests of ordinal regression models

Response: JUDGMENT
                                                                                                    Model
1            AGREEMENT + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
2 AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
  Resid. df Resid. Dev   Test    Df LR stat.   Pr(Chi)
1      4287   15106.95
2      4286   15105.83 1 vs 2     1 1.126422 0.2885398
drop1(m.06,      # drop each droppable predictor at a time from m.06 &
   test="Chisq") # test its significance w/ a LRT
Single term deletions

Model:
JUDGMENT ~ AGREEMENT + SUBJORDER + NUMBERGRAM + PERSONGRAM +
    AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
                     Df   AIC     LRT  Pr(>Chi)
<none>                  15133
PERSONGRAM            2 15133  3.6433 0.1617624
AGREEMENT:SUBJORDER   1 15144 12.5430 0.0003977 ***
AGREEMENT:NUMBERGRAM  1 15150 18.7609 1.482e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s fit the 7th model with the predictor PERSONGRAM deleted:

summary(m.07 <- update( # make m.07 an update of
   m.06, .~.            # m.06, namely all of it (.~.), but then
   - PERSONGRAM))       # remove this predictor
Call:
polr(formula = JUDGMENT ~ AGREEMENT + SUBJORDER + NUMBERGRAM +
    AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM, data = d, na.action = na.exclude,
    Hess = TRUE)

Coefficients:
                                  Value Std. Error t value
AGREEMENTyes                     0.7064    0.09623   7.341
SUBJORDERsemgram                 0.2666    0.07795   3.421
NUMBERGRAMsingular              -0.5460    0.07867  -6.941
AGREEMENTyes:SUBJORDERsemgram    0.3958    0.10971   3.608
AGREEMENTyes:NUMBERGRAMsingular  0.4667    0.11000   4.243

Intercepts:
      Value    Std. Error t value
-3|-2  -1.4980   0.0786   -19.0583
-2|-1  -0.7364   0.0743    -9.9046
-1|0   -0.2670   0.0732    -3.6463
0|1     0.0893   0.0730     1.2232
1|2     0.4562   0.0733     6.2265
2|3     1.3510   0.0760    17.7800

Residual Deviance: 15110.60
AIC: 15132.60 
anova(m.06, m.07, # compare m.06 to m.07
   test="Chisq")  # using a LRT
Likelihood ratio tests of ordinal regression models

Response: JUDGMENT
                                                                                         Model
1              AGREEMENT + SUBJORDER + NUMBERGRAM + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
2 AGREEMENT + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
  Resid. df Resid. Dev   Test    Df LR stat.   Pr(Chi)
1      4289   15110.60
2      4287   15106.95 1 vs 2     2 3.643254 0.1617624
drop1(m.07,      # drop each droppable predictor at a time from m.07 &
   test="Chisq") # test its significance w/ a LRT
Single term deletions

Model:
JUDGMENT ~ AGREEMENT + SUBJORDER + NUMBERGRAM + AGREEMENT:SUBJORDER +
    AGREEMENT:NUMBERGRAM
                     Df   AIC    LRT  Pr(>Chi)
<none>                  15133
AGREEMENT:SUBJORDER   1 15144 13.034 0.0003059 ***
AGREEMENT:NUMBERGRAM  1 15149 18.017  2.19e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

No, we can’t delete any more predictors, we’re done with model selection:

summary(m.final <- m.07); rm(m.07); invisible(gc())
Call:
polr(formula = JUDGMENT ~ AGREEMENT + SUBJORDER + NUMBERGRAM +
    AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM, data = d, na.action = na.exclude,
    Hess = TRUE)

Coefficients:
                                  Value Std. Error t value
AGREEMENTyes                     0.7064    0.09623   7.341
SUBJORDERsemgram                 0.2666    0.07795   3.421
NUMBERGRAMsingular              -0.5460    0.07867  -6.941
AGREEMENTyes:SUBJORDERsemgram    0.3958    0.10971   3.608
AGREEMENTyes:NUMBERGRAMsingular  0.4667    0.11000   4.243

Intercepts:
      Value    Std. Error t value
-3|-2  -1.4980   0.0786   -19.0583
-2|-1  -0.7364   0.0743    -9.9046
-1|0   -0.2670   0.0732    -3.6463
0|1     0.0893   0.0730     1.2232
1|2     0.4562   0.0733     6.2265
2|3     1.3510   0.0760    17.7800

Residual Deviance: 15110.60
AIC: 15132.60 
(m.final.tscores <- summary(m.final)$coefficients[,"t value"])
                   AGREEMENTyes                SUBJORDERsemgram
                       7.341353                        3.420638
             NUMBERGRAMsingular   AGREEMENTyes:SUBJORDERsemgram
                      -6.941128                        3.607670
AGREEMENTyes:NUMBERGRAMsingular                           -3|-2
                       4.242926                      -19.058294
                          -2|-1                            -1|0
                      -9.904564                       -3.646271
                            0|1                             1|2
                       1.223185                        6.226538
                            2|3
                      17.779967 
(m.final.pvalues <- 2*pnorm(abs(m.final.tscores), lower.tail=FALSE))
                   AGREEMENTyes                SUBJORDERsemgram
                   2.114448e-13                    6.247452e-04
             NUMBERGRAMsingular   AGREEMENTyes:SUBJORDERsemgram
                   3.889817e-12                    3.089594e-04
AGREEMENTyes:NUMBERGRAMsingular                           -3|-2
                   2.206240e-05                    5.607383e-81
                          -2|-1                            -1|0
                   3.977021e-23                    2.660737e-04
                            0|1                             1|2
                   2.212599e-01                    4.768550e-10
                            2|3
                   1.010396e-70 

Let’s also compute the overall p-value for this model:

anova(m.00, m.final, # compare the null model to m.final
   test="Chisq")     # using a LRT
Likelihood ratio tests of ordinal regression models

Response: JUDGMENT
                                                                            Model
1                                                                               1
2 AGREEMENT + SUBJORDER + NUMBERGRAM + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
  Resid. df Resid. Dev   Test    Df LR stat. Pr(Chi)
1      4294   15669.11
2      4289   15110.60 1 vs 2     5 558.5191       0
pchisq(              # compute the area under the chi-squared curve
   q=558.5191,       # for this chi-squared value
   df=5,             # at 5 df
   lower.tail=FALSE) # only using the right/upper tail/side
[1] 1.848523e-118

On to model quality: We make ‘predictions’ …

d <- cbind(d,       # add to d the columns resulting from
   predict(         # predicting
      m.final,      # from m.final
      type="prob")) # predicted probabilities
# give those columns better names
names(d)[9:15] <- paste0( # rename those columns to the result of pasting
   "PREDS.PP.",           # "PREDS.PP."
   names(d)[9:15])        # in front of the existing names
d <- cbind(d,         # add to d a column
   PREDS.CAT=predict( # called PREDS.CAT, the predictions
      m.final,        # from m.final, namely
      type="class"))  # categorical predictions
head(d) # check the result
  CASE JUDGMENT AGREEMENT    NEGATION EMPHPRON SUBJORDER NUMBERGRAM PERSONGRAM
1    1        3        no affirmative      yes   gramsem     plural      third
2    2       -3        no affirmative      yes   gramsem     plural      third
3    3        3        no affirmative      yes   gramsem     plural      third
4    4        3        no affirmative      yes   gramsem     plural      third
5    5       -3        no affirmative      yes   gramsem     plural      third
6    6        2        no affirmative      yes   gramsem     plural      third
  PREDS.PP.-3 PREDS.PP.-2 PREDS.PP.-1 PREDS.PP.0 PREDS.PP.1 PREDS.PP.2 PREDS.PP.3 PREDS.CAT
1   0.1827313   0.1410666    0.109845 0.08866082 0.08981659  0.1821742  0.2057054         3
2   0.1827313   0.1410666    0.109845 0.08866082 0.08981659  0.1821742  0.2057054         3
3   0.1827313   0.1410666    0.109845 0.08866082 0.08981659  0.1821742  0.2057054         3
4   0.1827313   0.1410666    0.109845 0.08866082 0.08981659  0.1821742  0.2057054         3
5   0.1827313   0.1410666    0.109845 0.08866082 0.08981659  0.1821742  0.2057054         3
6   0.1827313   0.1410666    0.109845 0.08866082 0.08981659  0.1821742  0.2057054         3

… and then we evaluate them (again, confusion matrices here are not as good as for the previous kinds of models):

(c.m <- table(           # confusion matrix: cross-tabulate
   "OBS"  =d$JUDGMENT,   # observed judgments in the rows
   "PREDS"=d$PREDS.CAT)) # predicted judgments in the columns
    PREDS
OBS    -3   -2   -1    0    1    2    3
  -3  301    0    0    0    0    0  319
  -2  191    0    0    0    0    0  289
  -1  136    0    0    0    0    0  251
  0    88    0    0    0    0    0  241
  1    93    0    0    0    0    0  262
  2   160    0    0    0    0    0  666
  3   182    0    0    0    0    0 1121
c(tapply(as.character(d$JUDGMENT)==d$PREDS.CAT, d$JUDGMENT, mean), # accuracies for ...
  "overall"=mean(as.character(d$JUDGMENT)==d$PREDS.CAT)) # ... each level & overall
       -3        -2        -1         0         1         2         3   overall
0.4854839 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.8603223 0.3306977 

That means, we’re getting 301+1121=1422 out of 4300 right, which is the accuracy of 0.3306977. But remember, this accuracy is not sensitive to how far off a prediction was: if the actual value is 2, then each of the 666 predictions of 3 is wrong but still much better than each of the 160 predictions of -3. Thus, let’s now see how our final model compares to our simulated baseline:

sum.of.preddiffs.from.obs <- "-"(
   as.numeric(d$PREDS.CAT),
   as.numeric(d$JUDGMENT)) %>% abs %>% sum
op <- par(mar=c(4,4,1,1))
sim.eval.plot.1()
abline(v=sum.of.preddiffs.from.obs, lty=2, col="green")

par(op)

In terms of significance, it does really well: the observed sum of absolute differences is so small it’s never observed in the random reshuffles:

sum(collector <= sum.of.preddiffs.from.obs)
[1] 0

At the same, the effect size is again not great. Our simulated baseline meant that, on average, we’d misestimate the right judgment by 2.49 points on our 7-point scale and now, with all our modeling efforts, we’re still off by an average of 2.16 steps, not exactly great:

median(collector) / 4300
[1] 2.49
sum.of.preddiffs.from.obs / 4300
[1] 2.155116

Which also means that our improvement is quite small, because it’s only a 13.45% improvement in rank steps away from the right judgment:

(2.49 - 2.155116) / 2.49
[1] 0.1344916

Plus, we need the model’s R2-values and compute both McFadden’s R2 and the more widely-used version of Nagelkerke’s R2:

R2s(m.final)
  McFadden R-squared Nagelkerke R-squared
          0.03564459           0.12507688 

Of course, if you paid attention to the book and worked through everything there, much of this you could have done with this:

summarize.polr(m.final)
$Coefficients
                                   Estimate  Std.Error        2.5%      97.5%          z     p2tailed
-3|-2                           -1.49795116 0.07859839 -1.65200117 -1.3439012 -19.058294 5.607383e-81
-2|-1                           -0.73637259 0.07434680 -0.88208964 -0.5906555  -9.904564 3.977021e-23
-1|0                            -0.26700342 0.07322644 -0.41052461 -0.1234822  -3.646271 2.660737e-04
0|1                              0.08927411 0.07298498 -0.05377382  0.2323221   1.223185 2.212599e-01
1|2                              0.45623362 0.07327244  0.31262227  0.5998450   6.226538 4.768550e-10
2|3                              1.35100907 0.07598490  1.20208140  1.4999367  17.779967 1.010396e-70
AGREEMENTyes                     0.70643359 0.09622662  0.51783289  0.8950343   7.341353 2.114448e-13
SUBJORDERsemgram                 0.26664660 0.07795231  0.11386288  0.4194303   3.420638 6.247452e-04
NUMBERGRAMsingular              -0.54604077 0.07866744 -0.70022612 -0.3918554  -6.941128 3.889817e-12
AGREEMENTyes:SUBJORDERsemgram    0.39580665 0.10971255  0.18077401  0.6108393   3.607670 3.089594e-04
AGREEMENTyes:NUMBERGRAMsingular  0.46670731 0.10999657  0.25111801  0.6822966   4.242926 2.206240e-05

$`Confusion matrix`
    PRED
OBS    -3   -2   -1    0    1    2    3
  -3  301    0    0    0    0    0  319
  -2  191    0    0    0    0    0  289
  -1  136    0    0    0    0    0  251
  0    88    0    0    0    0    0  241
  1    93    0    0    0    0    0  262
  2   160    0    0    0    0    0  666
  3   182    0    0    0    0    0 1121

$`Classification metrics`
        Baseline      Acc. for -3      Acc. for -2      Acc. for -1
       0.3030233        0.4854839        0.0000000        0.0000000
      Acc. for 0       Acc. for 1       Acc. for 2       Acc. for 3
       0.0000000        0.0000000        0.0000000        0.8603223
Acc. for overall          logloss
       0.3306977        1.7570460

$R2s
Deviance of null model Deviance of this model            McFadden R2         Nagelkerke R2
          1.566911e+04           1.511060e+04           3.564459e-02          1.250769e-01

$`Model significance test`
         LR statistic                    Df               P-value  Number of data points
         5.585191e+02          5.000000e+00         1.848534e-118           4.300000e+03

5 Visual interpretation

Now let’s visualize the results of the two interactions based on the predictions; we begin with AGREEMENT:SUBJORDER:

agsu.d <- data.frame(        # make agsu.d a data frame of
   agsu <- effect(           # the effect
      "AGREEMENT:SUBJORDER", # of AGREEMENT:SUBJORDER
      m.final))              # in m.final
agsu.d[,1:9] # show just the predicted probabilities
  AGREEMENT SUBJORDER   prob.X.3   prob.X.2   prob.X.1    prob.X0    prob.X1   prob.X2   prob.X3
1        no   gramsem 0.23067814 0.16037026 0.11556664 0.08791856 0.08457945 0.1590298 0.1618572
2       yes   gramsem 0.10323854 0.09455074 0.08497318 0.07743151 0.08810081 0.2170518 0.3346534
3        no   semgram 0.18677053 0.14292687 0.11054340 0.08874982 0.08947773 0.1801735 0.2013581
4       yes   semgram 0.05603033 0.05675284 0.05614353 0.05603617 0.07028809 0.2109386 0.4938105
plot(agsu,                      # plot the effect of AGREEMENT:SUBJORDER
   type="probability",          # but plot predicted probabilities
   ylim=c(0, 1),                # w/ these y-axis limits
   grid=TRUE,                   # & w/ a grid
   lines=list(multiline=TRUE),  # make all lines be in one panel
   confint=list(style="auto"),  # make the lines come with confidence bands
   lattice=list(key.args=list(  # make the legend ...
      columns=5,                # ... have 5 columns &
      cex.title=0)))            # no title

plot(agsu,                      # plot the effect of AGREEMENT:SUBJORDER
   type="probability",          # but plot predicted probabilities
   ylim=c(0, 1),                # w/ these y-axis limits
   x.var="SUBJORDER",           # w/ this variable on the x-axis
   grid=TRUE,                   # & w/ a grid
   lines=list(multiline=TRUE),  # make all lines be in one panel
   confint=list(style="auto"),  # make the lines come with confidence bands
   lattice=list(key.args=list(  # make the legend ...
      columns=5,                # ... have 5 columns &
      cex.title=0)))            # no title

Then, the other interaction, AGREEMENT:NUMBERGRAM:

agnu.d <- data.frame(         # make agnu.d a data frame of
   agnu <- effect(            # the effect
      "AGREEMENT:NUMBERGRAM", # of AGREEMENT:NUMBERGRAM
      m.final))               # in m.final
agnu.d[,1:9] # show just the predicted probabilities
  AGREEMENT NUMBERGRAM   prob.X.3   prob.X.2   prob.X.1    prob.X0    prob.X1   prob.X2   prob.X3
1        no     plural 0.16385116 0.13176333 0.10596206 0.08776934 0.09103486 0.1915265 0.2280928
2       yes     plural 0.07364045 0.07184102 0.06849506 0.06594663 0.07949663 0.2191469 0.4214333
3        no   singular 0.25278617 0.16734648 0.11658620 0.08654335 0.08156201 0.1490296 0.1461462
4       yes   singular 0.07923881 0.07638507 0.07199828 0.06856743 0.08169195 0.2198990 0.4022195
plot(agnu,                      # plot the effect of AGREEMENT:NUMBERGRAM
   type="probability",          # but plot predicted probabilities
   ylim=c(0, 1),                # w/ these y-axis limits
   grid=TRUE,                   # & w/ a grid
   lines=list(multiline=TRUE),  # make all lines be in one panel
   confint=list(style="auto"),  # make the lines come with confidence bands
   lattice=list(key.args=list(  # make the legend ...
      columns=5,                # ... have 5 columns &
      cex.title=0)))            # no title

plot(agnu,                      # plot the effect of AGREEMENT:NUMBERGRAM
   type="probability",          # but plot predicted probabilities
   ylim=c(0, 1),                # w/ these y-axis limits
   x.var="NUMBERGRAM",          # w/ this variable on the x-axis
   grid=TRUE,                   # & w/ a grid
   lines=list(multiline=TRUE),  # make all lines be in one panel
   confint=list(style="auto"),  # make the lines come with confidence bands
   lattice=list(key.args=list(  # make the legend ...
      columns=5,                # ... have 5 columns &
      cex.title=0)))            # no title

6 Write-up

To determine whether the rating of a Spanish construction involving the semantic role of an experiencer is predictable from

  • AGREEMENT, NEGATION, EMPHPRON, SUBJORDER, NUMBERGRAM, and PERSONGRAM;
  • any interaction of AGREEMENT with any of the other predictors,

an ordinal model selection process was undertaken (using backwards stepwise model selection of all predictors and controls based on significance testing). The final model resulting from the elimination of ns predictors contained the interactions AGREEMENT:SUBJORDER and AGREEMENT:NUMBERGRAM and was highly significant (LR=558.52, df=5, p<0.0001) with a very low amount of explanatory/predictive power (McFadden’s R2=0.036, Nagelkerke’s R2=0.125); the overall accuracy of the model (in terms of making the one exactly correct prediction for each case) is 33.07%, but the model really only predicts the highest rating (86% accuracy/recall) and the lowest rating (48.5% accuracy/recall). The summary table for the model is shown below:

Estimate 95%-CI se z p2-tailed
Intercept-3→-2 -1.50 [-1.65, -1.34] 0.08 -19.06 <0.0001
Intercept-2→-1 -0.74 [-0.88, -0.59] 0.07 -9.90 <0.0001
Intercept-1→0 -0.27 [-0.41, -0.12] 0.07 -3.65 0.0003
Intercept0→1 0.09 [-0.05, 0.23] 0.07 1.22 0.2213
Intercept1→2 0.46 [0.31, 0.60] 0.07 6.23 <0.0001
Intercept2→3 1.35 [1.20, 1.50] 0.08 17.78 <0.0001
AGREEMENTno→yes 0.71 [0.52, 0.90] 0.10 7.34 <0.0001
SUBJORDERgramsem→semgram 0.27 [0.11, 0.42] 0.08 3.42 0.0006
NUMBERGRAMplural→sing -0.55 [-0.70, -0.39] 0.08 -6.94 <0.0001
AGREEMENTno→yes:SUBJORDERgramsem→semgram 0.40 [0.18, 0.61] 0.11 3.61 0.0003
AGREEMENTno→yes:NUMBERGRAMplural→sing 0.47 [0.25, 0.68] 0.11 4.24 <0.0001

The way the two interactions in the model are correlated with JUDGMENT are as follows [show plots]:

  • with regard to AGREEMENT:SUBJORDER,
    • when AGREEMENT is no, all levels of JUDGMENT have relatively low predicted probabilities, but
      • when SUBJORDER is gramsem, JUDGMENT is predicted to be -3 (with a predicted probability of 0.231);
      • when SUBJORDER is semgram, JUDGMENT is predicted to be +3 (with a predicted probability of 0.201);
    • when AGREEMENT is yes, most levels of JUDGMENT have relatively low predicted probabilities, but
      • the model predicts 3, followed by 2 regardless of SUBJORDER, but
      • the predicted probability of JUDGMENT being 3 is much higher when SUBJORDER is semgram than when it is gramsem;
    • AGREEMENT being yes strongly boosts the predicted probability of JUDGMENT being 3;
  • with regard to AGREEMENT:NUMBERGRAM,
    • when AGREEMENT is no, all levels of JUDGMENT have relatively low predicted probabilities, but
      • when NUMBERGRAM is plural, JUDGMENT is predicted to be 3 (with a predicted probability of 0.228);
      • when NUMBERGRAM is singular, JUDGMENT is predicted to be -3 (with a predicted probability of 0.253);
    • when AGREEMENT is yes, most levels of JUDGMENT have relatively low predicted probabilities, but
      • the model predicts 3, followed by 2 regardless of NUMBERGRAM, but
      • AGREEMENT being yes strongly boosts the predicted probability of JUDGMENT being 3.

7 Excursus: could we have run a linear model?

Many people run linear models in cases where, strictly speaking, one should use ordinal models (because, like here, the response variable is an ordinal ranking), and they do that most likely because (i) like me, they hate ordinal models for their complexity and/or (ii) they don’t know ordinal models (well (enough)). Thus, the question might arise whether we could have done the same here and fit a much more straightforward linear model – let’s find out in two steps: in this session, we go through the model selection process; in the next session, we look at model diagnostics.

Let’s do some linear modeling:

d$JUDGMENT <- as.numeric(as.character(d$JUDGMENT)) # or as.numeric(d$JUDGMENT)-4
summary(m.01.lin <- lm(   # summarize m.01.lin, which models
   JUDGMENT ~ 1 +         # JUDGMENT ~ an overall intercept (1)
   # AGREEMENT interacting w/ all other predictors
   AGREEMENT * (NEGATION+EMPHPRON+SUBJORDER+NUMBERGRAM+PERSONGRAM),
   data=d,                # those vars are in d
   na.action=na.exclude)) # skip cases with NA/missing data

Call:
lm(formula = JUDGMENT ~ 1 + AGREEMENT * (NEGATION + EMPHPRON +
    SUBJORDER + NUMBERGRAM + PERSONGRAM), data = d, na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max
-4.8290 -1.8116  0.4351  1.5707  3.6784

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)
(Intercept)                     -0.212296   0.193797  -1.095 0.273377
AGREEMENTyes                     1.230055   0.271536   4.530 6.06e-06 ***
NEGATIONnegative                -0.101803   0.092732  -1.098 0.272346
EMPHPRONyes                     -0.003634   0.093477  -0.039 0.968990
SUBJORDERsemgram                 0.323943   0.092946   3.485 0.000497 ***
NUMBERGRAMsingular              -0.360704   0.153352  -2.352 0.018711 *
PERSONGRAMsecond                 0.171865   0.126062   1.363 0.172848
PERSONGRAMthird                  0.534925   0.193091   2.770 0.005624 **
AGREEMENTyes:NEGATIONnegative    0.122107   0.128877   0.947 0.343451
AGREEMENTyes:EMPHPRONyes        -0.132033   0.129462  -1.020 0.307851
AGREEMENTyes:SUBJORDERsemgram    0.315881   0.129826   2.433 0.015010 *
AGREEMENTyes:NUMBERGRAMsingular  0.268044   0.219243   1.223 0.221554
AGREEMENTyes:PERSONGRAMsecond   -0.020778   0.180547  -0.115 0.908382
AGREEMENTyes:PERSONGRAMthird    -0.490037   0.271843  -1.803 0.071514 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.101 on 4286 degrees of freedom
Multiple R-squared:  0.1229,    Adjusted R-squared:  0.1202
F-statistic: 46.19 on 13 and 4286 DF,  p-value: < 2.2e-16
drop1(m.01.lin, # drop each droppable predictor at a time from m.01.lin &
   test="F")    # test its significance w/ an F-test
Single term deletions

Model:
JUDGMENT ~ 1 + AGREEMENT * (NEGATION + EMPHPRON + SUBJORDER +
    NUMBERGRAM + PERSONGRAM)
                     Df Sum of Sq   RSS    AIC F value  Pr(>F)
<none>                            18916 6398.1
AGREEMENT:NEGATION    1    3.9620 18920 6397.0  0.8977 0.34345
AGREEMENT:EMPHPRON    1    4.5906 18921 6397.1  1.0401 0.30785
AGREEMENT:SUBJORDER   1   26.1284 18942 6402.0  5.9201 0.01501 *
AGREEMENT:NUMBERGRAM  1    6.5970 18923 6397.6  1.4947 0.22155
AGREEMENT:PERSONGRAM  2   23.5046 18940 6399.4  2.6628 0.06987 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s fit the 2nd model with the interaction AGREEMENT:NEGATION deleted:

summary(m.02.lin <- update( # make m.02.lin an update of
   m.01.lin, .~.            # m.01.lin, namely all of it (.~.), but then
   - AGREEMENT:NEGATION)) # remove this interaction

Call:
lm(formula = JUDGMENT ~ AGREEMENT + NEGATION + EMPHPRON + SUBJORDER +
    NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER +
    AGREEMENT:NUMBERGRAM + AGREEMENT:PERSONGRAM, data = d, na.action = na.exclude)

Residuals:
   Min     1Q Median     3Q    Max
-4.805 -1.826  0.442  1.577  3.641

Coefficients:
                                Estimate Std. Error t value Pr(>|t|)
(Intercept)                     -0.23466    0.19235  -1.220 0.222556
AGREEMENTyes                     1.29157    0.26366   4.899    1e-06 ***
NEGATIONnegative                -0.03858    0.06440  -0.599 0.549109
EMPHPRONyes                     -0.00293    0.09347  -0.031 0.974991
SUBJORDERsemgram                 0.32283    0.09294   3.474 0.000519 ***
NUMBERGRAMsingular              -0.36495    0.15328  -2.381 0.017315 *
PERSONGRAMsecond                 0.16318    0.12573   1.298 0.194392
PERSONGRAMthird                  0.52475    0.19279   2.722 0.006518 **
AGREEMENTyes:EMPHPRONyes        -0.13258    0.12946  -1.024 0.305825
AGREEMENTyes:SUBJORDERsemgram    0.31769    0.12981   2.447 0.014432 *
AGREEMENTyes:NUMBERGRAMsingular  0.26414    0.21920   1.205 0.228274
AGREEMENTyes:PERSONGRAMsecond   -0.01663    0.18049  -0.092 0.926581
AGREEMENTyes:PERSONGRAMthird    -0.49109    0.27184  -1.807 0.070903 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.101 on 4287 degrees of freedom
Multiple R-squared:  0.1227,    Adjusted R-squared:  0.1202
F-statistic: 49.96 on 12 and 4287 DF,  p-value: < 2.2e-16
anova(m.01.lin, m.02.lin, # compare m.01.lin to m.02.lin
   test="F")              # using an F-test
Analysis of Variance Table

Model 1: JUDGMENT ~ 1 + AGREEMENT * (NEGATION + EMPHPRON + SUBJORDER +
    NUMBERGRAM + PERSONGRAM)
Model 2: JUDGMENT ~ AGREEMENT + NEGATION + EMPHPRON + SUBJORDER + NUMBERGRAM +
    PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM +
    AGREEMENT:PERSONGRAM
  Res.Df   RSS Df Sum of Sq      F Pr(>F)
1   4286 18916
2   4287 18920 -1    -3.962 0.8977 0.3435
drop1(m.02.lin, # drop each droppable predictor at a time from m.02.lin &
   test="F")    # test its significance w/ an F-test
Single term deletions

Model:
JUDGMENT ~ AGREEMENT + NEGATION + EMPHPRON + SUBJORDER + NUMBERGRAM +
    PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM +
    AGREEMENT:PERSONGRAM
                     Df Sum of Sq   RSS    AIC F value  Pr(>F)
<none>                            18920 6397.0
NEGATION              1    1.5843 18922 6395.3  0.3590 0.54911
AGREEMENT:EMPHPRON    1    4.6291 18925 6396.0  1.0489 0.30582
AGREEMENT:SUBJORDER   1   26.4337 18947 6401.0  5.9894 0.01443 *
AGREEMENT:NUMBERGRAM  1    6.4083 18927 6396.4  1.4520 0.22827
AGREEMENT:PERSONGRAM  2   24.0208 18944 6398.4  2.7213 0.06590 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s fit the 3rd model with the main effect NEGATION deleted:

summary(m.03.lin <- update( # make m.03.lin an update of
   m.02.lin, .~.            # m.02.lin, namely all of it (.~.), but then
   - NEGATION))             # remove this main effect

Call:
lm(formula = JUDGMENT ~ AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM +
    PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM +
    AGREEMENT:PERSONGRAM, data = d, na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max
-4.8208 -1.8208  0.4241  1.5598  3.6183

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)
(Intercept)                     -0.248304   0.190984  -1.300 0.193628
AGREEMENTyes                     1.279564   0.262874   4.868 1.17e-06 ***
EMPHPRONyes                     -0.002501   0.093463  -0.027 0.978653
SUBJORDERsemgram                 0.322154   0.092924   3.467 0.000532 ***
NUMBERGRAMsingular              -0.367543   0.153212  -2.399 0.016486 *
PERSONGRAMsecond                 0.157881   0.125406   1.259 0.208115
PERSONGRAMthird                  0.518533   0.192496   2.694 0.007093 **
AGREEMENTyes:EMPHPRONyes        -0.133114   0.129446  -1.028 0.303851
AGREEMENTyes:SUBJORDERsemgram    0.317910   0.129800   2.449 0.014356 *
AGREEMENTyes:NUMBERGRAMsingular  0.272071   0.218785   1.244 0.213732
AGREEMENTyes:PERSONGRAMsecond   -0.008359   0.179949  -0.046 0.962953
AGREEMENTyes:PERSONGRAMthird    -0.477517   0.270872  -1.763 0.077991 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.101 on 4288 degrees of freedom
Multiple R-squared:  0.1226,    Adjusted R-squared:  0.1204
F-statistic: 54.48 on 11 and 4288 DF,  p-value: < 2.2e-16
anova(m.02.lin, m.03.lin, # compare m.02.lin to m.03.lin
   test="F")              # using an F-test
Analysis of Variance Table

Model 1: JUDGMENT ~ AGREEMENT + NEGATION + EMPHPRON + SUBJORDER + NUMBERGRAM +
    PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM +
    AGREEMENT:PERSONGRAM
Model 2: JUDGMENT ~ AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM +
    AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM +
    AGREEMENT:PERSONGRAM
  Res.Df   RSS Df Sum of Sq     F Pr(>F)
1   4287 18920
2   4288 18922 -1   -1.5843 0.359 0.5491
drop1(m.03.lin, # drop each droppable predictor at a time from m.03.lin &
   test="F")    # test its significance w/ an F-test
Single term deletions

Model:
JUDGMENT ~ AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM +
    AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM +
    AGREEMENT:PERSONGRAM
                     Df Sum of Sq   RSS    AIC F value  Pr(>F)
<none>                            18922 6395.3
AGREEMENT:EMPHPRON    1    4.6664 18927 6394.4  1.0575 0.30385
AGREEMENT:SUBJORDER   1   26.4711 18948 6399.3  5.9988 0.01436 *
AGREEMENT:NUMBERGRAM  1    6.8240 18929 6394.9  1.5464 0.21373
AGREEMENT:PERSONGRAM  2   23.5069 18945 6396.7  2.6635 0.06982 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s fit the 4th model with the interaction AGREEMENT:EMPHPRON deleted:

summary(m.04.lin <- update( # make m.04.lin an update of
   m.03.lin, .~.            # m.03.lin, namely all of it (.~.), but then
   - AGREEMENT:EMPHPRON))   # remove this interaction

Call:
lm(formula = JUDGMENT ~ AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM +
    PERSONGRAM + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM +
    AGREEMENT:PERSONGRAM, data = d, na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max
-4.7927 -1.8276  0.4417  1.5314  3.6569

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)
(Intercept)                     -0.200926   0.185345  -1.084 0.278398
AGREEMENTyes                     1.201596   0.251704   4.774 1.87e-06 ***
EMPHPRONyes                     -0.071895   0.064665  -1.112 0.266278
SUBJORDERsemgram                 0.322319   0.092925   3.469 0.000528 ***
NUMBERGRAMsingular              -0.384056   0.152369  -2.521 0.011753 *
PERSONGRAMsecond                 0.156027   0.125394   1.244 0.213461
PERSONGRAMthird                  0.508766   0.192263   2.646 0.008170 **
AGREEMENTyes:SUBJORDERsemgram    0.318659   0.129799   2.455 0.014127 *
AGREEMENTyes:NUMBERGRAMsingular  0.282862   0.218535   1.294 0.195612
AGREEMENTyes:PERSONGRAMsecond   -0.005005   0.179921  -0.028 0.977810
AGREEMENTyes:PERSONGRAMthird    -0.466902   0.270677  -1.725 0.084610 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.101 on 4289 degrees of freedom
Multiple R-squared:  0.1224,    Adjusted R-squared:  0.1204
F-statistic: 59.82 on 10 and 4289 DF,  p-value: < 2.2e-16
anova(m.03.lin, m.04.lin, # compare m.03.lin to m.04.lin
   test="F")              # using an F-test
Analysis of Variance Table

Model 1: JUDGMENT ~ AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM +
    AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM +
    AGREEMENT:PERSONGRAM
Model 2: JUDGMENT ~ AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM +
    AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM + AGREEMENT:PERSONGRAM
  Res.Df   RSS Df Sum of Sq      F Pr(>F)
1   4288 18922
2   4289 18927 -1   -4.6664 1.0575 0.3039
drop1(m.04.lin, # drop each droppable predictor at a time from m.04.lin &
   test="F")    # test its significance w/ an F-test
Single term deletions

Model:
JUDGMENT ~ AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM +
    AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM + AGREEMENT:PERSONGRAM
                     Df Sum of Sq   RSS    AIC F value  Pr(>F)
<none>                            18927 6394.4
EMPHPRON              1    5.4549 18932 6393.6  1.2361 0.26628
AGREEMENT:SUBJORDER   1   26.5967 18953 6398.4  6.0271 0.01413 *
AGREEMENT:NUMBERGRAM  1    7.3930 18934 6394.1  1.6754 0.19561
AGREEMENT:PERSONGRAM  2   22.8054 18949 6395.6  2.5840 0.07559 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s fit the 5th model with the main effect EMPHPRON deleted:

summary(m.05.lin <- update( # make m.05.lin an update of
   m.04.lin, .~.            # m.04.lin, namely all of it (.~.), but then
   - EMPHPRON))             # remove this main effect

Call:
lm(formula = JUDGMENT ~ AGREEMENT + SUBJORDER + NUMBERGRAM +
    PERSONGRAM + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM +
    AGREEMENT:PERSONGRAM, data = d, na.action = na.exclude)

Residuals:
   Min     1Q Median     3Q    Max
-4.761 -1.859  0.459  1.500  3.617

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)
(Intercept)                     -0.250011   0.180015  -1.389 0.164955
AGREEMENTyes                     1.216166   0.251369   4.838 1.36e-06 ***
SUBJORDERsemgram                 0.322148   0.092927   3.467 0.000532 ***
NUMBERGRAMsingular              -0.366948   0.151594  -2.421 0.015536 *
PERSONGRAMsecond                 0.157948   0.125386   1.260 0.207847
PERSONGRAMthird                  0.518885   0.192053   2.702 0.006924 **
AGREEMENTyes:SUBJORDERsemgram    0.319862   0.129798   2.464 0.013767 *
AGREEMENTyes:NUMBERGRAMsingular  0.259298   0.217511   1.192 0.233282
AGREEMENTyes:PERSONGRAMsecond   -0.005234   0.179925  -0.029 0.976797
AGREEMENTyes:PERSONGRAMthird    -0.476064   0.270559  -1.760 0.078554 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.101 on 4290 degrees of freedom
Multiple R-squared:  0.1221,    Adjusted R-squared:  0.1203
F-statistic: 66.33 on 9 and 4290 DF,  p-value: < 2.2e-16
anova(m.04.lin, m.05.lin, # compare m.04.lin to m.05.lin
   test="F")              # using an F-test
Analysis of Variance Table

Model 1: JUDGMENT ~ AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM +
    AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM + AGREEMENT:PERSONGRAM
Model 2: JUDGMENT ~ AGREEMENT + SUBJORDER + NUMBERGRAM + PERSONGRAM +
    AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM + AGREEMENT:PERSONGRAM
  Res.Df   RSS Df Sum of Sq      F Pr(>F)
1   4289 18927
2   4290 18932 -1   -5.4549 1.2361 0.2663
drop1(m.05.lin, # drop each droppable predictor at a time from m.05.lin &
   test="F")    # test its significance w/ an F-test
Single term deletions

Model:
JUDGMENT ~ AGREEMENT + SUBJORDER + NUMBERGRAM + PERSONGRAM +
    AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM + AGREEMENT:PERSONGRAM
                     Df Sum of Sq   RSS    AIC F value  Pr(>F)
<none>                            18932 6393.6
AGREEMENT:SUBJORDER   1   26.7998 18959 6397.7  6.0728 0.01377 *
AGREEMENT:NUMBERGRAM  1    6.2716 18938 6393.1  1.4211 0.23328
AGREEMENT:PERSONGRAM  2   23.7335 18956 6395.0  2.6890 0.06806 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s fit the 6th model with the interaction AGREEMENT:AGREEMENT:NUMBERGRAM deleted:

summary(m.06.lin <- update( # make m.06.lin an update of
   m.05.lin, .~.            # m.05.lin, namely all of it (.~.), but then
   - AGREEMENT:NUMBERGRAM)) # remove this interaction

Call:
lm(formula = JUDGMENT ~ AGREEMENT + SUBJORDER + NUMBERGRAM +
    PERSONGRAM + AGREEMENT:SUBJORDER + AGREEMENT:PERSONGRAM,
    data = d, na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max
-4.8413 -1.8413  0.4113  1.4952  3.6185

Coefficients:
                              Estimate Std. Error t value Pr(>|t|)
(Intercept)                   -0.37752    0.14480  -2.607 0.009160 **
AGREEMENTyes                   1.47360    0.12867  11.453  < 2e-16 ***
SUBJORDERsemgram               0.32528    0.09289   3.502 0.000467 ***
NUMBERGRAMsingular            -0.24100    0.10872  -2.217 0.026693 *
PERSONGRAMsecond               0.20719    0.11839   1.750 0.080177 .
PERSONGRAMthird                0.64459    0.16053   4.015 6.03e-05 ***
AGREEMENTyes:SUBJORDERsemgram  0.32447    0.12975   2.501 0.012427 *
AGREEMENTyes:PERSONGRAMsecond -0.11173    0.15619  -0.715 0.474410
AGREEMENTyes:PERSONGRAMthird  -0.73507    0.16125  -4.559 5.29e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.101 on 4291 degrees of freedom
Multiple R-squared:  0.1219,    Adjusted R-squared:  0.1202
F-statistic: 74.43 on 8 and 4291 DF,  p-value: < 2.2e-16
anova(m.05.lin, m.06.lin, # compare m.05.lin to m.06.lin
   test="F")              # using an F-test
Analysis of Variance Table

Model 1: JUDGMENT ~ AGREEMENT + SUBJORDER + NUMBERGRAM + PERSONGRAM +
    AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM + AGREEMENT:PERSONGRAM
Model 2: JUDGMENT ~ AGREEMENT + SUBJORDER + NUMBERGRAM + PERSONGRAM +
    AGREEMENT:SUBJORDER + AGREEMENT:PERSONGRAM
  Res.Df   RSS Df Sum of Sq      F Pr(>F)
1   4290 18932
2   4291 18938 -1   -6.2716 1.4211 0.2333
drop1(m.06.lin, # drop each droppable predictor at a time from m.06.lin &
   test="F")    # test its significance w/ an F-test
Single term deletions

Model:
JUDGMENT ~ AGREEMENT + SUBJORDER + NUMBERGRAM + PERSONGRAM +
    AGREEMENT:SUBJORDER + AGREEMENT:PERSONGRAM
                     Df Sum of Sq   RSS    AIC F value    Pr(>F)
<none>                            18938 6393.1
NUMBERGRAM            1    21.688 18960 6396.0  4.9140   0.02669 *
AGREEMENT:SUBJORDER   1    27.603 18966 6397.3  6.2542   0.01243 *
AGREEMENT:PERSONGRAM  2   106.819 19045 6413.2 12.1014 5.744e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now, we can’t delete any more predictors, we’re done with model selection:

summary(m.final.lin <- m.06.lin); rm(m.06.lin); invisible(gc())

Call:
lm(formula = JUDGMENT ~ AGREEMENT + SUBJORDER + NUMBERGRAM +
    PERSONGRAM + AGREEMENT:SUBJORDER + AGREEMENT:PERSONGRAM,
    data = d, na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max
-4.8413 -1.8413  0.4113  1.4952  3.6185

Coefficients:
                              Estimate Std. Error t value Pr(>|t|)
(Intercept)                   -0.37752    0.14480  -2.607 0.009160 **
AGREEMENTyes                   1.47360    0.12867  11.453  < 2e-16 ***
SUBJORDERsemgram               0.32528    0.09289   3.502 0.000467 ***
NUMBERGRAMsingular            -0.24100    0.10872  -2.217 0.026693 *
PERSONGRAMsecond               0.20719    0.11839   1.750 0.080177 .
PERSONGRAMthird                0.64459    0.16053   4.015 6.03e-05 ***
AGREEMENTyes:SUBJORDERsemgram  0.32447    0.12975   2.501 0.012427 *
AGREEMENTyes:PERSONGRAMsecond -0.11173    0.15619  -0.715 0.474410
AGREEMENTyes:PERSONGRAMthird  -0.73507    0.16125  -4.559 5.29e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.101 on 4291 degrees of freedom
Multiple R-squared:  0.1219,    Adjusted R-squared:  0.1202
F-statistic: 74.43 on 8 and 4291 DF,  p-value: < 2.2e-16

So, how do the models compare?

Predictor Final ordinal model Final linear model Present & directly comparable
AGREEMENT yes yes
NEGATION
EMPHPRON
SUBJORDER yes yes
NUMBERGRAM yes
PERSONGRAM yes
AGREEMENT:NEGATION
AGREEMENT:EMPHPRON
AGREEMENT:SUBJORDER yes yes yes
AGREEMENT:NUMBERGRAM yes
AGREEMENT:PERSONGRAM yes

So let’s make at least that one straightforward comparison:

plot(effect("AGREEMENT:SUBJORDER", m.final.lin), ylim=c(-3, 3), grid=TRUE)

plot(effect("AGREEMENT:SUBJORDER", m.final.lin), ylim=c(-3, 3), grid=TRUE, x.var="SUBJORDER")

With regard to this interaction at least, the the result from the final linear model is quite similar to the one in our final ordinal model:

  • when AGREEMENT is yes, then subjects rated the stimuli more highly than when AGREEMENT is no (as expected), but that difference is much higher when SUBJORDER is semgram than when it is gramsem;
  • from the reverse perspective, when SUBJORDER is semgram, then subjects rated the stimuli more highly than when SUBJORDER is gramsem, but that difference is much higher when AGREEMENT is yes than when it is no.

However, if we compare the R2s, and we need to compare the ones that can be compared, sort of, then there’s quite some difference:

  • McFadden’s R2 for the final ordinal model was 0.0356, i.e. very small;
  • (McFadden’s) R2 for the final linear model was 0.1219, i.e. small but considerably better.

So, the results are mixed, but not particularly encouraging in the sense of allowing us to go with the linear model. We’ll reach a more definitive verdict during the next session.

8 Homework

To prepare for next session, read SFLWR3, Section 5.6 & 5.7.

9 Session info

sessionInfo()
R version 4.5.0 (2025-04-11)
Platform: x86_64-pc-linux-gnu
Running under: Pop!_OS 22.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: America/Los_Angeles
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  compiler  methods
[8] base

other attached packages:
[1] MASS_7.3-65    effects_4.2-2  car_3.1-3      carData_3.0-5  STGmisc_1.02
[6] Rcpp_1.0.14    magrittr_2.0.3

loaded via a namespace (and not attached):
 [1] Matrix_1.7-3       jsonlite_2.0.0     survey_4.4-2       splines_4.5.0
 [5] boot_1.3-31        yaml_2.3.10        fastmap_1.2.0      lattice_0.22-7
 [9] Formula_1.2-5      knitr_1.50         rbibutils_2.3      htmlwidgets_1.6.4
[13] nloptr_2.2.1       insight_1.2.0      nnet_7.3-20        minqa_1.2.8
[17] DBI_1.2.3          rlang_1.1.6        xfun_0.52          estimability_1.5.1
[21] cli_3.6.5          Rdpack_2.6.4       digest_0.6.37      grid_4.5.0
[25] rstudioapi_0.17.1  lme4_1.1-37        nlme_3.1-168       reformulas_0.4.1
[29] evaluate_1.0.3     mitools_2.4        survival_3.8-3     abind_1.4-8
[33] colorspace_2.1-1   rmarkdown_2.29     tools_4.5.0        htmltools_0.5.8.1