Let’s load a data set for an example of an ordinal regression model; importantly, we make sure the response variable is an ordered factor:
rm(list=ls(all.names=TRUE))
library(car); library(effects); library(MASS)
source("helpers/202_R2.r"); source("helpers/202_summarize.polr.r") # load small helper functions
str(d <- read.delim( # summarize d, the result of loading
file="inputfiles/202_07_RATG.csv", # this file
stringsAsFactors=TRUE)) # change categorical variables into factors
## 'data.frame': 4300 obs. of 8 variables:
## $ CASE : int 1 2 3 4 5 6 7 8 9 10 ...
## $ JUDGMENT : int 3 -3 3 3 -3 2 -2 2 -3 3 ...
## $ AGREEMENT : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 2 ...
## $ NEGATION : Factor w/ 2 levels "affirmative",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ EMPHPRON : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
## $ SUBJORDER : Factor w/ 2 levels "gramsem","semgram": 1 1 1 1 1 1 1 1 2 2 ...
## $ NUMBERGRAM: Factor w/ 2 levels "plural","singular": 1 1 1 1 1 1 1 1 1 1 ...
## $ PERSONGRAM: Factor w/ 3 levels "first","second",..: 3 3 3 3 3 3 3 3 3 3 ...
d$JUDGMENT <- factor(d$JUDGMENT, ordered=TRUE)
summary(d)
## CASE JUDGMENT AGREEMENT NEGATION EMPHPRON
## Min. : 1 -3: 620 no :2077 affirmative:2139 no :2133
## 1st Qu.:1137 -2: 480 yes:2223 negative :2161 yes:2167
## Median :2262 -1: 387
## Mean :2266 0 : 329
## 3rd Qu.:3394 1 : 355
## Max. :4528 2 : 826
## 3 :1303
## SUBJORDER NUMBERGRAM PERSONGRAM
## gramsem:2173 plural :1989 first :1381
## semgram:2127 singular:2311 second:1561
## third :1358
##
##
##
##
Here’s what the columns are/contain:
CASE
is a simple case/row number just so
that each case has a unique ID;JUDGMENT
is the response variable: it’s an
ordinal rating variable with 7 levels centered around 0
(neutral);AGREEMENT
is a binary predictor: it’s
whether the experimental stimulus (a sentence) exhibited proper
agreement between the subject and the verb or not: no (e.g.,
He do the dishes) vs. yes (e.g., He does the
dishes);NEGATION
is a binary predictor: it’s whether
the experimental stimulus was affirmative (e.g., He does
the dishes) vs. negative (e.g., He doesn’t do the
dishes);EMPHPRON
is a binary predictor: it’s whether
the experimental stimulus contained an emphatic pronoun: no
(e.g., He did the dishes) vs. yes (e.g., He, HE,
did the dishes);SUBJORDER
is a binary predictor: it’s
whether the experimental stimulus has a grammatical subject before an
experiencer subject or not: gramsem (e.g., Birds please
me) vs. semgram (e.g., I like birds) (these are
not great examples, the study was actually done on Spanish);NUMBERGRAM
is a binary predictor: it’s
whether the grammatical subject of the experimental stimulus was
plural (e.g., Birds please me) or singular
(e.g., This pleases me);PERSONGRAM
is a ternary predictor: it’s
whether the grammatical subject of the experimental stimulus was
first person (e.g., I like birds) vs. second
person (e.g., you like birds) vs. third person (e.g.,
He likes birds).We want to determine whether the rating of a Spanish construction involving the semantic role of an experiencer is predictable from
AGREEMENT
, NEGATION
,
EMPHPRON
, SUBJORDER
, NUMPERGRAM
,
and PERSONGRAM
;AGREEMENT
with any of the other
predictors.Let’s already compute a multinomial kind of baseline for what will be
the response variable, JUDGMENT
:
(baselines <- c(
"baseline 1"=max( # make baselines[1] the highest
prop.table( # proportion in the
table(d$JUDGMENT))), # frequency table of the response
"baseline 2"=sum( # make baselines[2] the sum of the
prop.table( # proportions in the
table(d$JUDGMENT))^2))) # frequency table of the response squared
## baseline 1 baseline 2
## 0.3030233 0.1827431
But this is actually not ideal here – why?
To address this, let’s compute a baseline that takes the ordinal nature of the response (and the predictions) into consideration. We will use a very simple/crude simulation approach for this: We will randomly reshuffle the response variable 1000 times and compute for each reshuffling how far the random reshufflings are away from the actual data, where the ‘distance’ is operationalized as the sum of the absolute differences between observed and reshuffled ranks. This is how one might visualize that with a small histogram of those reshuffled sums-of-absolute-deviations; I am adding (i) the quantiles one might use as a 95% confidence interval and (ii) the sum that corresponds to baseline 1:
collector <- numeric(1000); set.seed(1); for (i in seq(collector)) {
random.judgments <- sample(d$JUDGMENT)
collector[i] <- sum(abs("-"(
as.numeric(random.judgments),
as.numeric(d$JUDGMENT))))
}
hist(collector, xlim=c(9250, 11750), main="")
abline(v=quantile(collector, probs=c(0.025, 0.5, 0.975)), lty=2, col="red")
abline(v=sum(abs(as.numeric(d$JUDGMENT)-7)), lty=2, col="blue")
With this, we also update our vector baselines
:
baselines["simulated"] <- median(collector); baselines
## baseline 1 baseline 2 simulated
## 3.030233e-01 1.827431e-01 1.070700e+04
Let’s compute the deviance of the null model:
deviance(m.00 <- polr(JUDGMENT ~ 1, Hess=TRUE, data=d, na.action=na.exclude))
## [1] 15669.11
Some exploration of the relevant variables:
# the predictor(s)/response on its/their own
# just check the summary for the main effects
table(d$NEGATION, d$AGREEMENT)
##
## no yes
## affirmative 1049 1090
## negative 1028 1133
table(d$EMPHPRON, d$AGREEMENT)
##
## no yes
## no 1038 1095
## yes 1039 1128
table(d$SUBJORDER, d$AGREEMENT)
##
## no yes
## gramsem 1047 1126
## semgram 1030 1097
table(d$NUMBERGRAM, d$AGREEMENT)
##
## no yes
## plural 926 1063
## singular 1151 1160
table(d$PERSONGRAM, d$AGREEMENT)
##
## no yes
## first 657 724
## second 809 752
## third 611 747
# the predictor(s) w/ the response
table(d$AGREEMENT, d$JUDGMENT)
##
## -3 -2 -1 0 1 2 3
## no 463 304 225 158 190 344 393
## yes 157 176 162 171 165 482 910
ftable(d$NEGATION, d$AGREEMENT, d$JUDGMENT)
## -3 -2 -1 0 1 2 3
##
## affirmative no 231 146 114 86 96 176 200
## yes 80 94 75 78 83 224 456
## negative no 232 158 111 72 94 168 193
## yes 77 82 87 93 82 258 454
ftable(d$EMPHPRON, d$AGREEMENT, d$JUDGMENT)
## -3 -2 -1 0 1 2 3
##
## no no 240 149 113 78 99 171 188
## yes 74 82 66 91 75 242 465
## yes no 223 155 112 80 91 173 205
## yes 83 94 96 80 90 240 445
ftable(d$SUBJORDER, d$AGREEMENT, d$JUDGMENT)
## -3 -2 -1 0 1 2 3
##
## gramsem no 253 159 124 87 97 161 166
## yes 98 107 104 98 99 254 366
## semgram no 210 145 101 71 93 183 227
## yes 59 69 58 73 66 228 544
ftable(d$NUMBERGRAM, d$AGREEMENT, d$JUDGMENT)
## -3 -2 -1 0 1 2 3
##
## plural no 162 113 89 70 97 184 211
## yes 64 84 75 88 80 243 429
## singular no 301 191 136 88 93 160 182
## yes 93 92 87 83 85 239 481
ftable(d$PERSONGRAM, d$AGREEMENT, d$JUDGMENT)
## -3 -2 -1 0 1 2 3
##
## first no 173 112 85 50 51 81 105
## yes 64 64 61 52 50 141 292
## second no 190 122 86 65 71 139 136
## yes 41 51 47 62 62 173 316
## third no 100 70 54 43 68 124 152
## yes 52 61 54 57 53 168 302
Let’s fit our initial regression model:
summary(m.01 <- polr( # summarize m.01, which models
JUDGMENT ~ 1 + # JUDGMENT ~ an overall intercept (1)
# AGREEMENT interacting w/ all other predictors
AGREEMENT * (NEGATION+EMPHPRON+SUBJORDER+NUMBERGRAM+PERSONGRAM),
Hess=TRUE, # recommended for using summary
data=d, # those vars are in d
na.action=na.exclude)) # skip cases with NA/missing data
## Call:
## polr(formula = JUDGMENT ~ 1 + AGREEMENT * (NEGATION + EMPHPRON +
## SUBJORDER + NUMBERGRAM + PERSONGRAM), data = d, na.action = na.exclude,
## Hess = TRUE)
##
## Coefficients:
## Value Std. Error t value
## AGREEMENTyes 1.00762 0.23047 4.3720
## NEGATIONnegative -0.07963 0.07825 -1.0177
## EMPHPRONyes 0.01820 0.07869 0.2313
## SUBJORDERsemgram 0.26155 0.07833 3.3390
## NUMBERGRAMsingular -0.28824 0.12844 -2.2441
## PERSONGRAMsecond 0.12173 0.10676 1.1402
## PERSONGRAMthird 0.41046 0.16241 2.5273
## AGREEMENTyes:NEGATIONnegative 0.06444 0.10980 0.5869
## AGREEMENTyes:EMPHPRONyes -0.14410 0.11020 -1.3077
## AGREEMENTyes:SUBJORDERsemgram 0.39130 0.11081 3.5311
## AGREEMENTyes:NUMBERGRAMsingular 0.24572 0.18523 1.3266
## AGREEMENTyes:PERSONGRAMsecond -0.05795 0.15453 -0.3750
## AGREEMENTyes:PERSONGRAMthird -0.36910 0.23059 -1.6007
##
## Intercepts:
## Value Std. Error t value
## -3|-2 -1.2230 0.1653 -7.4002
## -2|-1 -0.4605 0.1635 -2.8162
## -1|0 0.0101 0.1632 0.0617
## 0|1 0.3675 0.1634 2.2493
## 1|2 0.7353 0.1637 4.4917
## 2|3 1.6316 0.1652 9.8743
##
## Residual Deviance: 15099.88
## AIC: 15137.88
Let’s also compute the t-values for these coefficient values (however uninterpretable those are):
(m.01.tscores <- summary(m.01)$coefficients[,"t value"])
## AGREEMENTyes NEGATIONnegative
## 4.37198298 -1.01767960
## EMPHPRONyes SUBJORDERsemgram
## 0.23126312 3.33897026
## NUMBERGRAMsingular PERSONGRAMsecond
## -2.24412620 1.14023537
## PERSONGRAMthird AGREEMENTyes:NEGATIONnegative
## 2.52729216 0.58690684
## AGREEMENTyes:EMPHPRONyes AGREEMENTyes:SUBJORDERsemgram
## -1.30766349 3.53108615
## AGREEMENTyes:NUMBERGRAMsingular AGREEMENTyes:PERSONGRAMsecond
## 1.32657549 -0.37498824
## AGREEMENTyes:PERSONGRAMthird -3|-2
## -1.60067926 -7.40016289
## -2|-1 -1|0
## -2.81623304 0.06170461
## 0|1 1|2
## 2.24927021 4.49166614
## 2|3
## 9.87427499
And from that we compute the p-values:
(m.01.pvalues <- 2*pnorm(abs(m.01.tscores), lower.tail=FALSE))
## AGREEMENTyes NEGATIONnegative
## 1.231231e-05 3.088302e-01
## EMPHPRONyes SUBJORDERsemgram
## 8.171104e-01 8.408957e-04
## NUMBERGRAMsingular PERSONGRAMsecond
## 2.482428e-02 2.541883e-01
## PERSONGRAMthird AGREEMENTyes:NEGATIONnegative
## 1.149458e-02 5.572663e-01
## AGREEMENTyes:EMPHPRONyes AGREEMENTyes:SUBJORDERsemgram
## 1.909875e-01 4.138569e-04
## AGREEMENTyes:NUMBERGRAMsingular AGREEMENTyes:PERSONGRAMsecond
## 1.846491e-01 7.076692e-01
## AGREEMENTyes:PERSONGRAMthird -3|-2
## 1.094480e-01 1.360175e-13
## -2|-1 -1|0
## 4.859041e-03 9.507981e-01
## 0|1 1|2
## 2.449531e-02 7.066814e-06
## 2|3
## 5.382075e-23
Also, let’s quickly check the model’s overall significance value and see which predictors may get dropped:
anova(update(m.01, ~ 1), # compare the null model generated here ad hoc
m.01, test="Chisq") # to the model w/ the predictor using a LRT
## Likelihood ratio tests of ordinal regression models
##
## Response: JUDGMENT
## Model
## 1 1
## 2 1 + AGREEMENT * (NEGATION + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM)
## Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 4294 15669.11
## 2 4281 15099.88 1 vs 2 13 569.2355 0
drop1(m.01, # drop each droppable predictor at a time from m.01 &
test="Chisq") # test its significance w/ a LRT
## Single term deletions
##
## Model:
## JUDGMENT ~ 1 + AGREEMENT * (NEGATION + EMPHPRON + SUBJORDER +
## NUMBERGRAM + PERSONGRAM)
## Df AIC LRT Pr(>Chi)
## <none> 15138
## AGREEMENT:NEGATION 1 15136 0.3444 0.55727
## AGREEMENT:EMPHPRON 1 15138 1.7102 0.19095
## AGREEMENT:SUBJORDER 1 15148 12.4862 0.00041 ***
## AGREEMENT:NUMBERGRAM 1 15138 1.7590 0.18474
## AGREEMENT:PERSONGRAM 2 15137 3.4406 0.17901
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Let’s fit the 2nd model with the interaction
AGREEMENT:NEGATION
deleted:
summary(m.02 <- update( # make m.02 an update of
m.01, .~. # m.01, namely all of it (.~.), but then
- AGREEMENT:NEGATION)) # remove this interaction
## Call:
## polr(formula = JUDGMENT ~ AGREEMENT + NEGATION + EMPHPRON + SUBJORDER +
## NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER +
## AGREEMENT:NUMBERGRAM + AGREEMENT:PERSONGRAM, data = d, na.action = na.exclude,
## Hess = TRUE)
##
## Coefficients:
## Value Std. Error t value
## AGREEMENTyes 1.03982 0.22386 4.6450
## NEGATIONnegative -0.04691 0.05489 -0.8545
## EMPHPRONyes 0.01919 0.07868 0.2439
## SUBJORDERsemgram 0.26067 0.07832 3.3283
## NUMBERGRAMsingular -0.29084 0.12836 -2.2658
## PERSONGRAMsecond 0.11651 0.10639 1.0951
## PERSONGRAMthird 0.40454 0.16209 2.4958
## AGREEMENTyes:EMPHPRONyes -0.14501 0.11019 -1.3160
## AGREEMENTyes:SUBJORDERsemgram 0.39248 0.11080 3.5423
## AGREEMENTyes:NUMBERGRAMsingular 0.24393 0.18521 1.3171
## AGREEMENTyes:PERSONGRAMsecond -0.05529 0.15446 -0.3579
## AGREEMENTyes:PERSONGRAMthird -0.36913 0.23059 -1.6008
##
## Intercepts:
## Value Std. Error t value
## -3|-2 -1.2119 0.1641 -7.3827
## -2|-1 -0.4494 0.1624 -2.7674
## -1|0 0.0210 0.1622 0.1295
## 0|1 0.3783 0.1623 2.3310
## 1|2 0.7462 0.1627 4.5874
## 2|3 1.6425 0.1642 10.0033
##
## Residual Deviance: 15100.22
## AIC: 15136.22
anova(m.01, m.02, # compare m.01 to m.02
test="Chisq") # using a LRT
## Likelihood ratio tests of ordinal regression models
##
## Response: JUDGMENT
## Model
## 1 AGREEMENT + NEGATION + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM + AGREEMENT:PERSONGRAM
## 2 1 + AGREEMENT * (NEGATION + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM)
## Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 4282 15100.22
## 2 4281 15099.88 1 vs 2 1 0.3444451 0.5572746
drop1(m.02, # drop each droppable predictor at a time from m.02 &
test="Chisq") # test its significance w/ a LRT
## Single term deletions
##
## Model:
## JUDGMENT ~ AGREEMENT + NEGATION + EMPHPRON + SUBJORDER + NUMBERGRAM +
## PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM +
## AGREEMENT:PERSONGRAM
## Df AIC LRT Pr(>Chi)
## <none> 15136
## NEGATION 1 15135 0.7302 0.3928118
## AGREEMENT:EMPHPRON 1 15136 1.7323 0.1881216
## AGREEMENT:SUBJORDER 1 15147 12.5658 0.0003929 ***
## AGREEMENT:NUMBERGRAM 1 15136 1.7338 0.1879251
## AGREEMENT:PERSONGRAM 2 15136 3.4864 0.1749558
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Let’s fit the 3rd model with the main effect NEGATION
deleted:
summary(m.03 <- update( # make m.03 an update of
m.02, .~. # m.02, namely all of it (.~.), but then
- NEGATION)) # remove this predictor
## Call:
## polr(formula = JUDGMENT ~ AGREEMENT + EMPHPRON + SUBJORDER +
## NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER +
## AGREEMENT:NUMBERGRAM + AGREEMENT:PERSONGRAM, data = d, na.action = na.exclude,
## Hess = TRUE)
##
## Coefficients:
## Value Std. Error t value
## AGREEMENTyes 1.02402 0.22306 4.5908
## EMPHPRONyes 0.02060 0.07866 0.2618
## SUBJORDERsemgram 0.25944 0.07831 3.3130
## NUMBERGRAMsingular -0.29455 0.12828 -2.2962
## PERSONGRAMsecond 0.10906 0.10604 1.0285
## PERSONGRAMthird 0.39612 0.16178 2.4485
## AGREEMENTyes:EMPHPRONyes -0.14650 0.11018 -1.3297
## AGREEMENTyes:SUBJORDERsemgram 0.39322 0.11080 3.5490
## AGREEMENTyes:NUMBERGRAMsingular 0.25412 0.18481 1.3751
## AGREEMENTyes:PERSONGRAMsecond -0.04408 0.15391 -0.2864
## AGREEMENTyes:PERSONGRAMthird -0.35195 0.22969 -1.5323
##
## Intercepts:
## Value Std. Error t value
## -3|-2 -1.1959 0.1631 -7.3332
## -2|-1 -0.4336 0.1613 -2.6873
## -1|0 0.0368 0.1611 0.2283
## 0|1 0.3940 0.1612 2.4437
## 1|2 0.7619 0.1616 4.7141
## 2|3 1.6581 0.1632 10.1612
##
## Residual Deviance: 15100.95
## AIC: 15134.95
anova(m.02, m.03, # compare m.02 to m.03
test="Chisq") # using a LRT
## Likelihood ratio tests of ordinal regression models
##
## Response: JUDGMENT
## Model
## 1 AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM + AGREEMENT:PERSONGRAM
## 2 AGREEMENT + NEGATION + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM + AGREEMENT:PERSONGRAM
## Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 4283 15100.95
## 2 4282 15100.22 1 vs 2 1 0.7302203 0.3928118
drop1(m.03, # drop each droppable predictor at a time from m.03 &
test="Chisq") # test its significance w/ a LRT
## Single term deletions
##
## Model:
## JUDGMENT ~ AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM +
## AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM +
## AGREEMENT:PERSONGRAM
## Df AIC LRT Pr(>Chi)
## <none> 15135
## AGREEMENT:EMPHPRON 1 15135 1.7682 0.183607
## AGREEMENT:SUBJORDER 1 15146 12.6131 0.000383 ***
## AGREEMENT:NUMBERGRAM 1 15135 1.8905 0.169147
## AGREEMENT:PERSONGRAM 2 15134 3.3222 0.189929
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Let’s fit the 4th model with the interaction
AGREEMENT:PERSONGRAM
deleted:
summary(m.04 <- update( # make m.04 an update of
m.03, .~. # m.03, namely all of it (.~.), but then
- AGREEMENT:PERSONGRAM)) # remove this interaction
## Call:
## polr(formula = JUDGMENT ~ AGREEMENT + EMPHPRON + SUBJORDER +
## NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER +
## AGREEMENT:NUMBERGRAM, data = d, na.action = na.exclude, Hess = TRUE)
##
## Coefficients:
## Value Std. Error t value
## AGREEMENTyes 0.76959 0.11295 6.8139
## EMPHPRONyes 0.01133 0.07850 0.1443
## SUBJORDERsemgram 0.26537 0.07813 3.3965
## NUMBERGRAMsingular -0.41025 0.10669 -3.8454
## PERSONGRAMsecond 0.08777 0.07689 1.1415
## PERSONGRAMthird 0.21723 0.11466 1.8946
## AGREEMENTyes:EMPHPRONyes -0.13700 0.11004 -1.2449
## AGREEMENTyes:SUBJORDERsemgram 0.38909 0.11063 3.5171
## AGREEMENTyes:NUMBERGRAMsingular 0.48785 0.11089 4.3995
##
## Intercepts:
## Value Std. Error t value
## -3|-2 -1.3213 0.1329 -9.9452
## -2|-1 -0.5594 0.1305 -4.2860
## -1|0 -0.0895 0.1301 -0.6880
## 0|1 0.2674 0.1301 2.0556
## 1|2 0.6350 0.1304 4.8681
## 2|3 1.5310 0.1323 11.5759
##
## Residual Deviance: 15104.28
## AIC: 15134.28
anova(m.03, m.04, # compare m.03 to m.04
test="Chisq") # using a LRT
## Likelihood ratio tests of ordinal regression models
##
## Response: JUDGMENT
## Model
## 1 AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
## 2 AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM + AGREEMENT:PERSONGRAM
## Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 4285 15104.28
## 2 4283 15100.95 1 vs 2 2 3.322208 0.1899291
drop1(m.04, # drop each droppable predictor at a time from m.04 &
test="Chisq") # test its significance w/ a LRT
## Single term deletions
##
## Model:
## JUDGMENT ~ AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM +
## AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
## Df AIC LRT Pr(>Chi)
## <none> 15134
## PERSONGRAM 2 15134 3.6143 0.1641241
## AGREEMENT:EMPHPRON 1 15134 1.5498 0.2131632
## AGREEMENT:SUBJORDER 1 15145 12.3871 0.0004323 ***
## AGREEMENT:NUMBERGRAM 1 15152 19.3703 1.077e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Let’s fit the 5th model with the interaction
AGREEMENT:EMPHPRON
deleted:
summary(m.05 <- update( # make m.05 an update of
m.04, .~. # m.04, namely all of it (.~.), but then
- AGREEMENT:EMPHPRON)) # remove this interaction
## Call:
## polr(formula = JUDGMENT ~ AGREEMENT + EMPHPRON + SUBJORDER +
## NUMBERGRAM + PERSONGRAM + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM,
## data = d, na.action = na.exclude, Hess = TRUE)
##
## Coefficients:
## Value Std. Error t value
## AGREEMENTyes 0.69713 0.09674 7.206
## EMPHPRONyes -0.05837 0.05502 -1.061
## SUBJORDERsemgram 0.26392 0.07812 3.378
## NUMBERGRAMsingular -0.42288 0.10623 -3.981
## PERSONGRAMsecond 0.08768 0.07688 1.141
## PERSONGRAMthird 0.21233 0.11460 1.853
## AGREEMENTyes:SUBJORDERsemgram 0.39157 0.11061 3.540
## AGREEMENTyes:NUMBERGRAMsingular 0.49044 0.11087 4.424
##
## Intercepts:
## Value Std. Error t value
## -3|-2 -1.3652 0.1282 -10.6526
## -2|-1 -0.6033 0.1257 -4.7990
## -1|0 -0.1334 0.1252 -1.0654
## 0|1 0.2235 0.1252 1.7846
## 1|2 0.5910 0.1256 4.7069
## 2|3 1.4865 0.1273 11.6748
##
## Residual Deviance: 15105.83
## AIC: 15133.83
anova(m.04, m.05, # compare m.04 to m.05
test="Chisq") # using a LRT
## Likelihood ratio tests of ordinal regression models
##
## Response: JUDGMENT
## Model
## 1 AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
## 2 AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
## Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 4286 15105.83
## 2 4285 15104.28 1 vs 2 1 1.549812 0.2131632
drop1(m.05, # drop each droppable predictor at a time from m.05 &
test="Chisq") # test its significance w/ a LRT
## Single term deletions
##
## Model:
## JUDGMENT ~ AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM +
## AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
## Df AIC LRT Pr(>Chi)
## <none> 15134
## EMPHPRON 1 15133 1.1264 0.2885398
## PERSONGRAM 2 15133 3.4489 0.1782729
## AGREEMENT:SUBJORDER 1 15144 12.5500 0.0003962 ***
## AGREEMENT:NUMBERGRAM 1 15151 19.5864 9.615e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Let’s fit the 6th model with the main effect EMPHPRON
deleted:
summary(m.06 <- update( # make m.06 an update of
m.05, .~. # m.05, namely all of it (.~.), but then
- EMPHPRON)) # remove this predictor
## Call:
## polr(formula = JUDGMENT ~ AGREEMENT + SUBJORDER + NUMBERGRAM +
## PERSONGRAM + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM,
## data = d, na.action = na.exclude, Hess = TRUE)
##
## Coefficients:
## Value Std. Error t value
## AGREEMENTyes 0.70368 0.09656 7.288
## SUBJORDERsemgram 0.26511 0.07811 3.394
## NUMBERGRAMsingular -0.41126 0.10563 -3.893
## PERSONGRAMsecond 0.08941 0.07685 1.163
## PERSONGRAMthird 0.21791 0.11446 1.904
## AGREEMENTyes:SUBJORDERsemgram 0.39147 0.11060 3.539
## AGREEMENTyes:NUMBERGRAMsingular 0.47677 0.11012 4.330
##
## Intercepts:
## Value Std. Error t value
## -3|-2 -1.3264 0.1228 -10.8030
## -2|-1 -0.5646 0.1203 -4.6947
## -1|0 -0.0948 0.1198 -0.7917
## 0|1 0.2619 0.1198 2.1853
## 1|2 0.6293 0.1202 5.2342
## 2|3 1.5247 0.1222 12.4814
##
## Residual Deviance: 15106.95
## AIC: 15132.95
anova(m.05, m.06, # compare m.05 to m.06
test="Chisq") # using a LRT
## Likelihood ratio tests of ordinal regression models
##
## Response: JUDGMENT
## Model
## 1 AGREEMENT + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
## 2 AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
## Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 4287 15106.95
## 2 4286 15105.83 1 vs 2 1 1.126422 0.2885398
drop1(m.06, # drop each droppable predictor at a time from m.06 &
test="Chisq") # test its significance w/ a LRT
## Single term deletions
##
## Model:
## JUDGMENT ~ AGREEMENT + SUBJORDER + NUMBERGRAM + PERSONGRAM +
## AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
## Df AIC LRT Pr(>Chi)
## <none> 15133
## PERSONGRAM 2 15133 3.6433 0.1617624
## AGREEMENT:SUBJORDER 1 15144 12.5430 0.0003977 ***
## AGREEMENT:NUMBERGRAM 1 15150 18.7609 1.482e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Let’s fit the 7th model with the predictor PERSONGRAM
deleted:
summary(m.07 <- update( # make m.07 an update of
m.06, .~. # m.06, namely all of it (.~.), but then
- PERSONGRAM)) # remove this predictor
## Call:
## polr(formula = JUDGMENT ~ AGREEMENT + SUBJORDER + NUMBERGRAM +
## AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM, data = d, na.action = na.exclude,
## Hess = TRUE)
##
## Coefficients:
## Value Std. Error t value
## AGREEMENTyes 0.7064 0.09623 7.341
## SUBJORDERsemgram 0.2666 0.07795 3.421
## NUMBERGRAMsingular -0.5460 0.07867 -6.941
## AGREEMENTyes:SUBJORDERsemgram 0.3958 0.10971 3.608
## AGREEMENTyes:NUMBERGRAMsingular 0.4667 0.11000 4.243
##
## Intercepts:
## Value Std. Error t value
## -3|-2 -1.4980 0.0786 -19.0583
## -2|-1 -0.7364 0.0743 -9.9046
## -1|0 -0.2670 0.0732 -3.6463
## 0|1 0.0893 0.0730 1.2232
## 1|2 0.4562 0.0733 6.2265
## 2|3 1.3510 0.0760 17.7800
##
## Residual Deviance: 15110.60
## AIC: 15132.60
anova(m.06, m.07, # compare m.06 to m.07
test="Chisq") # using a LRT
## Likelihood ratio tests of ordinal regression models
##
## Response: JUDGMENT
## Model
## 1 AGREEMENT + SUBJORDER + NUMBERGRAM + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
## 2 AGREEMENT + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
## Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 4289 15110.60
## 2 4287 15106.95 1 vs 2 2 3.643254 0.1617624
drop1(m.07, # drop each droppable predictor at a time from m.07 &
test="Chisq") # test its significance w/ a LRT
## Single term deletions
##
## Model:
## JUDGMENT ~ AGREEMENT + SUBJORDER + NUMBERGRAM + AGREEMENT:SUBJORDER +
## AGREEMENT:NUMBERGRAM
## Df AIC LRT Pr(>Chi)
## <none> 15133
## AGREEMENT:SUBJORDER 1 15144 13.034 0.0003059 ***
## AGREEMENT:NUMBERGRAM 1 15149 18.017 2.19e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
No, we can’t delete any more predictors, we’re done with model selection:
summary(m.final <- m.07); rm(m.07); invisible(gc())
## Call:
## polr(formula = JUDGMENT ~ AGREEMENT + SUBJORDER + NUMBERGRAM +
## AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM, data = d, na.action = na.exclude,
## Hess = TRUE)
##
## Coefficients:
## Value Std. Error t value
## AGREEMENTyes 0.7064 0.09623 7.341
## SUBJORDERsemgram 0.2666 0.07795 3.421
## NUMBERGRAMsingular -0.5460 0.07867 -6.941
## AGREEMENTyes:SUBJORDERsemgram 0.3958 0.10971 3.608
## AGREEMENTyes:NUMBERGRAMsingular 0.4667 0.11000 4.243
##
## Intercepts:
## Value Std. Error t value
## -3|-2 -1.4980 0.0786 -19.0583
## -2|-1 -0.7364 0.0743 -9.9046
## -1|0 -0.2670 0.0732 -3.6463
## 0|1 0.0893 0.0730 1.2232
## 1|2 0.4562 0.0733 6.2265
## 2|3 1.3510 0.0760 17.7800
##
## Residual Deviance: 15110.60
## AIC: 15132.60
(m.final.tscores <- summary(m.final)$coefficients[,"t value"])
## AGREEMENTyes SUBJORDERsemgram
## 7.341353 3.420638
## NUMBERGRAMsingular AGREEMENTyes:SUBJORDERsemgram
## -6.941128 3.607670
## AGREEMENTyes:NUMBERGRAMsingular -3|-2
## 4.242926 -19.058294
## -2|-1 -1|0
## -9.904564 -3.646271
## 0|1 1|2
## 1.223185 6.226538
## 2|3
## 17.779967
(m.final.pvalues <- 2*pnorm(abs(m.final.tscores), lower.tail=FALSE))
## AGREEMENTyes SUBJORDERsemgram
## 2.114448e-13 6.247452e-04
## NUMBERGRAMsingular AGREEMENTyes:SUBJORDERsemgram
## 3.889817e-12 3.089594e-04
## AGREEMENTyes:NUMBERGRAMsingular -3|-2
## 2.206240e-05 5.607383e-81
## -2|-1 -1|0
## 3.977021e-23 2.660737e-04
## 0|1 1|2
## 2.212599e-01 4.768550e-10
## 2|3
## 1.010396e-70
Let’s also compute the overall p-value for this model:
anova(m.00, m.final, # compare the null model to m.final
test="Chisq") # using a LRT
## Likelihood ratio tests of ordinal regression models
##
## Response: JUDGMENT
## Model
## 1 1
## 2 AGREEMENT + SUBJORDER + NUMBERGRAM + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
## Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 4294 15669.11
## 2 4289 15110.60 1 vs 2 5 558.5191 0
pchisq( # compute the area under the chi-squared curve
q=558.5191, # for this chi-squared value
df=5, # at 5 df
lower.tail=FALSE) # only using the right/upper tail/side
## [1] 1.848523e-118
On to model quality: We make ‘predictions’ …
d <- cbind(d, # add to d the columns resulting from
predict( # predicting
m.final, # from m.final
type="prob")) # predicted probabilities
# give those columns better names
names(d)[9:15] <- paste0( # rename those columns to the result of pasting
"PREDS.", # "PREDS."
names(d)[9:15]) # in front of the existing names
d <- cbind(d, # add to d a column
PREDS.CAT=predict( # called PREDS.CAT, the predictions
m.final, # from m.final, namely
type="class")) # categorical predictions
head(d) # check the result
## CASE JUDGMENT AGREEMENT NEGATION EMPHPRON SUBJORDER NUMBERGRAM PERSONGRAM
## 1 1 3 no affirmative yes gramsem plural third
## 2 2 -3 no affirmative yes gramsem plural third
## 3 3 3 no affirmative yes gramsem plural third
## 4 4 3 no affirmative yes gramsem plural third
## 5 5 -3 no affirmative yes gramsem plural third
## 6 6 2 no affirmative yes gramsem plural third
## PREDS.-3 PREDS.-2 PREDS.-1 PREDS.0 PREDS.1 PREDS.2 PREDS.3
## 1 0.1827313 0.1410666 0.109845 0.08866082 0.08981659 0.1821742 0.2057054
## 2 0.1827313 0.1410666 0.109845 0.08866082 0.08981659 0.1821742 0.2057054
## 3 0.1827313 0.1410666 0.109845 0.08866082 0.08981659 0.1821742 0.2057054
## 4 0.1827313 0.1410666 0.109845 0.08866082 0.08981659 0.1821742 0.2057054
## 5 0.1827313 0.1410666 0.109845 0.08866082 0.08981659 0.1821742 0.2057054
## 6 0.1827313 0.1410666 0.109845 0.08866082 0.08981659 0.1821742 0.2057054
## PREDS.CAT
## 1 3
## 2 3
## 3 3
## 4 3
## 5 3
## 6 3
… and then we evaluate them (again, confusion matrices here are not as good as for the previous kinds of models):
(c.m <- table( # confusion matrix: cross-tabulate
"OBS" =d$JUDGMENT, # observed judgments in the rows
"PREDS"=d$PREDS.CAT)) # predicted judgments in the columns
## PREDS
## OBS -3 -2 -1 0 1 2 3
## -3 301 0 0 0 0 0 319
## -2 191 0 0 0 0 0 289
## -1 136 0 0 0 0 0 251
## 0 88 0 0 0 0 0 241
## 1 93 0 0 0 0 0 262
## 2 160 0 0 0 0 0 666
## 3 182 0 0 0 0 0 1121
c(tapply(as.character(d$JUDGMENT)==d$PREDS.CAT, d$JUDGMENT, mean), # accuracies for ...
"overall"=mean(as.character(d$JUDGMENT)==d$PREDS.CAT)) # ... each level & overall
## -3 -2 -1 0 1 2 3 overall
## 0.4854839 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.8603223 0.3306977
How does the model compare to our simulated baseline?
(sum.of.preddiffs.from.obs <- sum(abs("-"(
as.numeric(d$PREDS.CAT),
as.numeric(d$JUDGMENT)))))
## [1] 9267
hist(collector, xlim=c(9250, 11750), main="")
abline(v=quantile(collector, probs=c(0.025, 0.975)), lty=2, col="red")
abline(v=sum(abs(as.numeric(d$JUDGMENT)-7)), lty=2, col="blue")
abline(v=sum.of.preddiffs.from.obs, lty=2, col="green")
In terms of significance, it does really well: the observed sum of absolute differences is so small it’s never observed in the random reshuffles:
sum(collector <= sum.of.preddiffs.from.obs)
## [1] 0
At the same, the effect size is again not great. Our simulated baseline meant that, on average, we’d misestimate the right judgment by 2.49 points on our 7-point scale and now, with all our modeling efforts, we’re still off by an average of 2.16 steps, not exactly great:
median(collector) / 4300
## [1] 2.49
sum.of.preddiffs.from.obs / 4300
## [1] 2.155116
Finally, we need the model’s R2-values. We begin with the widely-used version of Nagelkerke’s R2, but then also add McFadden’s R2:
c("Nagelkerke's R2"=R2(m.final),
"McFadden's R2" =(m.00$deviance-m.final$deviance) / m.00$deviance)
## Nagelkerke's R2 McFadden's R2
## 0.12507688 0.03564459
Of course, if you paid attention to the book and worked through everything there, much of this you could have done with this:
summarize.polr(m.final)
## $Coefficients
## Estimate Std.Error 2.5% 97.5%
## -3|-2 -1.49795116 0.07859839 -1.65200117 -1.3439012
## -2|-1 -0.73637259 0.07434680 -0.88208964 -0.5906555
## -1|0 -0.26700342 0.07322644 -0.41052461 -0.1234822
## 0|1 0.08927411 0.07298498 -0.05377382 0.2323221
## 1|2 0.45623362 0.07327244 0.31262227 0.5998450
## 2|3 1.35100907 0.07598490 1.20208140 1.4999367
## AGREEMENTyes 0.70643359 0.09622662 0.51783289 0.8950343
## SUBJORDERsemgram 0.26664660 0.07795231 0.11386288 0.4194303
## NUMBERGRAMsingular -0.54604077 0.07866744 -0.70022612 -0.3918554
## AGREEMENTyes:SUBJORDERsemgram 0.39580665 0.10971255 0.18077401 0.6108393
## AGREEMENTyes:NUMBERGRAMsingular 0.46670731 0.10999657 0.25111801 0.6822966
## z p2tailed
## -3|-2 -19.058294 5.607383e-81
## -2|-1 -9.904564 3.977021e-23
## -1|0 -3.646271 2.660737e-04
## 0|1 1.223185 2.212599e-01
## 1|2 6.226538 4.768550e-10
## 2|3 17.779967 1.010396e-70
## AGREEMENTyes 7.341353 2.114448e-13
## SUBJORDERsemgram 3.420638 6.247452e-04
## NUMBERGRAMsingular -6.941128 3.889817e-12
## AGREEMENTyes:SUBJORDERsemgram 3.607670 3.089594e-04
## AGREEMENTyes:NUMBERGRAMsingular 4.242926 2.206240e-05
##
## $`Confusion matrix`
## PRED
## OBS -3 -2 -1 0 1 2 3
## -3 301 0 0 0 0 0 319
## -2 191 0 0 0 0 0 289
## -1 136 0 0 0 0 0 251
## 0 88 0 0 0 0 0 241
## 1 93 0 0 0 0 0 262
## 2 160 0 0 0 0 0 666
## 3 182 0 0 0 0 0 1121
##
## $`Classification metrics`
## Baseline Acc. for -3 Acc. for -2 Acc. for -1
## 3.030233e-01 4.854839e-01 0.000000e+00 0.000000e+00
## Acc. for 0 Acc. for 1 Acc. for 2 Acc. for 3
## 0.000000e+00 0.000000e+00 0.000000e+00 8.603223e-01
## Acc. for overall pbinom logloss
## 3.306977e-01 4.761551e-05 1.757046e+00
##
## $R2s
## Deviance of null model Deviance of this model McFadden R2
## 1.566911e+04 1.511060e+04 3.564459e-02
## Nagelkerke R2
## 1.250769e-01
##
## $`Model significance test`
## LR statistic Df P-value
## 5.585191e+02 5.000000e+00 1.848534e-118
## Number of data points
## 4.300000e+03
Now let’s visualize the results of the two interactions based on the
predictions; we begin with AGREEMENT:SUBJORDER
:
agsu.d <- data.frame( # make agsu.d a data frame of
agsu <- effect( # the effect
"AGREEMENT:SUBJORDER", # of AGREEMENT:SUBJORDER
m.final)) # in m.final
agsu.d[,1:9] # show just the predicted probabilities
## AGREEMENT SUBJORDER prob.X.3 prob.X.2 prob.X.1 prob.X0 prob.X1
## 1 no gramsem 0.23067814 0.16037026 0.11556664 0.08791856 0.08457945
## 2 yes gramsem 0.10323854 0.09455074 0.08497318 0.07743151 0.08810081
## 3 no semgram 0.18677053 0.14292687 0.11054340 0.08874982 0.08947773
## 4 yes semgram 0.05603033 0.05675284 0.05614353 0.05603617 0.07028809
## prob.X2 prob.X3
## 1 0.1590298 0.1618572
## 2 0.2170518 0.3346534
## 3 0.1801735 0.2013581
## 4 0.2109386 0.4938105
plot(agsu, # plot the effect of AGREEMENT:SUBJORDER
type="probability", # but plot predicted probabilities
ylim=c(0, 1), # w/ these y-axis limits
grid=TRUE, # & w/ a grid
multiline=TRUE, # put all lines in one panel
confint=list(style="auto"), # keep confidence intervals
lattice=list(key.args=list( # make the
columns=5, # legend have 5 columns &
cex.title=0))) # no title