1 Intro

Let’s load a data set for an example of an ordinal regression model; importantly, we make sure the response variable is an ordered factor:

rm(list=ls(all.names=TRUE))
library(car); library(effects); library(MASS)
source("helpers/202_R2.r"); source("helpers/202_summarize.polr.r") # load small helper functions
str(d <- read.delim(       # summarize d, the result of loading
   file="inputfiles/202_07_RATG.csv", # this file
   stringsAsFactors=TRUE)) # change categorical variables into factors
## 'data.frame':    4300 obs. of  8 variables:
##  $ CASE      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ JUDGMENT  : int  3 -3 3 3 -3 2 -2 2 -3 3 ...
##  $ AGREEMENT : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 2 ...
##  $ NEGATION  : Factor w/ 2 levels "affirmative",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ EMPHPRON  : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $ SUBJORDER : Factor w/ 2 levels "gramsem","semgram": 1 1 1 1 1 1 1 1 2 2 ...
##  $ NUMBERGRAM: Factor w/ 2 levels "plural","singular": 1 1 1 1 1 1 1 1 1 1 ...
##  $ PERSONGRAM: Factor w/ 3 levels "first","second",..: 3 3 3 3 3 3 3 3 3 3 ...
d$JUDGMENT <- factor(d$JUDGMENT, ordered=TRUE)
summary(d)
##       CASE      JUDGMENT  AGREEMENT         NEGATION    EMPHPRON  
##  Min.   :   1   -3: 620   no :2077   affirmative:2139   no :2133  
##  1st Qu.:1137   -2: 480   yes:2223   negative   :2161   yes:2167  
##  Median :2262   -1: 387                                           
##  Mean   :2266   0 : 329                                           
##  3rd Qu.:3394   1 : 355                                           
##  Max.   :4528   2 : 826                                           
##                 3 :1303                                           
##    SUBJORDER       NUMBERGRAM    PERSONGRAM  
##  gramsem:2173   plural  :1989   first :1381  
##  semgram:2127   singular:2311   second:1561  
##                                 third :1358  
##                                              
##                                              
##                                              
## 

Here’s what the columns are/contain:

  • the column CASE is a simple case/row number just so that each case has a unique ID;
  • the column JUDGMENT is the response variable: it’s an ordinal rating variable with 7 levels centered around 0 (neutral);
  • the column AGREEMENT is a binary predictor: it’s whether the experimental stimulus (a sentence) exhibited proper agreement between the subject and the verb or not: no (e.g., He do the dishes) vs. yes (e.g., He does the dishes);
  • the column NEGATION is a binary predictor: it’s whether the experimental stimulus was affirmative (e.g., He does the dishes) vs. negative (e.g., He doesn’t do the dishes);
  • the column EMPHPRON is a binary predictor: it’s whether the experimental stimulus contained an emphatic pronoun: no (e.g., He did the dishes) vs. yes (e.g., He, HE, did the dishes);
  • the column SUBJORDER is a binary predictor: it’s whether the experimental stimulus has a grammatical subject before an experiencer subject or not: gramsem (e.g., Birds please me) vs. semgram (e.g., I like birds) (these are not great examples, the study was actually done on Spanish);
  • the column NUMBERGRAM is a binary predictor: it’s whether the grammatical subject of the experimental stimulus was plural (e.g., Birds please me) or singular (e.g., This pleases me);
  • the column PERSONGRAM is a ternary predictor: it’s whether the grammatical subject of the experimental stimulus was first person (e.g., I like birds) vs. second person (e.g., you like birds) vs. third person (e.g., He likes birds).

We want to determine whether the rating of a Spanish construction involving the semantic role of an experiencer is predictable from

  • AGREEMENT, NEGATION, EMPHPRON, SUBJORDER, NUMPERGRAM, and PERSONGRAM;
  • any interaction of AGREEMENT with any of the other predictors.

2 Deviance & baseline(s)

Let’s already compute a multinomial kind of baseline for what will be the response variable, JUDGMENT:

(baselines <- c(
   "baseline 1"=max(          # make baselines[1] the highest
      prop.table(             # proportion in the
         table(d$JUDGMENT))), # frequency table of the response
   "baseline 2"=sum(             # make baselines[2] the sum of the
      prop.table(                # proportions in the
         table(d$JUDGMENT))^2))) # frequency table of the response squared
## baseline 1 baseline 2 
##  0.3030233  0.1827431

But this is actually not ideal here – why?

To address this, let’s compute a baseline that takes the ordinal nature of the response (and the predictions) into consideration. We will use a very simple/crude simulation approach for this: We will randomly reshuffle the response variable 1000 times and compute for each reshuffling how far the random reshufflings are away from the actual data, where the ‘distance’ is operationalized as the sum of the absolute differences between observed and reshuffled ranks. This is how one might visualize that with a small histogram of those reshuffled sums-of-absolute-deviations; I am adding (i) the quantiles one might use as a 95% confidence interval and (ii) the sum that corresponds to baseline 1:

collector <- numeric(1000); set.seed(1); for (i in seq(collector)) {
   random.judgments <- sample(d$JUDGMENT)
   collector[i] <- sum(abs("-"(
      as.numeric(random.judgments),
      as.numeric(d$JUDGMENT))))
}
hist(collector, xlim=c(9250, 11750), main="")
   abline(v=quantile(collector, probs=c(0.025, 0.5, 0.975)), lty=2, col="red")
   abline(v=sum(abs(as.numeric(d$JUDGMENT)-7)), lty=2, col="blue")

With this, we also update our vector baselines:

baselines["simulated"] <- median(collector); baselines
##   baseline 1   baseline 2    simulated 
## 3.030233e-01 1.827431e-01 1.070700e+04

Let’s compute the deviance of the null model:

deviance(m.00 <- polr(JUDGMENT ~ 1, Hess=TRUE, data=d, na.action=na.exclude))
## [1] 15669.11

3 Exploration & preparation

Some exploration of the relevant variables:

# the predictor(s)/response on its/their own
# just check the summary for the main effects
table(d$NEGATION, d$AGREEMENT)
##              
##                 no  yes
##   affirmative 1049 1090
##   negative    1028 1133
table(d$EMPHPRON, d$AGREEMENT)
##      
##         no  yes
##   no  1038 1095
##   yes 1039 1128
table(d$SUBJORDER, d$AGREEMENT)
##          
##             no  yes
##   gramsem 1047 1126
##   semgram 1030 1097
table(d$NUMBERGRAM, d$AGREEMENT)
##           
##              no  yes
##   plural    926 1063
##   singular 1151 1160
table(d$PERSONGRAM, d$AGREEMENT)
##         
##           no yes
##   first  657 724
##   second 809 752
##   third  611 747
# the predictor(s) w/ the response
table(d$AGREEMENT, d$JUDGMENT)
##      
##        -3  -2  -1   0   1   2   3
##   no  463 304 225 158 190 344 393
##   yes 157 176 162 171 165 482 910
ftable(d$NEGATION, d$AGREEMENT, d$JUDGMENT)
##                   -3  -2  -1   0   1   2   3
##                                             
## affirmative no   231 146 114  86  96 176 200
##             yes   80  94  75  78  83 224 456
## negative    no   232 158 111  72  94 168 193
##             yes   77  82  87  93  82 258 454
ftable(d$EMPHPRON, d$AGREEMENT, d$JUDGMENT)
##           -3  -2  -1   0   1   2   3
##                                     
## no  no   240 149 113  78  99 171 188
##     yes   74  82  66  91  75 242 465
## yes no   223 155 112  80  91 173 205
##     yes   83  94  96  80  90 240 445
ftable(d$SUBJORDER, d$AGREEMENT, d$JUDGMENT)
##               -3  -2  -1   0   1   2   3
##                                         
## gramsem no   253 159 124  87  97 161 166
##         yes   98 107 104  98  99 254 366
## semgram no   210 145 101  71  93 183 227
##         yes   59  69  58  73  66 228 544
ftable(d$NUMBERGRAM, d$AGREEMENT, d$JUDGMENT)
##                -3  -2  -1   0   1   2   3
##                                          
## plural   no   162 113  89  70  97 184 211
##          yes   64  84  75  88  80 243 429
## singular no   301 191 136  88  93 160 182
##          yes   93  92  87  83  85 239 481
ftable(d$PERSONGRAM, d$AGREEMENT, d$JUDGMENT)
##              -3  -2  -1   0   1   2   3
##                                        
## first  no   173 112  85  50  51  81 105
##        yes   64  64  61  52  50 141 292
## second no   190 122  86  65  71 139 136
##        yes   41  51  47  62  62 173 316
## third  no   100  70  54  43  68 124 152
##        yes   52  61  54  57  53 168 302

4 Modeling & numerical interpretation

Let’s fit our initial regression model:

summary(m.01 <- polr(     # summarize m.01, which models
   JUDGMENT ~ 1 +         # JUDGMENT ~ an overall intercept (1)
   # AGREEMENT interacting w/ all other predictors
   AGREEMENT * (NEGATION+EMPHPRON+SUBJORDER+NUMBERGRAM+PERSONGRAM),
   Hess=TRUE,             # recommended for using summary
   data=d,                # those vars are in d
   na.action=na.exclude)) # skip cases with NA/missing data
## Call:
## polr(formula = JUDGMENT ~ 1 + AGREEMENT * (NEGATION + EMPHPRON + 
##     SUBJORDER + NUMBERGRAM + PERSONGRAM), data = d, na.action = na.exclude, 
##     Hess = TRUE)
## 
## Coefficients:
##                                    Value Std. Error t value
## AGREEMENTyes                     1.00762    0.23047  4.3720
## NEGATIONnegative                -0.07963    0.07825 -1.0177
## EMPHPRONyes                      0.01820    0.07869  0.2313
## SUBJORDERsemgram                 0.26155    0.07833  3.3390
## NUMBERGRAMsingular              -0.28824    0.12844 -2.2441
## PERSONGRAMsecond                 0.12173    0.10676  1.1402
## PERSONGRAMthird                  0.41046    0.16241  2.5273
## AGREEMENTyes:NEGATIONnegative    0.06444    0.10980  0.5869
## AGREEMENTyes:EMPHPRONyes        -0.14410    0.11020 -1.3077
## AGREEMENTyes:SUBJORDERsemgram    0.39130    0.11081  3.5311
## AGREEMENTyes:NUMBERGRAMsingular  0.24572    0.18523  1.3266
## AGREEMENTyes:PERSONGRAMsecond   -0.05795    0.15453 -0.3750
## AGREEMENTyes:PERSONGRAMthird    -0.36910    0.23059 -1.6007
## 
## Intercepts:
##       Value   Std. Error t value
## -3|-2 -1.2230  0.1653    -7.4002
## -2|-1 -0.4605  0.1635    -2.8162
## -1|0   0.0101  0.1632     0.0617
## 0|1    0.3675  0.1634     2.2493
## 1|2    0.7353  0.1637     4.4917
## 2|3    1.6316  0.1652     9.8743
## 
## Residual Deviance: 15099.88 
## AIC: 15137.88

Let’s also compute the t-values for these coefficient values (however uninterpretable those are):

(m.01.tscores <- summary(m.01)$coefficients[,"t value"])
##                    AGREEMENTyes                NEGATIONnegative 
##                      4.37198298                     -1.01767960 
##                     EMPHPRONyes                SUBJORDERsemgram 
##                      0.23126312                      3.33897026 
##              NUMBERGRAMsingular                PERSONGRAMsecond 
##                     -2.24412620                      1.14023537 
##                 PERSONGRAMthird   AGREEMENTyes:NEGATIONnegative 
##                      2.52729216                      0.58690684 
##        AGREEMENTyes:EMPHPRONyes   AGREEMENTyes:SUBJORDERsemgram 
##                     -1.30766349                      3.53108615 
## AGREEMENTyes:NUMBERGRAMsingular   AGREEMENTyes:PERSONGRAMsecond 
##                      1.32657549                     -0.37498824 
##    AGREEMENTyes:PERSONGRAMthird                           -3|-2 
##                     -1.60067926                     -7.40016289 
##                           -2|-1                            -1|0 
##                     -2.81623304                      0.06170461 
##                             0|1                             1|2 
##                      2.24927021                      4.49166614 
##                             2|3 
##                      9.87427499

And from that we compute the p-values:

(m.01.pvalues <- 2*pnorm(abs(m.01.tscores), lower.tail=FALSE))
##                    AGREEMENTyes                NEGATIONnegative 
##                    1.231231e-05                    3.088302e-01 
##                     EMPHPRONyes                SUBJORDERsemgram 
##                    8.171104e-01                    8.408957e-04 
##              NUMBERGRAMsingular                PERSONGRAMsecond 
##                    2.482428e-02                    2.541883e-01 
##                 PERSONGRAMthird   AGREEMENTyes:NEGATIONnegative 
##                    1.149458e-02                    5.572663e-01 
##        AGREEMENTyes:EMPHPRONyes   AGREEMENTyes:SUBJORDERsemgram 
##                    1.909875e-01                    4.138569e-04 
## AGREEMENTyes:NUMBERGRAMsingular   AGREEMENTyes:PERSONGRAMsecond 
##                    1.846491e-01                    7.076692e-01 
##    AGREEMENTyes:PERSONGRAMthird                           -3|-2 
##                    1.094480e-01                    1.360175e-13 
##                           -2|-1                            -1|0 
##                    4.859041e-03                    9.507981e-01 
##                             0|1                             1|2 
##                    2.449531e-02                    7.066814e-06 
##                             2|3 
##                    5.382075e-23

Also, let’s quickly check the model’s overall significance value and see which predictors may get dropped:

anova(update(m.01, ~ 1), # compare the null model generated here ad hoc
   m.01, test="Chisq")   # to the model w/ the predictor using a LRT
## Likelihood ratio tests of ordinal regression models
## 
## Response: JUDGMENT
##                                                                         Model
## 1                                                                           1
## 2 1 + AGREEMENT * (NEGATION + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM)
##   Resid. df Resid. Dev   Test    Df LR stat. Pr(Chi)
## 1      4294   15669.11                              
## 2      4281   15099.88 1 vs 2    13 569.2355       0
drop1(m.01,      # drop each droppable predictor at a time from m.01 &
   test="Chisq") # test its significance w/ a LRT
## Single term deletions
## 
## Model:
## JUDGMENT ~ 1 + AGREEMENT * (NEGATION + EMPHPRON + SUBJORDER + 
##     NUMBERGRAM + PERSONGRAM)
##                      Df   AIC     LRT Pr(>Chi)    
## <none>                  15138                     
## AGREEMENT:NEGATION    1 15136  0.3444  0.55727    
## AGREEMENT:EMPHPRON    1 15138  1.7102  0.19095    
## AGREEMENT:SUBJORDER   1 15148 12.4862  0.00041 ***
## AGREEMENT:NUMBERGRAM  1 15138  1.7590  0.18474    
## AGREEMENT:PERSONGRAM  2 15137  3.4406  0.17901    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s fit the 2nd model with the interaction AGREEMENT:NEGATION deleted:

summary(m.02 <- update(   # make m.02 an update of
   m.01, .~.              # m.01, namely all of it (.~.), but then
   - AGREEMENT:NEGATION)) # remove this interaction
## Call:
## polr(formula = JUDGMENT ~ AGREEMENT + NEGATION + EMPHPRON + SUBJORDER + 
##     NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + 
##     AGREEMENT:NUMBERGRAM + AGREEMENT:PERSONGRAM, data = d, na.action = na.exclude, 
##     Hess = TRUE)
## 
## Coefficients:
##                                    Value Std. Error t value
## AGREEMENTyes                     1.03982    0.22386  4.6450
## NEGATIONnegative                -0.04691    0.05489 -0.8545
## EMPHPRONyes                      0.01919    0.07868  0.2439
## SUBJORDERsemgram                 0.26067    0.07832  3.3283
## NUMBERGRAMsingular              -0.29084    0.12836 -2.2658
## PERSONGRAMsecond                 0.11651    0.10639  1.0951
## PERSONGRAMthird                  0.40454    0.16209  2.4958
## AGREEMENTyes:EMPHPRONyes        -0.14501    0.11019 -1.3160
## AGREEMENTyes:SUBJORDERsemgram    0.39248    0.11080  3.5423
## AGREEMENTyes:NUMBERGRAMsingular  0.24393    0.18521  1.3171
## AGREEMENTyes:PERSONGRAMsecond   -0.05529    0.15446 -0.3579
## AGREEMENTyes:PERSONGRAMthird    -0.36913    0.23059 -1.6008
## 
## Intercepts:
##       Value   Std. Error t value
## -3|-2 -1.2119  0.1641    -7.3827
## -2|-1 -0.4494  0.1624    -2.7674
## -1|0   0.0210  0.1622     0.1295
## 0|1    0.3783  0.1623     2.3310
## 1|2    0.7462  0.1627     4.5874
## 2|3    1.6425  0.1642    10.0033
## 
## Residual Deviance: 15100.22 
## AIC: 15136.22
anova(m.01, m.02, # compare m.01 to m.02
   test="Chisq")  # using a LRT
## Likelihood ratio tests of ordinal regression models
## 
## Response: JUDGMENT
##                                                                                                                                                            Model
## 1 AGREEMENT + NEGATION + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM + AGREEMENT:PERSONGRAM
## 2                                                                                    1 + AGREEMENT * (NEGATION + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM)
##   Resid. df Resid. Dev   Test    Df  LR stat.   Pr(Chi)
## 1      4282   15100.22                                 
## 2      4281   15099.88 1 vs 2     1 0.3444451 0.5572746
drop1(m.02,      # drop each droppable predictor at a time from m.02 &
   test="Chisq") # test its significance w/ a LRT
## Single term deletions
## 
## Model:
## JUDGMENT ~ AGREEMENT + NEGATION + EMPHPRON + SUBJORDER + NUMBERGRAM + 
##     PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM + 
##     AGREEMENT:PERSONGRAM
##                      Df   AIC     LRT  Pr(>Chi)    
## <none>                  15136                      
## NEGATION              1 15135  0.7302 0.3928118    
## AGREEMENT:EMPHPRON    1 15136  1.7323 0.1881216    
## AGREEMENT:SUBJORDER   1 15147 12.5658 0.0003929 ***
## AGREEMENT:NUMBERGRAM  1 15136  1.7338 0.1879251    
## AGREEMENT:PERSONGRAM  2 15136  3.4864 0.1749558    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s fit the 3rd model with the main effect NEGATION deleted:

summary(m.03 <- update( # make m.03 an update of
   m.02, .~.            # m.02, namely all of it (.~.), but then
   - NEGATION))         # remove this predictor
## Call:
## polr(formula = JUDGMENT ~ AGREEMENT + EMPHPRON + SUBJORDER + 
##     NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + 
##     AGREEMENT:NUMBERGRAM + AGREEMENT:PERSONGRAM, data = d, na.action = na.exclude, 
##     Hess = TRUE)
## 
## Coefficients:
##                                    Value Std. Error t value
## AGREEMENTyes                     1.02402    0.22306  4.5908
## EMPHPRONyes                      0.02060    0.07866  0.2618
## SUBJORDERsemgram                 0.25944    0.07831  3.3130
## NUMBERGRAMsingular              -0.29455    0.12828 -2.2962
## PERSONGRAMsecond                 0.10906    0.10604  1.0285
## PERSONGRAMthird                  0.39612    0.16178  2.4485
## AGREEMENTyes:EMPHPRONyes        -0.14650    0.11018 -1.3297
## AGREEMENTyes:SUBJORDERsemgram    0.39322    0.11080  3.5490
## AGREEMENTyes:NUMBERGRAMsingular  0.25412    0.18481  1.3751
## AGREEMENTyes:PERSONGRAMsecond   -0.04408    0.15391 -0.2864
## AGREEMENTyes:PERSONGRAMthird    -0.35195    0.22969 -1.5323
## 
## Intercepts:
##       Value   Std. Error t value
## -3|-2 -1.1959  0.1631    -7.3332
## -2|-1 -0.4336  0.1613    -2.6873
## -1|0   0.0368  0.1611     0.2283
## 0|1    0.3940  0.1612     2.4437
## 1|2    0.7619  0.1616     4.7141
## 2|3    1.6581  0.1632    10.1612
## 
## Residual Deviance: 15100.95 
## AIC: 15134.95
anova(m.02, m.03, # compare m.02 to m.03
   test="Chisq")  # using a LRT
## Likelihood ratio tests of ordinal regression models
## 
## Response: JUDGMENT
##                                                                                                                                                            Model
## 1            AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM + AGREEMENT:PERSONGRAM
## 2 AGREEMENT + NEGATION + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM + AGREEMENT:PERSONGRAM
##   Resid. df Resid. Dev   Test    Df  LR stat.   Pr(Chi)
## 1      4283   15100.95                                 
## 2      4282   15100.22 1 vs 2     1 0.7302203 0.3928118
drop1(m.03,      # drop each droppable predictor at a time from m.03 &
   test="Chisq") # test its significance w/ a LRT
## Single term deletions
## 
## Model:
## JUDGMENT ~ AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM + 
##     AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM + 
##     AGREEMENT:PERSONGRAM
##                      Df   AIC     LRT Pr(>Chi)    
## <none>                  15135                     
## AGREEMENT:EMPHPRON    1 15135  1.7682 0.183607    
## AGREEMENT:SUBJORDER   1 15146 12.6131 0.000383 ***
## AGREEMENT:NUMBERGRAM  1 15135  1.8905 0.169147    
## AGREEMENT:PERSONGRAM  2 15134  3.3222 0.189929    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s fit the 4th model with the interaction AGREEMENT:PERSONGRAM deleted:

summary(m.04 <- update( # make m.04 an update of
   m.03, .~.            # m.03, namely all of it (.~.), but then
   - AGREEMENT:PERSONGRAM)) # remove this interaction
## Call:
## polr(formula = JUDGMENT ~ AGREEMENT + EMPHPRON + SUBJORDER + 
##     NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + 
##     AGREEMENT:NUMBERGRAM, data = d, na.action = na.exclude, Hess = TRUE)
## 
## Coefficients:
##                                    Value Std. Error t value
## AGREEMENTyes                     0.76959    0.11295  6.8139
## EMPHPRONyes                      0.01133    0.07850  0.1443
## SUBJORDERsemgram                 0.26537    0.07813  3.3965
## NUMBERGRAMsingular              -0.41025    0.10669 -3.8454
## PERSONGRAMsecond                 0.08777    0.07689  1.1415
## PERSONGRAMthird                  0.21723    0.11466  1.8946
## AGREEMENTyes:EMPHPRONyes        -0.13700    0.11004 -1.2449
## AGREEMENTyes:SUBJORDERsemgram    0.38909    0.11063  3.5171
## AGREEMENTyes:NUMBERGRAMsingular  0.48785    0.11089  4.3995
## 
## Intercepts:
##       Value   Std. Error t value
## -3|-2 -1.3213  0.1329    -9.9452
## -2|-1 -0.5594  0.1305    -4.2860
## -1|0  -0.0895  0.1301    -0.6880
## 0|1    0.2674  0.1301     2.0556
## 1|2    0.6350  0.1304     4.8681
## 2|3    1.5310  0.1323    11.5759
## 
## Residual Deviance: 15104.28 
## AIC: 15134.28
anova(m.03, m.04, # compare m.03 to m.04
   test="Chisq")  # using a LRT
## Likelihood ratio tests of ordinal regression models
## 
## Response: JUDGMENT
##                                                                                                                                                 Model
## 1                        AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
## 2 AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM + AGREEMENT:PERSONGRAM
##   Resid. df Resid. Dev   Test    Df LR stat.   Pr(Chi)
## 1      4285   15104.28                                
## 2      4283   15100.95 1 vs 2     2 3.322208 0.1899291
drop1(m.04,      # drop each droppable predictor at a time from m.04 &
   test="Chisq") # test its significance w/ a LRT
## Single term deletions
## 
## Model:
## JUDGMENT ~ AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM + 
##     AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
##                      Df   AIC     LRT  Pr(>Chi)    
## <none>                  15134                      
## PERSONGRAM            2 15134  3.6143 0.1641241    
## AGREEMENT:EMPHPRON    1 15134  1.5498 0.2131632    
## AGREEMENT:SUBJORDER   1 15145 12.3871 0.0004323 ***
## AGREEMENT:NUMBERGRAM  1 15152 19.3703 1.077e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s fit the 5th model with the interaction AGREEMENT:EMPHPRON deleted:

summary(m.05 <- update(   # make m.05 an update of
   m.04, .~.              # m.04, namely all of it (.~.), but then
   - AGREEMENT:EMPHPRON)) # remove this interaction
## Call:
## polr(formula = JUDGMENT ~ AGREEMENT + EMPHPRON + SUBJORDER + 
##     NUMBERGRAM + PERSONGRAM + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM, 
##     data = d, na.action = na.exclude, Hess = TRUE)
## 
## Coefficients:
##                                    Value Std. Error t value
## AGREEMENTyes                     0.69713    0.09674   7.206
## EMPHPRONyes                     -0.05837    0.05502  -1.061
## SUBJORDERsemgram                 0.26392    0.07812   3.378
## NUMBERGRAMsingular              -0.42288    0.10623  -3.981
## PERSONGRAMsecond                 0.08768    0.07688   1.141
## PERSONGRAMthird                  0.21233    0.11460   1.853
## AGREEMENTyes:SUBJORDERsemgram    0.39157    0.11061   3.540
## AGREEMENTyes:NUMBERGRAMsingular  0.49044    0.11087   4.424
## 
## Intercepts:
##       Value    Std. Error t value 
## -3|-2  -1.3652   0.1282   -10.6526
## -2|-1  -0.6033   0.1257    -4.7990
## -1|0   -0.1334   0.1252    -1.0654
## 0|1     0.2235   0.1252     1.7846
## 1|2     0.5910   0.1256     4.7069
## 2|3     1.4865   0.1273    11.6748
## 
## Residual Deviance: 15105.83 
## AIC: 15133.83
anova(m.04, m.05, # compare m.04 to m.05
   test="Chisq")  # using a LRT
## Likelihood ratio tests of ordinal regression models
## 
## Response: JUDGMENT
##                                                                                                                          Model
## 1                      AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
## 2 AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:EMPHPRON + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
##   Resid. df Resid. Dev   Test    Df LR stat.   Pr(Chi)
## 1      4286   15105.83                                
## 2      4285   15104.28 1 vs 2     1 1.549812 0.2131632
drop1(m.05,      # drop each droppable predictor at a time from m.05 &
   test="Chisq") # test its significance w/ a LRT
## Single term deletions
## 
## Model:
## JUDGMENT ~ AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM + 
##     AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
##                      Df   AIC     LRT  Pr(>Chi)    
## <none>                  15134                      
## EMPHPRON              1 15133  1.1264 0.2885398    
## PERSONGRAM            2 15133  3.4489 0.1782729    
## AGREEMENT:SUBJORDER   1 15144 12.5500 0.0003962 ***
## AGREEMENT:NUMBERGRAM  1 15151 19.5864 9.615e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s fit the 6th model with the main effect EMPHPRON deleted:

summary(m.06 <- update( # make m.06 an update of
   m.05, .~.            # m.05, namely all of it (.~.), but then
   - EMPHPRON))         # remove this predictor
## Call:
## polr(formula = JUDGMENT ~ AGREEMENT + SUBJORDER + NUMBERGRAM + 
##     PERSONGRAM + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM, 
##     data = d, na.action = na.exclude, Hess = TRUE)
## 
## Coefficients:
##                                    Value Std. Error t value
## AGREEMENTyes                     0.70368    0.09656   7.288
## SUBJORDERsemgram                 0.26511    0.07811   3.394
## NUMBERGRAMsingular              -0.41126    0.10563  -3.893
## PERSONGRAMsecond                 0.08941    0.07685   1.163
## PERSONGRAMthird                  0.21791    0.11446   1.904
## AGREEMENTyes:SUBJORDERsemgram    0.39147    0.11060   3.539
## AGREEMENTyes:NUMBERGRAMsingular  0.47677    0.11012   4.330
## 
## Intercepts:
##       Value    Std. Error t value 
## -3|-2  -1.3264   0.1228   -10.8030
## -2|-1  -0.5646   0.1203    -4.6947
## -1|0   -0.0948   0.1198    -0.7917
## 0|1     0.2619   0.1198     2.1853
## 1|2     0.6293   0.1202     5.2342
## 2|3     1.5247   0.1222    12.4814
## 
## Residual Deviance: 15106.95 
## AIC: 15132.95
anova(m.05, m.06, # compare m.05 to m.06
   test="Chisq")  # using a LRT
## Likelihood ratio tests of ordinal regression models
## 
## Response: JUDGMENT
##                                                                                                     Model
## 1            AGREEMENT + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
## 2 AGREEMENT + EMPHPRON + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
##   Resid. df Resid. Dev   Test    Df LR stat.   Pr(Chi)
## 1      4287   15106.95                                
## 2      4286   15105.83 1 vs 2     1 1.126422 0.2885398
drop1(m.06,      # drop each droppable predictor at a time from m.06 &
   test="Chisq") # test its significance w/ a LRT
## Single term deletions
## 
## Model:
## JUDGMENT ~ AGREEMENT + SUBJORDER + NUMBERGRAM + PERSONGRAM + 
##     AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
##                      Df   AIC     LRT  Pr(>Chi)    
## <none>                  15133                      
## PERSONGRAM            2 15133  3.6433 0.1617624    
## AGREEMENT:SUBJORDER   1 15144 12.5430 0.0003977 ***
## AGREEMENT:NUMBERGRAM  1 15150 18.7609 1.482e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s fit the 7th model with the predictor PERSONGRAM deleted:

summary(m.07 <- update( # make m.07 an update of
   m.06, .~.            # m.06, namely all of it (.~.), but then
   - PERSONGRAM))       # remove this predictor
## Call:
## polr(formula = JUDGMENT ~ AGREEMENT + SUBJORDER + NUMBERGRAM + 
##     AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM, data = d, na.action = na.exclude, 
##     Hess = TRUE)
## 
## Coefficients:
##                                   Value Std. Error t value
## AGREEMENTyes                     0.7064    0.09623   7.341
## SUBJORDERsemgram                 0.2666    0.07795   3.421
## NUMBERGRAMsingular              -0.5460    0.07867  -6.941
## AGREEMENTyes:SUBJORDERsemgram    0.3958    0.10971   3.608
## AGREEMENTyes:NUMBERGRAMsingular  0.4667    0.11000   4.243
## 
## Intercepts:
##       Value    Std. Error t value 
## -3|-2  -1.4980   0.0786   -19.0583
## -2|-1  -0.7364   0.0743    -9.9046
## -1|0   -0.2670   0.0732    -3.6463
## 0|1     0.0893   0.0730     1.2232
## 1|2     0.4562   0.0733     6.2265
## 2|3     1.3510   0.0760    17.7800
## 
## Residual Deviance: 15110.60 
## AIC: 15132.60
anova(m.06, m.07, # compare m.06 to m.07
   test="Chisq")  # using a LRT
## Likelihood ratio tests of ordinal regression models
## 
## Response: JUDGMENT
##                                                                                          Model
## 1              AGREEMENT + SUBJORDER + NUMBERGRAM + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
## 2 AGREEMENT + SUBJORDER + NUMBERGRAM + PERSONGRAM + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
##   Resid. df Resid. Dev   Test    Df LR stat.   Pr(Chi)
## 1      4289   15110.60                                
## 2      4287   15106.95 1 vs 2     2 3.643254 0.1617624
drop1(m.07,      # drop each droppable predictor at a time from m.07 &
   test="Chisq") # test its significance w/ a LRT
## Single term deletions
## 
## Model:
## JUDGMENT ~ AGREEMENT + SUBJORDER + NUMBERGRAM + AGREEMENT:SUBJORDER + 
##     AGREEMENT:NUMBERGRAM
##                      Df   AIC    LRT  Pr(>Chi)    
## <none>                  15133                     
## AGREEMENT:SUBJORDER   1 15144 13.034 0.0003059 ***
## AGREEMENT:NUMBERGRAM  1 15149 18.017  2.19e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

No, we can’t delete any more predictors, we’re done with model selection:

summary(m.final <- m.07); rm(m.07); invisible(gc())
## Call:
## polr(formula = JUDGMENT ~ AGREEMENT + SUBJORDER + NUMBERGRAM + 
##     AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM, data = d, na.action = na.exclude, 
##     Hess = TRUE)
## 
## Coefficients:
##                                   Value Std. Error t value
## AGREEMENTyes                     0.7064    0.09623   7.341
## SUBJORDERsemgram                 0.2666    0.07795   3.421
## NUMBERGRAMsingular              -0.5460    0.07867  -6.941
## AGREEMENTyes:SUBJORDERsemgram    0.3958    0.10971   3.608
## AGREEMENTyes:NUMBERGRAMsingular  0.4667    0.11000   4.243
## 
## Intercepts:
##       Value    Std. Error t value 
## -3|-2  -1.4980   0.0786   -19.0583
## -2|-1  -0.7364   0.0743    -9.9046
## -1|0   -0.2670   0.0732    -3.6463
## 0|1     0.0893   0.0730     1.2232
## 1|2     0.4562   0.0733     6.2265
## 2|3     1.3510   0.0760    17.7800
## 
## Residual Deviance: 15110.60 
## AIC: 15132.60
(m.final.tscores <- summary(m.final)$coefficients[,"t value"])
##                    AGREEMENTyes                SUBJORDERsemgram 
##                        7.341353                        3.420638 
##              NUMBERGRAMsingular   AGREEMENTyes:SUBJORDERsemgram 
##                       -6.941128                        3.607670 
## AGREEMENTyes:NUMBERGRAMsingular                           -3|-2 
##                        4.242926                      -19.058294 
##                           -2|-1                            -1|0 
##                       -9.904564                       -3.646271 
##                             0|1                             1|2 
##                        1.223185                        6.226538 
##                             2|3 
##                       17.779967
(m.final.pvalues <- 2*pnorm(abs(m.final.tscores), lower.tail=FALSE))
##                    AGREEMENTyes                SUBJORDERsemgram 
##                    2.114448e-13                    6.247452e-04 
##              NUMBERGRAMsingular   AGREEMENTyes:SUBJORDERsemgram 
##                    3.889817e-12                    3.089594e-04 
## AGREEMENTyes:NUMBERGRAMsingular                           -3|-2 
##                    2.206240e-05                    5.607383e-81 
##                           -2|-1                            -1|0 
##                    3.977021e-23                    2.660737e-04 
##                             0|1                             1|2 
##                    2.212599e-01                    4.768550e-10 
##                             2|3 
##                    1.010396e-70

Let’s also compute the overall p-value for this model:

anova(m.00, m.final, # compare the null model to m.final
   test="Chisq")     # using a LRT
## Likelihood ratio tests of ordinal regression models
## 
## Response: JUDGMENT
##                                                                             Model
## 1                                                                               1
## 2 AGREEMENT + SUBJORDER + NUMBERGRAM + AGREEMENT:SUBJORDER + AGREEMENT:NUMBERGRAM
##   Resid. df Resid. Dev   Test    Df LR stat. Pr(Chi)
## 1      4294   15669.11                              
## 2      4289   15110.60 1 vs 2     5 558.5191       0
pchisq(              # compute the area under the chi-squared curve
   q=558.5191,       # for this chi-squared value
   df=5,             # at 5 df
   lower.tail=FALSE) # only using the right/upper tail/side
## [1] 1.848523e-118

On to model quality: We make ‘predictions’ …

d <- cbind(d,       # add to d the columns resulting from
   predict(         # predicting
      m.final,      # from m.final
      type="prob")) # predicted probabilities
# give those columns better names
names(d)[9:15] <- paste0( # rename those columns to the result of pasting
   "PREDS.",              # "PREDS."
   names(d)[9:15])        # in front of the existing names
d <- cbind(d,         # add to d a column
   PREDS.CAT=predict( # called PREDS.CAT, the predictions
      m.final,        # from m.final, namely
      type="class"))  # categorical predictions
head(d) # check the result
##   CASE JUDGMENT AGREEMENT    NEGATION EMPHPRON SUBJORDER NUMBERGRAM PERSONGRAM
## 1    1        3        no affirmative      yes   gramsem     plural      third
## 2    2       -3        no affirmative      yes   gramsem     plural      third
## 3    3        3        no affirmative      yes   gramsem     plural      third
## 4    4        3        no affirmative      yes   gramsem     plural      third
## 5    5       -3        no affirmative      yes   gramsem     plural      third
## 6    6        2        no affirmative      yes   gramsem     plural      third
##    PREDS.-3  PREDS.-2 PREDS.-1    PREDS.0    PREDS.1   PREDS.2   PREDS.3
## 1 0.1827313 0.1410666 0.109845 0.08866082 0.08981659 0.1821742 0.2057054
## 2 0.1827313 0.1410666 0.109845 0.08866082 0.08981659 0.1821742 0.2057054
## 3 0.1827313 0.1410666 0.109845 0.08866082 0.08981659 0.1821742 0.2057054
## 4 0.1827313 0.1410666 0.109845 0.08866082 0.08981659 0.1821742 0.2057054
## 5 0.1827313 0.1410666 0.109845 0.08866082 0.08981659 0.1821742 0.2057054
## 6 0.1827313 0.1410666 0.109845 0.08866082 0.08981659 0.1821742 0.2057054
##   PREDS.CAT
## 1         3
## 2         3
## 3         3
## 4         3
## 5         3
## 6         3

… and then we evaluate them (again, confusion matrices here are not as good as for the previous kinds of models):

(c.m <- table(           # confusion matrix: cross-tabulate
   "OBS"  =d$JUDGMENT,   # observed judgments in the rows
   "PREDS"=d$PREDS.CAT)) # predicted judgments in the columns
##     PREDS
## OBS    -3   -2   -1    0    1    2    3
##   -3  301    0    0    0    0    0  319
##   -2  191    0    0    0    0    0  289
##   -1  136    0    0    0    0    0  251
##   0    88    0    0    0    0    0  241
##   1    93    0    0    0    0    0  262
##   2   160    0    0    0    0    0  666
##   3   182    0    0    0    0    0 1121
c(tapply(as.character(d$JUDGMENT)==d$PREDS.CAT, d$JUDGMENT, mean), # accuracies for ...
  "overall"=mean(as.character(d$JUDGMENT)==d$PREDS.CAT)) # ... each level & overall
##        -3        -2        -1         0         1         2         3   overall 
## 0.4854839 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.8603223 0.3306977

How does the model compare to our simulated baseline?

(sum.of.preddiffs.from.obs <- sum(abs("-"(
   as.numeric(d$PREDS.CAT),
   as.numeric(d$JUDGMENT)))))
## [1] 9267
hist(collector, xlim=c(9250, 11750), main="")
   abline(v=quantile(collector, probs=c(0.025, 0.975)), lty=2, col="red")
   abline(v=sum(abs(as.numeric(d$JUDGMENT)-7)), lty=2, col="blue")
   abline(v=sum.of.preddiffs.from.obs, lty=2, col="green")

In terms of significance, it does really well: the observed sum of absolute differences is so small it’s never observed in the random reshuffles:

sum(collector <= sum.of.preddiffs.from.obs)
## [1] 0

At the same, the effect size is again not great. Our simulated baseline meant that, on average, we’d misestimate the right judgment by 2.49 points on our 7-point scale and now, with all our modeling efforts, we’re still off by an average of 2.16 steps, not exactly great:

median(collector) / 4300
## [1] 2.49
sum.of.preddiffs.from.obs / 4300
## [1] 2.155116

Finally, we need the model’s R2-values. We begin with the widely-used version of Nagelkerke’s R2, but then also add McFadden’s R2:

c("Nagelkerke's R2"=R2(m.final),
  "McFadden's R2"  =(m.00$deviance-m.final$deviance) / m.00$deviance)
## Nagelkerke's R2   McFadden's R2 
##      0.12507688      0.03564459

Of course, if you paid attention to the book and worked through everything there, much of this you could have done with this:

summarize.polr(m.final)
## $Coefficients
##                                    Estimate  Std.Error        2.5%      97.5%
## -3|-2                           -1.49795116 0.07859839 -1.65200117 -1.3439012
## -2|-1                           -0.73637259 0.07434680 -0.88208964 -0.5906555
## -1|0                            -0.26700342 0.07322644 -0.41052461 -0.1234822
## 0|1                              0.08927411 0.07298498 -0.05377382  0.2323221
## 1|2                              0.45623362 0.07327244  0.31262227  0.5998450
## 2|3                              1.35100907 0.07598490  1.20208140  1.4999367
## AGREEMENTyes                     0.70643359 0.09622662  0.51783289  0.8950343
## SUBJORDERsemgram                 0.26664660 0.07795231  0.11386288  0.4194303
## NUMBERGRAMsingular              -0.54604077 0.07866744 -0.70022612 -0.3918554
## AGREEMENTyes:SUBJORDERsemgram    0.39580665 0.10971255  0.18077401  0.6108393
## AGREEMENTyes:NUMBERGRAMsingular  0.46670731 0.10999657  0.25111801  0.6822966
##                                          z     p2tailed
## -3|-2                           -19.058294 5.607383e-81
## -2|-1                            -9.904564 3.977021e-23
## -1|0                             -3.646271 2.660737e-04
## 0|1                               1.223185 2.212599e-01
## 1|2                               6.226538 4.768550e-10
## 2|3                              17.779967 1.010396e-70
## AGREEMENTyes                      7.341353 2.114448e-13
## SUBJORDERsemgram                  3.420638 6.247452e-04
## NUMBERGRAMsingular               -6.941128 3.889817e-12
## AGREEMENTyes:SUBJORDERsemgram     3.607670 3.089594e-04
## AGREEMENTyes:NUMBERGRAMsingular   4.242926 2.206240e-05
## 
## $`Confusion matrix`
##     PRED
## OBS    -3   -2   -1    0    1    2    3
##   -3  301    0    0    0    0    0  319
##   -2  191    0    0    0    0    0  289
##   -1  136    0    0    0    0    0  251
##   0    88    0    0    0    0    0  241
##   1    93    0    0    0    0    0  262
##   2   160    0    0    0    0    0  666
##   3   182    0    0    0    0    0 1121
## 
## $`Classification metrics`
##         Baseline      Acc. for -3      Acc. for -2      Acc. for -1 
##     3.030233e-01     4.854839e-01     0.000000e+00     0.000000e+00 
##       Acc. for 0       Acc. for 1       Acc. for 2       Acc. for 3 
##     0.000000e+00     0.000000e+00     0.000000e+00     8.603223e-01 
## Acc. for overall           pbinom          logloss 
##     3.306977e-01     4.761551e-05     1.757046e+00 
## 
## $R2s
## Deviance of null model Deviance of this model            McFadden R2 
##           1.566911e+04           1.511060e+04           3.564459e-02 
##          Nagelkerke R2 
##           1.250769e-01 
## 
## $`Model significance test`
##          LR statistic                    Df               P-value 
##          5.585191e+02          5.000000e+00         1.848534e-118 
## Number of data points 
##          4.300000e+03

5 Visual interpretation

Now let’s visualize the results of the two interactions based on the predictions; we begin with AGREEMENT:SUBJORDER:

agsu.d <- data.frame(        # make agsu.d a data frame of
   agsu <- effect(           # the effect
      "AGREEMENT:SUBJORDER", # of AGREEMENT:SUBJORDER
      m.final))              # in m.final
agsu.d[,1:9] # show just the predicted probabilities
##   AGREEMENT SUBJORDER   prob.X.3   prob.X.2   prob.X.1    prob.X0    prob.X1
## 1        no   gramsem 0.23067814 0.16037026 0.11556664 0.08791856 0.08457945
## 2       yes   gramsem 0.10323854 0.09455074 0.08497318 0.07743151 0.08810081
## 3        no   semgram 0.18677053 0.14292687 0.11054340 0.08874982 0.08947773
## 4       yes   semgram 0.05603033 0.05675284 0.05614353 0.05603617 0.07028809
##     prob.X2   prob.X3
## 1 0.1590298 0.1618572
## 2 0.2170518 0.3346534
## 3 0.1801735 0.2013581
## 4 0.2109386 0.4938105
plot(agsu,                     # plot the effect of AGREEMENT:SUBJORDER
   type="probability",         # but plot predicted probabilities
   ylim=c(0, 1),               # w/ these y-axis limits
   grid=TRUE,                  # & w/ a grid
   multiline=TRUE,             # put all lines in one panel
   confint=list(style="auto"), # keep confidence intervals
   lattice=list(key.args=list( # make the
      columns=5,               # legend have 5 columns &
      cex.title=0)))           # no title