Let’s load a data set for an example of a multinomial model; the data are in _input/dataltvoice.csv and you can find information about the variables/columns in _input/dataltvoice.r.
rm(list=ls(all.names=TRUE))library(car); library(effects); library(nnet) # & now load small helper functions:source("_helpers/R2s.r"); source("_helpers/summarize.multinom.r"); source("_helpers/cohens.kappa.r")summary(d <-read.delim( # summarize d, the result of loadingfile="_input/dataltvoice.csv", # this filestringsAsFactors=TRUE)) # change categorical variables into factors
CASE TRANSITIVITY REC_LEN PAT_LEN
Min. : 1.0 d_a:810 Min. : 1.000 Min. : 1.000
1st Qu.: 474.8 d_p:349 1st Qu.: 1.000 1st Qu.: 2.000
Median : 935.5 p_a:476 Median : 2.000 Median : 3.000
Mean : 939.4 p_p:141 Mean : 3.128 Mean : 4.443
3rd Qu.:1412.2 3rd Qu.: 3.000 3rd Qu.: 6.000
Max. :1871.0 Max. :44.000 Max. :33.000
REC_ANIM PAT_SEM REC_ACCESSIB PAT_ACCESSIB
animate :1106 abstract :1265 given:993 given: 312
inanimate: 670 concrete : 335 new :783 new :1464
information: 176
We want to know whether TRANSITIVITY is predictable from
REC_ANIM, REC_ACCESSIB, PAT_ACCESSIB;
the length difference of the recipient and the patient;
the interaction of the accessibility predictors;
the interaction of the REC_ANIM and the length difference.
2 Deviance & baseline(s)
Let’s already compute the baselines for what will be the response variable, TRANSITIVITY:
(baselines <-c("baseline 1"=max( # make baselines[1] the highestprop.table( # proportion in thetable(d$TRANSITIVITY))), # frequency table of the response"baseline 2"=sum( # make baselines[2] the sum of theprop.table( # proportions in thetable(d$TRANSITIVITY))^2))) # frequency table of the response squared
baseline 1 baseline 2
0.4560811 0.3247625
Let’s also compute the deviance of the null model:
d_a d_p p_a p_p
given given 98 48 39 20
new 469 165 117 37
new given 21 21 36 29
new 222 115 284 55
4 Modeling & numerical interpretation
Let’s fit our initial regression model:
summary(m.01<-multinom( # make/summarize the multinomial model m.01: TRANSITIVITY ~1+# TRANSITIVITY ~ an overall intercept (1) REC_ACCESSIB + PAT_ACCESSIB +# the accessibility predictors & ... REC_ACCESSIB:PAT_ACCESSIB +# their interaction REC_ANIM + LEN_RECmPAT_LOG +# the rec animacy & length diff preds & ... REC_ANIM:LEN_RECmPAT_LOG, # their interactiondata=d, # those vars are in dmodel=TRUE, # & save the model frame (optional)na.action=na.exclude)) # skip cases with NA/missing data
# weights: 32 (21 variable)
initial value 2462.058785
iter 10 value 1814.322650
iter 20 value 1767.454974
final value 1762.521921
converged
So there are definitely significant results. Since we can’t use drop1, let’s do anova two times, once for each interaction:
summary(m.02a <-update( # make m.02a an update of m.01, .~. # m.01, namely all of it (.~.), but then- REC_ACCESSIB:PAT_ACCESSIB)) # remove this interaction
# weights: 28 (18 variable)
initial value 2462.058785
iter 10 value 1801.290105
iter 20 value 1767.457695
final value 1765.590290
converged
anova(m.01, m.02a, # compare m.01 to m.02atest="Chisq") # using a LRT
Likelihood ratio tests of Multinomial Models
Response: TRANSITIVITY
Model
1 REC_ACCESSIB + PAT_ACCESSIB + REC_ANIM + LEN_RECmPAT_LOG + REC_ANIM:LEN_RECmPAT_LOG
2 1 + REC_ACCESSIB + PAT_ACCESSIB + REC_ACCESSIB:PAT_ACCESSIB + REC_ANIM + LEN_RECmPAT_LOG + REC_ANIM:LEN_RECmPAT_LOG
Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
1 5310 3531.181
2 5307 3525.044 1 vs 2 3 6.136739 0.1051438
summary(m.02b <-update( # make m.02b an update of m.01, .~. # m.01, namely all of it (.~.), but then- REC_ANIM:LEN_RECmPAT_LOG)) # remove this interaction
# weights: 28 (18 variable)
initial value 2462.058785
iter 10 value 1790.360208
iter 20 value 1764.915297
final value 1764.868000
converged
anova(m.01, m.02b, # compare m.01 to m.02btest="Chisq") # using a LRT
Likelihood ratio tests of Multinomial Models
Response: TRANSITIVITY
Model
1 REC_ACCESSIB + PAT_ACCESSIB + REC_ANIM + LEN_RECmPAT_LOG + REC_ACCESSIB:PAT_ACCESSIB
2 1 + REC_ACCESSIB + PAT_ACCESSIB + REC_ACCESSIB:PAT_ACCESSIB + REC_ANIM + LEN_RECmPAT_LOG + REC_ANIM:LEN_RECmPAT_LOG
Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
1 5310 3529.736
2 5307 3525.044 1 vs 2 3 4.692158 0.1957774
Both interactions are ns so we first delete the one with the higher p-value, i.e. we delete REC_ANIM:LEN_RECmPAT_LOG and go with m.02b. But can we also delete the other interaction?
summary(m.03<-update( # make m.03 an update of m.02b, .~. # m.02b, namely all of it (.~.), but then- REC_ACCESSIB:PAT_ACCESSIB)) # remove this interaction
# weights: 24 (15 variable)
initial value 2462.058785
iter 10 value 1794.993382
iter 20 value 1767.868111
final value 1767.866967
converged
Let’s also compute the overall p-value for this model:
anova(m.00, m.final, # compare the null model to m.finaltest="Chisq") # using a LRT
Likelihood ratio tests of Multinomial Models
Response: TRANSITIVITY
Model Resid. df Resid. Dev
1 1 5325 4375.423
2 REC_ACCESSIB + PAT_ACCESSIB + REC_ANIM + LEN_RECmPAT_LOG 5313 3535.734
Test Df LR stat. Pr(Chi)
1
2 1 vs 2 12 839.6888 0
pchisq( # compute the area under the chi-squared curveq=839.6888, # for this chi-squared valuedf=12, # at 12 dflower.tail=FALSE) # only using the right/upper tail/side
[1] 5.073936e-172
On to model quality: We make ‘predictions’ …
d <-cbind(d, # add to d the columns resulting frompredict( # predicting m.final, # from m.finaltype="prob")) # predicted probabilities# give those columns better namesnames(d)[10:13] <-paste0( # rename those columns to the result of pasting"PREDS.PP.", # "PREDS.PP."names(d)[10:13]) # in front of the existing namesd <-cbind(d, # add to d a column(dPREDS.CAT=predict( # called PREDS.CAT, the predictions m.final, # from m.final, namelytype="class")) # categorical predictionshead(d) # check the result
CASE TRANSITIVITY REC_LEN PAT_LEN REC_ANIM PAT_SEM REC_ACCESSIB
1 1 d_a 5 8 animate abstract new
2 2 d_a 2 3 animate abstract new
3 3 d_a 2 2 inanimate information given
4 4 p_a 5 3 inanimate abstract new
5 5 d_a 2 3 animate abstract new
6 6 d_a 1 4 animate concrete given
PAT_ACCESSIB LEN_RECmPAT_LOG PREDS.PP.d_a PREDS.PP.d_p PREDS.PP.p_a
1 new -0.6780719 0.4729958 0.2570037 0.2051686
2 new -0.5849625 0.4558345 0.2579896 0.2178814
3 new 0.0000000 0.3272929 0.2219580 0.3700426
4 new 0.7369656 0.1261111 0.1330745 0.6243988
5 new -0.5849625 0.4558345 0.2579896 0.2178814
6 given -2.0000000 0.6460504 0.2493809 0.0530535
PREDS.PP.p_p PREDS.CAT
1 0.06483201 d_a
2 0.06829445 d_a
3 0.08070648 p_a
4 0.11641564 p_a
5 0.06829445 d_a
6 0.05151519 d_a
… and then we evaluate them:
(c.m <-table( # confusion matrix: cross-tabulate"OBS"=d$TRANSITIVITY, # observed constructions in the rows"PREDS"=d$PREDS.CAT)) # predicted constructions in the columns
We can also add PREDS.PP.obs, like we did for binary logistic regression, but it’s a bit more complex here so I’ll just show how it’s done without more details:
d$PREDS.PP.obs <-predict(m.final, type="prob")[ # make this column the predictionsmatrix(c( # that are in the positions defined hereseq(nrow(d)), # one prediction for each rowas.numeric(d$TRANSITIVITY)), ncol=2)] # namely the column of the observed result
Why are we doing this? To see whether we can compute the deviance of the final model from its contributions to logloss – i.e., -log(d$PREDS.PP.obs)) – again. And we can, the logic is the same as before:
sum(-log(d$PREDS.PP.obs)) *2
[1] 3535.734
deviance(m.final)
[1] 3535.734
Of course, if you paid attention to the book and worked through everything there, much of this you could have done with this:
summarize.multinom(m.final)
# weights: 8 (3 variable)
initial value 2462.058785
iter 10 value 2187.711475
iter 10 value 2187.711469
final value 2187.711344
converged
Now let’s visualize the results of the predictors based on the predictions, one at a time. We begin with REC_ACCESSIB:
ra.d <-data.frame( # make ra.d a data frame of ra <-effect( # the effect"REC_ACCESSIB", # of REC_ACCESSIB m.final)) # in m.finalra.d[,1:5] # show just the predicted probabilities
REC_ACCESSIB prob.d_a prob.d_p prob.p_a prob.p_p
1 given 0.4995828 0.2663792 0.1736107 0.06042732
2 new 0.3951903 0.2374149 0.2789172 0.08847765
plot(ra, # plot the effect of REC_ACCESSIBtype="probability", # but plot predicted probabilitiesylim=c(0, 1), # w/ these y-axis limitsgrid=TRUE, # & w/ a gridlines=list(multiline=TRUE), # make all lines be in one panelconfint=list(style="auto"), # make the lines come with confidence bandslattice=list(key.args=list( # make the legend ...columns=4, # ... have 4 columns &cex.title=0))) # no title
Next, PAT_ACCESSIB:
pa.d <-data.frame( # make pa.d a data frame of pa <-effect( # the effect"PAT_ACCESSIB", # of PAT_ACCESSIB m.final)) # in m.finalpa.d[,1:5] # show just the predicted probabilities
PAT_ACCESSIB prob.d_a prob.d_p prob.p_a prob.p_p
1 given 0.3442863 0.2595748 0.2316187 0.16452012
2 new 0.4774871 0.2520581 0.2105525 0.05990231
plot(pa, # plot the effect of PAT_ACCESSIBtype="probability", # but plot predicted probabilitiesylim=c(0, 1), # w/ these y-axis limitsgrid=TRUE, # & w/ a gridlines=list(multiline=TRUE), # make all lines be in one panelconfint=list(style="auto"), # make the lines come with confidence bandslattice=list(key.args=list( # make the legend ...columns=4, # ... have 4 columns &cex.title=0))) # no title
REC_ANIM is next:
rca.d <-data.frame( # make rca.d a data frame of rca <-effect( # the effect"REC_ANIM", # of REC_ANIM m.final)) # in m.finalrca.d[,1:5] # show just the predicted probabilities
plot(rca, # plot the effect of REC_ANIMtype="probability", # but plot predicted probabilitiesylim=c(0, 1), # w/ these y-axis limitsgrid=TRUE, # & w/ a gridlines=list(multiline=TRUE), # make all lines be in one panelconfint=list(style="auto"), # make the lines come with confidence bandslattice=list(key.args=list( # make the legend ...columns=4, # ... have 4 columns &cex.title=0))) # no title
Finally, the numeric predictor LEN_RECmPAT_LOG:
ld.d <-data.frame( # make ld.d a data frame of ld <-effect( # the effect"LEN_RECmPAT_LOG", # of LEN_RECmPAT_LOG m.final)) # in m.finalld.d[,1:5] # show just the predicted probabilities
plot(ld, # plot the effect of LEN_RECmPAT_LOGtype="probability", # but plot predicted probabilitiesylim=c(0, 1), # w/ these y-axis limitsgrid=TRUE, # & w/ a gridlines=list(multiline=TRUE), # make all lines be in one panelconfint=list(style="auto"), # make the lines come with confidence bandslattice=list(key.args=list( # make the legend ...columns=4, # ... have 4 columns &cex.title=0))) # no title
6 Write-up
To determine whether TRANSITIVITY varies as a function of
REC_ANIM, REC_ACCESSIB, PAT_ACCESSIB;
the length difference of the recipient and the patient;
the interaction of the accessibility predictors;
the interaction of the REC_ANIM and the length difference,
a multinomial model selection process was undertaken (using backwards stepwise model selection of all predictors and controls based on significance testing). The final model resulting from the elimination of ns predictors contained all four main effects and was highly significant (LR=839.6888, df=12, p<0.0001) with a decent amount of explanatory/predictive power (McFadden’s R2=0.192, Nagelkerke’s R2=0.412); the overall accuracy of the model is 60.8%, but the accuracies/recalls for each construction are very varied [insert those values]. The summary table for the model is shown below:
Estimate
95%-CI
se
z
p2-tailed
Interceptd_a→d_p
-0.08
[-0.42, 0.27]
0.18
-0.43
0.6666
Interceptd_a→p_a
-0.41
[-0.80, -0.03]
0.20
-2.12
0.0336
Interceptd_a→p_p
-0.62
[-1.05, -0.19]
0.22
-2.81
0.005
REC_ACCESSIBgiven→new+d_a→d_p
0.12
[-0.16, 0.40]
0.14
0.83
0.4043
REC_ACCESSIBgiven→new+d_a→p_a
0.71
[ 0.41, 1.01]
0.15
4.66
<0.0001
REC_ACCESSIBgiven→new+d_a→p_p
0.62
[ 0.20, 1.03]
0.21
2.90
0.0037
PAT_ACCESSIBgiven→new+d_a→d_p
-0.36
[-0.70, -0.02]
0.17
-2.06
0.0397
PAT_ACCESSIBgiven→new+d_a→p_a
-0.42
[-0.80, -0.04]
0.19
-2.17
0.0296
PAT_ACCESSIBgiven→new+d_a→p_p
-1.34
[-1.79, -0.89]
0.23
-5.83
<0.0001
REC_ANIManim→inanim+d_a→d_p
0.04
[-0.24, 0.33]
0.15
0.30
0.7627
REC_ANIManim→inanim+d_a→p_a
0.96
[ 0.67, 1.25]
0.15
6.43
<0.0001
REC_ANIManim→inanim+d_a→p_p
0.55
[ 0.15, 0.96]
0.21
2.68
0.0074
LEN_RECmPAT_LOG0→1+d_a→d_p
0.44
[ 0.34, 0.54]
0.05
8.48
<0.0001
LEN_RECmPAT_LOG0→1+d_a→p_a
1.04
[ 0.93, 1.16]
0.06
17.74
<0.0001
LEN_RECmPAT_LOG0→1+d_a→p_p
0.96
[ 0.81, 1.10]
0.07
12.85
<0.0001
The way the main effects in the model are correlated with TRANSITIVITY are as follows [show plots]:
with regard to REC_ACCESSIB,
ditransitive actives become much more likely with given recipients;
prepositional dative actives become much more likely with new recipients;
the predicted probabilities of the passives do not differ much in response to REC_ACCESSIB;
with regard to PAT_ACCESSIB,
ditransitive actives become much more likely with new patients;
prepositional dative passives become much more likely with given patients;
the predicted probabilities of the other two constructions do not differ much in response to PAT_ACCESSIB;
with regard to REC_ANIM,
ditransitive actives become much more likely with animate recipients;
prepositional dative actives become much more likely with inanimate recipients;
ditransitive passives become somewhat more likely with animate recipients;
the predicted probabilities of the prepositional dative passives do not differ much in response to REC_ANIM;
with regard to the length difference predictor,
ditransitive actives become much more likely when the recipient is shorter then the patient;
prepositional dative actives become much more likely when the recipient is longer then the patient;
the predicted probabilities of the passives do not differ much in response to the length difference predictor.
7 Excursus 1: prototypes
Note that, just like last session, we can use the predicted probabilities to explore the prototypical uses, the best examples, of these constructions:
d.split <-split(d, d$TRANSITIVITY) # split up data by construction# the prototype of the ditransitive activehead(unique(d.split$d_a[order(d.split$d_a$PREDS.PP.d_a, decreasing=TRUE),c(2, 5, 7:9, 10)]), 20)
TRANSITIVITY REC_ANIM REC_ACCESSIB PAT_ACCESSIB LEN_RECmPAT_LOG
963 d_a animate given new -5.044394
1665 d_a animate given new -4.857981
728 d_a animate given new -4.392317
217 d_a animate given new -4.321928
784 d_a animate given new -4.247928
54 d_a animate given new -4.169925
247 d_a animate new new -4.459432
623 d_a animate given new -4.087463
529 d_a animate given new -4.000000
663 d_a animate given new -3.906891
724 d_a animate given new -3.807355
1164 d_a inanimate given new -4.000000
1278 d_a animate given new -3.700440
353 d_a animate new new -4.000000
399 d_a inanimate given new -3.906891
231 d_a animate given new -3.584963
142 d_a animate given new -3.459432
899 d_a inanimate given new -3.700440
199 d_a animate given new -3.321928
801 d_a animate given given -4.087463
PREDS.PP.d_a
963 0.9305632
1665 0.9247412
728 0.9079811
217 0.9051423
784 0.9020637
54 0.8987107
247 0.8962209
623 0.8950415
529 0.8910051
663 0.8865381
724 0.8815612
1164 0.8768473
1278 0.8759735
353 0.8725102
399 0.8713634
231 0.8696441
142 0.8624003
899 0.8582752
199 0.8540085
801 0.8511656
# the prototype of the prepositional dative activehead(unique(d.split$p_a[order(d.split$p_a$PREDS.PP.p_a, decreasing=TRUE),c(2, 5, 7:9, 12)]), 20)
TRANSITIVITY REC_ANIM REC_ACCESSIB PAT_ACCESSIB LEN_RECmPAT_LOG
1627 p_a inanimate new new 4.247928
999 p_a inanimate new new 4.087463
1447 p_a inanimate new new 3.954196
239 p_a inanimate new new 3.906891
188 p_a inanimate new new 3.807355
1538 p_a inanimate new new 3.584963
380 p_a inanimate new new 3.459432
1437 p_a inanimate new new 3.321928
163 p_a inanimate new new 3.169925
205 p_a inanimate given new 4.087463
170 p_a inanimate new new 3.115477
727 p_a inanimate new new 3.087463
150 p_a inanimate new new 3.000000
121 p_a inanimate given new 3.857981
109 p_a inanimate new new 2.807355
1646 p_a inanimate given new 3.700440
143 p_a inanimate new new 2.700440
47 p_a inanimate new new 2.584963
186 p_a inanimate new new 2.459432
951 p_a inanimate given new 3.321928
PREDS.PP.p_a
1627 0.8560652
999 0.8520718
1447 0.8485311
239 0.8472217
188 0.8443709
1538 0.8374944
380 0.8332723
1437 0.8283350
163 0.8224616
205 0.8223316
170 0.8202437
727 0.8190779
150 0.8153269
121 0.8129771
109 0.8064299
1646 0.8059040
143 0.8010844
47 0.7949559
186 0.7878461
951 0.7863926
Nice:
the prototypical ditransitive active involves
a short, given, and animate recipient;
a long new patient;
examples such as He told him a very long and convoluted story;
the prototypical prepositional dative active involves
a long, new, and inanimate recipient;
a short new patient;
examples such as She gave some serious thought to the situation that had presented itself to her.
The above solution used the predicted probabilities for the attested cases, the data in d. And in this particular case, this approach is probably sufficient because we don’t have that many predictors and most have them have only few levels so our data cover what’s possible relatively well. But remember from last session that we can also use all possible combinations of predictors’ levels, values, or value ranges. For that, but also for other reasons, we should really always create a data frame like nd here, which contains all possible combinations of things:
That data frame can then be sorted by these 4 new columns of predicted probabilities to see what combinations come out on top for each construction. Here’s the result of this for the same two constructions as above:
REC_ACCESSIB PAT_ACCESSIB REC_ANIM LEN_RECmPAT_LOG d_a
3 given new animate -5.044394 0.9305632
7 given new inanimate -5.044394 0.9239690
4 new new animate -5.044394 0.9200185
8 new new inanimate -5.044394 0.9095308
1 given given animate -5.044394 0.9012998
5 given given inanimate -5.044394 0.8906063
11 given new animate -3.918632 0.8871113
2 new given animate -5.044394 0.8852398
15 given new inanimate -3.918632 0.8720688
6 new given inanimate -5.044394 0.8680667
REC_ACCESSIB PAT_ACCESSIB REC_ANIM LEN_RECmPAT_LOG p_a
80 new new inanimate 5.087463 0.8731474
79 given new inanimate 5.087463 0.8530508
72 new new inanimate 3.961701 0.8487362
71 given new inanimate 3.961701 0.8173373
76 new new animate 5.087463 0.8093854
64 new new inanimate 2.835939 0.8078080
75 given new animate 5.087463 0.7747445
68 new new animate 3.961701 0.7661525
63 given new inanimate 2.835939 0.7550298
78 new given inanimate 5.087463 0.7491943
Here, the results don’t change all that much, but there are scenarios where this can make a bigger difference.
8 Excursus 2: predictions that are most wrong
It can again be interesting to explore the case where the model does worst, i.e. where the contributions to logloss are highest:
lapply( # apply to d.split, # the list w/ the 4 data frames \(af) # an anonymous function (SFLWR 3.5.4)head(af[ # that takes the head of each data frameorder( # when the data frame is sorted-log(af$PREDS.PP.obs), # by the contributions to loglossdecreasing=TRUE),])) # in decreasing order
$d_a
CASE TRANSITIVITY REC_LEN PAT_LEN REC_ANIM PAT_SEM REC_ACCESSIB
558 592 d_a 4 1 inanimate abstract new
540 572 d_a 5 2 inanimate abstract new
148 158 d_a 13 2 animate abstract new
288 303 d_a 7 2 inanimate abstract new
780 825 d_a 6 2 inanimate abstract new
1063 1123 d_a 14 5 animate concrete new
PAT_ACCESSIB LEN_RECmPAT_LOG PREDS.PP.d_a PREDS.PP.d_p PREDS.PP.p_a
558 new 2.000000 0.04098797 0.07520977 0.7572852
540 given 1.321928 0.04348256 0.08467242 0.6044306
148 new 2.700440 0.04642982 0.11081498 0.6820561
288 new 1.807355 0.04909686 0.08279865 0.7420416
780 new 1.584963 0.06027015 0.09220711 0.7224019
1063 given 1.485427 0.07438122 0.14891127 0.4696682
PREDS.PP.p_p PREDS.CAT PREDS.PP.obs
558 0.1265171 p_a 0.04098797
540 0.2674144 p_a 0.04348256
148 0.1606991 p_a 0.04642982
288 0.1260629 p_a 0.04909686
780 0.1251208 p_a 0.06027015
1063 0.3070393 p_a 0.07438122
$d_p
CASE TRANSITIVITY REC_LEN PAT_LEN REC_ANIM PAT_SEM REC_ACCESSIB
1105 1167 d_p 17 1 inanimate abstract new
76 82 d_p 16 1 inanimate abstract new
1012 1064 d_p 10 2 inanimate abstract new
1235 1311 d_p 5 1 inanimate abstract new
606 642 d_p 6 1 inanimate abstract new
94 100 d_p 5 1 inanimate abstract new
PAT_ACCESSIB LEN_RECmPAT_LOG PREDS.PP.d_a PREDS.PP.d_p PREDS.PP.p_a
1105 new 4.087463 0.005232179 0.02395599 0.8520718
76 new 4.000000 0.005716255 0.02518862 0.8497722
1012 given 2.321928 0.016823555 0.05076708 0.6633506
1235 given 2.321928 0.016823555 0.05076708 0.6633506
606 new 2.584963 0.023381506 0.05543348 0.7949559
94 new 2.321928 0.030160357 0.06372313 0.7794830
PREDS.PP.p_p PREDS.CAT PREDS.PP.obs
1105 0.1187400 p_a 0.02395599
76 0.1193229 p_a 0.02518862
1012 0.2690588 p_a 0.05076708
1235 0.2690588 p_a 0.05076708
606 0.1262291 p_a 0.05543348
94 0.1266335 p_a 0.06372313
$p_a
CASE TRANSITIVITY REC_LEN PAT_LEN REC_ANIM PAT_SEM REC_ACCESSIB
795 841 p_a 1 18 inanimate information new
730 773 p_a 1 10 inanimate concrete given
1435 1517 p_a 2 17 inanimate abstract given
1413 1494 p_a 1 4 animate concrete given
32 34 p_a 1 10 inanimate concrete new
668 711 p_a 1 3 animate abstract given
PAT_ACCESSIB LEN_RECmPAT_LOG PREDS.PP.d_a PREDS.PP.d_p PREDS.PP.p_a
795 new -4.169925 0.8611907 0.1059172 0.02558573
730 new -3.321928 0.8305757 0.1314493 0.02941342
1435 new -3.087463 0.8107024 0.1421820 0.03665998
1413 new -2.000000 0.7434795 0.2009380 0.04001856
32 new -3.321928 0.7878249 0.1404810 0.05666255
668 new -1.584963 0.6952908 0.2253799 0.05768800
PREDS.PP.p_p PREDS.CAT PREDS.PP.obs
795 0.007306420 d_a 0.02558573
730 0.008561551 d_a 0.02941342
1435 0.010455667 d_a 0.03665998
1413 0.015563978 d_a 0.04001856
32 0.015031572 d_a 0.05666255
668 0.021641305 d_a 0.05768800
$p_p
CASE TRANSITIVITY REC_LEN PAT_LEN REC_ANIM PAT_SEM REC_ACCESSIB
93 99 p_p 1 7 animate concrete given
886 933 p_p 1 6 animate concrete given
1106 1168 p_p 2 12 animate concrete given
990 1042 p_p 2 10 animate abstract given
452 482 p_p 1 4 animate concrete given
889 936 p_p 3 10 animate information given
PAT_ACCESSIB LEN_RECmPAT_LOG PREDS.PP.d_a PREDS.PP.d_p PREDS.PP.p_a
93 new -2.807355 0.8179065 0.1552057 0.01897277
886 new -2.584963 0.7997475 0.1672878 0.02339257
1106 new -2.584963 0.7997475 0.1672878 0.02339257
990 new -2.321928 0.7760426 0.1821528 0.02986146
452 new -2.000000 0.7434795 0.2009380 0.04001856
889 new -1.736966 0.7137766 0.2164683 0.05054233
PREDS.PP.p_p PREDS.CAT PREDS.PP.obs
93 0.007915070 d_a 0.007915070
886 0.009572164 d_a 0.009572164
1106 0.009572164 d_a 0.009572164
990 0.011943120 d_a 0.011943120
452 0.015563978 d_a 0.015563978
889 0.019212724 d_a 0.019212724
In a real study, one could now check the concrete examples and their contexts for what might explain the unexpected speaker choices.
9 Homework
To prepare for next session, read SFLWR3, Section 5.4.2 on ordinal regression.
---title: "Ling 202: session 06: multinomial regression (key)"author: - name: "[Stefan Th. Gries](https://www.stgries.info)" affiliation: - UC Santa Barbara - JLU Giessen orcid: 0000-0002-6497-3958date: "2025-04-01 12:34:56"date-format: "DD MMM YYYY HH-mm-ss"editor: sourceformat: html: page-layout: full code-fold: false code-link: true code-copy: true code-tools: true code-line-numbers: true code-overflow: scroll number-sections: true smooth-scroll: true toc: true toc-depth: 4 number-depth: 4 toc-location: left monofont: lucida console tbl-cap-location: top fig-cap-location: bottom fig-width: 5 fig-height: 5 fig-format: png fig-dpi: 300 fig-align: center embed-resources: trueexecute: cache: false echo: true eval: true warning: false---# IntroLet's load a data set for an example of a multinomial model; the data are in [_input/dataltvoice.csv](_input/dataltvoice.csv) and you can find information about the variables/columns in [_input/dataltvoice.r](_input/dataltvoice.r).```{r}rm(list=ls(all.names=TRUE))library(car); library(effects); library(nnet) # & now load small helper functions:source("_helpers/R2s.r"); source("_helpers/summarize.multinom.r"); source("_helpers/cohens.kappa.r")summary(d <-read.delim( # summarize d, the result of loadingfile="_input/dataltvoice.csv", # this filestringsAsFactors=TRUE)) # change categorical variables into factors```We want to know whether `TRANSITIVITY` is predictable from* `REC_ANIM`, `REC_ACCESSIB`, `PAT_ACCESSIB`;* the length difference of the recipient and the patient;* the interaction of the accessibility predictors;* the interaction of the `REC_ANIM` and the length difference.# Deviance & baseline(s)Let's already compute the baselines for what will be the response variable, `TRANSITIVITY`:```{r}(baselines <-c("baseline 1"=max( # make baselines[1] the highestprop.table( # proportion in thetable(d$TRANSITIVITY))), # frequency table of the response"baseline 2"=sum( # make baselines[2] the sum of theprop.table( # proportions in thetable(d$TRANSITIVITY))^2))) # frequency table of the response squared```Let's also compute the deviance of the null model:```{r session06_nulldev}deviance(m.00 <- multinom(TRANSITIVITY ~ 1, data=d, na.action=na.exclude))```# Exploration & preparationSome exploration of the relevant variables and we prepare our length-difference predictor:```{r}# the predictor(s)/response on its/their ownhist(d$REC_LEN)hist(d$PAT_LEN)hist(d$LEN_RECmPAT_LOG <-log2(d$REC_LEN)-log2(d$PAT_LEN), main="")table(d$REC_ANIM)table(d$REC_ACCESSIB)table(d$PAT_ACCESSIB)table(d$REC_ACCESSIB, d$PAT_ACCESSIB)# the predictor(s) w/ the responsespineplot(d$TRANSITIVITY ~ d$LEN_RECmPAT_LOG)table(d$REC_ANIM, d$TRANSITIVITY)ftable(d$REC_ACCESSIB, d$PAT_ACCESSIB, d$TRANSITIVITY)```# Modeling & numerical interpretationLet's fit our initial regression model:```{r}summary(m.01<-multinom( # make/summarize the multinomial model m.01: TRANSITIVITY ~1+# TRANSITIVITY ~ an overall intercept (1) REC_ACCESSIB + PAT_ACCESSIB +# the accessibility predictors & ... REC_ACCESSIB:PAT_ACCESSIB +# their interaction REC_ANIM + LEN_RECmPAT_LOG +# the rec animacy & length diff preds & ... REC_ANIM:LEN_RECmPAT_LOG, # their interactiondata=d, # those vars are in dmodel=TRUE, # & save the model frame (optional)na.action=na.exclude)) # skip cases with NA/missing data```Let's also compute the *z*-scores for these coefficient values (however uninterpretable those are):```{r}(m.01.zscores <-summary(m.01)$coefficients /summary(m.01)$standard.errors)```And from that we compute the *p*-values:```{r}(m.01.pvalues <-2*pnorm(abs(m.01.zscores), lower.tail=FALSE))```So there are definitely significant results. Since we can't use `drop1`, let's do `anova` two times, once for each interaction:```{r}summary(m.02a <-update( # make m.02a an update of m.01, .~. # m.01, namely all of it (.~.), but then- REC_ACCESSIB:PAT_ACCESSIB)) # remove this interactionanova(m.01, m.02a, # compare m.01 to m.02atest="Chisq") # using a LRTsummary(m.02b <-update( # make m.02b an update of m.01, .~. # m.01, namely all of it (.~.), but then- REC_ANIM:LEN_RECmPAT_LOG)) # remove this interactionanova(m.01, m.02b, # compare m.01 to m.02btest="Chisq") # using a LRT```Both interactions are ns so we first delete the one with the higher *p*-value, i.e. we delete `REC_ANIM:LEN_RECmPAT_LOG` and go with `m.02b`. But can we also delete the other interaction?```{r}summary(m.03<-update( # make m.03 an update of m.02b, .~. # m.02b, namely all of it (.~.), but then- REC_ACCESSIB:PAT_ACCESSIB)) # remove this interactionanova(m.02b, m.03, # compare m.02b to m.03test="Chisq") # using a LRT```Yes, we can so we delete `REC_ACCESSIB:PAT_ACCESSIB` and go with `m.03`. But can we also delete main effects?```{r}summary(m.04a <-update( # make m.04a an update of m.03, .~. # m.03, namely all of it (.~.), but then- REC_ACCESSIB)) # remove this predictoranova(m.03, m.04a, test="Chisq") # compare models with a LRTsummary(m.04b <-update( # make m.04b an update of m.03, .~. # m.03, namely all of it (.~.), but then- PAT_ACCESSIB)) # remove this predictoranova(m.03, m.04b, test="Chisq") # compare models with a LRTsummary(m.04c <-update( # make m.04c an update of m.03, .~. # m.03, namely all of it (.~.), but then- REC_ANIM)) # remove this predictoranova(m.03, m.04c, test="Chisq") # compare models with a LRTsummary(m.04d <-update( # make m.04d an update of m.03, .~. # m.03, namely all of it (.~.), but then- LEN_RECmPAT_LOG)) # remove this predictoranova(m.03, m.04d, test="Chisq") # compare models with a LRT```No, we can't delete any more predictors, we're done with model selection:```{r}summary(m.final <- m.03); rm(m.04a, m.04b, m.04c, m.04d); invisible(gc())(m.final.zscores <-summary(m.final)$coefficients /summary(m.final)$standard.errors)(m.final.pvalues <-2*pnorm(abs(m.final.zscores), lower.tail=FALSE))```Let's also compute the overall *p*-value for this model:```{r}anova(m.00, m.final, # compare the null model to m.finaltest="Chisq") # using a LRTpchisq( # compute the area under the chi-squared curveq=839.6888, # for this chi-squared valuedf=12, # at 12 dflower.tail=FALSE) # only using the right/upper tail/side```On to model quality: We make 'predictions' ...```{r}d <-cbind(d, # add to d the columns resulting frompredict( # predicting m.final, # from m.finaltype="prob")) # predicted probabilities# give those columns better namesnames(d)[10:13] <-paste0( # rename those columns to the result of pasting"PREDS.PP.", # "PREDS.PP."names(d)[10:13]) # in front of the existing namesd <-cbind(d, # add to d a column(dPREDS.CAT=predict( # called PREDS.CAT, the predictions m.final, # from m.final, namelytype="class")) # categorical predictionshead(d) # check the result```... and then we evaluate them:```{r}(c.m <-table( # confusion matrix: cross-tabulate"OBS"=d$TRANSITIVITY, # observed constructions in the rows"PREDS"=d$PREDS.CAT)) # predicted constructions in the columnsc(tapply(d$TRANSITIVITY==d$PREDS.CAT, d$TRANSITIVITY, mean), # accuracies for ..."overall"=mean(d$TRANSITIVITY==d$PREDS.CAT)) # ... each level & overallcohens.kappa(c.m)```Plus, we need the model's *R*^2^-values and compute both McFadden's *R*^2^ and the more widely-used version of Nagelkerke's *R*^2^:```{r}R2s(m.final)```We can also add `PREDS.PP.obs`, like we did for binary logistic regression, but it's a bit more complex here so I'll just show how it's done without more details:```{r}d$PREDS.PP.obs <-predict(m.final, type="prob")[ # make this column the predictionsmatrix(c( # that are in the positions defined hereseq(nrow(d)), # one prediction for each rowas.numeric(d$TRANSITIVITY)), ncol=2)] # namely the column of the observed result```Why are we doing this? To see whether we can compute the deviance of the final model from its contributions to logloss -- i.e., `-log(d$PREDS.PP.obs))` -- again. And we can, the logic is the same as before:```{r}sum(-log(d$PREDS.PP.obs)) *2deviance(m.final)```Of course, if you paid attention to the book and worked through everything there, much of this you could have done with this:```{r}summarize.multinom(m.final)```# Visual interpretationNow let's visualize the results of the predictors based on the predictions, one at a time. We begin with `REC_ACCESSIB`:```{r}ra.d <-data.frame( # make ra.d a data frame of ra <-effect( # the effect"REC_ACCESSIB", # of REC_ACCESSIB m.final)) # in m.finalra.d[,1:5] # show just the predicted probabilitiesplot(ra, # plot the effect of REC_ACCESSIBtype="probability", # but plot predicted probabilitiesylim=c(0, 1), # w/ these y-axis limitsgrid=TRUE, # & w/ a gridlines=list(multiline=TRUE), # make all lines be in one panelconfint=list(style="auto"), # make the lines come with confidence bandslattice=list(key.args=list( # make the legend ...columns=4, # ... have 4 columns &cex.title=0))) # no title```Next, `PAT_ACCESSIB`:```{r}pa.d <-data.frame( # make pa.d a data frame of pa <-effect( # the effect"PAT_ACCESSIB", # of PAT_ACCESSIB m.final)) # in m.finalpa.d[,1:5] # show just the predicted probabilitiesplot(pa, # plot the effect of PAT_ACCESSIBtype="probability", # but plot predicted probabilitiesylim=c(0, 1), # w/ these y-axis limitsgrid=TRUE, # & w/ a gridlines=list(multiline=TRUE), # make all lines be in one panelconfint=list(style="auto"), # make the lines come with confidence bandslattice=list(key.args=list( # make the legend ...columns=4, # ... have 4 columns &cex.title=0))) # no title````REC_ANIM` is next:```{r}rca.d <-data.frame( # make rca.d a data frame of rca <-effect( # the effect"REC_ANIM", # of REC_ANIM m.final)) # in m.finalrca.d[,1:5] # show just the predicted probabilitiesplot(rca, # plot the effect of REC_ANIMtype="probability", # but plot predicted probabilitiesylim=c(0, 1), # w/ these y-axis limitsgrid=TRUE, # & w/ a gridlines=list(multiline=TRUE), # make all lines be in one panelconfint=list(style="auto"), # make the lines come with confidence bandslattice=list(key.args=list( # make the legend ...columns=4, # ... have 4 columns &cex.title=0))) # no title```Finally, the numeric predictor `LEN_RECmPAT_LOG`:```{r}ld.d <-data.frame( # make ld.d a data frame of ld <-effect( # the effect"LEN_RECmPAT_LOG", # of LEN_RECmPAT_LOG m.final)) # in m.finalld.d[,1:5] # show just the predicted probabilitiesplot(ld, # plot the effect of LEN_RECmPAT_LOGtype="probability", # but plot predicted probabilitiesylim=c(0, 1), # w/ these y-axis limitsgrid=TRUE, # & w/ a gridlines=list(multiline=TRUE), # make all lines be in one panelconfint=list(style="auto"), # make the lines come with confidence bandslattice=list(key.args=list( # make the legend ...columns=4, # ... have 4 columns &cex.title=0))) # no title```# Write-upTo determine whether `TRANSITIVITY` varies as a function of* `REC_ANIM`, `REC_ACCESSIB`, `PAT_ACCESSIB`;* the length difference of the recipient and the patient;* the interaction of the accessibility predictors;* the interaction of the `REC_ANIM` and the length difference,a multinomial model selection process was undertaken (using backwards stepwise model selection of all predictors and controls based on significance testing). The final model resulting from the elimination of ns predictors contained all four main effects and was highly significant (*LR*=839.6888, *df*=12, *p*<0.0001) with a decent amount of explanatory/predictive power (McFadden's *R*^2^=0.192, Nagelkerke's *R*^2^=0.412); the overall accuracy of the model is 60.8%, but the accuracies/recalls for each construction are very varied [insert those values]. The summary table for the model is shown below:| | Estimate | 95%-CI | *se* | *z* | *p*~2-tailed~ ||:------------------------------------------|:---------|:---------------|:-----|:------|:--------------|| Intercept~d_a→d_p~ | -0.08 | [-0.42, 0.27] | 0.18 | -0.43 | 0.6666 || Intercept~d_a→p_a~ | -0.41 | [-0.80, -0.03] | 0.20 | -2.12 | 0.0336 || Intercept~d_a→p_p~ | -0.62 | [-1.05, -0.19] | 0.22 | -2.81 | 0.005 || REC_ACCESSIB~given→new+d_a→d_p~ | 0.12 | [-0.16, 0.40] | 0.14 | 0.83 | 0.4043 || REC_ACCESSIB~given→new+d_a→p_a~ | 0.71 | [ 0.41, 1.01] | 0.15 | 4.66 | <0.0001 || REC_ACCESSIB~given→new+d_a→p_p~ | 0.62 | [ 0.20, 1.03] | 0.21 | 2.90 | 0.0037 || PAT_ACCESSIB~given→new+d_a→d_p~ | -0.36 | [-0.70, -0.02] | 0.17 | -2.06 | 0.0397 || PAT_ACCESSIB~given→new+d_a→p_a~ | -0.42 | [-0.80, -0.04] | 0.19 | -2.17 | 0.0296 || PAT_ACCESSIB~given→new+d_a→p_p~ | -1.34 | [-1.79, -0.89] | 0.23 | -5.83 | <0.0001 || REC_ANIM~anim→inanim+d_a→d_p~ | 0.04 | [-0.24, 0.33] | 0.15 | 0.30 | 0.7627 || REC_ANIM~anim→inanim+d_a→p_a~ | 0.96 | [ 0.67, 1.25] | 0.15 | 6.43 | <0.0001 || REC_ANIM~anim→inanim+d_a→p_p~ | 0.55 | [ 0.15, 0.96] | 0.21 | 2.68 | 0.0074 || LEN_RECmPAT_LOG~0→1+d_a→d_p~ | 0.44 | [ 0.34, 0.54] | 0.05 | 8.48 | <0.0001 || LEN_RECmPAT_LOG~0→1+d_a→p_a~ | 1.04 | [ 0.93, 1.16] | 0.06 | 17.74 | <0.0001 || LEN_RECmPAT_LOG~0→1+d_a→p_p~ | 0.96 | [ 0.81, 1.10] | 0.07 | 12.85 | <0.0001 |The way the main effects in the model are correlated with `TRANSITIVITY` are as follows [show plots]:* with regard to `REC_ACCESSIB`, + ditransitive actives become much more likely with given recipients; + prepositional dative actives become much more likely with new recipients; + the predicted probabilities of the passives do not differ much in response to `REC_ACCESSIB`;* with regard to `PAT_ACCESSIB`, + ditransitive actives become much more likely with new patients; + prepositional dative passives become much more likely with given patients; + the predicted probabilities of the other two constructions do not differ much in response to `PAT_ACCESSIB`;* with regard to `REC_ANIM`, + ditransitive actives become much more likely with animate recipients; + prepositional dative actives become much more likely with inanimate recipients; + ditransitive passives become somewhat more likely with animate recipients; + the predicted probabilities of the prepositional dative passives do not differ much in response to `REC_ANIM`;* with regard to the length difference predictor, + ditransitive actives become much more likely when the recipient is shorter then the patient; + prepositional dative actives become much more likely when the recipient is longer then the patient; + the predicted probabilities of the passives do not differ much in response to the length difference predictor.```{r}#| echo: false#| eval: falsera.d[,c(1, grep("prob.d_a", colnames(ra.d)))]ra.d[,c(1, grep("prob.p_a", colnames(ra.d)))]ra.d[,c(1, grep("prob.d_p", colnames(ra.d)))]ra.d[,c(1, grep("prob.p_p", colnames(ra.d)))]pa.d[,c(1, grep("prob.d_a", colnames(pa.d)))]pa.d[,c(1, grep("prob.p_a", colnames(pa.d)))]pa.d[,c(1, grep("prob.d_p", colnames(pa.d)))]pa.d[,c(1, grep("prob.p_p", colnames(pa.d)))]rca.d[,c(1, grep("prob.d_a", colnames(rca.d)))]rca.d[,c(1, grep("prob.p_a", colnames(rca.d)))]rca.d[,c(1, grep("prob.d_p", colnames(rca.d)))]rca.d[,c(1, grep("prob.p_p", colnames(rca.d)))]ld.d[,c(1, grep("prob.d_a", colnames(ld.d)))]ld.d[,c(1, grep("prob.p_a", colnames(ld.d)))]ld.d[,c(1, grep("prob.d_p", colnames(ld.d)))]ld.d[,c(1, grep("prob.p_p", colnames(ld.d)))]```# Excursus 1: prototypesNote that, just like last session, we can use the predicted probabilities to explore the prototypical uses, the best examples, of these constructions:```{r}d.split <-split(d, d$TRANSITIVITY) # split up data by construction# the prototype of the ditransitive activehead(unique(d.split$d_a[order(d.split$d_a$PREDS.PP.d_a, decreasing=TRUE),c(2, 5, 7:9, 10)]), 20)# the prototype of the prepositional dative activehead(unique(d.split$p_a[order(d.split$p_a$PREDS.PP.p_a, decreasing=TRUE),c(2, 5, 7:9, 12)]), 20)```Nice:* the prototypical ditransitive active involves + a short, given, and animate recipient; + a long new patient; + examples such as *He told him a very long and convoluted story*;* the prototypical prepositional dative active involves + a long, new, and inanimate recipient; + a short new patient; + examples such as *She gave some serious thought to the situation that had presented itself to her*.The above solution used the predicted probabilities for the attested cases, the data in `d`. And in this particular case, this approach is probably sufficient because we don't have that many predictors and most have them have only few levels so our data cover what's possible relatively well. But remember from last session that we can also use all possible combinations of predictors' levels, values, or value ranges. For that, but also for other reasons, we should really always create a data frame like `nd` here, which contains all possible combinations of things:```{r}summary(nd <-expand.grid(REC_ACCESSIB=levels(d$REC_ACCESSIB),PAT_ACCESSIB=levels(d$PAT_ACCESSIB),REC_ANIM=levels(d$REC_ANIM),LEN_RECmPAT_LOG=seq(min(d$LEN_RECmPAT_LOG),max(d$LEN_RECmPAT_LOG),length.out=10)))```And then we would add to it the predicted probabilities of all levels of the response variable:```{r}nd <-cbind(nd, predict(m.final, newdata=nd, type="probs"))```That data frame can then be sorted by these 4 new columns of predicted probabilities to see what combinations come out on top for each construction. Here's the result of this for the same two constructions as above:```{r}head(nd[order(nd$d_a, decreasing=TRUE), c(1:4, 5)], 10)head(nd[order(nd$p_a, decreasing=TRUE), c(1:4, 7)], 10)```Here, the results don't change all that much, but there are scenarios where this can make a bigger difference.# Excursus 2: predictions that are most wrongIt can again be interesting to explore the case where the model does worst, i.e. where the contributions to logloss are highest:```{r}lapply( # apply to d.split, # the list w/ the 4 data frames \(af) # an anonymous function (SFLWR 3.5.4)head(af[ # that takes the head of each data frameorder( # when the data frame is sorted-log(af$PREDS.PP.obs), # by the contributions to loglossdecreasing=TRUE),])) # in decreasing order```In a real study, one could now check the concrete examples and their contexts for what might explain the unexpected speaker choices.# HomeworkTo prepare for next session, read *SFLWR*^3^, Section 5.4.2 on ordinal regression.# Session info```{r}sessionInfo()```