1 Session 04: Binary logistic regression modeling 1

rm(list=ls(all.names=TRUE))
library(car); library(effects)
source("105_04-05_R2.r"); source("105_04-05_C.score.r") # load small helper functions
summary(x <- read.delim(      # summarize x, the result of loading
   file="105_04-05_GENs.csv", # this file
   stringsAsFactors=TRUE))    # change categorical variables into factors
##       CASE      SPEAKER       MODALITY    GENITIVE    POR_LENGTH    
##  Min.   :   2   nns:2666   spoken :1685   of:2720   Min.   :  1.00  
##  1st Qu.:1006   ns : 934   written:1915   s : 880   1st Qu.:  8.00  
##  Median :2018                                       Median : 11.00  
##  Mean   :2012                                       Mean   : 14.58  
##  3rd Qu.:3017                                       3rd Qu.: 17.00  
##  Max.   :4040                                       Max.   :204.00  
##    PUM_LENGTH         POR_ANIMACY   POR_FINAL_SIB        POR_DEF    
##  Min.   :  2.00   animate   : 920   absent :2721   definite  :2349  
##  1st Qu.:  6.00   collective: 607   present: 879   indefinite:1251  
##  Median :  9.00   inanimate :1671                                   
##  Mean   : 10.35   locative  : 243                                   
##  3rd Qu.: 13.00   temporal  : 159                                   
##  Max.   :109.00

Let’s already compute the baselines for what will be the response variable in all case studies, GENITIVE:

(baselines <- c(
   "baseline 1"=max(          # make baselines[1] the highest
      prop.table(             # proportion in the
         table(x$GENITIVE))), # frequency table of the response
   "baseline 2"=sum(             # make baselines[2] the sum of the
      prop.table(                # proportions in the
         table(x$GENITIVE))^2))) # frequency table of the response squared
## baseline 1 baseline 2 
##  0.7555556  0.6306173

1.1 Deviance

In the case of lm, we defined deviance as the sum of squared residuals and we said we want to reduce the deviance of our model relative to a null model by adding (hopefully good) predictors. What is deviance here in the case of a glm? To understand that, we first compute a null model:

summary(m.00 <- glm(GENITIVE ~ 1, data=x, family=binomial, na.action=na.exclude))
## 
## Call:
## glm(formula = GENITIVE ~ 1, family = binomial, data = x, na.action = na.exclude)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.7487  -0.7487  -0.7487  -0.7487   1.6785  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.12847    0.03878   -29.1   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 4004.3  on 3599  degrees of freedom
## Residual deviance: 4004.3  on 3599  degrees of freedom
## AIC: 4006.3
## 
## Number of Fisher Scoring iterations: 4
deviance(m.00)
## [1] 4004.273
sum(residuals(m.00)^2)
## [1] 4004.273

We see that the deviance of m.00 is still the sum of this model’s squared residuals (and also what is called “residual deviance” in the summary of m.00). But what are squared residuals here? We will discuss that in more detail after we have fitted our first ‘real model’ …

1.2 A binary predictor

Does the choice of a genitive construction vary as a function of the modality in which the construction was produced (spoken vs. written)?

Some exploration of the relevant variables:

# the predictor(s)/response on its/their own
table(x$GENITIVE)
## 
##   of    s 
## 2720  880
table(x$MODALITY)
## 
##  spoken written 
##    1685    1915
# the predictor(s) w/ the response
(mod_x_gen <- table(x$MODALITY, x$GENITIVE))
##          
##             of    s
##   spoken  1232  453
##   written 1488  427

How would we have dealt with this in Ling 104? First, we would have computed a chi-squared test like this, where the chi-squared value is computed involving differences of observed and expected values:

chisq.test(mod_x_gen, correct=FALSE)
## 
##  Pearson's Chi-squared test
## 
## data:  mod_x_gen
## X-squared = 10.21, df = 1, p-value = 0.001397

Second, we might have computed the odds ratio and the logged odds ratio for the s-genitives:

(odds.ratio <-
   (mod_x_gen[2,2] / mod_x_gen[2,1]) /
   (mod_x_gen[1,2] / mod_x_gen[1,1]))
## [1] 0.7804363
log(odds.ratio)
## [1] -0.2479022

How do we do this now? With a binary logistic regression, which returns the G2-statistic, which is computed based on ratios of observed and expected values (not differences like chi-squared) and which also returns the (logged) odds ratio. Let’s fit our regression model:

summary(m.01 <- glm(      # make/summarize the gen. linear model m.01:
   GENITIVE ~ 1 +         # GENITIVE ~ an overall intercept (1)
   MODALITY,              # & the predictor MODALITY
   data=x,                # those variables are in x
   family=binomial,       # the response is binary
   na.action=na.exclude)) # skip cases with NA/missing data
## 
## Call:
## glm(formula = GENITIVE ~ 1 + MODALITY, family = binomial, data = x, 
##     na.action = na.exclude)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.7914  -0.7914  -0.7103  -0.7103   1.7325  
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     -1.00050    0.05495 -18.208  < 2e-16 ***
## MODALITYwritten -0.24790    0.07767  -3.192  0.00141 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 4004.3  on 3599  degrees of freedom
## Residual deviance: 3994.1  on 3598  degrees of freedom
## AIC: 3998.1
## 
## Number of Fisher Scoring iterations: 4
pchisq(              # compute the area under the chi-squared curve
   q=4004.3-3994.1,  # for this chi-squared value: null - residual deviance
   df=3599-3598,     # at this df: null - residual df
   lower.tail=FALSE) # only using the right/upper tail/side
## [1] 0.001404407
drop1(           # drop each droppable predictor at a time
   m.01,         # from the model m.01 &
   test="Chisq") # test its significance w/ a LRT
## Single term deletions
## 
## Model:
## GENITIVE ~ 1 + MODALITY
##          Df Deviance    AIC    LRT Pr(>Chi)   
## <none>        3994.1 3998.1                   
## MODALITY  1   4004.3 4006.3 10.194 0.001409 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Note how similar the chi-squared value and the G2-statistic and their p-values are.) Let’s make ‘predictions’:

x$PREDS.NUM.2 <- predict( # make x$PREDS.NUM.2 the result of predicting
   m.01,                  # from m.01
   type="response")       # predicted probabilities of "s"
# if you don't say type="response", predict gives you log odds
x$PREDS.CAT <- factor(        # make x$PREDS.CAT the result of checking
   ifelse(x$PREDS.NUM.2>=0.5, # if the predicted prob. of "s" is >=0.5
      levels(x$GENITIVE)[2],  # if yes, predict "s"
      levels(x$GENITIVE)[1])) # otherwise, predict "of"

On top of that, let’s also add something else, namely a column called PREDS.NUM.obs that contains, for each case, the predicted probability of the level that was observed in the data, i.e. what the model should have predicted. How does that work? Consider the head of x:

head(x)
##   CASE SPEAKER MODALITY GENITIVE POR_LENGTH PUM_LENGTH POR_ANIMACY
## 1    2     nns   spoken       of         13          7  collective
## 2    3     nns   spoken       of         22          7     animate
## 3    4     nns   spoken       of         11          8     animate
## 4    5     nns   spoken       of         26          4  collective
## 5    6     nns   spoken        s          8          4     animate
## 6    7     nns   spoken        s          7          3     animate
##   POR_FINAL_SIB  POR_DEF PREDS.NUM.2 PREDS.CAT
## 1        absent definite   0.2688427        of
## 2        absent definite   0.2688427        of
## 3       present definite   0.2688427        of
## 4        absent definite   0.2688427        of
## 5        absent definite   0.2688427        of
## 6        absent definite   0.2688427        of

In the first case, the model predicted s with a probability of 0.2688427 and, thus of with a probability of 1-0.2688427=0.7311573. The actual observation, what the speaker really said, is of so PREDS.NUM.obs[1] should be 0.7311573. In the fifth case, the model again predicted s with a probability of 0.2688427 and, thus of with a probability of 1-0.2688427=0.7311573. The actual observation, what the speaker really said, is s so PREDS.NUM.obs[2] should be 0.2688427. How do we do that efficiently for all cases (and in the spirit of Section 3.5.2, efficiently means ‘without a loop’)?

x$PREDS.NUM.obs <- ifelse( # x$PREDS.NUM.obs is determined by ifelse
   x$GENITIVE=="s",        # if the obs genitive is the 2nd level of the response
     x$PREDS.NUM.2,        # take its predicted probability
   1-x$PREDS.NUM.2)        # otherwise take 1 minus its predicted probability
head(x)
##   CASE SPEAKER MODALITY GENITIVE POR_LENGTH PUM_LENGTH POR_ANIMACY
## 1    2     nns   spoken       of         13          7  collective
## 2    3     nns   spoken       of         22          7     animate
## 3    4     nns   spoken       of         11          8     animate
## 4    5     nns   spoken       of         26          4  collective
## 5    6     nns   spoken        s          8          4     animate
## 6    7     nns   spoken        s          7          3     animate
##   POR_FINAL_SIB  POR_DEF PREDS.NUM.2 PREDS.CAT PREDS.NUM.obs
## 1        absent definite   0.2688427        of     0.7311573
## 2        absent definite   0.2688427        of     0.7311573
## 3       present definite   0.2688427        of     0.7311573
## 4        absent definite   0.2688427        of     0.7311573
## 5        absent definite   0.2688427        of     0.2688427
## 6        absent definite   0.2688427        of     0.2688427

We’ll get to why this is useful in a moment … But let’s now evaluate the predictive capacity of the model:

(qwe <- table(           # confusion matrix: cross-tabulate
   "OBS"=x$GENITIVE,     # observed genitives in the rows
   "PREDS"=x$PREDS.CAT)) # predicted orders in the columns
##     PREDS
## OBS    of
##   of 2720
##   s   880
c("R2"=R2(m.01), "C"=C.score(m.01)) # R-squared & C
##          R2           C 
## 0.004212717 0.530915775
# evaluating the confusion matrix here in the usual way doesn't make much sense ...
# c( # evaluate the confusion matrix
#    "Class. acc." =mean(x$GENITIVE==x$PREDS.CAT), # (tp+tn) / (tp+tn+fp+fn)
#    "Prec. for s" =qwe[ "s", "s"] / sum(qwe[    , "s"]),
#    "Rec. for s"  =qwe[ "s", "s"] / sum(qwe[ "s",    ]),
#    "Prec. for of"=qwe["of","of"] / sum(qwe[    ,"of"]),
#    "Rec. for of" =qwe["of","of"] / sum(qwe["of",    ]))
c("R2"=R2(m.01), "C"=C.score(m.01)) # R-squared & C
##          R2           C 
## 0.004212717 0.530915775

Let’s visualize the nature of the effect of MODALITY based on the predictions:

(ph <- data.frame( # make ph a data frame of
   mod <- effect(  # the effect
      "MODALITY",  # of MODALITY
      m.01)))      # in m.01
##   MODALITY       fit          se     lower     upper
## 1   spoken 0.2688427 0.010800758 0.2482073 0.2905308
## 2  written 0.2229765 0.009511784 0.2048903 0.2421729
plot(mod,           # plot the effect of MODALITY
   type="response", # but plot predicted probabilities
   ylim=c(0, 1),    # ylim: the whole range of possible probabilities
   grid=TRUE)       # & w/ a grid

1.2.1 Excursus: deviance derived from logloss

We saw above that the deviance of a glm is again the sum of squared residuals, which means we again want to reduce the deviance of our model relative to a null model by adding (hopefully good) predictors. But what is deviance here in the case of a glm exactly, how it is computed? To understand, let’s look at the predictions of m.01 again:

head(x[,c(4,10:12)])
##   GENITIVE PREDS.NUM.2 PREDS.CAT PREDS.NUM.obs
## 1       of   0.2688427        of     0.7311573
## 2       of   0.2688427        of     0.7311573
## 3       of   0.2688427        of     0.7311573
## 4       of   0.2688427        of     0.7311573
## 5        s   0.2688427        of     0.2688427
## 6        s   0.2688427        of     0.2688427

What would the best model ever have returned for these 6 cases? Something like this, where the model

  • predicts all instances of s correctly because it returns a predicted probability of s of essentially 1;
  • predicts all instances of of correctly because it returns a predicted probability of s of essentially 0:
##   GENITIVE PREDS.NUM.2 PREDS.CAT PREDS.NUM.obs
## 1       of     0.00001        of       0.99999
## 2       of     0.00001        of       0.99999
## 3       of     0.00001        of       0.99999
## 4       of     0.00001        of       0.99999
## 5        s     0.99999         s       0.99999
## 6        s     0.99999         s       0.99999

That means we could use 1-PREDS.NUM.obs to quantify how much off from a perfect prediction the model is, but there is an alternative: We could taking the negative logs of PREDS.NUM.obs because then

  • if the predicted probability of what was actually observed, which should be close to 1, is in fact close to 1 – i.e. the model is good – then we end up -logging something that’s very close to 1, which makes the negative logs really close to 0:
-log(0.8)
## [1] 0.2231436
-log(0.9)
## [1] 0.1053605
-log(0.95)
## [1] 0.05129329
  • if the predicted probability of what was actually observed, which should be close to 1, is in fact close to 0 – i.e. the model is bad – then we end up -logging something that’s very close to 0, which makes the negative logs grow and grow:
-log(0.2)
## [1] 1.609438
-log(0.1)
## [1] 2.302585
-log(0.05)
## [1] 2.995732

I am calling these values contributions to logloss (because the mean of all those values is a evaluation metric called logloss. But what does this have to do with deviance? This:

deviance(m.01) # same as 
## [1] 3994.079
# the sum of all contributions to logloss times 2
sum(-log(x$PREDS.NUM.obs)) * 2
## [1] 3994.079

That means, the deviance of such a glm, what the summary calls “residual deviance”, expresses how far all the predicted probabilities of the events that should have been predicted are from their theoretical ideals of (very near to) 0 and (very near to) 1! That actually means we can also again express how much the deviance of the null model, the null deviance, is reduced by the predictors in percent:

(m.01$null.deviance-m.01$deviance) / m.01$null.deviance
## [1] 0.002545694

This is also an R2-value – not the one that is mostly used and that we computed above, Nagelkerke’s R2, but one that is called McFadden’s R2.

1.3 A categorical predictor

Does the choice of a genitive construction vary as a function on the degree/kind of animacy of the possessor (animate vs. collective vs. inanimate vs. locative vs. temporal)?

Some exploration of the relevant variables:

# the predictor(s)/response on its/their own
table(x$GENITIVE)
## 
##   of    s 
## 2720  880
table(x$POR_ANIMACY)
## 
##    animate collective  inanimate   locative   temporal 
##        920        607       1671        243        159
# the predictor(s) w/ the response
table(x$POR_ANIMACY, x$GENITIVE)
##             
##                of    s
##   animate     370  550
##   collective  408  199
##   inanimate  1638   33
##   locative    199   44
##   temporal    105   54

Let’s fit our regression model:

summary(m.01 <- glm(      # make/summarize the gen. linear model m.01:
   GENITIVE ~ 1 +         # GENITIVE ~ an overall intercept (1)
   POR_ANIMACY,           # & the predictor POR_ANIMACY
   data=x,                # those variables are in x
   family=binomial,       # the response is binary
   na.action=na.exclude)) # skip cases with NA/missing data
## 
## Call:
## glm(formula = GENITIVE ~ 1 + POR_ANIMACY, family = binomial, 
##     data = x, na.action = na.exclude)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3497  -0.6321  -0.1997  -0.1997   2.8017  
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            0.39642    0.06724   5.896 3.73e-09 ***
## POR_ANIMACYcollective -1.11438    0.10953 -10.174  < 2e-16 ***
## POR_ANIMACYinanimate  -4.30114    0.18823 -22.851  < 2e-16 ***
## POR_ANIMACYlocative   -1.90553    0.17965 -10.607  < 2e-16 ***
## POR_ANIMACYtemporal   -1.06139    0.18045  -5.882 4.06e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 4004.3  on 3599  degrees of freedom
## Residual deviance: 2766.0  on 3595  degrees of freedom
## AIC: 2776
## 
## Number of Fisher Scoring iterations: 6
pchisq(              # compute the area under the chi-squared curve
   q=4004.3-2766,    # for this chi-squared value: null - residual deviance
   df=3599-3595,     # at this df: null - residual df
   lower.tail=FALSE) # only using the right/upper tail/side
## [1] 7.926259e-267
drop1(           # drop each droppable predictor at a time
   m.01,         # from the model m.01 &
   test="Chisq") # test its significance w/ a LRT
## Single term deletions
## 
## Model:
## GENITIVE ~ 1 + POR_ANIMACY
##             Df Deviance    AIC    LRT  Pr(>Chi)    
## <none>           2766.0 2776.0                     
## POR_ANIMACY  4   4004.3 4006.3 1238.3 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s make (numerical and categorical) ‘predictions’ and evaluate them:

x$PREDS.NUM.2 <- predict( # make x$PREDS.NUM.2 the result of predicting
   m.01,                  # from m.01
   type="response")       # predicted probabilities of "s"
# if you don't say type="response", predict gives you log odds
x$PREDS.CAT <- factor(        # make x$PREDS.CAT the result of checking
   ifelse(x$PREDS.NUM.2>=0.5, # if the predicted prob. of "s" is >=0.5
      levels(x$GENITIVE)[2],  # if yes, predict "s"
      levels(x$GENITIVE)[1])) # otherwise, predict "of"
(qwe <- table(           # confusion matrix: cross-tabulate
   "OBS"=x$GENITIVE,     # observed genitives in the rows
   "PREDS"=x$PREDS.CAT)) # predicted orders in the columns
##     PREDS
## OBS    of    s
##   of 2350  370
##   s   330  550
c( # evaluate the confusion matrix
   "Class. acc." =mean(x$GENITIVE==x$PREDS.CAT, na.rm=TRUE), # (tp+tn) / (tp+tn+fp+fn)
   "Prec. for s" =qwe[ "s", "s"] / sum(qwe[    , "s"]),
   "Rec. for s"  =qwe[ "s", "s"] / sum(qwe[ "s",    ]),
   "Prec. for of"=qwe["of","of"] / sum(qwe[    ,"of"]),
   "Rec. for of" =qwe["of","of"] / sum(qwe["of",    ]))
##  Class. acc.  Prec. for s   Rec. for s Prec. for of  Rec. for of 
##    0.8055556    0.5978261    0.6250000    0.8768657    0.8639706
c("Nagelkerke's R2"=R2(m.01),
  "McFadden's R2"  =(m.01$null.deviance-m.01$deviance) / m.01$null.deviance,
  "C"=C.score(m.01)) # R-squared values & C
## Nagelkerke's R2   McFadden's R2               C 
##       0.4336234       0.3092390       0.8472389

Again, we also add our column PREDS.NUM.obs with, for each case, the predicted probability of the level that was observed in the data:

x$PREDS.NUM.obs <- ifelse( # x$PREDS.NUM.obs is determined by ifelse
   x$GENITIVE=="s",        # if the obs genitive is the 2nd level of the response
     x$PREDS.NUM.2,        # take its predicted probability
   1-x$PREDS.NUM.2)        # otherwise take 1 minus its predicted probability
head(x)
##   CASE SPEAKER MODALITY GENITIVE POR_LENGTH PUM_LENGTH POR_ANIMACY
## 1    2     nns   spoken       of         13          7  collective
## 2    3     nns   spoken       of         22          7     animate
## 3    4     nns   spoken       of         11          8     animate
## 4    5     nns   spoken       of         26          4  collective
## 5    6     nns   spoken        s          8          4     animate
## 6    7     nns   spoken        s          7          3     animate
##   POR_FINAL_SIB  POR_DEF PREDS.NUM.2 PREDS.CAT PREDS.NUM.obs
## 1        absent definite   0.3278418        of     0.6721582
## 2        absent definite   0.5978261         s     0.4021739
## 3       present definite   0.5978261         s     0.4021739
## 4        absent definite   0.3278418        of     0.6721582
## 5        absent definite   0.5978261         s     0.5978261
## 6        absent definite   0.5978261         s     0.5978261

Let’s visualize the nature of the effect of POR_ANIMACY based on the predictions:

(ph <- data.frame(   # make ph a data frame of
   ani <- effect(    # the effect
      "POR_ANIMACY", # of POR_ANIMACY
      m.01)))        # in m.01
##   POR_ANIMACY        fit          se      lower      upper
## 1     animate 0.59782609 0.016165922 0.56577463 0.62906282
## 2  collective 0.32784185 0.019053448 0.29164055 0.36621363
## 3   inanimate 0.01974865 0.003403457 0.01407325 0.02764863
## 4    locative 0.18106996 0.024702646 0.13756935 0.23458435
## 5    temporal 0.33962264 0.037557428 0.27028269 0.41659580
plot(ani,           # plot the effect of POR_ANIMACY
   type="response", # but plot predicted probabilities
   ylim=c(0, 1),    # ylim: the whole range of possible probabilities
   grid=TRUE)       # & w/ a grid

1.3.1 Excursus: model comparison

Given a lot of previous literature on the genitive alternation, you might actually have hypothesized that the only thing one needs is a distinction between (typically) animate and inanimate possessors. Let’s use model comparison to check this. First, we create a version of POR_ANIMACY that conflates levels as desired:

x$POR_ANIMACY.confl <- x$POR_ANIMACY # create a copy of POR_ANIMACY
levels(x$POR_ANIMACY.confl) <-       # overwrite the copy's levels such that
   c("anim/coll", "anim/coll",       # "animate" and "collective" become "anim/coll"
      "inanim/loc/temp", "inanim/loc/temp", "inanim/loc/temp") # all others become "inanim/loc/temp"
table(POR_ANIMACY=x$POR_ANIMACY, POR_ANIMACY.confl=x$POR_ANIMACY.confl) # check you did it right
##             POR_ANIMACY.confl
## POR_ANIMACY  anim/coll inanim/loc/temp
##   animate          920               0
##   collective       607               0
##   inanimate          0            1671
##   locative           0             243
##   temporal           0             159
head(x <- x[,c(1:7,13,8:12)])
##   CASE SPEAKER MODALITY GENITIVE POR_LENGTH PUM_LENGTH POR_ANIMACY
## 1    2     nns   spoken       of         13          7  collective
## 2    3     nns   spoken       of         22          7     animate
## 3    4     nns   spoken       of         11          8     animate
## 4    5     nns   spoken       of         26          4  collective
## 5    6     nns   spoken        s          8          4     animate
## 6    7     nns   spoken        s          7          3     animate
##   POR_ANIMACY.confl POR_FINAL_SIB  POR_DEF PREDS.NUM.2 PREDS.CAT PREDS.NUM.obs
## 1         anim/coll        absent definite   0.3278418        of     0.6721582
## 2         anim/coll        absent definite   0.5978261         s     0.4021739
## 3         anim/coll       present definite   0.5978261         s     0.4021739
## 4         anim/coll        absent definite   0.3278418        of     0.6721582
## 5         anim/coll        absent definite   0.5978261         s     0.5978261
## 6         anim/coll        absent definite   0.5978261         s     0.5978261

We already have the model with POR_ANIMACY as the only predictor so we only need to fit a new model with POR_ANIMACY.confl as the only predictor:

summary(m.01.animconfl <- glm( # make/summarize the gen. linear model m.01.animconfl:
   GENITIVE ~ 1 + POR_ANIMACY.confl,    # ORDER ~ an overall intercept (1) & POR_ANIMACY.confl
   data=x, family=binomial, na.action=na.exclude)) # the other arguments
## 
## Call:
## glm(formula = GENITIVE ~ 1 + POR_ANIMACY.confl, family = binomial, 
##     data = x, na.action = na.exclude)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1613  -0.3613  -0.3613  -0.3613   2.3501  
## 
## Coefficients:
##                                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                      -0.03799    0.05119  -0.742    0.458    
## POR_ANIMACY.conflinanim/loc/temp -2.65829    0.10377 -25.616   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 4004.3  on 3599  degrees of freedom
## Residual deviance: 3093.4  on 3598  degrees of freedom
## AIC: 3097.4
## 
## Number of Fisher Scoring iterations: 5

… and then we do a comparison with anova just like above, the only difference being the argument to test (because we’re now fitting a generalized linear model).

anova(             # compare
   m.01,           # the model w/ the 5-level predictor POR_ANIMACY
   m.01.animconfl, # & the model w/ the 2-level predictor POR_ANIMACY.confl
   test="Chisq")   # using a LRT
## Analysis of Deviance Table
## 
## Model 1: GENITIVE ~ 1 + POR_ANIMACY
## Model 2: GENITIVE ~ 1 + POR_ANIMACY.confl
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1      3595     2766.0                          
## 2      3598     3093.4 -3  -327.39 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The difference between the two models is hugely significant: The finer resolution of POR_ANIMACY compared to POR_ANIMACY.confl makes the model significantly better (in the sense of having a significantly smaller deviance): LR-test statistic (the difference between the models’ deviances)=327.39, df=3, p<0.00001. (If you really understood everything well, you would have know this already from the summary of m.01.animconfl – why?)

1.3.2 Excursus: deviance derived from logloss

The way the contributions to logloss and the deviance are computed do not change just because we have (a) different predictor(s) now: The deviance is still the sum of all contributions to logloss times 2:

deviance(m.01)
## [1] 2765.995
sum(-log(x$PREDS.NUM.obs)) * 2
## [1] 2765.995
m.01$deviance
## [1] 2765.995

1.4 A numeric predictor

Does the choice of a genitive construction vary as a function on the length of the possessor (in characters)?

# the predictor(s)/response on its/their own
table(x$GENITIVE)
## 
##   of    s 
## 2720  880
hist(x$POR_LENGTH); hist(x$POR_LENGTH, breaks="FD")