```
rm(list=ls(all.names=TRUE))
library(car); library(effects)
source("105_04-05_R2.r"); source("105_04-05_C.score.r") # load small helper functions
summary(x <- read.delim( # summarize x, the result of loading
file="105_04-05_GENs.csv", # this file
stringsAsFactors=TRUE)) # change categorical variables into factors
```

```
## CASE SPEAKER MODALITY GENITIVE POR_LENGTH
## Min. : 2 nns:2666 spoken :1685 of:2720 Min. : 1.00
## 1st Qu.:1006 ns : 934 written:1915 s : 880 1st Qu.: 8.00
## Median :2018 Median : 11.00
## Mean :2012 Mean : 14.58
## 3rd Qu.:3017 3rd Qu.: 17.00
## Max. :4040 Max. :204.00
## PUM_LENGTH POR_ANIMACY POR_FINAL_SIB POR_DEF
## Min. : 2.00 animate : 920 absent :2721 definite :2349
## 1st Qu.: 6.00 collective: 607 present: 879 indefinite:1251
## Median : 9.00 inanimate :1671
## Mean : 10.35 locative : 243
## 3rd Qu.: 13.00 temporal : 159
## Max. :109.00
```

Let’s already compute the baselines for what will be the response
variable in all case studies, `GENITIVE`

:

```
(baselines <- c(
"baseline 1"=max( # make baselines[1] the highest
prop.table( # proportion in the
table(x$GENITIVE))), # frequency table of the response
"baseline 2"=sum( # make baselines[2] the sum of the
prop.table( # proportions in the
table(x$GENITIVE))^2))) # frequency table of the response squared
```

```
## baseline 1 baseline 2
## 0.7555556 0.6306173
```

In the case of `lm`

, we defined deviance as the sum of
squared residuals and we said we want to reduce the deviance of our
model relative to a null model by adding (hopefully good) predictors.
What is deviance here in the case of a `glm`

? To understand
that, we first compute a null model:

`summary(m.00 <- glm(GENITIVE ~ 1, data=x, family=binomial, na.action=na.exclude))`

```
##
## Call:
## glm(formula = GENITIVE ~ 1, family = binomial, data = x, na.action = na.exclude)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.7487 -0.7487 -0.7487 -0.7487 1.6785
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.12847 0.03878 -29.1 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 4004.3 on 3599 degrees of freedom
## Residual deviance: 4004.3 on 3599 degrees of freedom
## AIC: 4006.3
##
## Number of Fisher Scoring iterations: 4
```

`deviance(m.00)`

`## [1] 4004.273`

`sum(residuals(m.00)^2)`

`## [1] 4004.273`

We see that the deviance of `m.00`

is still the sum of
this model’s squared residuals (and also what is called “residual
deviance” in the summary of `m.00`

). But what are squared
residuals here? We will discuss that in more detail after we have fitted
our first ‘real model’ …

Does the choice of a genitive construction vary as a function of the
modality in which the construction was produced (*spoken*
vs. *written*)?

Some exploration of the relevant variables:

```
# the predictor(s)/response on its/their own
table(x$GENITIVE)
```

```
##
## of s
## 2720 880
```

`table(x$MODALITY)`

```
##
## spoken written
## 1685 1915
```

```
# the predictor(s) w/ the response
(mod_x_gen <- table(x$MODALITY, x$GENITIVE))
```

```
##
## of s
## spoken 1232 453
## written 1488 427
```

How would we have dealt with this in Ling 104? First, we would have computed a chi-squared test like this, where the chi-squared value is computed involving differences of observed and expected values:

`chisq.test(mod_x_gen, correct=FALSE)`

```
##
## Pearson's Chi-squared test
##
## data: mod_x_gen
## X-squared = 10.21, df = 1, p-value = 0.001397
```

Second, we might have computed the odds ratio and the logged odds
ratio for the *s*-genitives:

```
(odds.ratio <-
(mod_x_gen[2,2] / mod_x_gen[2,1]) /
(mod_x_gen[1,2] / mod_x_gen[1,1]))
```

`## [1] 0.7804363`

`log(odds.ratio)`

`## [1] -0.2479022`

How do we do this now? With a binary logistic regression, which
returns the *G*^{2}-statistic, which is computed based on
ratios of observed and expected values (not differences like
chi-squared) and which also returns the (logged) odds ratio. Let’s fit
our regression model:

```
summary(m.01 <- glm( # make/summarize the gen. linear model m.01:
GENITIVE ~ 1 + # GENITIVE ~ an overall intercept (1)
MODALITY, # & the predictor MODALITY
data=x, # those variables are in x
family=binomial, # the response is binary
na.action=na.exclude)) # skip cases with NA/missing data
```

```
##
## Call:
## glm(formula = GENITIVE ~ 1 + MODALITY, family = binomial, data = x,
## na.action = na.exclude)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.7914 -0.7914 -0.7103 -0.7103 1.7325
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.00050 0.05495 -18.208 < 2e-16 ***
## MODALITYwritten -0.24790 0.07767 -3.192 0.00141 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 4004.3 on 3599 degrees of freedom
## Residual deviance: 3994.1 on 3598 degrees of freedom
## AIC: 3998.1
##
## Number of Fisher Scoring iterations: 4
```

```
pchisq( # compute the area under the chi-squared curve
q=4004.3-3994.1, # for this chi-squared value: null - residual deviance
df=3599-3598, # at this df: null - residual df
lower.tail=FALSE) # only using the right/upper tail/side
```

`## [1] 0.001404407`

```
drop1( # drop each droppable predictor at a time
m.01, # from the model m.01 &
test="Chisq") # test its significance w/ a LRT
```

```
## Single term deletions
##
## Model:
## GENITIVE ~ 1 + MODALITY
## Df Deviance AIC LRT Pr(>Chi)
## <none> 3994.1 3998.1
## MODALITY 1 4004.3 4006.3 10.194 0.001409 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

(Note how similar the chi-squared value and the
*G*^{2}-statistic and their *p*-values are.) Let’s
make ‘predictions’:

```
x$PREDS.NUM.2 <- predict( # make x$PREDS.NUM.2 the result of predicting
m.01, # from m.01
type="response") # predicted probabilities of "s"
# if you don't say type="response", predict gives you log odds
x$PREDS.CAT <- factor( # make x$PREDS.CAT the result of checking
ifelse(x$PREDS.NUM.2>=0.5, # if the predicted prob. of "s" is >=0.5
levels(x$GENITIVE)[2], # if yes, predict "s"
levels(x$GENITIVE)[1])) # otherwise, predict "of"
```

On top of that, let’s also add something else, namely a column called
`PREDS.NUM.obs`

that contains, for each case, the predicted
probability of the level that was observed in the data, i.e. what the
model *should* have predicted. How does that work? Consider the
head of `x`

:

`head(x)`

```
## CASE SPEAKER MODALITY GENITIVE POR_LENGTH PUM_LENGTH POR_ANIMACY
## 1 2 nns spoken of 13 7 collective
## 2 3 nns spoken of 22 7 animate
## 3 4 nns spoken of 11 8 animate
## 4 5 nns spoken of 26 4 collective
## 5 6 nns spoken s 8 4 animate
## 6 7 nns spoken s 7 3 animate
## POR_FINAL_SIB POR_DEF PREDS.NUM.2 PREDS.CAT
## 1 absent definite 0.2688427 of
## 2 absent definite 0.2688427 of
## 3 present definite 0.2688427 of
## 4 absent definite 0.2688427 of
## 5 absent definite 0.2688427 of
## 6 absent definite 0.2688427 of
```

In the first case, the model predicted *s* with a probability
of 0.2688427 and, thus *of* with a probability of
1-0.2688427=0.7311573. The actual observation, what the speaker really
said, is *of* so PREDS.NUM.obs[1] should be 0.7311573. In the
fifth case, the model again predicted *s* with a probability of
0.2688427 and, thus *of* with a probability of
1-0.2688427=0.7311573. The actual observation, what the speaker really
said, is *s* so `PREDS.NUM.obs[2]`

should be
0.2688427. How do we do that efficiently for all cases (and in the
spirit of Section 3.5.2, efficiently means ‘without a loop’)?

```
x$PREDS.NUM.obs <- ifelse( # x$PREDS.NUM.obs is determined by ifelse
x$GENITIVE=="s", # if the obs genitive is the 2nd level of the response
x$PREDS.NUM.2, # take its predicted probability
1-x$PREDS.NUM.2) # otherwise take 1 minus its predicted probability
head(x)
```

```
## CASE SPEAKER MODALITY GENITIVE POR_LENGTH PUM_LENGTH POR_ANIMACY
## 1 2 nns spoken of 13 7 collective
## 2 3 nns spoken of 22 7 animate
## 3 4 nns spoken of 11 8 animate
## 4 5 nns spoken of 26 4 collective
## 5 6 nns spoken s 8 4 animate
## 6 7 nns spoken s 7 3 animate
## POR_FINAL_SIB POR_DEF PREDS.NUM.2 PREDS.CAT PREDS.NUM.obs
## 1 absent definite 0.2688427 of 0.7311573
## 2 absent definite 0.2688427 of 0.7311573
## 3 present definite 0.2688427 of 0.7311573
## 4 absent definite 0.2688427 of 0.7311573
## 5 absent definite 0.2688427 of 0.2688427
## 6 absent definite 0.2688427 of 0.2688427
```

We’ll get to why this is useful in a moment … But let’s now evaluate the predictive capacity of the model:

```
(qwe <- table( # confusion matrix: cross-tabulate
"OBS"=x$GENITIVE, # observed genitives in the rows
"PREDS"=x$PREDS.CAT)) # predicted orders in the columns
```

```
## PREDS
## OBS of
## of 2720
## s 880
```

`c("R2"=R2(m.01), "C"=C.score(m.01)) # R-squared & C`

```
## R2 C
## 0.004212717 0.530915775
```

```
# evaluating the confusion matrix here in the usual way doesn't make much sense ...
# c( # evaluate the confusion matrix
# "Class. acc." =mean(x$GENITIVE==x$PREDS.CAT), # (tp+tn) / (tp+tn+fp+fn)
# "Prec. for s" =qwe[ "s", "s"] / sum(qwe[ , "s"]),
# "Rec. for s" =qwe[ "s", "s"] / sum(qwe[ "s", ]),
# "Prec. for of"=qwe["of","of"] / sum(qwe[ ,"of"]),
# "Rec. for of" =qwe["of","of"] / sum(qwe["of", ]))
c("R2"=R2(m.01), "C"=C.score(m.01)) # R-squared & C
```

```
## R2 C
## 0.004212717 0.530915775
```

Let’s visualize the nature of the effect of `MODALITY`

based on the predictions:

```
(ph <- data.frame( # make ph a data frame of
mod <- effect( # the effect
"MODALITY", # of MODALITY
m.01))) # in m.01
```

```
## MODALITY fit se lower upper
## 1 spoken 0.2688427 0.010800758 0.2482073 0.2905308
## 2 written 0.2229765 0.009511784 0.2048903 0.2421729
```

```
plot(mod, # plot the effect of MODALITY
type="response", # but plot predicted probabilities
ylim=c(0, 1), # ylim: the whole range of possible probabilities
grid=TRUE) # & w/ a grid
```

We saw above that the deviance of a `glm`

is again the sum
of squared residuals, which means we again want to reduce the deviance
of our model relative to a null model by adding (hopefully good)
predictors. But what is deviance here in the case of a `glm`

exactly, how it is computed? To understand, let’s look at the
predictions of `m.01`

again:

`head(x[,c(4,10:12)])`

```
## GENITIVE PREDS.NUM.2 PREDS.CAT PREDS.NUM.obs
## 1 of 0.2688427 of 0.7311573
## 2 of 0.2688427 of 0.7311573
## 3 of 0.2688427 of 0.7311573
## 4 of 0.2688427 of 0.7311573
## 5 s 0.2688427 of 0.2688427
## 6 s 0.2688427 of 0.2688427
```

What would the best model ever have returned for these 6 cases? Something like this, where the model

- predicts all instances of
*s*correctly because it returns a predicted probability of*s*of essentially 1; - predicts all instances of
*of*correctly because it returns a predicted probability of*s*of essentially 0:

```
## GENITIVE PREDS.NUM.2 PREDS.CAT PREDS.NUM.obs
## 1 of 0.00001 of 0.99999
## 2 of 0.00001 of 0.99999
## 3 of 0.00001 of 0.99999
## 4 of 0.00001 of 0.99999
## 5 s 0.99999 s 0.99999
## 6 s 0.99999 s 0.99999
```

That means we could use `1-PREDS.NUM.obs`

to quantify how
much off from a perfect prediction the model is, but there is an
alternative: We could taking the negative logs of
`PREDS.NUM.obs`

because then

- if the predicted probability of what was actually observed, which should be close to 1, is in fact close to 1 – i.e. the model is good – then we end up -logging something that’s very close to 1, which makes the negative logs really close to 0:

`-log(0.8)`

`## [1] 0.2231436`

`-log(0.9)`

`## [1] 0.1053605`

`-log(0.95)`

`## [1] 0.05129329`

- if the predicted probability of what was actually observed, which should be close to 1, is in fact close to 0 – i.e. the model is bad – then we end up -logging something that’s very close to 0, which makes the negative logs grow and grow:

`-log(0.2)`

`## [1] 1.609438`

`-log(0.1)`

`## [1] 2.302585`

`-log(0.05)`

`## [1] 2.995732`

I am calling these values **contributions to logloss**
(because the mean of all those values is a evaluation metric called
*logloss*. But what does this have to do with deviance? This:

`deviance(m.01) # same as `

`## [1] 3994.079`

```
# the sum of all contributions to logloss times 2
sum(-log(x$PREDS.NUM.obs)) * 2
```

`## [1] 3994.079`

That means, the deviance of such a `glm`

, what the summary
calls “residual deviance”, expresses how far all the predicted
probabilities of the events that *should* have been predicted are
from their theoretical ideals of (very near to) 0 and (very near to) 1!
That actually means we can also again express how much the deviance of
the null model, the null deviance, is reduced by the predictors in
percent:

`(m.01$null.deviance-m.01$deviance) / m.01$null.deviance`

`## [1] 0.002545694`

This is also an *R*^{2}-value – not the one that is
mostly used and that we computed above, Nagelkerke’s
*R*^{2}, but one that is called McFadden’s
*R*^{2}.

Does the choice of a genitive construction vary as a function on the
degree/kind of animacy of the possessor (*animate*
vs. *collective* vs. *inanimate* vs. *locative*
vs. *temporal*)?

Some exploration of the relevant variables:

```
# the predictor(s)/response on its/their own
table(x$GENITIVE)
```

```
##
## of s
## 2720 880
```

`table(x$POR_ANIMACY)`

```
##
## animate collective inanimate locative temporal
## 920 607 1671 243 159
```

```
# the predictor(s) w/ the response
table(x$POR_ANIMACY, x$GENITIVE)
```

```
##
## of s
## animate 370 550
## collective 408 199
## inanimate 1638 33
## locative 199 44
## temporal 105 54
```

Let’s fit our regression model:

```
summary(m.01 <- glm( # make/summarize the gen. linear model m.01:
GENITIVE ~ 1 + # GENITIVE ~ an overall intercept (1)
POR_ANIMACY, # & the predictor POR_ANIMACY
data=x, # those variables are in x
family=binomial, # the response is binary
na.action=na.exclude)) # skip cases with NA/missing data
```

```
##
## Call:
## glm(formula = GENITIVE ~ 1 + POR_ANIMACY, family = binomial,
## data = x, na.action = na.exclude)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3497 -0.6321 -0.1997 -0.1997 2.8017
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.39642 0.06724 5.896 3.73e-09 ***
## POR_ANIMACYcollective -1.11438 0.10953 -10.174 < 2e-16 ***
## POR_ANIMACYinanimate -4.30114 0.18823 -22.851 < 2e-16 ***
## POR_ANIMACYlocative -1.90553 0.17965 -10.607 < 2e-16 ***
## POR_ANIMACYtemporal -1.06139 0.18045 -5.882 4.06e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 4004.3 on 3599 degrees of freedom
## Residual deviance: 2766.0 on 3595 degrees of freedom
## AIC: 2776
##
## Number of Fisher Scoring iterations: 6
```

```
pchisq( # compute the area under the chi-squared curve
q=4004.3-2766, # for this chi-squared value: null - residual deviance
df=3599-3595, # at this df: null - residual df
lower.tail=FALSE) # only using the right/upper tail/side
```

`## [1] 7.926259e-267`

```
drop1( # drop each droppable predictor at a time
m.01, # from the model m.01 &
test="Chisq") # test its significance w/ a LRT
```

```
## Single term deletions
##
## Model:
## GENITIVE ~ 1 + POR_ANIMACY
## Df Deviance AIC LRT Pr(>Chi)
## <none> 2766.0 2776.0
## POR_ANIMACY 4 4004.3 4006.3 1238.3 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Let’s make (numerical and categorical) ‘predictions’ and evaluate them:

```
x$PREDS.NUM.2 <- predict( # make x$PREDS.NUM.2 the result of predicting
m.01, # from m.01
type="response") # predicted probabilities of "s"
# if you don't say type="response", predict gives you log odds
x$PREDS.CAT <- factor( # make x$PREDS.CAT the result of checking
ifelse(x$PREDS.NUM.2>=0.5, # if the predicted prob. of "s" is >=0.5
levels(x$GENITIVE)[2], # if yes, predict "s"
levels(x$GENITIVE)[1])) # otherwise, predict "of"
(qwe <- table( # confusion matrix: cross-tabulate
"OBS"=x$GENITIVE, # observed genitives in the rows
"PREDS"=x$PREDS.CAT)) # predicted orders in the columns
```

```
## PREDS
## OBS of s
## of 2350 370
## s 330 550
```

```
c( # evaluate the confusion matrix
"Class. acc." =mean(x$GENITIVE==x$PREDS.CAT, na.rm=TRUE), # (tp+tn) / (tp+tn+fp+fn)
"Prec. for s" =qwe[ "s", "s"] / sum(qwe[ , "s"]),
"Rec. for s" =qwe[ "s", "s"] / sum(qwe[ "s", ]),
"Prec. for of"=qwe["of","of"] / sum(qwe[ ,"of"]),
"Rec. for of" =qwe["of","of"] / sum(qwe["of", ]))
```

```
## Class. acc. Prec. for s Rec. for s Prec. for of Rec. for of
## 0.8055556 0.5978261 0.6250000 0.8768657 0.8639706
```

```
c("Nagelkerke's R2"=R2(m.01),
"McFadden's R2" =(m.01$null.deviance-m.01$deviance) / m.01$null.deviance,
"C"=C.score(m.01)) # R-squared values & C
```

```
## Nagelkerke's R2 McFadden's R2 C
## 0.4336234 0.3092390 0.8472389
```

Again, we also add our column `PREDS.NUM.obs`

with, for
each case, the predicted probability of the level that was observed in
the data:

```
x$PREDS.NUM.obs <- ifelse( # x$PREDS.NUM.obs is determined by ifelse
x$GENITIVE=="s", # if the obs genitive is the 2nd level of the response
x$PREDS.NUM.2, # take its predicted probability
1-x$PREDS.NUM.2) # otherwise take 1 minus its predicted probability
head(x)
```

```
## CASE SPEAKER MODALITY GENITIVE POR_LENGTH PUM_LENGTH POR_ANIMACY
## 1 2 nns spoken of 13 7 collective
## 2 3 nns spoken of 22 7 animate
## 3 4 nns spoken of 11 8 animate
## 4 5 nns spoken of 26 4 collective
## 5 6 nns spoken s 8 4 animate
## 6 7 nns spoken s 7 3 animate
## POR_FINAL_SIB POR_DEF PREDS.NUM.2 PREDS.CAT PREDS.NUM.obs
## 1 absent definite 0.3278418 of 0.6721582
## 2 absent definite 0.5978261 s 0.4021739
## 3 present definite 0.5978261 s 0.4021739
## 4 absent definite 0.3278418 of 0.6721582
## 5 absent definite 0.5978261 s 0.5978261
## 6 absent definite 0.5978261 s 0.5978261
```

Let’s visualize the nature of the effect of `POR_ANIMACY`

based on the predictions:

```
(ph <- data.frame( # make ph a data frame of
ani <- effect( # the effect
"POR_ANIMACY", # of POR_ANIMACY
m.01))) # in m.01
```

```
## POR_ANIMACY fit se lower upper
## 1 animate 0.59782609 0.016165922 0.56577463 0.62906282
## 2 collective 0.32784185 0.019053448 0.29164055 0.36621363
## 3 inanimate 0.01974865 0.003403457 0.01407325 0.02764863
## 4 locative 0.18106996 0.024702646 0.13756935 0.23458435
## 5 temporal 0.33962264 0.037557428 0.27028269 0.41659580
```

```
plot(ani, # plot the effect of POR_ANIMACY
type="response", # but plot predicted probabilities
ylim=c(0, 1), # ylim: the whole range of possible probabilities
grid=TRUE) # & w/ a grid
```

Given a lot of previous literature on the genitive alternation, you
might actually have hypothesized that the only thing one needs is a
distinction between (typically) animate and inanimate possessors. Let’s
use model comparison to check this. First, we create a version of
`POR_ANIMACY`

that conflates levels as desired:

```
x$POR_ANIMACY.confl <- x$POR_ANIMACY # create a copy of POR_ANIMACY
levels(x$POR_ANIMACY.confl) <- # overwrite the copy's levels such that
c("anim/coll", "anim/coll", # "animate" and "collective" become "anim/coll"
"inanim/loc/temp", "inanim/loc/temp", "inanim/loc/temp") # all others become "inanim/loc/temp"
table(POR_ANIMACY=x$POR_ANIMACY, POR_ANIMACY.confl=x$POR_ANIMACY.confl) # check you did it right
```

```
## POR_ANIMACY.confl
## POR_ANIMACY anim/coll inanim/loc/temp
## animate 920 0
## collective 607 0
## inanimate 0 1671
## locative 0 243
## temporal 0 159
```

`head(x <- x[,c(1:7,13,8:12)])`

```
## CASE SPEAKER MODALITY GENITIVE POR_LENGTH PUM_LENGTH POR_ANIMACY
## 1 2 nns spoken of 13 7 collective
## 2 3 nns spoken of 22 7 animate
## 3 4 nns spoken of 11 8 animate
## 4 5 nns spoken of 26 4 collective
## 5 6 nns spoken s 8 4 animate
## 6 7 nns spoken s 7 3 animate
## POR_ANIMACY.confl POR_FINAL_SIB POR_DEF PREDS.NUM.2 PREDS.CAT PREDS.NUM.obs
## 1 anim/coll absent definite 0.3278418 of 0.6721582
## 2 anim/coll absent definite 0.5978261 s 0.4021739
## 3 anim/coll present definite 0.5978261 s 0.4021739
## 4 anim/coll absent definite 0.3278418 of 0.6721582
## 5 anim/coll absent definite 0.5978261 s 0.5978261
## 6 anim/coll absent definite 0.5978261 s 0.5978261
```

We already have the model with `POR_ANIMACY`

as the only
predictor so we only need to fit a new model with
`POR_ANIMACY.confl`

as the only predictor:

```
summary(m.01.animconfl <- glm( # make/summarize the gen. linear model m.01.animconfl:
GENITIVE ~ 1 + POR_ANIMACY.confl, # ORDER ~ an overall intercept (1) & POR_ANIMACY.confl
data=x, family=binomial, na.action=na.exclude)) # the other arguments
```

```
##
## Call:
## glm(formula = GENITIVE ~ 1 + POR_ANIMACY.confl, family = binomial,
## data = x, na.action = na.exclude)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.1613 -0.3613 -0.3613 -0.3613 2.3501
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.03799 0.05119 -0.742 0.458
## POR_ANIMACY.conflinanim/loc/temp -2.65829 0.10377 -25.616 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 4004.3 on 3599 degrees of freedom
## Residual deviance: 3093.4 on 3598 degrees of freedom
## AIC: 3097.4
##
## Number of Fisher Scoring iterations: 5
```

… and then we do a comparison with `anova`

just like
above, the only difference being the argument to `test`

(because we’re now fitting a generalized linear model).

```
anova( # compare
m.01, # the model w/ the 5-level predictor POR_ANIMACY
m.01.animconfl, # & the model w/ the 2-level predictor POR_ANIMACY.confl
test="Chisq") # using a LRT
```

```
## Analysis of Deviance Table
##
## Model 1: GENITIVE ~ 1 + POR_ANIMACY
## Model 2: GENITIVE ~ 1 + POR_ANIMACY.confl
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 3595 2766.0
## 2 3598 3093.4 -3 -327.39 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

The difference between the two models is hugely significant: The
finer resolution of `POR_ANIMACY`

compared to
`POR_ANIMACY.confl`

makes the model significantly better (in
the sense of having a significantly smaller deviance): LR-test statistic
(the difference between the models’ deviances)=327.39, *df*=3,
*p*<0.00001. (If you really understood everything well, you
would have know this already from the summary of
`m.01.animconfl`

– why?)

The way the contributions to logloss and the deviance are computed do not change just because we have (a) different predictor(s) now: The deviance is still the sum of all contributions to logloss times 2:

`deviance(m.01)`

`## [1] 2765.995`

`sum(-log(x$PREDS.NUM.obs)) * 2`

`## [1] 2765.995`

`m.01$deviance`

`## [1] 2765.995`