Germán Rodríguez

Multilevel Models
Princeton University
The Stata manual [XT, p.249] analyzes data from the 1989 Bangladesh fertility survey, previously analyzed by Huq and Cleland, and by Ng et al.

. webuse bangladesh, clear (Bangladesh Fertility Survey, 1989) . desc Contains data from https://www.stata-press.com/data/r11/bangladesh.dta obs: 1,934 Bangladesh Fertility Survey, 1989 vars: 7 28 May 2009 20:27 size: 27,076 (99.9% of memory free) (_dta has notes) --------------------------------------------------------------------------------------------------------------- storage display value variable name type format label variable label --------------------------------------------------------------------------------------------------------------- district byte %9.0g District c_use byte %9.0g yesno Use contraception urban byte %9.0g urban Urban or rural age float %6.2f Age, mean centered child1 byte %9.0g 1 child child2 byte %9.0g 2 children child3 byte %9.0g 3 or more children --------------------------------------------------------------------------------------------------------------- Sorted by: district

We will fit a logit model predicting contraceptive use by urban residence, age and number of children, with a random intercept by district. We use xtmelogit to be able to obtain estimates of the district effect.

. xtmelogit c_use urban age child1 child2 child3 || district: , mle Refining starting values: Iteration 0: log likelihood = -1219.2682 Iteration 1: log likelihood = -1209.3544 Iteration 2: log likelihood = -1207.1895 Performing gradient-based optimization: Iteration 0: log likelihood = -1207.1895 Iteration 1: log likelihood = -1206.8323 Iteration 2: log likelihood = -1206.8322 Iteration 3: log likelihood = -1206.8322 Mixed-effects logistic regression Number of obs = 1934 Group variable: district Number of groups = 60 Obs per group: min = 2 avg = 32.2 max = 118 Integration points = 7 Wald chi2(5) = 109.60 Log likelihood = -1206.8322 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ c_use | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- urban | .7322764 .1194857 6.13 0.000 .4980887 .9664641 age | -.0264982 .0078916 -3.36 0.001 -.0419654 -.0110309 child1 | 1.116002 .1580921 7.06 0.000 .8061466 1.425856 child2 | 1.365895 .174669 7.82 0.000 1.02355 1.70824 child3 | 1.344031 .1796549 7.48 0.000 .9919141 1.696148 _cons | -1.68929 .1477592 -11.43 0.000 -1.978892 -1.399687 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ district: Identity | sd(_cons) | .4643477 .0789531 .3327464 .6479975 ------------------------------------------------------------------------------ LR test vs. logistic regression: chibar2(01) = 43.39 Prob>=chibar2 = 0.0000 . estimates store xtme . mata exp(st_matrix("e(b)")) 1 2 3 4 5 6 7 +---------------------------------------------------------------------------------------------------+ 1 | 2.07980972 .9738498451 3.052623893 3.919229511 3.834469892 .1846506578 .4643477362 | +---------------------------------------------------------------------------------------------------+

The estimates show that when we compare women with the same age and family size, contraceptive use is higher in urban than in rural areas, with double the odds of using contraception [exp(0.732)=2.08]. Within an area and family size, contraceptive use declines with age, with 2.6% lower odds per year of age [exp(-.026)-1 = -.026]. Within an area and age, contraceptive use is higher among women with a child, and much higher among women with two or more children, than among those with no children, with odds ratios of three and almost four [exp(1.116)=3.05, exp(1.366)=3.92 and exp(1.344)=3.834].

There is very subtantial variation in contraceptive use across districts. The standard deviation of 0.46 indicates that women in a district which is one standard deviation above the mean have odds of using contraception that are 59% higher than comparable women in an average district [exp(0.464) = 1.59]. The standard deviation is also equivalent to a correlation of 0.06 in the latent propensities to use contraception of comparable women in the same district [0.464^2/(0.464^2+ π^2/3) = 0.06].

We can identify districts where women are more or less likely to
use contraception by "estimating" the random effects. We can
obtain maximum likelihood estimates by treating the estimated
linear predictor from this model as an offset and then fitting
a separate regression model in each district, something easy to
do with `statsby`

as shown by Rabe-Hesketh and
Skrondal(2008, p. 265). So here we go:

. predict xb, xb . statsby mle=_b[_cons], by(district) saving(ml,replace): /// > logit c_use, offset(xb) (running logit on estimation sample) command: logit c_use, offset(xb) mle: _b[_cons] by: district Statsby groups ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 ..x.......x.....................................x. 50 .......... . merge m:1 district using ml Result # of obs. ----------------------------------------- not matched 0 matched 1,934 (_merge==3) -----------------------------------------

The mle is not defined when all women in a district have the same outcome, so everybody (or nobody) uses contraception. This occurs for 3 districts.

The empirical Bayes estimate, based on the mode of the posterior
distribution of the random effects, can be obtained with
`predict, ref`

. estimates restore xtme (results xtme are active now) . predict re, ref . bysort district: gen first=_n==1 . scatter re mle if first || function y=x, range(-1 1) /// > ,title(ML and EB Estimates of District Effects) /// > xtitle(Maximum likleihood) ytitle(Empirical Bayes) /// > legend(off) . graph export bd1.png, width(400) replace (file bd1.png written in PNG format)

Comparing the EB and ML estimates we see the usual shrinkage towards zero, which is particularly noticeable in four districts:

. egen ni = count(1), by(district) . egen pi = mean(c_use), by(district) . list district ni pi mle re if first & mle < -1, clean district ni pi mle re 352. 10 13 .0769231 -1.563583 -.3736235 764. 24 14 .0714286 -1.790981 -.4534406 1740. 55 6 .1666667 -1.609306 -.3196343 1851. 59 10 .1 -1.700366 -.3979423

These are all fairly small districts with less than 15 women each. However, there are five other districts with small sizes that do not exhibit this phenomenon. The reason is that in those cases the estimated random effect was close to zero anyway. (There are also two small districts with all zeroes or all ones.)

. list district ni pi mle re if first & ni < 15 & mle > -1, clean district ni pi mle re 138. 3 2 1 . .2074448 845. 26 13 .3846154 .0580263 .0228793 1101. 33 14 .4285714 -.1926461 -.0801968 1215. 37 13 .5384616 .7878244 .3173832 1228. 38 14 .2857143 -.4925295 -.1871244 1335. 42 11 .5454546 .5789949 .2067375 1600. 49 4 0 . -.1864481

One final word on EB estimates. If you use `gllamm`

to
fit this model you can use `gllapred`

to obtain
EB estimates. That procedure computes posterior means, whereas
`predict, ref`

computes posterior modes. Because the
posterior distributions are not symmetric these two estimates can
differ, but usually they are quite close.

Subject-specific probabilities can be computed easily from first principles by setting the fixed covariates and the random effects to illustrative values. In this dataset age is centered at the sample mean. Here are probabilities for a woman with one child and average age in urban and rural areas of the average district:

. di invlogit(_b[_cons] + _b[child1]) .36047847 . di invlogit(_b[_cons] + _b[child1] + _b[urban]) .53966357

We see that the odds ratio of about two translates to probabilities of 36.0 and 54.0%, or 18 percentage points.

Population-average probabilities can also be computed, but they
require integration over the distribution of the random effect.
At this time these probabilities can only be computed by using
`gllapred`

after `gllamm`

, as they have not
(yet) been added to `predict`

after `xtmelogit`

.
They can, however, be computed from first principles using Mata,
as illustrated below.

I wrote a Mata function to compute abscissas and weights for
integrating over a standard normal random variable by translating
the C code for Gauss-Hermite quadrature in Numerical Recipes in C.
The values for 12 quadrature points are also available for general
use in a Stata file called `gauher12.dta`

right here.
(Gauss-Hermite integrates over exp(-t^2), but I changed variables
to a standard normal, multipying the abcsissas by sqrt(2) and
dividing the weights by sqrt(π).)

Here's the calculation of the rural and urban probabilities averaged over the distribution of the district effects. We list the estimates to find the location of the coefficients of interest: the constant is 6th, the urban coef is first, the one-child coef is 3rd, and the log of sigma is 7th:

. mat list e(b) e(b)[1,7] eq1: eq1: eq1: eq1: eq1: eq1: lns1_1_1: urban age child1 child2 child3 _cons _cons y1 .73227641 -.02649815 1.1160015 1.3658951 1.3440312 -1.6892896 -.76712158 . mata ------------------------------------------------- mata (type end to exit) ------------------------------------- : gh12 = gauher(12) // first column has abcissas, second weights : b = st_matrix("e(b)") : sig = exp(b[7]) : u = r = 0 : for (i=1; i <= 12; i++) { > r = r + invlogit(b[6] + b[3] + sig*gh12[i,1]) * gh12[i,2] > u = u + invlogit(b[6] + b[3] + sig*gh12[i,1] + b[1]) * gh12[i,2] > } : r, u 1 2 +-----------------------------+ 1 | .3668268221 .537737042 | +-----------------------------+ : end ---------------------------------------------------------------------------------------------------------------

The population-average effect of urban residence is a difference from 36.7 to 53.8 or 17.1 percentage points, averaged across all districts.

The next model will treat the urban-rural difference as random at the district level. (Most districts have urban and rural areas.) This is somewhat equivalent to allowing an interation between urban residence and district, but instead of estimating a separate urban-rural difference for each district, we assume that the differences are drawn from a normal distribution:

. xtmelogit c_use urban age child1 child2 child3 /// > || district: urban, covariance(unstructured) mle Refining starting values: Iteration 0: log likelihood = -1215.8594 (not concave) Iteration 1: log likelihood = -1204.0802 Iteration 2: log likelihood = -1199.7994 Performing gradient-based optimization: Iteration 0: log likelihood = -1199.7994 Iteration 1: log likelihood = -1199.4784 Iteration 2: log likelihood = -1199.3158 Iteration 3: log likelihood = -1199.315 Iteration 4: log likelihood = -1199.315 Mixed-effects logistic regression Number of obs = 1934 Group variable: district Number of groups = 60 Obs per group: min = 2 avg = 32.2 max = 118 Integration points = 7 Wald chi2(5) = 97.50 Log likelihood = -1199.315 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ c_use | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- urban | .8157872 .1715519 4.76 0.000 .4795516 1.152023 age | -.026415 .008023 -3.29 0.001 -.0421398 -.0106902 child1 | 1.13252 .1603285 7.06 0.000 .818282 1.446758 child2 | 1.357739 .1770522 7.67 0.000 1.010724 1.704755 child3 | 1.353827 .1828801 7.40 0.000 .9953882 1.712265 _cons | -1.71165 .1605617 -10.66 0.000 -2.026345 -1.396954 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ district: Unstructured | sd(urban) | .8162856 .1975237 .5080068 1.31164 sd(_cons) | .6242943 .1035136 .451079 .8640247 corr(urban,_cons) | -.7964729 .1151556 -.9361775 -.4394904 ------------------------------------------------------------------------------ LR test vs. logistic regression: chi2(3) = 58.42 Prob > chi2 = 0.0000 Note: LR test is conservative and provided only for reference. . mata exp(st_matrix("e(b)")[1]) 2.260954838

It is *very* important in this model to specify the covariance
matrix of the random effects as unstructured, so we allow
correlation between the level of contraceptive use and the
urban-rural difference in each district.

The estimates show that the odds of using contraception on the average district are about double in urban compared to urban areas, but the difference varies substantially by district, as shown by the standard deviation of the urban coefficient, 0.82. For example the urban-rural difference in logits is essentially zero in a district with differential one standard deviation below the mean (0.8158-0.8163 = -0.001), and equivalent to an odds ratio of about five in a district with differential one standard deviation above the mean (0.8158+0.8163 = 1.632).

We now compute Empirical Bayes (EB) estimates of the random intercept and random slop, representing the estimated district effects on the leel of contraceptive use and the urban-rural difference. (We could compute ML estimates as well, using a procedure similar to the previous section, but we will focus on the EB estimates.)

. predict ri rs, reffects . scatter rs ri, title(District Effects on Contraceptive Use) /// > xtitle(Effect on Level of Use) /// > ytitle(Effect on Urban-Rural Differential) . graph export bd2.png, width(400) replace (file bd2.png written in PNG format)

We see a clear negative correlation. Districts where contraceptive use is higher than average, after considering the age and number of children of the residents, tend to show a much smaller urban-rural difference, reflecting the correlation of -0.796 estimated above.

We could also estimate separate random effects for urban and rural areas in each district by including dummies for both urban and rural and omitting the constant in the fixed and random parts. If you do that you will find that the two effects are nearly independent.

Continue with Poisson Models