1 Introduction
Data with correlated categorical responses arise frequently in applications. This may arise from units grouped into clusters (clustered data) or multiple measurements taken on the same unit (longitudinal data). For instance, we might expect the unemployment outcomes (employed, unemployed, not in labour force) of residents living in the same region to be correlated, due to similar job opportunities and socioeconomic levels. Ignoring the correlation structure and assuming that all observations are independent by fitting an ordinary multinomial regression model may result in biased estimates and inaccurate predictions. Multinomial mixed models can account for correlation by using group level random effects Daniels & Gatsonis (1997); Hartzel et al. (2001); Hedeker (2003).
For multinomial mixed models, it is a common practice to assume a multivariate normal distribution for the random effects. The multivariate normal distribution is easy to interpret and is convenient when we want to build more complicated correlation structures into our model. However, the resulting likelihood involves multidimensional integrals that cannot be solved analytically. The computational effort to evaluate the likelihood increases with the number of groups and categories, making it not suitable for large scale applications. In fact,
Lee, Green & Ryan (2017) showed that closedform likelihoods for multinomial mixed models do not exist regardless of the random effect distribution, except for the special case when there are no covariates.Various methods have been proposed to circumvent the computational obstacle for fitting multinomial mixed models. Among them are quadrature Hartzel et al. (2001); Hedeker (2003), Monte Carlo EM algorithm, pseudolikelihood approach Hartzel et al. (2001)
and Markov Chain Monte Carlo methods
Daniels & Gatsonis (1997). Jain et al. (1994)proposed a random effects estimation approach using a discrete probability distribution approximation. Simulation based methods such as method of simulated moments
McFadden (1989) and method of simulated maximum likelihood Gong et al. (2004); Hann & Uhlendorff (2006) are widely used in the econometrics literature. Recently, Perry (2016) proposed a fast momentbased estimation method that scales well for large samples and which arguably can be extended for fitting multinomial mixed models. Kuss & McLerran (2007) used the fact that the multinomial model is a multivariate binary model and exploited a procedure proposed by Wright (1998) for model fitting. Their approach has been criticized by de Rooij & Worku (2012) as they failed to realize that a multivariate link function is needed in the context of multinomial models. An alternative strategy using clustered bootstrap was subsequently proposed by de Rooij & Worku (2012). Although some authors have considered the Dirichletmultinomial model that results in a closedform likelihood, it does not allow the incorporation of individual level covariates.Chen & Kuo (2001)
advocate using Poisson loglinear or nonlinear mixed models, both with random effects, as surrogates to multinomial mixed models. Their method capitalizes on existing mixed models software packages for fitting generalized linear models with random effects. This allows multinomial mixed models to be fitted using an approximate likelihood from the Poisson surrogate models. Their results are based on extensions of the well known “Poisson Trick”
Baker (1994); McCullagh & Nelder (1989); Venables & Ripley (2002) that relates multinomial models with Poisson models via a respecification of the model formulae. Although clever, their methods have been criticized for being computationally inefficient MalchowMøller & Svarer (2003); Kuss & McLerran (2007) and sometimes producing nonsense results Kuss & McLerran (2007). This might be due to the intractable likelihoods of their models and the various approximation methods being used in different software packages. The considerable execution time is especially problematic, where it can take up to months to fit the model to a moderatesized dataset MalchowMøller & Svarer (2003)! In this article, we propose a new approach based on an extension of the GammaPoisson model Lee, Brown & Ryan (2017), where the likelihood can be expressed in closedform. Using the proposed estimation procedure, the parameters can be estimated via readily available packages for fitting generalized linear models.The remaining paper is organized as follows. Section 2 reviews the “Poisson Trick” for multinomial regression models with independent responses, and suggests a parsimonious fitting strategy when all covariates are categorical. In Section 3 we propose a new approach for approximating the likelihood of multinomial regression models with random effects. The empirical performance of the proposed model is demonstrated via a simulation study and a reanalysis of the yogurt brand choice dataset as discussed by Chen & Kuo (2001) in Section 4. Finally, we conclude with a discussion and a summary of our findings in Section 5.
2 "Poisson Trick" for Independent Multinomial Responses
This section describes the relationship between multinomial and Poisson regression models for independent responses, which we refer to as the “Poisson Trick”. The results are based on the well known fact that given the sum, Poisson counts are jointly multinomially distributed McCullagh & Nelder (1989).
2.1 Derivation
Let be the
response vector for observation
with the corresponding probability vector , where indexes the multinomial category. A common approach to satisfy the two characteristics of probability (i) for all and ; and (ii) is via(1) 
where is a positive userspecified function of covariates and fixed effects , and . Depending on the type of variable under consideration, and can be indexed by various combinations of and (Croissant, 2013, pp. 78). Conditional on the multinomial sums , are independently multinomially distributed for each , i.e.
(2) 
In multinomial models, is treated as fixed. Suppose we instead treat as random and assume
(3) 
independently for each . This results in a multinomialPoisson mixture with the following joint probability function for each :
(4) 
The marginal probability of can then be obtained by summing the joint probability over all possible values of :
(5) 
Thus, allowing the multinomial sums to be random according to a Poisson distribution results in
(6) 
independently for each and . Summing over all observations, the loglikelihood is
(7) 
where and denote the Poisson and multinomial loglikelihood functions respectively, and . The second term on the left hand side is the model we would like to fit, and the term on the right hand side is the model we actually fit.
To show that the Poisson surrogate model is an exact fit to the multinomial model, first note the loglikelihood corresponding to the multinomial model is
(8) 
and the loglikelihood of the Poisson surrogate model is
(9) 
Differentiating Equation 9 with respect to and setting it to , we obtain . Plugging in the maximizing value of into Equation 9, we have
(10) 
Equation 10 is identical to Equation 8
, up to an additive constant. It follows that the maximum likelihood estimates, their asymptotic variances and tests for the fixed effects can be
exactly recovered under the Poisson surrogate model Richards (1961). That is, likelihood inference for is the same whether we regard as fixed (multinomial) or randomly sampled from independent Poissons. This result applies to the fixed effects model, with any parameterization of , including:
Exponential transformations of linear combinations of categorical variables and regression coefficients McCullagh & Nelder (1989); Agresti (2013),

Exponential transformations of linear combinations of continuous variables and regression coefficients,

Any monotonic transformations of linear combinations of covariates and regression coefficients,

Nonlinear functions of covariates and regression coefficients,

Nonparametric formulations.
The Poisson surrogate model eliminates from the denominator of the multinomial probabilities. This makes sense intuitively, as we do not expect the multinomial sums to provide any useful information in estimating the fixed effects. Given that can also be obtained by setting the fitted values of the multinomial sums equal to the observed counts in the Poisson surrogate model, has the effect of recovering the multinomial sums. The key idea is to include a separate constant for each unique combination of covariates in the Poisson surrogate models.
2.2 Specifying Model Formulae for Poisson Surrogate Models
For purposes of exposition, the model formulae in this section are written in terms of the R language R Development Core Team (2017), although this article is not concerned with software packages per se. Multinomial models are fitted using the multinom() function within the nnet package Ripley & Venables (2016); Poisson models are fitted using the glm() function within the stats package.
For concreteness, consider the nonparallel baseline category logit models. The “baseline category logit” assumption refers to the following: treating category as the baseline category with without loss of generality, we model = as a linear function of covariates and regression coefficients . This assumption is not necessary, but chosen so that the model formulae can be illustrated using functions within the stats package. The “nonparallel” assumption refers to covariate effects that vary across categories Fullerton & Xu (2016), i.e. all elements of the vector are indexed by . That is, if the set of logits are plotted against the covariate on the same graph, a set of straight lines with slopes that are in general different will be obtained. Later we shall discuss cases where we relax this assumption.
Consider a hypothetical dataset with two predictors and (these can be categorical or continuous) and a multinomial outcome vector with categories. In short format, each row of data represents an observation with a dimensional outcome vector . Poisson models treat the outcomes of each observation as independent and glm() requires data to be presented in long format. This requires an additional factor that denotes the category memberships. Each row now comprises a scalar outcome, resulting in rows of data per observation. The first few rows of data in both short and long format are shown in Table 1.


Table 2 shows the equivalant relationship between nonparallel multinomial models and the corresponding Poisson models, where the parameters satisfy the usual constraints for identifiability. The Poisson surrogate models possess several important features:

The model includes an indicator variable (that corresponds to in Section 2.1) for each observation, although this can be simplified when all covariates are categorical. This is to ensure the exact recovery of the multinomial sums, as the fixed sums are treated as random in the Poisson models. As a result, we do not interpret the coefficients of since they are just nuisance parameters.

The category membership indicator enters as a covariate in the Poisson models, where the coefficients correspond to the intercepts in the multinomial modelsv so that the counts are allowed to vary by category.

The model includes interaction terms between and (denoted by in the model formula), where the coefficients correspond to the slopes in the multinomial models. This is due to the nonparallel assumption where each category has a separate slope, and also the fact that multinomial models treat the response counts jointly for each observation, whereas Poisson models treat each response count as a separate observation. It is important that these interaction terms are included even if they are not significant. For multinomial models where some (partial models) or all (parallel models) of the covariate effects do not vary across categories, the equivalent Poisson models can be obtained by modifying the interaction structure between and accordingly. For instance, in parallel models where all categories share the same covariate effects, there is no need to include the interation terms between and , since the slopes do not vary across categories.
Multinomial  Poisson 

Syntax for using multinom() in R, where data are presented in short format and is a vector of response counts.

Syntax for using glm() in R, where data are presented in long format and is a scalar response count.
When writing the model formula, it is important to specify and as factors due to their categorical nature. This can be achieved via the factor() function in R.
Special Case: Categorical Covariates
When all covariates are categorical, the model formulae in Table 3 offer a more parsimonious option for fitting the Poisson models without having to estimate a separate parameter for each observation.
Multinomial  Poisson 

Syntax for using multinom() in R, where data are presented in short format and is a vector of response counts.

Syntax for using glm() in R, where data are presented in long format and is a scalar response count.
As stated above, the key to achieving the 11 correspondence between multinomial and Poisson models (with the same link function) is to include a separate constant for each unique combination of covariates. For categorical covariates, this can be achieved by including the full interaction among the predictors in the Poisson model. When all the covariates are categorical, the interaction term has the precise effect of pooling groups of observations with identical covariates. Fitting such models is equivalent to fitting the observation index as a factor (Table 2), but the pooling results in a smaller effective data frame, and therefore smaller storage requirements and faster fitting speed, with no loss of information. Of course, if there are many factors, there may not be much saving, because it will be comparatively rare for different observations to have all the same factor level combinations.
3 Extending the “Poisson Trick” for Correlated Multinomial Responses
3.1 Derivation
Consider a set of observations which fall into a collection of groups and let be a vectorvalued random effect for group . Each observation belongs to only a single group. Extending the notation in Section 2, the response vector for observation in group is , with the corresponding probability vector , where . Conditional on the multinomial sums and the random effects , the counts are Multinomially distributed:
(11) 
In analogy to the results in Section 2, given the random effects, we treat as random and assume
(12) 
independently for each and . This gives
(13) 
independently for each and . The probability argument in Equation 7 still holds, now conditional on the random effects:
(14) 
where . If the random effects are observed, the conditional probability statement above imply a 11 exact correspondence between the multinomial and the Poisson surrogate models. However, due to the unobserved nature of the random effects, interest lies in the marginal distribution, obtained by integrating out the random effects. This results in an approximate relationship between the two models. It turns out that the marginal likelihood of the approximating Poisson surrogate model (right hand side of Equation 14) can be expressed in closedform if we assume an independent Gamma model for the random effects, with and , i.e. Lee, Green & Ryan (2017).
With this assumption for the distribution of the random effects, the marginal likelihood of the multinomial model that we would like to fit (second term on the left hand side of Equation 14) is given by
(15) 
This does not generally exhibit a closedform solution regardless of the random effect distribution, unless in the special cases of no covariate or with only group specific covariates Lee, Green & Ryan (2017). Numerical or simulation methods can be used to approximate the likelihood, with computational efforts increasing with increasing number of groups and categories. On the other hand, the marginal likelihood of the Poisson surrogate model can be expressed in closedform:
(16) 
The Poisson surrogate model is an extension of the GammaPoisson model as proposed by Lee, Brown & Ryan (2017) and Lee, Green & Ryan (2017) to allow the modelling of counts for multiple categories.
As a consequence of Equation 16, we have
(17) 
Refer to the appendix for details. This is the populationaveraged expected value and is not suitable for prediction in general, as it does not take into account the cluster effect. However, it can be useful for out of sample prediction, when there are no samples present in a particular group.
Special Case: Var() approaches
When Var() approaches for all , the model reduces to the special case of no random effects as outlined in Section 2, and the exact correspondence between the multinomial and the Poisson models can be regained.
3.2 Identifiability
There is some lack of identifiability with the model formulation given by Equation 16, characterized by nonuniqueness of the maximum likelihood estimates. There is an identifiability issue between and , and also between and the category intercepts. To fix this, we impose the constraint of so that . As a consequence, and . Also, we only require a random effect for each logit, and thus a constraint for the random effects associated with the baseline category is needed. Denote . Several authors such as Agresti (2013) (pp.514) and Hartzel et al. (2001) considered a multivariate normal distribution for the random effects, i.e. , where is a by variancecovariance matrix. This is equivalent to saying that for all , or . The equivalent statement in our proposed model is to fix for all . This is tantamount to saying approaches , and thus approaches .
3.3 Prediction of Random Effects and Fitted Values
We focus on the best predictor (BP) for random effects prediction, i.e. the predictor that minimises the overall mean squared error of prediction. McCulloch et al. (2008) shows that the BP is given by the posterior expectation of the random effect. Under the proposed Poisson surrogate model, the BP is given by
(18) 
which can be calculated via
(19) 
Solving for the integral, the BP is
(20) 
where . depends on the parameters , and , in which we replace by their estimators, leading to the empirical best predictor (EBP). The fitted values can then be defined as
(21) 
where we replace and by their respective estimators and .
3.4 Parameter Estimation
Consider the parameterization which is widely adopted in practice, where , where and are both vectors. The chosen index structure for and encompasses a variety of possible scenarios: (i) categoryspecific predictors with generic coefficients , (ii) categoryspecific predictors with categoryspecific coefficients , and (iii) observationspecific predictors with categoryspecific coefficients
. This can be achieved by creating the appropriate interaction terms between the predictor and the category indicator variable, thus modifying the model matrix. Note that observationspecific predictors must be paired with choicespecific coefficients. Otherwise they will disappear in the differentiation when we consider the logodds.
Denote , where includes the incidental parameters for all and . Algorithm 1 presents an Expectation/Conditional Maximization (ECM) algorithm Meng & Rubin (1993) for parameter estimation of the Poisson surrogate model. Refer to the appendix for a detailed derivation.

Obtain by fitting a Poisson loglinear model with as the response and as the design matrix, with as offset. includes indicator variables for each unique combination of covariates.

Obtain for each for to by maximizing
where is the gamma function.
4 Yogurt Brand Choice Dataset
We consider the yogurt brand choice dataset previously analyzed by Jain et al. (1994) and Chen & Kuo (2001). Jain et al. (1994) approximated the likelihood of a multinomial logit model with Gaussian random effects using a discrete distribution. Chen & Kuo (2001) approximated the multinomial logit model using the Poisson loglinear model and Poisson nonlinear model, both with Gaussian random effects.
The dataset consists of purchases of yogurt by a panel of households in Springfield, Missouri, and were originally provided by A. C. Nielsen. The data
were collected by optical scanners for about two years and correspond to purchases. Variables collected include brand, price and presence of newspaper feature advertisements for each purchase made by households in the panel. Price and feature advertisements are choicespecific variables. We assume a parallel baseline logit model by assigning generic coefficients to these variables, as we do not expect the effect of price and feature advertisements on the probability of purchase to vary according to brands. The four brands of yogurt: Yoplait, Dannon, Weight Watchers, and Hiland account for market shares of 34%, 40%, 23%, and 3% respectively. Following Chen & Kuo (2001), we put Hiland as the reference brand. Table 4 presents the yogurt data in both long and short format. The letters ‘f’ and ‘p’ represent the feature and price variables respectively, with the letter that follows denoting the brand. For instance, ‘fy’ stands for ‘feature of Yoplait’ and ‘pd’ stands for ‘price of Dannon’.


We fit the models proposed in Sections 2 and 3, and compare our results to that of Chen & Kuo (2001), fitted using the SAS macro GLIMMIX and the SAS procedure NLMIXED. The results are presented in Table 5. The preference ordering of the brands are the same for all models, i.e. Yoplait is the most preferred brand, followed by Dannon, Weight Watchers and Hiland. The slope parameters estimates have the expected signs for all models. An increase in price is associated with a decrease in the probability of purchase. Feature advertisement tends to increase the chance of purchase. The householdtohousehold variation in the probability of purchase for Weight Watchers is much larger than the other brands, although none are significant (p > 0.05).
In comparing the estimates between models, we note that the fixed effects model is likely to produce biased estimates as it did not take into account of the correlation induced by multiple purchases from the same household. The parameter estimates of NLMIXED and GammaPoisson are uniformly larger than that of GLIMMIX, except for the intercept associated with Weight Watchers (for NLMIXED) and for the slope associated with price (for GammaPoisson). The estimates of the standard errors from NLMIXED and GammaPoisson are also uniformly larger than that of GLIMMIX. These differences can be attributed to the different distributional assumptions of the random effects, and also the different approximations used in GLIMMIX and NLMIXED to estimate the intractable likelihood. In this regard, our model exhibit a closedform likelihood that allows exact inference to be performed with respect to the approximating model.
We tried to fit a simplified version of GLIMMIX using the glmer() function within the lme4 package in R, with just a random effect per household (ignoring the choice effect). However, the model failed to converge within a few months, even though Chen & Kuo (2001) claimed that the GLIMMIX model coverged in SAS.
Random Effects  
Variables  Fixed Effects  GLIMMIX  NLMIXED  GammaPoisson 
Dannon  3.716 (0.145)  3.838 (0.231)  4.130 (0.648)  4.616 (0.309) 
Weight  3.074 (0.145)  2.242 (0.241)  1.046 (0.671)  3.677 (0.392) 
Yoplait  4.450 (0.187)  4.626 (0.261)  4.805 (0.699)  5.275 (0.342) 
Feature  0.491 (0.120)  0.730 (0.121)  0.956 (0.185)  0.785 (0.178) 
Price  36.658 (2.437)  40.012 (2.562)  36.686 (3.725)  40.881 (3.778) 
N/A  N/A  N/A  2.203 (0.134)  
N/A  N/A  N/A  6.067 (0.374)  
N/A  N/A  N/A  1.918 (0.135) 

Fitted using the glm() function in R, using the “Poisson Trick” as outlined in Section 2.

Poisson loglinear model with Gaussian random effects, fitted by Chen & Kuo (2001) using the SAS macro GLIMMIX.

Poisson nonlinear model with Gaussian random effects, fitted by Chen & Kuo (2001) using the SAS procedure NLMIXED.

Poisson loglinear model with Gamma (multiplicative) random effects fitted using the ECM algorithm, as outlined in Section 3.
5 Concluding Remarks
In this article, we presented methods for fitting various multinomial regression models via the socalled “Poisson Trick” and its extensions. The “Poisson Trick” for fitting fixed effects multinomial regression models is handy when the direct fitting of multinomial models is not supported, for instance the INLA package Rue et al. (2009) in R. For multinomial regression models with random effects, there exist a variety of experience for using the existing extensions proposed by Chen & Kuo (2001), from taking months to fit a moderate sized dataset MalchowMøller & Svarer (2003), producing nonsense results Kuss & McLerran (2007) to nonconvergence in our experience of fitting the yogurt brand choice dataset. We proposed an extension of the “Poisson Trick” using Gamma (multiplicative) random effects. In contrast to the models by Chen & Kuo (2001), our model exhibits a closedform likelihood and can be maximized using existing functions for fitting generalized linear models that are stable and heavily optimized, without having to approximate the integrals.
6 Appendix
6.1 Derivation of the PopulationAveraged Expected Values in Equation 17
Equation 16 is also equivalent to
(22) 
This results in two different interpretations for the extended GammaPoisson surrogate model:
Taking expectation of both Equations 16 and 22 with respect to gives rise to the populationaveraged expected value given in Equation 17
. The definitions of negative multinomial and negative binomial distributions are given in the following subsections.
6.1.1 Negative Multinomial Distribution
This is the distribution on the nonnegative integers outcomes , with corresponding probability of occurence and probability mass function
for parameters and , where for all , and is the Gamma function. We write . For positive integer
, the negative multinomial distribution can be recognized as the joint distribution of the ntuple
when performing sampling until reaches the predetermined value . The mean vector of negative multinomial distribution is given by .6.1.2 Negative Binomial Distribution
This is the distribution on the nonnegative integers outcome , with corresponding probability of occurence and probability mass function
for parameters and . We write . For positive integer , the negative binomial distribution can be recognized as the distribution for the number of heads before the th tail in biased cointossing, but it is a valid distribution for all . In engineering, it is sometimes called the Pólya distribution in the case where is not integer.
6.2 Derivation of the Expectation/Conditional Maximixation (ECM) Algorithm in Section 3.4
Treating for all and to as missing data and for all , and as observed data, the complete data is . Denote , where includes the incidental parameters for all and . The complete data loglikeliood is
(23) 
The th Estep involves finding the conditional expectation of the complete data loglikelihood with respect to to the conditional distribution of given and the current estimated parameter . Straightforward algebra establishes that
(24) 
independently for each and
, where the gamma distribution is parameterized in terms of scale parameter. It follows that
(25)  
(26) 
Thus, in the th Estep, we replace and in Equation 23 with and , giving . The th CMstep then finds to maximize via a sequence of conditional maximization steps, each of which maximizes the function over a subset of , with the rest fixed at its previous value. In our application, it is natural to partition into and for each to . Differentiating Equation 23 with respect to , we obtain
(27) 
which is the score equation of the Poisson loglinear model McCullagh & Nelder (1989) with an additional offset . This allows us to leverage existing functions for fitting generalized linear models available in most statistical software packages for maximizing in the CM step. This is an important feature as often contains a huge amount of parameters in our applications, due to the inclusion the incidental parameter for every unique combination of covariates. Existing functions for fitting generalized linear models are typically stable and heavily optimized, even for a large number of parameters. Maximizing in the CM step for each q is straightforward, as it only involves univariate optimization.
Acknowledgements
Lee’s research is partially supported by the Australian Bureau of Statistics. The authors are grateful to Zhen Chen for providing the yogurt data.
References
 (1)
 Agresti (2013) Agresti, A. (2013), Categorical Data Analysis, Wiley.
 Baker (1994) Baker, S. G. (1994), ‘The multinomialpoisson transformation’, Journal of the Royal Statistical Society: Series D 43(4), 495–504.
 Chen & Kuo (2001) Chen, Z. & Kuo, L. (2001), ‘A note on the estimation of the multinomial logit model with random effects’, The American Statistician 55(2), 89–95.

Croissant (2013)
Croissant, Y. (2013), Estimation of
multinomial logit models in R: The mlogit Packages.
R package version 0.24.
https://cran.rproject.org/web/packages/mlogit/vignettes/mlogit.pdf  Daniels & Gatsonis (1997) Daniels, M. J. & Gatsonis, C. (1997), ‘Hierarchical polytomous regression models with applications to health services research’, Statistics in Medicine 16, 2311–2325.
 de Rooij & Worku (2012) de Rooij, M. & Worku, H. M. (2012), ‘A warning concerning the estimation of multinomial logistic models with correlated rresponse in SAS’, Computer methods and programs in biomedicine 107, 341–346.
 Fullerton & Xu (2016) Fullerton, A. S. & Xu, J. (2016), Ordered Regression Models: Parallel, Partial, and NonParallel Alternatives, CRC Press.
 Gong et al. (2004) Gong, X., van Soest, A. & Villagomez, E. (2004), ‘Mobility in the urban labor market: A panel data analysis for Mexico’, Economic Development and Cultural Change 53(1), 1–36.
 Hann & Uhlendorff (2006) Hann, P. & Uhlendorff, A. (2006), ‘Estimation of multinomial logit models with unobserved heterogeneity using maximum simulated likelihood’, The Stata Journal 6(2), 229–245.
 Hartzel et al. (2001) Hartzel, J., Agresti, A. & Caffo, B. (2001), ‘Multinomial logit random effects models’, Statistical Modelling 1(2), 81–102.

Hedeker (2003)
Hedeker, D. (2003), ‘A mixedeffects multinomial logistic regression model’,
Statistics in Medicine 22(9), 1433–1446.  Jain et al. (1994) Jain, D. C., Vilcassim, N. J. & Chintagunta, P. K. (1994), ‘A randomcoefficients logit brandchoice model applied to panel data’, Journal of Business & Economic Statistics 12(3), 317–328.
 Kuss & McLerran (2007) Kuss, O. & McLerran, D. (2007), ‘A note on the estimation of the multinomial logistic model with correlated rresponse in SAS’, Computer Methods and Programs in Biomedicine 87, 262–269.
 Lee, Brown & Ryan (2017) Lee, J. Y. L., Brown, J. J. & Ryan, L. (2017), ‘Sufficiency revisited: Rethinking statistical algorithms in the big data era’, The American Statistician (to appear) .
 Lee, Green & Ryan (2017) Lee, J. Y. L., Green, P. J. & Ryan, L. M. (2017), ‘Conjugate generalised mixed models for twolevel data’, Paper in preparation .
 MalchowMøller & Svarer (2003) MalchowMøller, N. & Svarer, M. (2003), ‘Estimation of the multinomial logit model with random effects’, Applied Economics Letters 10(7), 389–392.
 McCullagh & Nelder (1989) McCullagh, P. & Nelder, J. A. (1989), Generalized Linear Models, Chapman & Hall.
 McCulloch et al. (2008) McCulloch, C. E., Searle, S. R. & Neuhaus, J. M. (2008), Generalized, linear and mixed models, John Wiley & Sons.
 McFadden (1989) McFadden, D. (1989), ‘A method of simulated moments for estimation of discrete response models without numerical integration’, Econometrica 57(5), 995–1026.
 Meng & Rubin (1993) Meng, X.L. & Rubin, D. B. (1993), ‘Maximum likelihood estimation via the ECM algorithm: A general framework’, Biometrika 80(2), 267–278.
 Perry (2016) Perry, P. O. (2016), ‘Fast momentbased estimation for hierarchical models’, Journal of the Royal Statistical Society: Series B 79(1), 267–291.

R Development Core Team (2017)
R Development Core Team (2017), R: A
Language and Environment for Statistical Computing, Vienna, Austria: R
Foundation for Statistical Computing.
ISBN 3900051070, http://www.Rproject.org.
http://www.Rproject.org  Richards (1961) Richards, F. (1961), ‘A method of maximumlikelihood estimation’, Journal of Royal Statistical Society (Series B) 23, 469–475.

Ripley & Venables (2016)
Ripley, B. & Venables, W. (2016),
nnet: FeedForward Neural Networks and Multinomial LogLinear Models
. R package version 7.312. 
Rue et al. (2009)
Rue, H., Martino, S. & Chopin, N. (2009), ‘Approximate Bayesian inference for latent Gaussian models using integrated nested Laplace approximations (with discussion)’,
Journal of the Royal Statistical Society, Series B 71(2), 319–392.  Venables & Ripley (2002) Venables, W. & Ripley, B. (2002), Modern Applied Statistics with S, Springer.

Wright (1998)
Wright, S. (1998), Multivariate analysis using the MIXED procedure (paper 22923),
in ‘Proceedings of the 23rd Annual SAS Users Group (SUGI) International Conference’.
Comments
There are no comments yet.