1 Introduction
Cumulative link models were proposed by McCullagh (1980), see also Agresti (2010), and are the most popular tool to handle ordinal outcomes, which are pervasive in many disciplines. One of the reasons for their popularity relies on the use of a single regression coefficient for all response levels, making the effect simple to summarize. For these models, maximum likelihood (ML) is the most common estimation method. Despite this fact, it presents some problems and several proposals have been developed to solve them. One of the problems concerns the asymptotic approximation for the distribution of the ML estimator, which can be highly inaccurate with moderate sample information or sparse data. Another problem with ML estimation lies in boundary estimates, which can arise with positive probability in models for ordinal data and can cause several difficulties in the fitting process and inferential procedures.
The literature is rich in methods related to bias reduction of the ML estimator. Such methods can be distinguished (Kosmidis, 2014a) into explicit methods, that focus on correcting the estimate, and implicit methods, based on correction of the estimating function. The main disadvantage of the former lies in the need for finiteness of ML estimates which is overcome by the latter, one of the reasons for their spread in applied statistics.
The estimation approaches based on an adjustment of the score allow, by introducing an asymptotically negligible bias in the score function, to obtain the mean bias reduced (mean BR) estimator, proposed by Firth (1993) and developed in Kosmidis and Firth (2009, 2010), and the median bias reduced (median BR) estimator, proposed by Kenne Pagui et al. (2017). A unified presentation for generalized linear models is given by Kosmidis et al. (2020) and for general models in Kenne Pagui et al. (2019). Such approaches do not require the finiteness of the ML estimates. In addition, they are effective in preventing boundary estimates. The main difference between the two methods lies in the use of the mean and the median, respectively, as a centering index for the estimator. Mean BR achieves a first-order bias correction. The lack of equivariance under nonlinear reparameterizations is a disadvantage of this approach which is, however, overcome by practical advantages in applications. Median BR, developed in Kenne Pagui et al. (2017) and in a subsequent paper (Kenne Pagui et al., 2019), aims at median centering of the estimator, that is componentwise third-order median unbiased in the continuous case and equivariant under componentwise monotone reparameterizations.
Mean BR for cumulative link models is developed in Kosmidis (2014b), where finiteness and optimal frequentist properties are illustrated. Here we obtain the quantities needed to compute the median BR in cumulative link models. We use the simplified algebric form of the adjustment term developed in Kenne Pagui et al. (2019). We show, through extensive simulation studies, that the proposed method succeeds in achieving componentwise median centering, outperforms ML and is competitive with mean BR. Considering an ordinal probability effect measure, proposed by Agresti and Kateri (2017), we also analyze the behaviour under componentwise monotone reparameterizations, showing the good performance achieved by the median BR estimator. Finally, we present an application where the median BR approach, like mean BR, is seen to be able to prevent boundary estimates.
2 Cumulative link models
Let be the ordinal outcome, with categories, for subject , . Let be the probability to observe category , , for subject , and the cumulative probability. With , , a
-dimensional row vector of covariates, the cumulative link model
(McCullagh, 1980) links the cumulative probabilities to a linear predictor, , , via the relationship(1) |
where is a given link function and is the regression parameter vector. This class of models assumes that the effects of , expressed through , are the same for each . The intercept parameters , , satisfy , since is increasing in for each fixed . Model (1) has an interpretation in terms of an underlying latent variable (see e.g. Agresti, 2010, Section 3.3.2), that is the ordinal outcome
can be seen as the discretization of a latent continuous random variable
, satisfying a regression model , . The random variables are independent and identically distributed withand cumulative distribution function
. By assigning threshold values , , such that we observe if , with , the equivalent formulation of model (1) is obtainedwith . Common choices for
are the logistic, standard normal or extreme value distribution. The cumulative logit model, also known as proportional odds model
(McCullagh, 1980, Section 2), is obtained assuming , the cumulative probit model is recovered with , and the cumulative complementary log-log link model, also known as proportional hazards model (McCullagh, 1980, Section 3), setting .The popularity of model (1) is linked to its parsimony since it uses a single parameter for each predictor, in addition to the latent variable interpretation. The cumulative link model can be inadequate because of misspecification of the linear predictor or due to departure from the assumption that the covariate effect is the same for each , . Several models have been proposed that relax the latter assumption (for a detailed description see Fullerton and Xu, 2016). Instances are the partial cumulative link model, which first appeared in the literature as partial proportional odds model (Peterson and Harrell, 1990), or the nonparallel cumulative link model. Both include the cumulative link model as a special case. However, despite their flexibility, they may present some difficulties either from a computational or from the interpretation point of view, especially with data sets with several predictors.
2.1 Maximum likelihood, bias reduction and boundary estimates
As the sample size increases, the probability of unique ML estimates tends to one (McCullagh, 1980, Section 6.3). However, the ML estimator has a positive probability of being on the boundary of the parameter space. In cumulative link models (1), boundary estimates are estimates of the regression parameters with infinite components, and/or consecutive intercept estimates having the same value. Pratt (1981) showed that zero counts for a middle category , , produce consecutive equal intercept estimates, that is , and if the first or the last category have zero observed counts, then the estimates for or are infinite. Agresti (2010, Section 3.4.5) describes some settings where infinite ML estimates occur for the regression parameters.
Kosmidis (2014b) demonstrates that meanBR is a general effective strategy to prevent boundary estimates. The same advantage will be seen to hold for median BR in Sections 4 and 5. With particular regard to boundary estimates of the intercept parameters, Kosmidis (2014b, Section 8.3, Remark 1) showed that the ML estimate of the regression parameters is invariant with respect to grouping of unobserved categories with the adjacent ones. So, likelihood inference on the regression parameters is possible if one or more categories are unobserved. The same appears to hold for mean BR and will be seen to hold in all examples considered for median BR. The only difference with respect to ML estimates is that if the first or the last category has zero counts, then the mean and median BR estimates are tipically finite.
2.2 An ordinal probability effect measure
A useful monotone transformation of regression parameters related to binary covariates was proposed by Agresti and Kateri (2017) to overcome the difficulty for practitioners to interpret nonlinear measures, such as probits and odds ratios. This reparameterization allows an interpretation in terms of “ordinal superiority”, that is the probability that an observation from one group falls above an independent observation from the other group, adjusting for other covariates. For a vector of covariates , let
a binary variable which is a group indicator for an observation. Let
, be the independent outcomes from the groups and , respectively. For ordinal responses, the ordinal superiority measure, , is defined asBased on model (1), Agresti and Kateri (2017) show that the exact or approximate expressions of for the parameter related to the binary covariate, , are , considering the logit link function, for the probit link, and for the complementary log-log link.
3 Median bias reduction
For a regular parametric model with
-dimensional parameter , let be the log-likelihood based on a sample of size and , , the -th component of the score . Moreover, let be the observed information matrix and the expected information matrix, which we assume to be of order . We denote with the -th column of and with the element of .The median BR estimator, , is obtained as solution of the estimating equation , where
(2) |
with
The vector has components
with and , . The vector has components , where has elements
with the matrix given by
We refer to Kenne Pagui et al. (2019) for further details about the computation of and for the relation with the mean BR estimator (Firth, 1993), . The latter is seen to be based on an adjusted score of the form (2) with .
Kenne Pagui et al. (2017) show that in the continuous case, each component of , , , is median unbiased with an error of order , i.e. , compared to the ML estimator, which is median unbiased with an error of order . Moreover, the asymptotic distribution of is the same as that of the ML estimator, , and of the mean BR estimator, , that is .
The equation is usually solved numerically. Moreover, a finite solution is not always guaranteed. The numerical solutions of can be obtained by a Fisher scoring-type algorithm, whose -th iteration is
(3) |
which differs from the analogue for the ML estimates only by the addition of the term . We adopt, as a stopping criterion for the algorithm, the condition , for every , and we set, as default, .
The algorithm needs a starting value, , whose determination is not trivial and can result in nonconvergence of (3). When available, the ML estimate, , or the mean BR estimate, , are suitable starting values, which are also able to speed up the convergence. We set the starting values following a strategy similar to that used in Christensen (2019) for cumulative link models (1). The starting value for the regression coefficients, , is set to zero. The intercept parameters, , , are initialized to , where is the cumulative distribution function of the error terms, according to the latent variable interpetation discussed in Section 2.
In order to recognize boundary estimates, we adapt the diagnostics in Lesaffre and Albert (1989)
, identifying infinite estimates if their absolute value and the corresponding standard error are greater then some thresholds. Categories with zero observed counts are grouped, except when it happens at the extreme categories.
4 Simulation study
We conducted a simulation study to assess the performance of the median BR estimator, , in cumulative link models (1). We compare it with the ML, , and mean BR,
, estimators in terms of empirical probability of underestimation (PU%), estimated relative (mean) bias (RB%), and empirical coverage of the 95% Wald-type confidence interval (WALD%).
We consider sample sizes, , and different link functions , namely the logit, probit and complementary log-log (cloglog) link functions. We generate the covariate from a standard Normal, and
from Bernoulli distributions with probabilities 0.5 and 0.8 respectively, and
from a Poisson with mean 2.5. Assuming that the response has three categories, we fit the modelconsidering 10,000 replications, with covariates fixed at the observed value and true parameter . Setting for the logit link function, we use the approximate relations between the coefficients with different link functions leading to for the probit link function, and for the complementary log-log link function.
Table 1 contains the numerical results for all link functions considered. Boundary estimates occurred using ML with percentage frequencies 2.82%, 2.75% and 2.44%, with , and 0.08%, 0.1% and 0.04%, with , for the logit, probit and complementary log-log link functions, respectively. Instead, mean and median BR estimates are always finite. It appears that the new method proves to be remarkably accurate in achieving median centering and shows a lower estimated relative bias than ML and comparable with that of the mean BR estimator, as well as a good empirical coverage of the the 95% Wald-type confidence intervals. The differences between the three estimators are appreciable in lower sample size settings and become much less pronounced as the sample size increases.
Link | PU% | RB% | WALD% | PU% | RB% | WALD% | PU% | RB% | WALD% | |
---|---|---|---|---|---|---|---|---|---|---|
logit | 40.94 | 14.50 | 94.97 | 43.46 | 6.30 | 94.77 | 45.83 | 2.80 | 94.75 | |
55.34 | 14.90 | 94.76 | 54.27 | 6.60 | 94.93 | 52.06 | 2.50 | 94.88 | ||
44.63 | 13.50 | 96.48 | 46.91 | 9.10 | 95.32 | 47.39 | 4.60 | 94.97 | ||
62.99 | 16.50 | 95.19 | 59.19 | 7.00 | 94.92 | 56.22 | 3.20 | 95.36 | ||
54.14 | -0.50 | 95.94 | 51.99 | -0.20 | 95.34 | 51.64 | -0.30 | 95.23 | ||
48.38 | 0.90 | 96.35 | 49.51 | 0.60 | 95.77 | 48.60 | -0.30 | 95.45 | ||
53.01 | -0.30 | 96.96 | 52.64 | -0.50 | 96.06 | 51.27 | 0.00 | 95.52 | ||
45.71 | 0.40 | 94.96 | 47.47 | 0.00 | 95.11 | 47.89 | -0.10 | 95.35 | ||
50.83 | 2.90 | 95.92 | 50.05 | 1.20 | 95.47 | 50.01 | 0.40 | 95.25 | ||
50.12 | 4.20 | 95.89 | 50.67 | 2.10 | 95.64 | 49.62 | 0.40 | 95.34 | ||
50.12 | 8.70 | 97.03 | 50.60 | 2.90 | 95.97 | 49.99 | 1.50 | 95.39 | ||
50.22 | 4.30 | 95.54 | 50.34 | 1.70 | 95.25 | 50.07 | 0.70 | 95.51 | ||
probit | 40.31 | 14.50 | 94.12 | 42.82 | 6.17 | 94.21 | 45.23 | 2.83 | 94.41 | |
55.40 | 14.67 | 94.26 | 53.65 | 6.33 | 94.62 | 52.44 | 2.67 | 94.61 | ||
45.35 | 12.67 | 96.35 | 46.58 | 8.50 | 95.02 | 47.63 | 4.17 | 94.82 | ||
63.26 | 15.83 | 94.16 | 59.23 | 6.67 | 94.56 | 56.74 | 3.17 | 95.20 | ||
53.79 | -0.83 | 95.56 | 52.18 | -0.33 | 95.15 | 51.66 | -0.17 | 94.99 | ||
48.67 | 0.67 | 96.06 | 49.30 | 0.33 | 95.65 | 48.69 | -0.17 | 95.06 | ||
52.93 | -1.33 | 96.79 | 52.18 | -0.67 | 95.82 | 51.58 | -0.33 | 95.45 | ||
44.93 | -0.33 | 94.87 | 46.40 | -0.17 | 95.18 | 47.80 | 0.00 | 95.17 | ||
50.81 | 2.33 | 95.54 | 50.08 | 1.00 | 95.01 | 50.23 | 0.50 | 94.89 | ||
50.46 | 3.50 | 95.71 | 50.23 | 1.50 | 95.49 | 49.37 | 0.33 | 94.99 | ||
50.24 | 6.00 | 96.89 | 50.37 | 2.33 | 95.63 | 50.42 | 1.17 | 95.23 | ||
49.67 | 3.33 | 95.36 | 49.35 | 1.33 | 95.35 | 49.90 | 0.67 | 95.36 | ||
cloglog | 39.59 | 15.29 | 94.07 | 42.58 | 7.14 | 94.47 | 44.69 | 3.29 | 94.89 | |
55.42 | 13.86 | 94.25 | 53.82 | 5.86 | 94.60 | 52.85 | 2.86 | 94.79 | ||
46.72 | 15.57 | 95.46 | 46.31 | 11.43 | 95.57 | 47.27 | 5.86 | 95.33 | ||
62.53 | 16.00 | 94.23 | 59.16 | 7.14 | 94.87 | 56.04 | 3.29 | 95.11 | ||
55.26 | -1.14 | 95.36 | 53.07 | -0.29 | 94.89 | 52.19 | -0.29 | 95.04 | ||
48.95 | 0.57 | 96.09 | 49.17 | 0.00 | 95.53 | 49.46 | 0.00 | 95.21 | ||
54.39 | -0.86 | 95.83 | 52.99 | -0.43 | 95.86 | 52.02 | 0.14 | 95.73 | ||
44.90 | 0.29 | 94.73 | 47.13 | 0.14 | 94.94 | 47.32 | 0.00 | 95.37 | ||
51.31 | 2.57 | 95.40 | 50.33 | 1.43 | 95.01 | 50.28 | 0.71 | 95.07 | ||
50.55 | 3.43 | 95.72 | 50.20 | 1.29 | 95.33 | 50.25 | 0.71 | 95.12 | ||
50.77 | 12.14 | 96.04 | 50.16 | 4.71 | 95.86 | 50.10 | 2.57 | 95.69 | ||
49.95 | 4.14 | 95.29 | 50.73 | 2.00 | 95.17 | 49.52 | 0.86 | 95.50 |
Table 2 shows the estimated relative bias under monotone reparameterizations of the parameters related to the binary covariates, considering the ordinal probability effect measure presented in Section 2.2. In the new parameterization, it appears that the median BR estimator has the best performance in terms of estimated relative bias, if compared with ML and mean BR, which is not equivariant under this type of reparameterization.
link | |||||||
---|---|---|---|---|---|---|---|
logit | 1.58 | -1.05 | -0.42 | -1.30 | 4.15 | 1.21 | |
0.79 | -0.49 | -0.18 | -1.70 | 2.27 | 0.88 | ||
0.24 | -0.39 | -0.22 | -1.00 | 1.03 | 0.33 | ||
probit | 1.99 | -0.74 | -0.18 | -2.23 | -3.43 | 0.80 | |
0.93 | -0.36 | -0.09 | -2.09 | 1.73 | 0.48 | ||
0.38 | -0.26 | -0.14 | -1.10 | 0.80 | 0.21 | ||
cloglog | 1.39 | -1.11 | -0.55 | -1.18 | 5.18 | 1.30 | |
0.63 | -0.61 | -0.33 | -2.11 | 2.59 | 0.54 | ||
0.33 | -0.30 | -0.16 | -1.36 | 1.12 | 0.06 |
5 Application
We consider the data analysed in Randall (1989), related to a factorial experiment for investigating the factors that affect the bitterness of wine. There are two factors, temperature at the time of crashing the grapes, , and contact between juice and skin, . Each factor has two levels, “cold” and “warm” for temperature and “yes” and “no” for contact. For each of the four treatment conditions, two bottles were assessed by a panel of nine judges, giving observations. As in Christensen (2019, Section 4.8), we consider the outcomes obtained by combining the three central categories and we fit the model
Table 3 shows the coefficient estimates obtained with ML, mean BR and median BR. Both mean and median BR approaches are able to solve the boundary estimates problem.
ML | -1.32 (0.53) | () | () | -1.31 (0.71) |
meanBR | -1.25 (0.51) | 5.48 (1.48) | -3.43 (1.42) | -1.19 (0.67) |
medianBR | -1.29 (0.52) | 6.46 (2.32) | -4.48 (2.29) | -1.24 (0.68) |
Table 4 shows the simulation results for the regression parameters considering 10,000 replications, with covariates fixed at the observed value and true parameter . We found samples out of 10,000 with ML boundary estimates. Instead, mean and median BR estimates are always finite. The median BR is again highly accurate in achieving median centering and shows a lower estimated relative bias than ML, as well as a good empirical coverage of the 95% Wald-type confidence intervals.
Parameter | Parameter | |||||
---|---|---|---|---|---|---|
PU% | RB% | WALD% | PU% | RB% | WALD% | |
ML1 | 55.08 | 1.80 | 96.92 | 53.20 | 8.20 | 96.50 |
meanBR | 43.91 | -0.65 | 95.88 | 48.10 | 0.50 | 96.60 |
medianBR | 49.71 | 8.95 | 96.48 | 50.35 | 4.90 | 96.28 |
Under the monotone reparameterization of the coefficients related to the binary covariates, proposed by Agresti and Kateri (2017) and presented in Section 2.2, the estimated percentage relative bias is , and for , and , and for , with ML, mean BR and median BR, respectively. For ML, it should be recalled that the estimated relative bias is conditional upon finiteness of the estimates. It is noteworthy that the median BR estimator has lower estimated relative mean bias that the ML and the mean BR estimators.
References
- Ordinal probability effect measures for group comparisons in multinomial cumulative link models. Biometrics 73 (), pp. 214–219. Cited by: §1, §2.2, §5.
- Analysis of ordinal categorical data. 2nd ed. New York: Wiley. Cited by: §1, §2.1, §2.
- Ordinal - regression models for ordinal data. R package version 2019.12-10 http://CRAN.R-project.org/package=ordinal. External Links: Link Cited by: §3, §5.
- Bias reduction of maximum likelihood estimates. Biometrika 80, pp. 27–38. Cited by: Median bias reduction in cumulative link models, §1, §3.
- Ordered regression models: parallel, partial, and non-parallel alternatives. Boca Raton, FL: CRC Press. Cited by: §2.
- Median bias reduction of maximum likelihood estimates. Biometrika 104 (), pp. 923–938. External Links: Link Cited by: Median bias reduction in cumulative link models, §1, §3.
- Efficient implementation of median bias reduction. Submitted. https://arxiv.org/abs/2004.08630 (), pp. . External Links: Link Cited by: §1, §1, §3.
- Bias reduction in exponential family nonlinear models. Biometrika 96 (), pp. 793–804. Cited by: §1.
- A generic algorithm for reducing bias in parametric estimation. Electronic Journal of Statistics 4 (), pp. 1097–1112. Cited by: §1.
- Mean and median bias reduction in generalized linear models. Statistics and Computing 30 (), pp. 43–59. Cited by: §1.
- Bias in parametric estimation: reduction and useful side-effects. Wiley Interdisciplinary Reviews: Computational Statistics 6 (), pp. 185–196. Cited by: §1.
- Improved estimation in cumulative link models. Journal of the Royal Statistical Society: Series B (Methodological) 76 (), pp. 169–194. Cited by: §1, §2.1.
- Partial separation in logistic discrimination. Journal of the Royal Statistical Society: Series B (Methodological) 51 (), pp. 109–116. Cited by: §3.
- Regression models for ordinal data. Journal of the Royal Statistical Society: Series B (Methodological) 42 (), pp. 109–142. Cited by: §1, §2.1, §2.
-
Partial proportional odds models for ordinal response variables
. Journal of the Royal Statistical Society: Series C (Applied Statistics) 39 (), pp. 205–217. Cited by: §2. - Concavity of the log likelihood. Journal of the American Statistical Assocciation 76 (), pp. 103–106. Cited by: §2.1.
- The analysis of sensory data by generalized linear model. Biometrical Journal 31 (), pp. 781–793. Cited by: §5.
Comments
There are no comments yet.