Median bias reduction in cumulative link models

This paper presents a novel estimation approach for cumulative link models, based on median bias reduction as developed in Kenne Pagui et al. (2017). The median bias reduced estimator is obtained as solution of an estimating equation based on an adjustment of the score. It allows to obtain higher-order median centering of maximum likelihood estimates without requiring their finiteness. Moreover, the estimator is equivariant under componentwise monotone reparameterizations and the method is effective in preventing boundary estimates. We evaluate the properties of the median bias reduced estimator through simulation studies and compare it with the two main competitors, the maximum likelihood and the mean bias reduced (Firth, 1993) estimators. Finally, we show an application where the proposed estimator is able to solve the boundary estimates problem.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

04/18/2020

Efficient implementation of median bias reduction

In numerous regular statistical models, median bias reduction (Kenne Pag...
08/07/2020

Manifold-adaptive dimension estimation revisited

Data dimensionality informs us about data complexity and sets limit on t...
04/09/2017

Distributed Statistical Estimation and Rates of Convergence in Normal Approximation

This paper presents new algorithms for distributed statistical estimatio...
04/11/2018

Mean and median bias reduction in generalized linear models

This paper presents an integrated framework for estimation and inference...
11/05/2020

Accurate inference in negative binomial regression

Negative binomial regression is commonly employed to analyze overdispers...
01/11/2020

Empirical bias-reducing adjustments to estimating functions

We develop a novel, general framework for the asymptotic reduction of th...
04/19/2020

stochprofML: Stochastic Profiling Using Maximum Likelihood Estimation in R

Tissues are often heterogeneous in their single-cell molecular expressio...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Cumulative link models were proposed by McCullagh (1980), see also Agresti (2010), and are the most popular tool to handle ordinal outcomes, which are pervasive in many disciplines. One of the reasons for their popularity relies on the use of a single regression coefficient for all response levels, making the effect simple to summarize. For these models, maximum likelihood (ML) is the most common estimation method. Despite this fact, it presents some problems and several proposals have been developed to solve them. One of the problems concerns the asymptotic approximation for the distribution of the ML estimator, which can be highly inaccurate with moderate sample information or sparse data. Another problem with ML estimation lies in boundary estimates, which can arise with positive probability in models for ordinal data and can cause several difficulties in the fitting process and inferential procedures.

The literature is rich in methods related to bias reduction of the ML estimator. Such methods can be distinguished (Kosmidis, 2014a) into explicit methods, that focus on correcting the estimate, and implicit methods, based on correction of the estimating function. The main disadvantage of the former lies in the need for finiteness of ML estimates which is overcome by the latter, one of the reasons for their spread in applied statistics.

The estimation approaches based on an adjustment of the score allow, by introducing an asymptotically negligible bias in the score function, to obtain the mean bias reduced (mean BR) estimator, proposed by Firth (1993) and developed in Kosmidis and Firth (2009, 2010), and the median bias reduced (median BR) estimator, proposed by Kenne Pagui et al. (2017). A unified presentation for generalized linear models is given by Kosmidis et al. (2020) and for general models in Kenne Pagui et al. (2019). Such approaches do not require the finiteness of the ML estimates. In addition, they are effective in preventing boundary estimates. The main difference between the two methods lies in the use of the mean and the median, respectively, as a centering index for the estimator. Mean BR achieves a first-order bias correction. The lack of equivariance under nonlinear reparameterizations is a disadvantage of this approach which is, however, overcome by practical advantages in applications. Median BR, developed in Kenne Pagui et al. (2017) and in a subsequent paper (Kenne Pagui et al., 2019), aims at median centering of the estimator, that is componentwise third-order median unbiased in the continuous case and equivariant under componentwise monotone reparameterizations.

Mean BR for cumulative link models is developed in Kosmidis (2014b), where finiteness and optimal frequentist properties are illustrated. Here we obtain the quantities needed to compute the median BR in cumulative link models. We use the simplified algebric form of the adjustment term developed in Kenne Pagui et al. (2019). We show, through extensive simulation studies, that the proposed method succeeds in achieving componentwise median centering, outperforms ML and is competitive with mean BR. Considering an ordinal probability effect measure, proposed by Agresti and Kateri (2017), we also analyze the behaviour under componentwise monotone reparameterizations, showing the good performance achieved by the median BR estimator. Finally, we present an application where the median BR approach, like mean BR, is seen to be able to prevent boundary estimates.

2 Cumulative link models

Let be the ordinal outcome, with categories, for subject , . Let be the probability to observe category , , for subject , and the cumulative probability. With , , a

-dimensional row vector of covariates, the cumulative link model

(McCullagh, 1980) links the cumulative probabilities to a linear predictor, , , via the relationship

(1)

where is a given link function and is the regression parameter vector. This class of models assumes that the effects of , expressed through , are the same for each . The intercept parameters , , satisfy , since is increasing in for each fixed . Model (1) has an interpretation in terms of an underlying latent variable (see e.g. Agresti, 2010, Section 3.3.2), that is the ordinal outcome

can be seen as the discretization of a latent continuous random variable

, satisfying a regression model , . The random variables are independent and identically distributed with

and cumulative distribution function

. By assigning threshold values , , such that we observe if , with , the equivalent formulation of model (1) is obtained

with . Common choices for

are the logistic, standard normal or extreme value distribution. The cumulative logit model, also known as proportional odds model

(McCullagh, 1980, Section 2), is obtained assuming , the cumulative probit model is recovered with , and the cumulative complementary log-log link model, also known as proportional hazards model (McCullagh, 1980, Section 3), setting .

The popularity of model (1) is linked to its parsimony since it uses a single parameter for each predictor, in addition to the latent variable interpretation. The cumulative link model can be inadequate because of misspecification of the linear predictor or due to departure from the assumption that the covariate effect is the same for each , . Several models have been proposed that relax the latter assumption (for a detailed description see Fullerton and Xu, 2016). Instances are the partial cumulative link model, which first appeared in the literature as partial proportional odds model (Peterson and Harrell, 1990), or the nonparallel cumulative link model. Both include the cumulative link model as a special case. However, despite their flexibility, they may present some difficulties either from a computational or from the interpretation point of view, especially with data sets with several predictors.

2.1 Maximum likelihood, bias reduction and boundary estimates

As the sample size increases, the probability of unique ML estimates tends to one (McCullagh, 1980, Section 6.3). However, the ML estimator has a positive probability of being on the boundary of the parameter space. In cumulative link models (1), boundary estimates are estimates of the regression parameters with infinite components, and/or consecutive intercept estimates having the same value. Pratt (1981) showed that zero counts for a middle category , , produce consecutive equal intercept estimates, that is , and if the first or the last category have zero observed counts, then the estimates for or are infinite. Agresti (2010, Section 3.4.5) describes some settings where infinite ML estimates occur for the regression parameters.

Kosmidis (2014b) demonstrates that meanBR is a general effective strategy to prevent boundary estimates. The same advantage will be seen to hold for median BR in Sections 4 and 5. With particular regard to boundary estimates of the intercept parameters, Kosmidis (2014b, Section 8.3, Remark 1) showed that the ML estimate of the regression parameters is invariant with respect to grouping of unobserved categories with the adjacent ones. So, likelihood inference on the regression parameters is possible if one or more categories are unobserved. The same appears to hold for mean BR and will be seen to hold in all examples considered for median BR. The only difference with respect to ML estimates is that if the first or the last category has zero counts, then the mean and median BR estimates are tipically finite.

2.2 An ordinal probability effect measure

A useful monotone transformation of regression parameters related to binary covariates was proposed by Agresti and Kateri (2017) to overcome the difficulty for practitioners to interpret nonlinear measures, such as probits and odds ratios. This reparameterization allows an interpretation in terms of “ordinal superiority”, that is the probability that an observation from one group falls above an independent observation from the other group, adjusting for other covariates. For a vector of covariates , let

a binary variable which is a group indicator for an observation. Let

, be the independent outcomes from the groups and , respectively. For ordinal responses, the ordinal superiority measure, , is defined as

Based on model (1), Agresti and Kateri (2017) show that the exact or approximate expressions of for the parameter related to the binary covariate, , are , considering the logit link function, for the probit link, and for the complementary log-log link.

3 Median bias reduction

For a regular parametric model with

-dimensional parameter , let be the log-likelihood based on a sample of size and , , the -th component of the score . Moreover, let be the observed information matrix and the expected information matrix, which we assume to be of order . We denote with the -th column of and with the element of .

The median BR estimator, , is obtained as solution of the estimating equation , where

(2)

with

The vector has components

with and , . The vector has components , where has elements

with the matrix given by

We refer to Kenne Pagui et al. (2019) for further details about the computation of and for the relation with the mean BR estimator (Firth, 1993), . The latter is seen to be based on an adjusted score of the form (2) with .

Kenne Pagui et al. (2017) show that in the continuous case, each component of , , , is median unbiased with an error of order , i.e. , compared to the ML estimator, which is median unbiased with an error of order . Moreover, the asymptotic distribution of is the same as that of the ML estimator, , and of the mean BR estimator, , that is .

The equation is usually solved numerically. Moreover, a finite solution is not always guaranteed. The numerical solutions of can be obtained by a Fisher scoring-type algorithm, whose -th iteration is

(3)

which differs from the analogue for the ML estimates only by the addition of the term . We adopt, as a stopping criterion for the algorithm, the condition , for every , and we set, as default, .

The algorithm needs a starting value, , whose determination is not trivial and can result in nonconvergence of (3). When available, the ML estimate, , or the mean BR estimate, , are suitable starting values, which are also able to speed up the convergence. We set the starting values following a strategy similar to that used in Christensen (2019) for cumulative link models (1). The starting value for the regression coefficients, , is set to zero. The intercept parameters, , , are initialized to , where is the cumulative distribution function of the error terms, according to the latent variable interpetation discussed in Section 2.

In order to recognize boundary estimates, we adapt the diagnostics in Lesaffre and Albert (1989)

, identifying infinite estimates if their absolute value and the corresponding standard error are greater then some thresholds. Categories with zero observed counts are grouped, except when it happens at the extreme categories.

4 Simulation study

We conducted a simulation study to assess the performance of the median BR estimator, , in cumulative link models (1). We compare it with the ML, , and mean BR,

, estimators in terms of empirical probability of underestimation (PU%), estimated relative (mean) bias (RB%), and empirical coverage of the 95% Wald-type confidence interval (WALD%).

We consider sample sizes, , and different link functions , namely the logit, probit and complementary log-log (cloglog) link functions. We generate the covariate from a standard Normal, and

from Bernoulli distributions with probabilities 0.5 and 0.8 respectively, and

from a Poisson with mean 2.5. Assuming that the response has three categories, we fit the model

considering 10,000 replications, with covariates fixed at the observed value and true parameter . Setting for the logit link function, we use the approximate relations between the coefficients with different link functions leading to for the probit link function, and for the complementary log-log link function.

Table 1 contains the numerical results for all link functions considered. Boundary estimates occurred using ML with percentage frequencies 2.82%, 2.75% and 2.44%, with , and 0.08%, 0.1% and 0.04%, with , for the logit, probit and complementary log-log link functions, respectively. Instead, mean and median BR estimates are always finite. It appears that the new method proves to be remarkably accurate in achieving median centering and shows a lower estimated relative bias than ML and comparable with that of the mean BR estimator, as well as a good empirical coverage of the the 95% Wald-type confidence intervals. The differences between the three estimators are appreciable in lower sample size settings and become much less pronounced as the sample size increases.

Link PU% RB% WALD% PU% RB% WALD% PU% RB% WALD%
logit 40.94 14.50 94.97 43.46 6.30 94.77 45.83 2.80 94.75
55.34 14.90 94.76 54.27 6.60 94.93 52.06 2.50 94.88
44.63 13.50 96.48 46.91 9.10 95.32 47.39 4.60 94.97
62.99 16.50 95.19 59.19 7.00 94.92 56.22 3.20 95.36
54.14 -0.50 95.94 51.99 -0.20 95.34 51.64 -0.30 95.23
48.38 0.90 96.35 49.51 0.60 95.77 48.60 -0.30 95.45
53.01 -0.30 96.96 52.64 -0.50 96.06 51.27 0.00 95.52
45.71 0.40 94.96 47.47 0.00 95.11 47.89 -0.10 95.35
50.83 2.90 95.92 50.05 1.20 95.47 50.01 0.40 95.25
50.12 4.20 95.89 50.67 2.10 95.64 49.62 0.40 95.34
50.12 8.70 97.03 50.60 2.90 95.97 49.99 1.50 95.39
50.22 4.30 95.54 50.34 1.70 95.25 50.07 0.70 95.51
probit 40.31 14.50 94.12 42.82 6.17 94.21 45.23 2.83 94.41
55.40 14.67 94.26 53.65 6.33 94.62 52.44 2.67 94.61
45.35 12.67 96.35 46.58 8.50 95.02 47.63 4.17 94.82
63.26 15.83 94.16 59.23 6.67 94.56 56.74 3.17 95.20
53.79 -0.83 95.56 52.18 -0.33 95.15 51.66 -0.17 94.99
48.67 0.67 96.06 49.30 0.33 95.65 48.69 -0.17 95.06
52.93 -1.33 96.79 52.18 -0.67 95.82 51.58 -0.33 95.45
44.93 -0.33 94.87 46.40 -0.17 95.18 47.80 0.00 95.17
50.81 2.33 95.54 50.08 1.00 95.01 50.23 0.50 94.89
50.46 3.50 95.71 50.23 1.50 95.49 49.37 0.33 94.99
50.24 6.00 96.89 50.37 2.33 95.63 50.42 1.17 95.23
49.67 3.33 95.36 49.35 1.33 95.35 49.90 0.67 95.36
cloglog 39.59 15.29 94.07 42.58 7.14 94.47 44.69 3.29 94.89
55.42 13.86 94.25 53.82 5.86 94.60 52.85 2.86 94.79
46.72 15.57 95.46 46.31 11.43 95.57 47.27 5.86 95.33
62.53 16.00 94.23 59.16 7.14 94.87 56.04 3.29 95.11
55.26 -1.14 95.36 53.07 -0.29 94.89 52.19 -0.29 95.04
48.95 0.57 96.09 49.17 0.00 95.53 49.46 0.00 95.21
54.39 -0.86 95.83 52.99 -0.43 95.86 52.02 0.14 95.73
44.90 0.29 94.73 47.13 0.14 94.94 47.32 0.00 95.37
51.31 2.57 95.40 50.33 1.43 95.01 50.28 0.71 95.07
50.55 3.43 95.72 50.20 1.29 95.33 50.25 0.71 95.12
50.77 12.14 96.04 50.16 4.71 95.86 50.10 2.57 95.69
49.95 4.14 95.29 50.73 2.00 95.17 49.52 0.86 95.50
Table 1: Estimation of regression parameters . Simulation results for ML, , mean BR, , and median BR, , estimators. For ML, RB% and WALD% are conditional upon finiteness of the estimates

Table 2 shows the estimated relative bias under monotone reparameterizations of the parameters related to the binary covariates, considering the ordinal probability effect measure presented in Section 2.2. In the new parameterization, it appears that the median BR estimator has the best performance in terms of estimated relative bias, if compared with ML and mean BR, which is not equivariant under this type of reparameterization.

link
logit 1.58 -1.05 -0.42 -1.30 4.15 1.21
0.79 -0.49 -0.18 -1.70 2.27 0.88
0.24 -0.39 -0.22 -1.00 1.03 0.33
probit 1.99 -0.74 -0.18 -2.23 -3.43 0.80
0.93 -0.36 -0.09 -2.09 1.73 0.48
0.38 -0.26 -0.14 -1.10 0.80 0.21
cloglog 1.39 -1.11 -0.55 -1.18 5.18 1.30
0.63 -0.61 -0.33 -2.11 2.59 0.54
0.33 -0.30 -0.16 -1.36 1.12 0.06
Table 2: Estimated relative bias (RB%) for and . For ML, RB% is conditional upon finiteness of the estimates

5 Application

We consider the data analysed in Randall (1989), related to a factorial experiment for investigating the factors that affect the bitterness of wine. There are two factors, temperature at the time of crashing the grapes, , and contact between juice and skin, . Each factor has two levels, “cold” and “warm” for temperature and “yes” and “no” for contact. For each of the four treatment conditions, two bottles were assessed by a panel of nine judges, giving observations. As in Christensen (2019, Section 4.8), we consider the outcomes obtained by combining the three central categories and we fit the model

Table 3 shows the coefficient estimates obtained with ML, mean BR and median BR. Both mean and median BR approaches are able to solve the boundary estimates problem.

ML -1.32 (0.53) () () -1.31 (0.71)
meanBR -1.25 (0.51) 5.48 (1.48) -3.43 (1.42) -1.19 (0.67)
medianBR -1.29 (0.52) 6.46 (2.32) -4.48 (2.29) -1.24 (0.68)
Table 3: Coefficient estimates and corresponding standard errors in parenthesis

Table 4 shows the simulation results for the regression parameters considering 10,000 replications, with covariates fixed at the observed value and true parameter . We found samples out of 10,000 with ML boundary estimates. Instead, mean and median BR estimates are always finite. The median BR is again highly accurate in achieving median centering and shows a lower estimated relative bias than ML, as well as a good empirical coverage of the 95% Wald-type confidence intervals.

Parameter Parameter
PU% RB% WALD% PU% RB% WALD%
ML1 55.08 1.80 96.92 53.20 8.20 96.50
meanBR 43.91 -0.65 95.88 48.10 0.50 96.60
medianBR 49.71 8.95 96.48 50.35 4.90 96.28
Table 4: Estimation of regression parameters . Simulation results for ML, mean BR and median BR estimators. For ML, RB% and WALD% are conditional upon finiteness of the estimates

Under the monotone reparameterization of the coefficients related to the binary covariates, proposed by Agresti and Kateri (2017) and presented in Section 2.2, the estimated percentage relative bias is , and for , and , and for , with ML, mean BR and median BR, respectively. For ML, it should be recalled that the estimated relative bias is conditional upon finiteness of the estimates. It is noteworthy that the median BR estimator has lower estimated relative mean bias that the ML and the mean BR estimators.

References

  • A. Agresti and M. Kateri (2017) Ordinal probability effect measures for group comparisons in multinomial cumulative link models. Biometrics 73 (), pp. 214–219. Cited by: §1, §2.2, §5.
  • A. Agresti (2010) Analysis of ordinal categorical data. 2nd ed. New York: Wiley. Cited by: §1, §2.1, §2.
  • R. H. B. Christensen (2019) Ordinal - regression models for ordinal data. R package version 2019.12-10 http://CRAN.R-project.org/package=ordinal. External Links: Link Cited by: §3, §5.
  • D. Firth (1993) Bias reduction of maximum likelihood estimates. Biometrika 80, pp. 27–38. Cited by: Median bias reduction in cumulative link models, §1, §3.
  • A. S. Fullerton and J. Xu (2016) Ordered regression models: parallel, partial, and non-parallel alternatives. Boca Raton, FL: CRC Press. Cited by: §2.
  • E. C. Kenne Pagui, A. Salvan, and N. Sartori (2017) Median bias reduction of maximum likelihood estimates. Biometrika 104 (), pp. 923–938. External Links: Link Cited by: Median bias reduction in cumulative link models, §1, §3.
  • E. C. Kenne Pagui, A. Salvan, and N. Sartori (2019) Efficient implementation of median bias reduction. Submitted. https://arxiv.org/abs/2004.08630 (), pp. . External Links: Link Cited by: §1, §1, §3.
  • I. Kosmidis and D. Firth (2009) Bias reduction in exponential family nonlinear models. Biometrika 96 (), pp. 793–804. Cited by: §1.
  • I. Kosmidis and D. Firth (2010) A generic algorithm for reducing bias in parametric estimation. Electronic Journal of Statistics 4 (), pp. 1097–1112. Cited by: §1.
  • I. Kosmidis, E. C. Kenne Pagui, and N. Sartori (2020) Mean and median bias reduction in generalized linear models. Statistics and Computing 30 (), pp. 43–59. Cited by: §1.
  • I. Kosmidis (2014a) Bias in parametric estimation: reduction and useful side-effects. Wiley Interdisciplinary Reviews: Computational Statistics 6 (), pp. 185–196. Cited by: §1.
  • I. Kosmidis (2014b) Improved estimation in cumulative link models. Journal of the Royal Statistical Society: Series B (Methodological) 76 (), pp. 169–194. Cited by: §1, §2.1.
  • E. Lesaffre and A. Albert (1989) Partial separation in logistic discrimination. Journal of the Royal Statistical Society: Series B (Methodological) 51 (), pp. 109–116. Cited by: §3.
  • P. McCullagh (1980) Regression models for ordinal data. Journal of the Royal Statistical Society: Series B (Methodological) 42 (), pp. 109–142. Cited by: §1, §2.1, §2.
  • B. Peterson and Jr. F. E. Harrell (1990)

    Partial proportional odds models for ordinal response variables

    .
    Journal of the Royal Statistical Society: Series C (Applied Statistics) 39 (), pp. 205–217. Cited by: §2.
  • J. W. Pratt (1981) Concavity of the log likelihood. Journal of the American Statistical Assocciation 76 (), pp. 103–106. Cited by: §2.1.
  • J. H. Randall (1989) The analysis of sensory data by generalized linear model. Biometrical Journal 31 (), pp. 781–793. Cited by: §5.