Efficient implementation of median bias reduction

In numerous regular statistical models, median bias reduction (Kenne Pagui et al., 2017) has proven to be a noteworthy improvement over maximum likelihood, alternative to mean bias reduction. The estimator is obtained as solution to a modified score ensuring smaller asymptotic median bias than the maximum likelihood estimator. This paper provides a simplified algebraic form of the adjustment term. With the new formula, the estimation procedure benefits from a considerable computational gain by avoiding multiple summations and thus allows an efficient implementation for general parametric models. More importantly, the new formulation allows to highlight how the median bias reduction adjustment can be obtained by adding an extra term to the mean bias reduction adjustment. Illustrations are provided through new applications of median bias reduction to extended beta regression and beta-binomial regression. Mean bias reduction is also provided here for the latter model. Simulation studies show remarkable componentwise median centering of the median bias reduced estimator, while dispersion and coverage of related confidence intervals are comparable with those of its main competitors. Moreover, for the beta-binomial model the method is successful in solving the boundary estimate problem.



There are no comments yet.


page 1

page 2

page 3

page 4


Median bias reduction in cumulative link models

This paper presents a novel estimation approach for cumulative link mode...

Estimation of Dirichlet distribution parameters with bias-reducing adjusted score functions

The Dirichlet distribution, also known as multivariate beta, is the most...

Geometry of asymptotic bias reduction of plug-in estimators with adjusted likelihood

A geometric framework to improve a plug-in estimator in terms of asympto...

Mean and median bias reduction: A concise review and application to adjacent-categories logit models

The estimation of categorical response models using bias-reducing adjust...

Accurate inference in negative binomial regression

Negative binomial regression is commonly employed to analyze overdispers...

Mean and median bias reduction in generalized linear models

This paper presents an integrated framework for estimation and inference...

Conditional bias reduction can be dangerous: a key example from sequential analysis

We present a key example from sequential analysis, which illustrates tha...

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Consider estimation of a -dimensional parameter in a regular parametric model based on a sample of size . With moderate information, the maximum likelihood estimator can be highly inaccurate. Several proposals have been developed to correct the estimate or the estimating function, with the latter approach having the advantage of not requiring the finiteness of the maximum likelihood estimate. Adjusted score functions for mean bias reduction were proposed by Firth (1993) and in a subsequent paper by Kosmidis and Firth (2009). Median bias reduction was developed in Kenne Pagui et al. (2017)

, and is such that each component of the estimator has, to third-order, the same probability of underestimating and overestimating the corresponding parameter component. Both mean and median bias reduction methods consist of adding a suitable adjustment term to the score function and then solving the resulting adjusted score equation.

While mean bias reduction is tied to a specific parameterization and only equivariance under linear transformations of the parameters is guaranteed, median bias reduction delivers estimators that are exactly invariant in terms of their improved median bias properties under monotone component-wise transformations of the parameters.

The median modified score function proposed by Kenne Pagui et al. (2017) is obtained by adding an adjustment term of order to the score function. Such adjustment term was expressed in Kenne Pagui et al. (2017, Section 2.2) using index notation, which is a tool that allows to write complex formulae in a compact form, but is not necessarily optimal for their implementation. In this paper, we give a new matrix expression of the adjustment term, similar to that of Kosmidis and Firth (2010, Section 3) for mean bias reduction. The benefit is twofold: the new expression allows an more efficient implementation for general parametric models, and it also highlights a general connection between mean and median bias reduction, as noted for generalized linear models in Kosmidis et al. (2019).

In Section 3 the new formulation is applied to the double index beta regression, while Section 4 is devoted to beta-binomial regression, where the expression for mean bias reduction is also derived. In both models, all methods are assessed and compared through simulation experiments. The results confirm that the median bias reduced estimator succeeds in achieving componentwise median centring and in solving the boundary estimate problem that may arise for maximum likelihood in beta-binomial regression.

2 Median modified score

Let us denote the vector parameter by

and a generic component of by . Let (), be the elements of the score vector , where is the log likelihood function for . Let be the observed information matrix and be the Fisher information matrix. We assume that and third-order cumulants of are finite and of order , where is the sample size or, more generally, an index of information in the data about the model parameters. We denote by a generic entry of and an entry of its inverse, . Let and be matrices of expected values of log likelihood derivatives. Let, in addition, be the th column of matrix .

The modified score equation proposed by Kenne Pagui et al. (2017) has the form


where is the efficient score, , with (, ) the multiple regression coefficients of on the remaining components of , while is an adjustment term of order , involving the approximate first three cumulants of the profile score for . The expression of requires multiple summation, whose direct implementation is highly demanding in terms of computational time, especially for large .

Here we show that (1) is equivalent to


where , with . The -dimensional vector has -th component

and and are -dimensional vectors with entries

with is a matrix. The proof of equivalence of (1) and (2) is given at the end of this section. The new expression of median bias adjustment based on was already applied in Kyriakou et al. (2019) for meta regression models, but its derivation is actually given here.

We note that the adjustment term can be written as

where , with the vector having entries . The term equals the mean bias adjustment of Firth (1993) as given in Kosmidis and Firth (2010, formula (2.2)). The fadditional term is required for median bias reduction. This highlights a general connection between mean and median bias reduction, that was noted for generalized linear models in Kosmidis et al. (2019). Indeed, in such models the general expression of adjustment term given above simplifies in a compact form (Kosmidis et al., 2019, formula (9)). Details are given in the Appendix.

We let be the maximum likelihood estimator, the mean bias-reduced estimator of Firth (1993) and , the median bias-reduced estimator obtained as a solution of (2). In all cases, a quasi-Fisher scoring-type algorithm has th iteration

where can be the null vector, or , giving as a solution , or , respectively.

Proof of equivalence of (1) and (2).

Let be a generic entry of the inverse of the matrix with entries . Let and be the entries of and in (2) for fixed , respectively, where . In the following, we adopt the Einstein summation convention, i.e., summation is implied over repeated indices taking values in , with being any index in . Einstein summation convention does not affect the index . The modified score function proposed by Kenne Pagui et al. (2017) has the form


where and and are terms of order and , respectively, and represent the approximate first three cumulants of the profile score for given by

In (3), all quantities are evaluated at .

Using block matrix inversion of , we have . After some algebra, the modified score function (3) can be rewritten as


with and cumulants rewritten as

Above, indices take values in and Einstein summation convention is adopted.

Therefore, the modified score function can be represented in matrix form as

with and solving is equivalent to solve the equation with


where . With some algebra, we can write , where and are vectors with generic entries

where , with . It is straightforward to show that the -th component of is

where and are vectors with components

Putting together all the ingredients, the modified score function (5) can be written as


where , with the vector having components .

3 Double index beta regression

The double index beta regression model was introduced by Smithson and Verkuilen (2006) and Simas et al. (2010) as a generalization of beta regression (Ferrari and Cribari-Neto, 2004) in order to account for covariate effects both on the mean and on the precision parameter when modelling rates and proportions. An R implementation of mean bias reduction was developed in Grün et al. (2012), while median bias reduction for the case with constant precision was considered in Kenne Pagui et al. (2017, Example 8). Simulation results in the latter paper show that both mean and median bias reduction have considerable success in achieving the respective goals, for estimation of the precision parameter, as compared to maximum likelihood. On the other hand, all methods show similar accurate behaviour for estimation of mean parameters, with mean and median bias reduction providing nevertheless improved coverage of confidence intervals. Here we extend median bias reduction to the double index case.


be realizations of independent random variables

, each having a beta distribution with parameters

and , i.e. with expected value and precision parameter . The density of is


where , , , and is the gamma function. Double index beta regression assumes a regression structure for the expected value and for the precision , where and , with and representing row vectors of covariates. Above, and are vectors of unknown regression parameters. Additionally, the link functions and are monotonic and should have the mapping property and , respectively. Obvious choices for

are e.g. the logit and probit, while, for

, the log, the square root and the identity, with only the log satisfying the mapping property.

Unlike with generalised linear models, a compact simplified expression for the adjustment is not available for the current model. The ingredients and required for (2) are given in the Appendix. They are obtained along the lines of Grün et al. (2012, Section 2.3), who gave the quantity for mean bias reduction.

We use Monte Carlo simulation to assess the performance of the median bias reduced estimator. For this purpose, we consider a model having a regression structure on both mean and precision, with logit and log link, respectively. In particular, we let


where the are independent realizations of a standard normal and , with generated from a uniform (). The true values of the parameters are , and .

The sample sizes considered were and 60. For each , we run 10000 Monte Carlo replications, where the values of the explanatory variables and were held constant throughout the simulations. For each replication, the model was fitted using maximum likelihood, mean bias reduction and median bias reduction. We summarize the simulation results through the estimated probability of underestimation (PU), estimated bias (BIAS), root mean square error (RMSE) and estimated coverage probability of 95% Wald-type confidence intervals (WALD).

The maximum likelihood and mean bias reduced estimates were obtained from the R package betareg (Grün et al., 2012), while the median bias-reduced estimates were calculated from the R function mbrbetareg (Kenne Pagui and Sartori, 2019).

PU 45.9 50.5 52.0 31.7 46.9 54.8
54.7 49.9 47.2 57.5 49.8 44.5
51.3 50.4 49.1 52.6 50.6 47.5
BIAS 0.07 -0.01 -0.07 0.54 0.03 -0.35
-0.04 0.00 0.05 -0.08 -0.01 0.16
0.01 -0.01 0.00 0.04 -0.02 0.02
RMSE 0.51 0.22 0.94 1.15 0.52 2.07
0.50 0.21 0.93 0.95 0.45 1.87
0.50 0.21 0.92 0.94 0.45 1.85
WALD 82.9 82.4 83.3 80.5 79.8 84.0
90.3 88.8 90.7 86.7 85.0 86.9
89.5 88.5 90.0 87.5 85.1 87.3
PU 47.1 50.8 51.1 38.9 44.3 49.5
51.7 50.2 49.4 52.8 49.5 48.8
50.3 50.2 49.9 50.5 49.7 49.0
BIAS 0.03 0.00 -0.01 0.17 0.03 0.01
0.00 0.00 0.01 -0.01 0.00 0.03
0.01 0.00 0.01 0.02 0.00 0.02
RMSE 0.29 0.12 0.64 0.55 0.25 1.28
0.29 0.12 0.63 0.51 0.23 1.24
0.29 0.12 0.63 0.51 0.23 1.24
WALD 91.0 88.8 90.0 89.3 86.9 89.7
93.4 91.6 92.7 91.3 89.8 91.0
93.2 91.3 92.4 91.4 89.8 91.0
PU 47.5 50.6 50.8 41.0 51.9 50.2
50.6 48.4 50.1 51.2 47.2 50.3
49.4 49.0 50.6 49.5 48.5 50.9
BIAS 0.02 -0.01 -0.01 0.11 -0.02 -0.01
0.00 0.00 0.00 0.01 0.01 -0.01
0.01 0.00 -0.01 0.03 0.00 -0.02
RMSE 0.27 0.12 0.53 0.45 0.23 0.96
0.26 0.12 0.53 0.43 0.22 0.94
0.27 0.12 0.53 0.43 0.22 0.94
WALD 92.4 91.4 92.3 91.7 90.5 92.1
93.6 93.0 93.5 93.0 91.2 92.6
93.4 92.7 93.3 92.8 91.3 92.7
Table 1: Simulation results for double index beta regression.

The results are presented in Table 1. Both the median and the mean bias reduced estimators are superior to the maximum likelihood estimator. In particular, is effective in median centering, especially for . Even in terms of mean bias reduction, seems to show a slight improvement over . Both estimators have comparable coverages of confidence intervals.

Covariate effects on the dispersion parameter are typically expressed in the reparameterization and . Estimated bias of estimators , and () are given in Table 2. Bias reduction is not expected to be effective for the transformed estimator and, indeed, in most cases does not show the smallest bias.

0.355 0.185 0.166 0.128 0.051 0.050 0.015 0.072 0.055
68.505 91.987 75.348 25.734 24.185 23.869 11.190 10.763 10.322
Table 2: Estimated bias of estimators , and in the model (8).

4 Beta binomial regression

The beta binomial model is popular for analysing data in form of proportions. It is suitable in situations where these proportions exhibit variation greater than that predicted by a binomial model. The model is defined by assuming that the binomial mean parameter is itself beta distributed. In particular, let us assume that , with and , where (). Hence, and . Moreover, let . In the latter setting, the random variable is marginally distributed as beta binomial with mean

and variance

. The probability mass function of is


where . We consider a regression structure both for the expected value and for the precision , i.e.  and , where and , with and representing row vectors of covariates and , suitable link functions. The overall parameter is denoted by , where and are vectors of unknown regression parameters.

Here we provide mean and median bias reduction for the model (9). With constant mean and precision, a bias corrected estimator was developed by Saha and Paul (2005). Like for the beta regression, a compact expression for the adjustment term in (2) is not available here. The quantities required for its calculation are reported in the Appendix.

As an example, we consider the low-iron rat teratology dataset analysed in Liang and McCullagh (1993) and available in the R package VGAM. The goal is to study the effects of dietary regimens on fetal development in laboratory rats. Fifty-eight female rats were put on iron-deficient diets and divided into four groups. Group 1 is the untreated (low-iron) group; group 2 received injections on day 7 or day 10 only; group 3 received injections on days 0 and 7 and group 4 received injections weekly. The rats were made pregnant, sacrificed 3 weeks later, and the total number of fetuses and the number of dead fetuses in each litter were counted along with the level of mother’s hemoglobin. We assume model (9) for the response in the th litter, with


The covariates () are indicator variables for the th group, while is the level of mother’s hemoglobin.

Estimates , and of parameters of model (10) are displayed in Table 3, with (d1) corresponding to estimates computed on a subset of the data with litter size less or equal to 11, while (d2) corresponds to estimates obtained with the whole data set. The maximum likelihood estimates are obtained using the function vglm from the R package VGAM, while mean and median bias-reduced estimates were calculated from the R package brbetabinomial, available on GitHub (Kenne Pagui et al., 2019).

(d1) (d2)
 0.866 (1.130)  0.870 (1.128)  0.882 (1.141)  2.129 (0.847)  2.039 (0.853) 2.055 (0.858)
-4.144 (1.441) -3.793 (1.428) -3.890 (1.449) -2.440 (0.856) -2.369 (0.867) -2.394 (0.872)
-5.413 (2.070) -4.803 (1.998) -4.918 (2.028) -2.837 (1.354) -2.662 (1.343) -2.716 (1.354)
-6.079 (2.978) -5.402 (2.921) -5.548 ( 2.963) -2.287 (1.796) -2.207 (1.809) -2.244 (1.819)
 0.172 (0.253)  0.151 (0.251)  0.157 (0.254) -0.169 (0.173) -0.157 (0.174) -0.157 (0.175)
 0.226 (0.087)  0.268 (0.090)  0.269 (0.092)  0.236 (0.059)  0.260 (0.060)  0.261 (0.061)
Table 3:

Low-iron rat teratology data. Estimates (standard errors) of the parameters of model (

9). The label (d1) indicates results obtained from the subset of the data with litter size less or equal to 11, while (d2) corresponds to those obtained with the whole data.

To assess the properties of the estimators, we performed a simulation study with sample size and covariates as in the the two data sets (d1) and (d2) from the low-iron rat teratology data. For each of the two settings (d1) and (d2), we considered 10000 replications with parameter values equal to the maximum likelihood fit of the model (9) and covariates held fixed at the observed values. For each sample, we calculated the three estimates , and . Infinite estimates occurred using maximum likelihood with percentage frequencies 16% for (d1) and 58% for (d2), respectively, so that results for should be judged accordingly. For detecting infinite estimates the diagnostics in Lesaffre and Albert (1989) were adapted to the current model.

(d1) PU 49.4 55.8 63.4 63.5 49.3 66.8
49.4 46.2 44.7 45.3 52.3 49.1
49.2 49.5 50.1 48.1 51.0 48.1
BIAS 0.01 -0.32 0.13 -0.12 0.02 -0.04
0.01 0.01 0.17 0.12 0.00 0.01
0.01 -0.13 -0.08 -0.10 0.01 0.01
RMSE 1.26 1.67 2.34 3.34 0.28 0.10
1.16 1.50 1.99 2.99 0.26 0.09
1.19 1.58 2.08 3.09 0.27 0.09
WALD 92.4 92.8 93.3 93.3 92.6 81.6
94.8 94.8 95.4 95.4 95.0 90.0
94.8 94.9 95.6 95.4 94.8 91.2
(d2) PU 47.5 52.5 55.2 53.4 51.5 64.3
50.7 48.5 47.2 49.3 49.9 50.5
50.0 49.7 50.0 50.6 49.9 49.7
BIAS 0.06 -0.09 0.04 -0.10 -0.01 -0.02
-0.01 0.00 0.03 -0.01 0.00 0.00
0.01 -0.03 -0.12 -0.06 0.00 0.00
RMSE 0.89 0.92 1.31 1.89 0.18 0.06
0.85 0.87 1.35 1.83 0.17 0.06
0.86 0.89 1.48 1.87 0.18 0.06
WALD 93.8 93.7 95.2 94.0 93.7 88.6
95.1 94.8 95.8 95.1 95.2 92.3
95.1 94.7 96.0 95.0 95.1 93.2
Table 4: Low-iron rat teratology data. Simulation of size of the regression coefficient estimates with constant dispersion parameter under the maximum likelihood fit. The label (d1) indicates results obtained from the subset of the data with litter size less or equal to 11 while (d2) corresponds to those obtained with the whole data. For maximum likelihood, the bias, root mean squared error and coverage are conditional upon finiteness of the estimates.

The results are presented in Table 4. The estimated bias, root mean squared error and coverage probability of confidence intervals for are conditional upon its finiteness. Although this favours , both and are uniformly better. Especially for small samples (d1), the median bias reduced estimator is superior in achieving median centering, and also in terms of mean bias for some coefficients. In the larger sample sizes setting (d2), both mean and median bias reduced estimators achieve the desired goals and are highly preferable to maximum likelihood.


for generalized linear models

Let be realizations of independent random variables , each with probability density or mass function of the exponential dispersion family form

for some sufficiently smooth functions , , and , and fixed observation weights . The expected value of is and its variance is . Let be a

model matrix where each column corresponds to a predictor variable. The model matrix

is linked to through the relation with the linear predictor, where is the element of , the regression coefficients and a monotone link function. For some models, the dispersion parameter, may also be estimated along with . We show below that the adjustment term (2) gives, as special case, the closed form expression obtained in Kosmidis et al. (2019, Section 2.4) for median bias reduction in generalized linear models.

Let and be the and blocks of . Let denote a diagonal matrix having as its main diagonal. Let, in addition, be a -vector of zeros, a -vector of ones and a matrix of zeros. The ingredients needed to calculate the adjustment term in (2) are as follows

where , with , , with , and , , , with and , , , , , , is the ‘hat’ value for the th observation, obtained as the th diagonal element of the matrix , is the th diagonal element of , with . Using all the above ingredients in (2), the adjustments for and simplify to


respectively, where and with