DeepAI
Log In Sign Up

Robustness against outliers in ordinal response model via divergence approach

09/24/2022
by   Tomotaka Momozaki, et al.
0

This study deals with the problem of outliers in ordinal response model, which is a regression on ordered categorical data as the response variable. “Outlier” means that the combination of ordered categorical data and its covariates is heterogeneous compared to other pairs. Although the ordinal response model is important for data analysis in various fields such as medicine and social sciences, it is known that the maximum likelihood method with probit, logit, log-log and complementary log-log link functions, which are often used, is strongly affected by outliers, and statistical analysts are forced to limit their analysis when there may be outliers in the data. To solve this problem, this paper provides inference methods with two robust divergences (the density-power and γ-divergences). We also derive influence functions for the proposed methods and show conditions on the link function for them to be bounded and to redescendence. Since the commonly used link functions satisfy these conditions, the analyst can perform robust and flexible analysis with our methods. In addition, and this is a result that further highlights our contributions, we show that the influence function in the maximum likelihood method does not have redescendence for any link function in the ordinal response model. Through numerical experiments on artificial data, we show that the proposed methods perform better than the maximum likelihood method with and without outliers in the data for various link functions.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

11/16/2017

Categorical data analysis using a skewed Weibull regression model

In this paper, we present a Weibull link (skewed) model for categorical ...
04/23/2018

A constrained regression model for an ordinal response with ordinal predictors

A regression model is proposed for the analysis of an ordinal response v...
04/04/2022

A Modification of McFadden's R^2 for Binary and Ordinal Response Models

A lot of studies on the summary measures of predictive strength of categ...
05/27/2019

Deep ordinal classification based on cumulative link models

This paper proposes a deep convolutional neural network model for ordina...
08/30/2022

Robust and Efficient Estimation in Ordinal Response Models using the Density Power Divergence

In real life, we frequently come across data sets that involve some inde...
02/16/2015

On the Predictive Properties of Binary Link Functions

This paper provides a theoretical and computational justification of the...

1 Introduction

Ordered categorical data has become popular in a wide range of fields such as medicine, sociology, psychology, political sciences, economics, marketing, and so on (Breiger, 1981; Ashby et al., 1989; Uebersax, 1999). Ordered categorical data are, for example, the progression of a disease expressed as stage 1, 2, 3, or 4, or opinions on a policy expressed as opposition, neutrality, or approval. In addition, when continuous data are summarized into categorical data, such as ages 0-20, 21-40, 41-60, 61-80, and 80 or more, the categorical data are ordered categorical data. For this reason, ordered categorical data are often considered to be discretized values of latent continuous variables.

There have been many studies on how to analyze ordered categorical data from half a century ago to the present (McCullagh, 1980; Albert and Chib, 1993; Tomizawa et al., 2006; Agresti, 2010; Agresti and Kateri, 2017; Baetschmann et al., 2020

). This paper focuses on one of the most commonly used methods, the ordinal response model, that is a regression on ordered categorical data as a response variable. The ordinal response model is one of the frameworks of generalized linear models, and is called the ordinal regression model, or the cumulative link model since it connects the cumulative probability of belonging to a certain category to the covariates with the “link” function (

McCullagh, 1980

). The link functions most often used in data analysis are the probit link, which is the distribution function of the standard normal distribution, the logit link, which is the distribution function of the standard logistic distribution, the log-log link, which is the distribution function of the right-skewed Gumbel distribution, and the complementary log-log link, which is the distribution function of the left-skewed log-Weibull distribution. The formulation of the ordinal response model and its details are described in Section

2.

Outliers can be caused by various reasons, such as typos in the values and misrecognition of units. Since it is well known that the maximum likelihood method is strongly affected by outliers, another statistical method that can appropriately deal with outliers is required nowadays. There have been many studies on how to deal with outliers, for example, the well-known inferences based on Huber-type loss and robust divergences (Hampel et al., 1986; Basu et al., 1998; Jones et al., 2001; Fujisawa and Eguchi, 2008; Huber and Ronchetti, 2009; Ghosh and Basu, 2013; Maronna et al., 2019; Castilla et al., 2021), but most of them have focused on continuous, binary, or counted data, with few focusing on ordered categorical data.

Of course, the ordinal response model is not an exception to the model affected by outliers, although the response variable is discrete type data. The maximum likelihood method in the ordinal response model with the commonly used link functions, such as above mentioned is strongly affected by outliers. Although outliers in discrete data may be harder to imagine than in continuous data, an ”outlier” in the ordered response model is defined as a combination of ordered categorical data and its covariates that is heterogeneous relative to other pairs (Riani et al., 2011). This may be taken to mean that the combination of ordered categorical data and its covariates is inconsistent. Unless otherwise noted, we use the same definition of outliers in this paper.

There are various methods to check whether an inference method is robust against outliers, and often an influence function (Hampel, 1974

) may be used in linear regression. Simply put, the influence function is an index to check the “influence” of data on an inference method, and its value must be bounded at least to be robust against outliers. This is because if the value diverges for a given data, the result of the inference depends almost entirely on the data without regard to other data.

As mentioned earlier, there are few studies on outliers in the ordinal response model, but there are studies that derive the conditions for the inference method and link function such that the influence function is bounded in the ordinal response model as well. Croux et al. (2013) and Iannario et al. (2017)

proposed the weighted maximum likelihood methods using the Student, 0/1, and Huber weights so that the influence function in the ordinal response model is bounded. However, these papers do not specify the algorithms to perform their inference methods, which makes it difficult for practitioners who wish to implement the robust ordinal response model. We also attempted to implement their inference methods, but found it extremely difficult to converge the numerical optimization algorithm to compute their estimators.

Scalera et al. (2021) derived a class of link functions for bounded influence functions in the maximum likelihood method in the ordinal response model. However, the class does not include probit, logit, log-log and complementary log-log link functions which are commonly used in the analysis. That is, analysts cannot perform robust and flexible modeling for ordered categorical data. We have also confirmed that misspecification of the link function causes a substantial bias in the parameter estimation, although this is not discussed in depth here. This is also the case in the framework of binary regression, which is discussed in Czado and Santner (1992), and confirms that the flexibility in the choice of the link function is very important.

The boundedness of the influence function is an important property to ensure that the result of inference is not too much sensitive by only certain data, but it does not mean that data that are largely outliers compared to other data do not influence the result of inference at all. In other words, even if the influence function is bounded, there is still some influence from outliers. Therefore, it is desirable for the influence function to satisfy not only boundedness but also redescendence (Maronna et al., 2019), which means that the influence of large outliers on inference can be ignored. In this paper, we consider a class of link functions for the influence function to satisfy redescendence in the ordinal response model in the maximum likelihood method, and show that such a class does not exist (Theorem 3 in Section 4).

Since the influence function for the maximum likelihood method in the ordinal response model does not satisfy redescendence, it is necessary to consider another inference method. The weighted maximum likelihood methods of Croux et al. (2013) and Iannario et al. (2017) do not satisfy redescendence because the weights used are the Student, 0/1, and Huber types (Maronna et al., 2019), although the influence function is bounded. Thus, we propose inference methods in the ordinal response model with two robust divergences, the density-power (DP) (Basu et al., 1998) and -divergences (Jones et al., 2001; Fujisawa and Eguchi, 2008), which are expected to satisfy the redescendence of their influence functions.

Although recently, Pyne et al. (2022) proposed an inference method in the ordinal response model using the DP divergence, they did not discuss the condition for link functions to satisfy boundedness and redescendence of the influence function in their method. Our contribution is not only to derive the condition for link functions satisfying boundedness and redescendence of the influence function in the DP divergence (Theorem 4 in Section 4), but also to propose an inference method in the ordinal response model using the -divergence, which is also commonly used for robust inference, and derive the condition of link functions to achieve robust inference (Theorem 5 in Section 4), and compare the performance between the DP and -divergences. The derivation of this condition provides a guideline for practitioners to decide which of the various link functions to use when conducting robust analysis against outliers using the ordinal response model. It will also help to answer the question of whether the DP or -divergences should be used.

The paper is organized as follows Section 2 introduces the ordinal response model and its maximum likelihood method. Section 3 introduces two robust divergences, the DP and -divergences, and proposes estimators based on these divergences. We also briefly show that the proposed estimators are robust against outliers. Section 4 derives the influence functions of the proposed methods and considers the robustness of the proposed methods in terms of the influence functions. Section 5 considers the asymptotic properties of the proposed estimators and the testing procedures. Section 6 conducts some numerical experiments using artificial data for the proposed methods and evaluates the performance of the proposed methods. Section 7 provides the conclusion and some remarks.

2 Ordinal response model and its maximum likelihood method

In this section, we discuss the ordinal response model and the maximum likelihood method, a parameter estimation method commonly used in the model.

Consider the following latent variable model for ordered categorical data () as the response variable.

(1)

for , where is called the (continuous) latent variable,

, , (known) and has the density function , and are the cutpoints. For the identification, in the there is no error scale and the latent model has no the intercept term. Note that the observed data is , where and , and the latent variable is unobserved. The parameters to be estimated in this model are , where .

The probability mass function of given with is

where the indicator function and

Namely, the likelihood function for the ordinal response model is expressed as

(2)

The maximum likelihood estimator (MLE) of are obtained as that minimizes the negative log-likelihood function. That is, the MLE is

It is well known that the MLE is strongly affected by the outliers in the observations (). This is because if is large enough, i.e., if is outliers compared to the other data, then takes a large value and the entire objective function is strongly affected by the outlier .

3 Estimators via robust divergence for the ordinal response model

This section introduces two robust divergences, the DP and -divergences, proposes robust estimators for the ordinal response model using these divergences, and describes their properties against outliers. Section 3.1 describes the case using the DP divergence, and Section 3.2 describes the case using the -divergence.

3.1 Case: Density-Power divergence

Suppose that , , and are the underlying probability density (or mass) functions of , given , and , respectively. Let be the probability density (or mass) function of given with parameter . This paper uses the definition of the DP cross entropy

(3)

for . Using the DP cross entropy, the DP divergence is defined as

About the properties of the DP divergence, see Basu et al. (1998).

Using the DP divergence, the target parameter can be considered by

and when , .

Suppose that the observed data are randomly drawn from the underlying distribution . Then, from the formulation (3), the DP cross entropy is empirically estimated by

Namely, the DP estimator is defined by

Based on the above, we propose the following DP-estimator for the ordinal response model.

(4)

where the empirically estimated DP cross entropy

(5)

We briefly confirm that the proposed DP-estimator is robust against outliers. Suppose that is large enough, i.e., is outliers compared to the other data. Then, the empirically estimated DP cross entropy (5) is expressed as

since takes a small value. Namely, the objective function in (4) is expressed by removing the outliers , and as a result, the effect of the outliers can be ignored in the parameter estimation.

3.2 Case: -divergence

Suppose that , , and are the underlying probability density (or mass) functions of , given , and , respectively. Let be the probability density (or mass) function of given with parameter . This paper uses the definition of the -cross entropy of Kawashima and Fujisawa (2017) as follows.

(6)

for . Using the -cross entropy, the -divergence is defined as follows.

About the properties of the -divergence, see Fujisawa and Eguchi (2008) and Kawashima and Fujisawa (2017).

Using the -divergence, the target parameter can be considered by

and when , .

Suppose that the observed data are randomly drawn from the underlying distribution . Then, from the formulation (6), the -cross entropy is empirically estimated by

Namely, the -estimator is defined by

(7)

Based on the above, we propose the following -estimator for the ordinal response model.

(8)

where the empirically estimated -cross entropy

(9)

We briefly confirm that the proposed -estimator is robust against outliers. Suppose that is large enough, i.e., is outliers compared to the other data. Then, the empirically estimated -cross entropy (9) is expressed as

since takes a small value. Namely, the objective function in (8) is expressed by removing the outliers , and as a result, the effect of the outliers can be ignored in the parameter estimation.

4 Influence functions for the ordinal response model

The influence function is classically often used as a measure of robustness of an estimator, and it is required that the influence function is bounded for the observations (see, Hampel, 1974). This is because the influence function represents the effect on the estimator when there are a few outliers in the observed values. Furthermore, it preferably has the redescendence, that is the property that large outliers have little effect on the parameter estimation (Maronna et al., 2019).

This section derives the respective influence functions in the ordinal response model using the DP and -divergences, and discusses the robustness of each proposed estimation method against outliers in terms of the influence functions. The influence functions are plotted and compared. Before that, we describe notations for the discussion and discuss the influence function in the maximum likelihood method.

Suppose that are generated from the true distribution , and consider the contaminated model , where is the contamination ratio and is the contaminating distribution degenerated at . The influence function in the ordinal response model using the maximum likelihood method is expressed as follows.

where

and is the fisher information matrix. Note that since is the expected value for the distribution , is constant and bounded with respect to . That is, for the boundedness of the influence function , the function must be bounded.

The influence function of the parameter () in the maximum likelihood method for the ordinal response model is expressed as follows.

(10)

The influence function of the parameter () in the maximum likelihood method for the ordinal response model is expressed as follows.

(11)

In order for these influence functions (10) and (11) to be bounded in the maximum likelihood method, the conditions for the distribution of in the latent variable model (1) were derived by Scalera et al. (2021).

Theorem 1 (Proposition 3.2 of Scalera et al., 2021).

The influence function for (10) is bounded if and only if

(12)

where the is the density function of .

Theorem 2 (Proposition 3.1 of Scalera et al., 2021).

The influence function for (11) is bounded if and only if

(13)

where the is the density function of .

The distributions that satisfy the conditions (12) and (13) include the Student’s

-distribution where the degree of freedom

is sufficiently small (Scalera et al., 2021). When the distribution is the Student’s -distribution, since the influence function of is for large outliers, the influence function of is bounded for , however the parameter estimation is always affected by outliers constantly. Of course, the same is true for .

Although Scalera et al. (2021) mention the boundedness of the influence functions in the maximum likelihood method, they do not mention redescendence. Hence, we further give the following theorem on the redescendence of the influence functions in the maximum likelihood method.

Theorem 3.

The influence functions for (10) and (11) in the maximum likelihood method for the ordinal response model are not redescendence.

The proof is given in Appendix A.1.

Remark (Redescendence of the influence function of the maximum likelihood method in the ordinal response model).

There are various studies on heavy-tailed distributions, and recently the log-regularly varying function (Desgagné, 2015) whose order of tail is same as , such as the log-Pareto truncated normal distribution proposed by Gagnon et al. (2020) and the extremely heavily-tailed distribution proposed by Hamura et al. (2022) is well known. However, even if these distributions are applied to , the influence function of for the maximum likelihood method in the ordinal response model do not have redescendence. It is also known that is of constant order for regularly varying and log-regularly varying functions (O’Hagan and Pericchi, 2012; Desgagné, 2015). Therefore, when Student’s -distribution or the heavy tailed distributions described above is applied to the distribution , when and , or and , the influence function of is zero for large outliers.

4.1 Case: Density-Power divergence

In the similar manner as in the maximum likelihood method, the influence function using the DP divergence is expressed as follows.

where

and is the conditional expectation for the conditional distribution . Thus, the influence functions of the parameters () and () for the ordinal response model with the DP divergence are expressed by

(14)

and

(15)

respectively.

Here, the following theorem holds for the influence functions (14) and (15) using the DP divergence.

Theorem 4.

If there exists a certain such that

(16)

then the influence functions (14) and (15) are bounded and redescendent.

The proof is given in Appendix A.2.

Theorem 4 ensures that the proposed method using the DP divergence is robust against outliers in terms of the influence function. That is, the estimator of the proposed method is hardly affected by the large outliers.

4.2 Case: -divergence

In the similar manner as in the maximum likelihood method, the influence function using the -divergence is expressed as follows.

where

Thus, the influence functions of the parameters () and () for the ordinal response model with the -divergence are expressed by

(17)

and

(18)

respectively.

Here, the following theorem holds for the influence functions (17) and (18) using the -divergence.

Theorem 5.

If there exists a certain such that

(19)

then the influence functions (17) and (18) are bounded and redescendent.

The proof of Theorem 5 is omitted since it can be done in the same way as the proof of Theorem 4.

Theorem 5 ensures that the proposed method using the -divergence is robust against outliers in terms of the influence function. That is, the estimator of the proposed method is hardly affected by the large outliers.

4.3 Numerical experience of influence functions

Sections 4.1 and 4.2 prove that the influence functions for the ordinal response model using the DP and -divergences are bounded and redescendent under the conditions (16) and (19). In this section, we plot and compare these influence functions. Note that when and , the methods with the DP and -divergences are the equivalent to the maximum likelihood method, respectively.

The following settings are considered in plotting the influence functions. The ordinal response generated by (1), where (i.e., using one covariate) and , has 4 categories. For the distribution of the error term in (1), the standard normal distribution and the standard logistic distribution are used.

Figure 1 plots the influence function (vertical axis) for each parameter against the covariate (horizontal axis) with the maximum likelihood (black), the DP divergence (blue), and the -divergence (red). The first and second rows of Figure 1 show the standard normal and standard logistic distributions for , respectively. The first and second columns are the influence function of and , respectively. The line type differs according to the tuning parameter in the divergences, “dash” for , “dot” for , “dot-dash” when , and “long-dash” when . In plotting the influence functions, we set .

Since conditions (16) and (19) are satisfied when distribution is the standard normal distribution or the standard logistic distribution, as can be seen from Figure 1, the influence functions with the DP and -divergences are both bounded and redescendence. On the other hand, since the standard normal distribution does not satisfy the conditions (12) and (13), the influence function in the maximum likelihood method is not bounded, as can be seen from Figure 1, and since the standard logistic distribution satisfies only the condition (13), the influence function of is not bounded and that of is bounded.

In addition, since the order of (or ) is smaller when the distribution is lighter in conditions (16) and (19), from the proof of Theorems 4 and 5 (see Appendix A.2), the influence function converges to zero faster when is lighter. Thus, since the standard normal distribution is lighter than the standard logistic distribution, the first row of Figure 1 converges to zero faster than the second row.

For the influence functions with the DP and -divergences, when the tuning parameters and are the same value, the influence function with the -divergence takes smaller values when is around , but when , the influence functions with the DP and -divergences take the same value.

Figure 1: Plot of influence function for each parameter with the maximum likelihood (black), the DP divergence (blue) and the -divergence (red): the horizontal is the value of covariates, the vertical is the value of influence function: the first row for the distribution with standard normal distribution, the second row for standard logistic distribution: the first column is for the coefficient parameter , the second column is for the cutoff parameter

5 Asymptotic properties for proposed robust estimators and their test procedures

This section presents the asymptotic distributions of the proposed DP and -estimators and proposes the test procedures using them.

5.1 Case: Density-Power divergence

Under general regularity conditions (Huber and Ronchetti, 2009), the proposed DP-estimator (4) is asymptotically a multivariate normal distribution, i.e.,

(20)

where the asymptotic covariance matrix is

with

and

Since the estimator of the asymptotic covariance matrix is

where

and

and is consistent for , then with the equation (20), is asymptotically a multivariate normal distribution .

Hence under the null hypothesis

for ,

where is the -th element on the diagonal of