## 1 Introduction

Estimation of the ratio of two density functions using independent samples from the densities is an important problem in a variety of fields. In many contexts, it may be known that this ratio is non-decreasing. In this article, we derive and study the nonparametric maximum likelihood estimator of the individual distributions as well as the ratio of their densities under the assumption that the density ratio is non-decreasing.

Comparing the distributions of two independent samples is a fundamental problem in statistics. Suppose that and are independent real-valued samples with distribution functions and , respectively. In many situations, we might hypothesize that and are *stochastically ordered*, meaning intuitively that samples from tend to be larger than those from . A particular type of stochastic order that arises in many applications is the *likelihood ratio order*. We say that and satisfy a likelihood ratio order if the density ratio is monotone non-decreasing over the support of , where and for some dominating measure . For this reason, the likelihood ratio order is also called a *density ratio order*.

A likelihood ratio order can arise for a variety of scientific reasons. For example, Dykstra et al. (1995) and Yu et al. (2017) considered its application to biomedical problems, while Beare and Moon (2015) and Roosen and Hennessy (2004) discussed numerous examples of its use in economics, business, and finance. Statistically, the likelihood ratio order assumption is a useful nonparametric generalization of many parametric and semiparametric models. For instance, monotone transformations of location families of log-concave densities satisfy a likelihood ratio order: if and , where , is the distribution function corresponding to a log-concave density, , and is non-decreasing, then is non-decreasing. As another example, if is any monotone exponential tilt of , meaning for a monotone function , then is clearly also monotone.

In this article, we address nonparametric estimation under the likelihood ratio order. We are especially interested in estimation and inference for the density ratio function . In the biomedical sciences and elsewhere, the ratio of two density functions is an object of interest for describing the relative likelihood of a binary status indicator conditional on a covariate. If

is a binary random variable,

is a scalar random variable, , , and , then the density ratio equals(1) |

Therefore, the density ratio in this context may be interpreted as the relative odds of

given to the overall odds of . Since the transformationis strictly increasing, monotonicity of the density ratio is equivalent to monotonicity of the conditional probability

in .One specific situation in which the representation given in (1) is of scientific interest is biomarker evaluation. Over the past few decades, there has been a rapid increase in the development of assays to measure the concentration of various biochemicals in human sera, with the goal of predicting clinical disease status. In these contexts, represents disease status and represents the value of a biomarker. Equation (1) implies that the ratio of the densities of biomarker values among infected patients to the same among uninfected patients can be interpreted as the odds ratio of infection given biomarker level relative to overall odds of infection. Monotonicity of the density ratio corresponds to the assumption that the conditional probability of infection given biomarker level increases with biomarker level, which is a reasonable assumption if the biomarker is actually predictive of disease.

A second example of situations in which a density ratio may be of scientific interest is experiments with continuous exposures and binary outcomes. Suppose now that represents a continuous exposure, and represents the potential outcome under assignment to exposure . The causal odds of on the potential outcome is then defined as . If is independent of the observed exposure , as is true in randomized experiments, and additional causal conditions hold, then the causal odds equals , where is the observed outcome. If is the density of in units with and is the density of in units with , then this further equals . In some contexts, it may be known that is monotone in the exposure , in which case the density ratio is as well.

In this article, we derive the nonparametric maximum likelihood estimators of , , and under the likelihood ratio order restriction and derive certain asymptotic properties of these estimators, including consistency and convergence in distribution. In particular, we connect estimation of to the classical isotonic regression problem with a binary outcome, which both simplifies the derivation of large-sample results and suggests that existing inference methods for the isotonic regression problem can be used to perform inference for as well. Our results generalize those of Dykstra et al. (1995), who derived the maximum likelihood estimator of and under a likelihood ratio order in the special case where and are discrete distributions. We illustrate our results using numerical experiments and an analysis of a biomarker for predicting bacterial infection in children with systemic inflammatory response syndrome.

Recently, Yu et al. (2017) considered estimation of a monotone density ratio function by maximizing a smoothed likelihood function, and demonstrated certain asymptotic properties of their estimator. Yu et al. (2017) considered maximizing a smoothed likelihood rather than maximizing the likelihood directly because they claimed a maximum likelihood estimator does not exist. However, we show that, using a definition of the likelihood ratio ordered model based on convexity of the ordinal dominance curve, a well-defined nonparametric maximum likelihood estimator does exist. Furthermore, unlike the smoothed estimator, the derivation of the maximum likelihood estimator does not rely on the existence of Lebesgue density functions, and works equally well when and are discrete or have discrete components.

As is common in the monotonicity-constrained literature, there are certain tradeoffs to maximizing the smoothed and non-smoothed likelihood functions. In particular, the non-smoothed estimator converges pointwise at the rate, while the smoothed estimator converges at the faster rate, albeit under stronger smoothness assumptions. While Yu et al. (2017) did not propose a method for conducting inference, smoothed estimators typically possess an asymptotic bias that complicates the task of performing valid inference. In contrast, we demonstrate that the non-smoothed estimator converges pointwise to a mean zero limit distribution, which we use to construct asymptotically valid inference. An additional benefit of the non-smoothed estimator is that it does not depend on a bandwidth or any other tuning parameter. Finally, while the smoothed estimator relies on absolute continuity of and with respect to Lebesgue measure, we demonstrate that the maximum likelihood estimator does not, and indeed performs well even with distributions with mixed continuous and discrete parts.

Additional relevant references include: Lehmann and Rojo (1992) and Shaked and Shanthikumar (2007), which contain more examples and details regarding stochastic orders, Carolan and Tebbs (2005) and Beare and Moon (2015), which studied tests of the likelihood ratio order, and Rojo and Samaniego (1991), Rojo and Samaniego (1993), Mukerjee (1996), Arcones and Samaniego (2000), Davidov and Herman (2012), and Tang et al. (2017), which considered testing and estimation under other stochastic orders.

## 2 Likelihood ratio orders

We observe two independent real-valued samples and , with distribution functions and , respectively. We define as the support of and as the support of . We denote , and by and the empirical distribution functions of and , respectively. We define as the unique values of , as the unique values of , and as the unique values of . Throughout, we assume that it is not the case that – i.e. we assume that for some . We also assume that as .

We let be the space of distribution functions on ; i.e. all non-decreasing, cádlàg functions such that and . For any nondecreasing function , we define its *generalized-inverse* pointwise as . When , is called the *quantile function* of . For any interval and any function , we define the *greatest convex minorant* (GCM) of on , denoted , for the extended real line, as the pointwise supremum of all convex functions on bounded above by . The least concave majorant operator is defined analogously. We say a function is *convex over a set * if for every and such that , . We also define as the left derivative operator for a left differentiable function.

The unrestricted nonparametric model for the pair of distribution functions of the observed data is . As mentioned in the introduction, the likelihood ratio order can be defined as the ratio of the density functions and of and , with respect to some dominating measure , being non-decreasing. By varying the dominating measure , both discrete and continuous distributions can be handled this way. However, as noted by Yu et al. (2017), this definition does not lend itself to the derivation of a maximum likelihood estimator, since the likelihood defined through the densities can be made arbitrarily large. Instead, other authors have defined the likelihood ratio order as convexity of the *ordinal dominance curve*, defined as for (Bamber, 1975; Hsieh and Turnbull, 1996). Lehmann and Rojo (1992) demonstrated the equivalence of this definition to that using the density functions in the special case that and are strictly increasing and continuous on their supports, which were assumed to be intervals. In Theorem 1 below, we generalize this result.

###### Theorem 1.

If and is continuous on the support of , then (1) is convex on if and only if is non-decreasing on , and (2) if is non-decreasing on then for all .

To our knowledge, Theorem 1 is the most general result to-date connecting the likelihood ratio ordered model, defined via monotonicity of the density ratio function, to convexity of the ordinal dominance curve. Theorem 1 then justifies the following definitions. We say satisfy a likelihood ratio order, and write if is convex on . We then define the likelihood ratio ordered model as all such that . For any , we further define as , where is defined as the set of non-negative, non-decreasing functions on . By Theorem 1, for all such that and is continuous on , on . We define .

In the context of the likelihood ratio order, many existing works either assume that and are discrete (e.g. Dykstra et al., 1995) or that and are continuous (e.g. Lehmann and Rojo, 1992; Yu et al., 2017). In the discrete setting, if and are discrete distributions with common support and mass functions and such that , then on . Alternatively, if and both possess Lebesgue density functions and and , then on . However, for the purpose of deriving a maximum likelihood estimator, we will demonstrate that these two cases do not need to be treated separately. Furthermore, in some applied settings, and are neither discrete nor continuous, but rather a mixture of discrete and continuous components, and we will derive results that apply in these situations as well. For instance, exposures that are bounded below may have positive mass at their lower boundary, and be continuous thereafter. Many biomarkers exhibit this property. Similarly, some measurements are “clumpy”, exhibiting positive mass at integers or other “round” numbers due to the measurement process, but also possessing positive Lebesgue density between such points. In all cases, has a meaningful interpretation as the ratio of the conditional odds of a sample being from the distribution to the unconditional odds of a sample being from .

## 3 Estimation under a likelihood ratio order

### 3.1 Maximum likelihood estimator

The pair

determines the joint distribution of the observed data. The nonparametric maximum likelihood estimator of

, i.e. in the model , is for the empirical distribution function of , and the same of . This suggests taking as an estimator of the plug-in estimator . The function is known as the*empirical ordinal dominance curve*, and is properties were studied by Hsieh and Turnbull (1996).

In this section, we demonstrate, amongst other results, that is the maximum likelihood estimator of in the likelihood ratio ordered model . Defining the nonparametric likelihood of the observed data as

a maximum likelihood estimator of in is defined as , and a maximum likelihood estimator of is defined as .

We define as the empirical distribution of the combined sample , and for . Our first result characterizes .

###### Theorem 2.

Let be the value at of the GCM over of and be the value at of the LCM over of . Then is a right-continuous step function with jumps at with and is given by a right-continuous step function with jumps at , where , and for any such that , where , the mass of at is given by

For any such that , the mass of at is given by

We also note that and .

A proof of Theorem 2, and proofs of all other theorems, are provided in Supplementary Material. We note that if there are such that no but , then there are infinitely many maximizers because any that assigns mass to the interval yields the same likelihood and satisfies the constraints. In these cases, for the sake of uniqueness, we will put mass at the point . Theorem 2 agrees with the main result of Dykstra et al. (1995) in the special case that and are finite discrete distributions.

Theorem 2 implies the following result characterizing .

###### Corollary 1.

The points lie on the GCM over of the empirical ordinal dominance curve , where . Specifically, if are the vertices of the GCM of , then are the vertices of the GCM of the empirical ordinal dominance curve. Therefore, is equal to .

We illustrate the use of Theorem 2 and Corollary 1 using hypothetical data. Suppose that and . We first derive . The points are given by , and its GCM is given by . This is displayed in the upper left panel of Figure 1. The values of the GCM imply that , , , and . We then have that and . The estimators and are compared in the bottom left panel of Figure 1.

We next derive . The points are given by , and its LCM is given by . This is displayed in the center left panel of Figure 1. The values of the LCM imply that , , , and . The estimators and are compared in the bottom left panel of Figure 1.

Finally, we derive . The empirical ordinal dominance curve is given by the points , , , , , and the vertices of its GCM are given by , , . This is displayed in the bottom left panel of Figure 1. The left-hand slopes of the GCM are on the interval and on the interval , which implies that for and for . This is displayed in the bottom right panel of Figure 1.

### 3.2 Representation as a transformation of isotonic regression

The form of can be derived in a simpler way without relying on Theorem 2 by reframing the problem as a transformation of an isotonic regression with a binary outcome. We let be independent Bernoulli random variables with common probability and such that . Letting be the indices such that for each , we then define for each . Similarly, letting be the indices such that for each , we define for each . Defining the data unit , observing the independent samples from and from is then equivalent to observing independent observations from , where satisfies

Thus, represent the pooled values of , and each represents an indicator that corresponds to a sample from . Furthermore, , , and . Estimating given the independent samples and is therefore equivalent to estimating given independent observations from , where .

The benefit to the above reframing of the problem is that , , and can then be written as transformations of . First, we have that , where and is the odds transformation, defined as . Since is strictly increasing, is monotone if and only if is. Since the maximum likelihood estimator of under the assumption that is non-decreasing is given by the isotonic regression of on , and the maximum likelihood estimator of is given by , the maximum likelihood estimator of is then given by . Similarly, and . Since the maximum likelihood estimator of is given by the empirical distribution of , the maximum likelihood estimators of and are given by and . It is straightforward to see that these forms of the maximum likelihood estimators are equivalent to the forms given above. In the next section, we will utilize this form of

to derive its asymptotic properties and to construct asymptotic confidence intervals.

## 4 Asymptotic results

### 4.1 Discrete distributions

We first consider the situation where both and have finite support and , which in this case corresponds to the ratio of the mass functions , is strictly increasing on . In this case, is strictly convex on , and with probability tending to one, the empirical ordinal dominance curve is also strictly convex. As a result, and , where denotes the supremum norm. Therefore, letting and be the mass functions corresponding to and , respectively, we have that and

converge in distribution to independent normal distributions with mean

and variances

and , respectively. A straightforward application of the delta-method then implies that converges in distribution to a mean-zero normal with variance .### 4.2 Continuous distributions

Now we address the situation where and are both absolutely continuous on and , which now corresponds to the ratio of the density functions, is strictly increasing. We first consider the large-sample behavior of and . This study is aided by the work of Beare and Fang (2017), who demonstrated that the LCM operation is a directionally Hadamard differentiable mapping at any concave function. In particular, the Hadamard derivative at a concave is equal to the identity operator if and only if is strictly concave. The functional delta-method therefore implies that and since is strictly convex by assumption. When has flat sections, so that has affine sections, the form of the Hadamard derivative provided by Beare and Fang (2017)

can be used in conjunction with the chain rule to derive the weak limits of the processes

and . Since the Hadamard derivative of the GCM operation is weakly contractive, these limit processes are more concentrated than those of and , which implies consistency at the rate of and .We now turn to large-sample results for at points where both and possess Lebesgue density functions and , respectively. First, consistency of implies consistency of .

###### Theorem 3 (Consistency).

If is continuous at , is continuous at , and , then . If and are uniformly continuous on , then for any strict sub-interval .

We recall that, at any such that is positive and continuous in a neighborhood of , , and is continuously differentiable in a neighborhood of , it holds that

(2) |

where follows *Chernoff’s distribution*, defined as the point of maximum of for a two-sided standard Brownian motion originating from zero. We can then use the delta-method to see that

The scale parameter in the above limit distribution is equal to for

This yields the following result.

###### Theorem 4 (Pointwise convergence in distribution).

Suppose that, in a neighborhood of , is continuously differentiable with , and and are positive and continuous. Then

Theorem 4 provides a means to construct asymptotically valid confidence intervals for at any such that . Defining as an estimator of and the quantile of , a Wald-type confidence interval for is given by

(3) |

If , then this interval has asymptotic coverage of . The quantiles of were computed by Groeneboom and Wellner (2001), and in particular .

In practice, we recommend an alternative method to constructing confidence intervals for . We recommend first constructing confidence intervals for using either of two existing methods, then transforming these intervals into intervals for . Specifically, if represents a confidence interval for