Log In Sign Up

Large Deviations of the Estimated Cumulative Hazard Rate

by   Niklas Hohmann, et al.

Survivorship analysis allows to statistically analyze situations that can be modeled as waiting times to an event. These waiting times are characterized by the cumulative hazard rate, which can be estimated by the Nelson-Aalen estimator or diverse confidence estimators based on asymptotic statistics. To better understand the small sample properties of these estimators, the speed of convergence of the estimate to the exact value is examined. This is done by deriving large deviation principles and their rate functions for the estimators and examining their properties. It is shown that these rate functions are asymmetric, leading to a tendency of the estimated cumulative hazard rate to overestimate the true cumulative hazard rate. This tendency is strongest in the cases of (1) small sample sizes and (2) low tail probabilities. Taking this tendency into account can improve risk assessments of rare events and of cases where only little data is available.


page 1

page 2

page 3

page 4


Non-parametric estimation of cumulative (residual) extropy

Extropy and its properties are explored to quantify the uncertainty. In ...

On weighted cumulative residual extropy and weighted negative cumulative extropy

In this paper, we define general weighted cumulative residual extropy (G...

Distribution sensitive estimators of the index of regular variation based on ratios of order statistics

Ratios of central order statistics seem to be very useful for estimating...

Conditional quantile estimators: A small sample theory

This paper studies the small sample properties and bias of just-identifi...

Asymptotic Properties for Cumulative Probability Models for Continuous Outcomes

Regression models for continuous outcomes often require a transformation...

1 Introduction

Survival analysis is the standard framework to statistically analyze waiting times until an event occurs (Miličič, 2008). Classical applications of survival analysis arise in epidemiology, where the event can be the recovery or the death of a patient, and the results of the statistical analysis can decide on the admission or nonadmission of a new medical procedure (Sasieni and Brentnall, 2014; Kantoff et al., 2010).
A fundamental concept in characterizing the waiting time up to an event is the cumulative hazard rate (short CH)(Aalen, Borgan, and Gjessing, 2008)

. It is estimated using the Nelson-Aalen estimator and can be complemented by different confidence intervals and bands to compensate for uncertainties of the results

(Aalen, Borgan, and Gjessing, 2008; Aalen, 1978; Bie, Borgan, and Liestøl, 1986; Nelson, 1969, 1972).
However these confidence area estimators are based on the asymptotic properties of the Nelson-Aalen estimator(Bie, Borgan, and Liestøl, 1986), making it impossible to derive analytical statements regarding their performance for small sample sizes. As a result, no reliable assessments of the uncertainties for small samples are available, which is problematic for studies where larger sample sizes cannot be achieved, be it because of ethical considerations, financial reasons, or rarity.
The aim of this paper is to derive analytical expressions regarding the small sample properties of the estimated CH to better assess its deviations from the true CH in these cases.
For this, the pointwise speed of convergence of the estimated CH to the true CH is examined using the theory of large deviations.

2 Outline

The conventions and will be used throughout the paper.

be i.i.d. positive random variables, whose values model the waiting times for an event to occur. With the survival function

(Kleinbaum and Klein, 2010, p. 9), the cumulative hazard rate (short CH) can be written as(Kleinbaum and Klein, 2010, p. 294)


It can be estimated using the -th empirical CH


where is the -th empirical survival function. The aim of this paper is to examine the pointwise speed of convergence of towards .
From the theory of large deviations, it is known that for integrable i.i.d. , the relation


holds for a large class of sets (Varadhan, 2016). The function is called a rate function and determines the speed of convergence of the averaged random variables towards their expectation value.
In this paper, a rate function for the is derived (section 3). The properties of this rate function are then examined to draw conclusions about the convergence of to (section (4).
For this, fix any , and define the tail probabilities as . Assume that , since the cases and are trivial. Define the i.i.d. random variables


so . The behavior of at is uniquely determined by the , since


3 Establishing the Rate Functions

First, a large deviation principle (LDP) for the is derived. By either Cramér’s theorem (Cramér, 1938(@)(Klenke, 2008, p. 508) or by Sanov’s theorem (Sanov, 1958)(Klenke, 2008, p. 518), the series of probability measures


satisfies a large deviation principle with rate and rate function



. This is the relative entropy of two Bernoulli distributions, one with success probability

and one with success probability (Klenke, 2008, p. 515).
Applying the contraction principle (Klenke, 2008, p. 518) to this LDP and the function shows that the series satisfies a LDP with rate and rate function


for . This is the rate function of , and is displayed in fig. 1 for four different values of . Next, substitute and define the centered rate function as


for and . It is defined since the main interest of this examination is to compare the behavior of the rate functions for different close to the pointwise limit value of the empirical cumulative hazard rate. This value is shifted to the origin in the centered rate function and therefore allows to compare the behavior of the empirical cumulative hazard rate at fixed distances from the respective limit values for different .

4 Properties of the Rate Functions

4.1 Monotonicity in p

In this section, it is shown that , taken as a function of , is strictly increasing. This implies that the speed of convergence is decreasing as increases and correspondingly the tail probabilities decrease. Without loss of generality, it is assumed that takes on every tail probability, so is well defined for all .
First, the function can not be expected to be strictly increasing in for , since by definition for all . Therefore the case will be excluded.
The first derivative of with respect to is given by


and the second derivative of with respect to by


Since the inequality holds for all and all in the domain of , termwise analysis of eq. (13) shows that the second derivative is positive for all feasible , and zero only when , which was excluded above. This makes the second derivative strictly positive, so is strictly convex in .
If as defined in eq. (11) is taken as a function of for , it is well-defined and its first derivative, evaluated at , yields


So is strictly convex in and its gradient at is strictly positive, therefore is strictly increasing in for . This shows that the rate of convergence is decreasing for all as the tail probabilities decrease.

4.2 Asymmetry in z

First, it is shown that is not axis symmetric with respect to the ordinate axis. As an aid, the identity


is used. Splitting the fraction in the logarithm in from equation (11) and then applying the power series from eq. (15) with yields


for . Therefore is asymmetric by the asymmetry of .
Next, the symmetry defect of , given by , is examined. For this, the identities


are used. With the representation of in eq. (16), directly subtracting and yields


Using only the second order term of the sum gives the approximation for the symmetry defect


for and . These two statements show that the rate of convergence is not symmetric around the true cumulative hazard rate , and will always be slower from above the cumulative hazard rate than from below.

5 Example

Let be a positive random variable with distribution function and survival function . Taking the tail probabilities as functions of , meaning , the rate function in equation (9) is a function in the variables and :


For the case where

is exponentially distributed with mean

, the contour plot of is displayed in fig. 2, alongside the cumulative hazard rate. The asymmetry of the rate functions is clearly visible.

6 Discussion

The main result of this paper is that the estimate of the cumulative hazard rate (short CH) has a tendency to overestimate the true CH, which is strongest in the cases of (1) small sample sizes or (2) small tail probabilities. This is a direct conclusion from the asymmetry of the rate function governing the convergence of the estimated cumulative hazard rate in combination with the fundamental relation of the theory of large deviations given in equation (3). The degree of this effect is determined by the tail probabilities and therefore unique for every distribution.
It is notable that the asymmetry observed is not generated by the distribution of the random variables, but is rooted in the asymmetry of the rate function of the Bernoulli distribution. It is amplified by the logarithm in the definition of the cumulative hazard rate, and unavoidable in the sense that it naturally arises from observing whether a random variable takes on a value over or under a given threshold.
What remains unclear is how these results translate into the standard framework of survival analysis, i.e. censoring and using point processes instead of i.i.d. random variables. Especially censoring might have a strong influence on the case with low tail probabilities, since early censoring can render long survival times (that are commonly associated with low tail probabilities) irrelevant. Extending the results presented in this papers to this more general setting is relevant for applications, but requires further work.

Figure 1: The rate function (thick black line) from equation (9) for tail probabilities (top left), (top right), (bottom left), and (bottom right). The grey square is located at , which is the minimum of the rate functions and the exact value of the cumulative hazard rate at the given tail probability. The dotted lines indicate the values for .
As the tail probabilities decrease, the rate function to the right of the grey square becomes increasingly flat, which indicates a low rate of convergence to the cumulative hazard rate. Although the same effect can be observed on the left side of the grey square, it is a lot weaker. The figure was generated using R(R version 3.2.3, 2015).
Figure 2: A contour plot showing lines of equal rate of convergence (thin black lines) to a cumulative hazard rate (thick grey dashed line). The black contour lines are determined by from eq. (22) for the special case of an exponential distribution with mean and are lines of equal rate of convergence for the rates , where . The grey dashed line is the cumulative hazard rate, which is a linear function through the origin with gradient . The four rate functions displayed in fig. 1 are sections through this contour plot along the dotted black lines: The grey squares in fig. 1 are the intersection of the dotted black lines and the dashed grey cumulative hazard rate, the thin dashed lines in fig. 1 represent some of the contour lines from the figure shown here.
As increases and the tail probabilities decrease accordingly, the lines on the upper left half of the picture diverge faster from the cumulative hazard rate than on the lower right half of the picture, showing that the rate of convergence is lower on the upper left side.
Note that since the rate functions are only determined by the tail probabilities, and all tail probabilities occur in an exponential distribution, the corresponding contour plot for any random variable can be derived from this plot by a transformation of the -axis. The figure was generated using R(R version 3.2.3, 2015).


  • Aalen (1978) Aalen, O. 1978. Nonparametric inference for a family of counting processes. The Annals of Statistics 6(4):701–726.
  • Aalen, Borgan, and Gjessing (2008) Aalen, O., O. Borgan, and H. Gjessing. 2008. Survival and event history analysis: a process point of view. New York: Springer.
  • Bie, Borgan, and Liestøl (1986) Bie, O., Ø. Borgan, and K. Liestøl. 1986. Confidence intervals and confidence bands for the cumulative hazard rate function and their small sample properties. Scandinavian Journal of Statistics 9(3):221–233
  • Cramér (1938(@) Cramér, H. Sur un nouveau théorème-limite de la théorie des probabilités. Actualités Scientifiques et Industrielles 736:5–23.
  • Kantoff et al. (2010) Kantoff, P. W., T. J. Schuetz, B. A. Blumenstein, L. M. Glode, D. L. Bilhartz, M. Wyand, K. Manson, D. L. Panicali, R. Laus, J. Schlom, et al. 2010. Overall survival analysis of a phase ii randomized controlled trial of a poxviral-based psa-targeted immunotherapy in metastatic castration-resistant prostate cancer. Journal of Clinical Oncology 28(7):1099.
  • Kleinbaum and Klein (2010) Kleinbaum, D. G., and M. Klein. 2010. Survival analysis. New York: Springer.
  • Klenke (2008) Klenke, A. 2008. Probability Theory. A Comprehensive Course. London: Springer.
  • Miličič (2008) Miličič, B. 2008. Survival Analysis, p. 1367–1371. Dodrecht, Netherlands: Springer.
  • Nelson (1969) Nelson, W. (1969) Hazard plotting for incomplete failure data. Journal of Quality Technology 1(1):27–52.
  • Nelson (1972) Nelson, W. 1972. Theory and applications of hazard plotting for censored failure data. Technometrics 14(4):945–966.
  • R version 3.2.3 (2015) R Core Team. 2015. R: A Language and Environment for Statistical Computing (version 3.2.3). Vienna, Austria: R Foundation for Statistical Computing.
  • Sanov (1958) Sanov, I. N. 1958. On the probability of large deviations of random variables. Technical report, North Carolina State University. Dept. of Statistics.
  • Sasieni and Brentnall (2014) Sasieni, P. D., and A. R. Brentnall. 2014. Survival Analysis, p. 1195–1239. New York: Springer.
  • Varadhan (2016) Varadhan, S. R. S. 2016. Large deviations. Rhode Island: American Mathematical Society.