    # The Delta-Method and Influence Function in Medical Statistics: a Reproducible Tutorial

Approximate statistical inference via determination of the asymptotic distribution of a statistic is routinely used for inference in applied medical statistics (e.g. to estimate the standard error of the marginal or conditional risk ratio). One method for variance estimation is the classical Delta-method but there is a knowledge gap as this method is not routinely included in training for applied medical statistics and its uses are not widely understood. Given that a smooth function of an asymptotically normal estimator is also asymptotically normally distributed, the Delta-method allows approximating the large-sample variance of a function of an estimator with known large-sample properties. In a more general setting, it is a technique for approximating the variance of a functional (i.e., an estimand) that takes a function as an input and applies another function to it (e.g. the expectation function). Specifically, we may approximate the variance of the function using the functional Delta-method based on the influence function (IF). The IF explores how a functional ϕ(θ) changes in response to small perturbations in the sample distribution of the estimator and allows computing the empirical standard error of the distribution of the functional. The ongoing development of new methods and techniques may pose a challenge for applied statisticians who are interested in mastering the application of these methods. In this tutorial, we review the use of the classical and functional Delta-method and their links to the IF from a practical perspective. We illustrate the methods using a cancer epidemiology example and we provide reproducible and commented code in R and Python using symbolic programming. The code can be accessed at https://github.com/migariane/DeltaMethodInfluenceFunction

03/26/2019

### Estimation of a regular conditional functional by conditional U-statistics regression

U-statistics constitute a large class of estimators, generalizing the em...
03/08/2019

### Kernel Based Estimation of Spectral Risk Measures

Spectral risk measures (SRMs) belongs to the family of coherent risk mea...
10/25/2017

### Asymptotically Efficient Estimation of Smooth Functionals of Covariance Operators

Let X be a centered Gaussian random variable in a separable Hilbert spac...
02/22/2021

### A Small-Uniform Statistic for the Inference of Functional Linear Regressions

We propose a "small-uniform" statistic for the inference of the function...
12/18/2019

### Estimation of Smooth Functionals in Normal Models: Bias Reduction and Asymptotic Efficiency

Let X_1,..., X_n be i.i.d. random variables sampled from a normal distri...
12/07/2021

### A generalization gap estimation for overparameterized models via the Langevin functional variance

This paper discusses the estimation of the generalization gap, the diffe...
03/20/2019

### On approximate validation of models: A Kolmogorov-Smirnov based approach

Classical tests of fit typically reject a model for large enough real da...

## 1 Introduction

A fundamental problem in inferential statistics is to approximate the distribution of an estimator constructed from the sample (

i.e. a statistic). The standard error (SE) of an estimator characterises its variability.Boos2013 Oftentimes, it is not directly the estimator which is of interest but a function of it. In this case, the Delta-Method can approximate the standard error (with known asymptotic properties) using Taylor expansions because a smooth function of an asymptotically normal estimator is also asymptotically normal. Vaart1998 In a more general setting, this technique is also useful for approximating the variance of some functionals. For instance, in epidemiology the Delta-method is used to compute the SE of functions such as the risk difference (RD) and the risk ratio (RR),Agresti2010

which are all functions of the risk (a parameter representing the probability of the outcome).

Armitage2005, Boos2013 Alternatively to the Delta-method to approximate the distribution of the SE Boos2013, MillarMaximumADMB for large samples, we can use other computational methods such as the bootstrap.Efron1993, efron1982 In the course of their research, it may be necessary for applied statisticians to assess whether a large sample approximation of the distribution of a statistic is appropriate, how to derive the approximation, and how to use it for inference in applications. The distribution of the statistic must be approximated to directly estimate its variance and hence the SE because the number and type of inference problems for which it can be analytically determined is narrow.

In this tutorial we introduce the use of the classical and functional Delta-method, the Influence Function (IF), and their relationship from a practical perspective. Hampel introduced the concept of the IF in 1974.hampel1974 He highlighted that most estimators can actually be viewed as functionals constructed from the distribution functions. The IF was further developed in the context of robust statistics but is now used in many fields, including causal inference.hampel1974 The IF is often used to approximate the SE of a plug-in asymptotically linear estimator.Tsiatis:2007aa Mathematically, the IF is derived using the second term of the first order Taylor expansion used to empirically approximate the distribution of the plug-in estimator.Boos2013 It can be easily derived for most common estimators and it appears in the formulas for asymptotic variances of asymptotically normally distributed estimators. The IF is equivalent to the normalized score functions of maximum likelihood estimators.hampel1974

Furthermore, the tutorial includes boxes with R code (R Foundation for Statistical Computing, Vienna, Austria)R2020 to support the implementation of the methods and to allow readers to learn by doing. The code can be accessed at https://github.com/migariane/DeltaMethodInfluenceFunction

. In section 1, we introduce the importance of the Delta-method in statistics and justify the need of a tutorial for applied statisticians. In section 2, we review the theory of the classical and functional Delta-methods and the influence function (IF). In section 3, we provide multiple worked examples and code for applications of the classical and functional Delta-method, and the IF. The first examples involve deriving the SE for the sample mean of a variable, the ratio of two means of two independent variables, and the ratio of two sample proportions (i.e. the risk ratio). Also, we provide a example where the required conditions for the for the Delta-method do not hold. We then show how to use the functional Delta-method based on the IF to derive the SE for the quantile function and the correlation coefficient. Our final example is motivated by an application in cancer epidemiology and involves a parameter of interest that is a combination of coefficients in a logistic regression model. Finally, in section 4, we provide a concise conclusion where we mention additional interconnected methods with the Delta-method and the IF such as M-estimation and the Huber Sandwich estimator.

## 2 Theory: The Classical Delta-method

Let be a parameter. For this tutorial, we are interested in working with an estimand that can be written as a function of (i.e., ) rather than itself. For instance, we may not be interested in the probability of having a particular disease, but in the ratio of two probabilities , where the first probability () is of developing the disease under treatment, the second () is of developing the disease without treatment. The estimand represents the relative risk. Define the estimator of to be , the ratio of the estimators of the respective probabilities. The question is: if we know the variances of and , how do we obtain the variance of ? The Delta-method is one approach to answer this.

Let be an estimator of from a random sample where the s are independent and identically distributed (i.i.d) with a distribution defined with a parameter (i.e. ). Examples of parameters include the rate of an exponential variable (), the mean and variance of a normal distribution or the probability of a specific category under a multinomial model with different categories: with .

Any (measurable) function of the random sample is called a statistic.Casella1998TheoryEstimation In particular, any estimator of is a function of the random sample making it a statistic. For example, if , the mean, is a function of the s. To emphasize the dependency of the estimator, , on the sample size, , we write: . Thus would denote the estimator under a random sample of size and denotes the estimator under a random sample “of infinite size”. Any (measurable) function of the estimator,

also depends upon the random sample and hence it is a statistic too. Due to the dependency upon the random sample, any statistic by itself is a random variable. We can thus characterise the estimator in terms of its distribution. As an example, if the

are i.i.d. then also has a normal distribution with parameters . Furthermore, the statistic has a distribution.

More often than not, the distribution of a statistic cannot be estimated directly and we rely on the asymptotic (large sample) properties of where approaches

. A most powerful and well-known result is the central limit theorem which states under reasonable regularity conditions (i.i.d. variables with mean

)Billingsley1961StatisticalChains that if then, for large ,

 √n(^θn−μ)approx∼%Normal(0,σ2) (1)

which is the property that allows us to construct the Wald-type asymptotic confidence intervals:

.Agresti2012ApproximateProportions

However, when the function – a function of one or more estimators with large-sample normality with known variance – is not linear (e.g. the ratio of two proportions) and there is not a closed functional form to derive the SE, we use the Delta-method. The classical Delta-method states that under certain regularity conditions for the function , the statistic , and the i.i.d. random variables s, the distribution of can be approximated via a normal distribution with a variance proportional to ’s rate of change at , the derivative . In the one dimensional case of and , if is asymptotically normal, this theorem states that, for large (Appendix: Delta-method proof):

 √n(ϕ(^θn)−ϕ(μ))approx∼Normal(0,ϕ′(θ)σ2).

This provides the researcher with confidence intervals based on asymptotic normality:

 ^θn±Z1−α/2⋅√ϕ′(θ)σ2n.

To better understand the Delta-method we need to discuss four concepts. First, we need to discuss how derivatives approximate functions such as via a Taylor expansion. Second, we describe convergence in distribution which is what allows us to characterise the asymptotic properties of the estimator. Third, we present the central limit theorem, which is at the core of the Delta-method. Finally we’ll generalize these results to the functional Delta-method using influence functions.

### 2.1 Taylor’s Approximation ((a)) The derivative of the function ϕ(θ)=eθ (black) corresponds to the slope of the tangent line T1 (green) which is constructed by taking the line L(x)=ϕ(θ+h)−ϕ(θ)h(x−θ)+ϕ(θ) (red) and sending h→0.

For Taylor’s approximation to work we need to have a function that is differentiable at . Following the classical definition of differentiability,Courant1988DifferentialCalculus a real valued function with domain , a subset of , () is differentiable at and has derivative if the following limit exists:

 ϕ′(θ):=limh→0h≠0ϕ(θ+h)−ϕ(θ)h.

Intuitively, this definition states that one can estimate a unique tangent line to with slope at by calculating the values of the function at and and reducing the size of (see Figure 0(a)).

This definition can be extended to the multivariate case via directional derivatives (Gâteaux derivatives).Gateaux1919FonctionsIndependantes In multiple dimensions, there is no one unique tangent line that can be generated (see Figure 0(b)); hence, in addition to the function

, one must also specify the direction of the vector

in which the tangent line will be calculated. This results in , the derivative of at in the direction :111 You might notice a slight change in notation where the limit is stated as instead of the classical . The notation implies that the limit is taken with decreasing towards zero in order to distinguish the direction from .

 ~∂vϕ(θ):=limh↓0h≠0ϕ(θ+h⋅v)−ϕ(θ)h. (2)

As an example, Figure 0(b) is a graph of the function with two different vectors and . Each vector results in a different directional derivative, and , respectively, corresponding to the slopes of the tangent lines in the directions of and respectively.

It turns out that for the Delta-method to be generalized to functionals (i.e. functions of functions) having a Gâteaux derivative is not enough. We require not only that the directional derivative exists but also that it exists and coincides with the one obtained for any sequence of directions that converge to (i.e. ). This is called (equivalently) the compact derivative or the HadamardBeutner2016FunctionalFunctionals (one-sided directional) Zajicek2014HadamardDifferentiability derivative of at in the direction (as long as it is a linear function for any ) and is usually denoted as:

 ∂vϕ(θ):=limh↓0h≠0ϕ(θ+h⋅vh)−ϕ(θ)h for any sequence vh→v as h↓0. (3)

This concept is illustrated in Figure 0(c) where the specific sequence converges to .

An equivalent definition of the Hadamard (one-sided directional) derivative which is useful for calculations involves setting for some function and with which allows us to rewrite (3) as:

 ∂G−θϕ(θ)=limh↓0h≠0ϕ((1−h)θ+h⋅Gh)−ϕ(θ)h for any sequence Gh→G as h↓0. (4)

In the particular case of a constant sequence such that the expression reduces to a Gâteaux derivative which can oftentimes be computed as a classical derivative. We discuss a particular case of this derivative, the influence function, IF, (also known as influence curve) in Section 2.4. It is interpreted as the rate of change of our functional in the direction of a new observation, .

Recall that the derivative, , represents the slope of the line tangent to the function. Intuitively, if is close to , the tangent line at should provide an adequate approximation of Figure 0(d)). This is stated in the Taylor first order approximation of around as follows:

 ϕ(^θn)≈ϕ(θ)+∂vϕ(θ) (5)

with and the sign is interpreted as approximately equal. This can be rewritten as the more classical approach:

 ϕ(^θ)−ϕ(θ)≈∂vϕ(θ) with% v=^θ−θ. (6)

Readers might be familiar with the theorem in the classical notation of univariate calculus which states the approximation:

 ϕ(^θ)≈ϕ(θ)+ϕ′(θ)(^θ−θ)v. (7)

In this case the Hadamard derivative coincides with the classical one multiplied by :

 ∂vϕ(θ)=ϕ′(θ)(^θ−θ).

The justification for this connection is given by Fréchet’s derivative which represents the slope of the tangent plane. Intuitively, if the Hadamard (one-sided directional) derivatives exist for all directions we can talk about the tangent plane to at . The tangent plane is “made up” of all the individual (infinite) tangent lines. The slope of the tangent plane is the Fréchet derivative .Zajicek2014HadamardDifferentiability, ciarlet2013linear. For univariate functions in the Fréchet derivative is ; for functions of a multivariate returning one value, , this derivative is called the gradient and corresponds to the derivative of the function by each entry:

 ∇ϕ=(∂ϕ∂θ1,∂ϕ∂θ2,…,∂ϕ∂θn).

For multivariate functions, , the Fréchet derivative is an matrix called the Jacobian (matrix):

 ∇ϕ=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝∂ϕ1∂θ1∂ϕ1∂θ2…∂ϕ1∂θn∂ϕ2∂θ1∂ϕ2∂θ2…∂ϕ2∂θn⋮⋮⋱⋮∂ϕm∂θ1∂ϕm∂θ2…∂ϕm∂θn⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠. (8)

To obtain the Hadamard (one-sided directional) derivative from the Fréchet derivatives, either or , one needs to apply the derivative operator to the direction vector . This operation can be seen as “projecting” the tangent plane into the direction of hence resulting in the directional derivative:

 ∂vϕ(θ)=ϕ′(θ)⋅v or ∂vϕ(θ)=∇ϕ(θ)Tv. (9)

Thus the notation in (6) which we’ll use for the remainder of the paper includes not only the functional scenario but also the classical cases of functions in and respectively which can be obtained as the usual (classical) Fréchet derivatives projected onto .

Finally, as a side note, we remark that it is possible to improve the approximation via higher order Taylor’s expansion around (see 0(d))Courant1988DifferentialCalculus, ren2001second:

 Tn(^θ)=ϕ(θ)+n∑i=1ϕ(n)(θ)⋅(θ−^θ)nn!

where denotes the -th derivative of defined as the derivative of the -th derivative. Readers interested in pursuing higher order Hadamard derivatives can consult Ren and Sen (2001) and Tung and Bao (2022) REN2001187, tung2022higher.

### 2.2 Convergence in distribution

For any random variable,

, the cumulative distribution function (CDF), also commonly referred to as the distribution function, quantifies the probability that

is less than or equal to a real number . Thus ’s (i.e., the CDF) is given by:

 FX(z)=P(X≤z)

where the sign is interpreted pointwise if is a random vector of size (i.e. implies , , etc. for the vector ). The distribution function completely determines all the probabilities associated with a random variable as, for example, can be estimated as for any .

Given a statistic that depends upon the sample size, , the statistic’s distribution function also depends on . Let denote the distribution of and be the distribution of a random variable, . We say that converges in distribution to the random variable if the CDF of and the distribution of coincide at infinity:

 limn→∞Fn=FΘ.

We remark that convergence in distribution does not imply that the random variables and are the same; it solely entails that the probabilistic model of and are identical (e.g. both are ) They are different random variables with a common distribution. Convergence in distribution is usually interpreted as an approximation stating that for large , the distribution of is approximately (written ). Figure 2: . Convergence in distribution for the transformation Zn=√n⋅^θn−λ√λ under different sample sizes, n, when the sample of Xis comes from a Poisson distribution with parameter λ=1 and ^θn=1n∑ni=1Xi.

One of the most important results concerning convergence in distribution is the Central Limit Theorem (CLT). The CLT applies to any random sample with and finite variance: . It states that the error of the sample mean, , times the square root of the sample size is normally distributed:

 √nσ2(^θn−μ)d→Z, (10)

where and stands for convergence in distribution as . Figure 2 illustrates the distribution of for different sample sizes, , when the s are distributed.

### 2.3 Two sides of the same coin: the classical and functional Delta-method

The Delta-method uses both the Taylor approximation and the concept of convergence in distribution. It states that if for some series of numbers that depend on the sample size, with , we have that converges in distribution to then the weighted difference, , converges to the distribution of the derivative of in the direction of :

 rn(ϕ(^θn)−ϕ(θ))d→∇ϕ(θ)TZ,

as long as is a function that can be approximated via its Taylor Series around . Examples of numbers include as in (10). The idea behind the Delta-method relates to the fact that we can transform (7) into:

 rn(ϕ(^θ)−ϕ(θ))≈∇ϕ(θ)Trn(^θn−θ)d→∇ϕ(θ)TZ (11)

where the random quantity, converges in distribution to and thus converges (approximately) to (the derivative in the direction of ).

In practical terms this implies that the variance of can be approximated by an scaling of the variance of , i.e.:

 (12)

The same idea can be extended when the parameter of interest, , is not a real number (or vector of numbers) but a function. In this case, is a functional (i.e. a function of functions) and the corresponding method is oftentimes called the functional Delta-method. The result is that if with now denoting a random function, then:

 rn(ϕ(^θn)−ϕ(θ))d→∂Zϕ(θ) (13)

where denotes the Hadamard derivative of as in (6). We remark that the theorem of (13) is general in the sense that it works for classical derivatives (), gradients and jacobians (), and Hadamard derivatives () all following the notation from (3).

The reader is invited to consult the supplementary material for the classical proof of the Delta-Method as well as the more general proof of the functional one.

### 2.4 The influence function

It is common to represent scientific questions by estimands (i.e., a quantity we are interested in estimating from our data). For example, suppose we are interested in a random variable which follows a (possibly unknown) discrete distribution . The variable might be a binary indicator for disease status, for example, in a particular population. If we are interested in the probability of having the given disease, our estimand is . In this case, we have , the estimand is equivalent to the expectation of , i.e.

. The estimand can thus be seen as the parameter of the Bernoulli distribution. However a second interpretation is of importance: the estimand can also be seen as a

functional as it takes a function – specifically, the probability mass function – as an input and applies a function to it: the expectation. For taking discrete values, we have

 ψ=ϕ(PX)=∑x∈Xx⋅PX(X=x)=E[X] (14)

where denotes the support (i.e. possible values) of . In the binary case, . If is continuous, an estimand defined as the expectation of

is a functional of the probability density function

, such that .

It is important to highlight that the estimand , which represents our scientific question, relates to a functional of the mass . Following the previous notation, we have that . If we have a random sample, , we can compute the empirical probability mass function (ePMF):

 ^θ=^PX(z):=1nn∑i=1I{Xi}(z)(=Number of Xis=zn) (15)

where the indicator function of a set is defined as

 IA(z):={1 if z∈A,0 otherwise.

The ePMF can be used to estimate , which gives us . This is called a “plug-in” estimator, as we plug the estimator of (i.e. of ) into the function . In the above example, this implies calculating:

 ^ψ=ϕ(^θ)=ϕ(^PX)=∑x∈Xx⋅^PX(X=x)

which, for an observed dataset is equivalent to taking its meanVaart1998:

 ^ψ=∑x∈Xx⋅^PX(X=x)=∑x∈Xx[1nn∑i=1I{xi}(x)]=1nn∑i=1∑x∈Xx⋅I{xi}(x)=1nn∑i=1xi

where the last equality follows from the fact that only when and in that case the product is (we exchange with by using that in this scenario). The cases where don’t appear in the sum as results in adding to the sum.

The functional notation of allows us to study the robustness of our estimations using Hadamard derivatives. In particular, if the data are distributed according to the mass we can study the rate of change from distribution in the direction of another distribution, , by analyzing the derivative:

 ∂Q−PXϕ(PX)=limh↓0h≠0ϕ((1−h)PX+h⋅Q)−ϕ(PX)h

where we have substituted for all and in (4).

Intuitively this quantifies the rate of change in if the model deviates a little from towards (for example in the case of noisy data). Choosing as the indicator of the set that only contains the value (2.4) we can study the rate of change of in the direction of an observation, . In particular stands for the model that assigns probability to taking the value . Hence the derivative analyzes how an observation, , influences our estimation of .

The Hadamard derivative, in this special case, is called the influence function (IF) of the functional under model at and is denoted:

 (16)

The IF stands for the Hadamard derivative in a special case, thus the Taylor expansion in (5) can be rewritten as:

 ϕ(^PX)^ψ≈ϕ(PX)ψ+IFϕ,PX(Y)∂ϕI{Y}−PX(PX). (17)

Note that the Hadamard derivative establishes the change of value of a parameter (written as a functional) resultant from small perturbations of the estimator in the direction of

. Plotting the IF provides a tool to discover outliers and is informative about the robustness of the estimator

. Finally, if the difference is (asymptotically) normally distributed, the Delta-method implies that:

 ^ψn−ψ=ϕ(^θn)−ϕ(θ)approx∼Normal(0,Var[IFϕ,PX(Y)]) (18)

where the variance, , is taken with respect to the random variable (with mass ). We remind the reader that an estimator for such a variance given by a random sample is:

 ˆVar[IFϕ,PX(Y)]=1nn∑i=1(IFϕ,PX(Xi))2. (19)

Notice that this estimator is the classical variance estimator for when the mean is known (the mean of the influence function is always ).

### 2.5 Summary

The Delta-method to estimate the SE of any particular estimator of – a Hadamard-differentiable function of a parameter – can be summarized in the following steps:

1. Determine the asymptotic distribution of . This variable, , is a function of the distance between the estimator and the true value .

2. Define the function related to the scientific question of interest, and compute its Hadamard derivative. Usually can be obtained from the mass or the distribution (i.e. the CDF). Recall that in the case of real valued functions coincides with the classical derivative in the direction of as in equation (3).

3. Use the asymptotic distribution of obtained in step 1 and multiply it by the Hadamard derivative in step two. Then, estimate the variance of the distribution and compute the confidence intervals accordingly. Note that in most cases (e.g. when comes from ), the difference is approximately normal and Wald-type confidence intervals can be constructed using the variance in (19), i.e. by estimating the variance through the sample variance of the estimated IF to derive the SE of Agresti2012ApproximateProportions.

## 3 Examples

In the following sections we’ll provide several examples and R code in a set of 6 boxes of applications of the classical and functional Delta-method based on the Hadamard derivative and the IF. The code in the boxes can be accessed at https://github.com/migariane/DeltaMethodInfluenceFunction. All calculations and analytical derivations for the classical method were verified using the sympy packagemeurer2017sympy in Python 3.7 in a notebookpython which can be accessed either in the same repository or in our Google Collab: https://github.com/migariane/DeltaMethodInfluenceFunction/tree/main/CalculationsDerivationsSympy.

### 3.1 Derivation of the Standard Error for the Sample Mean based on the Influence Function (Classical Delta-method)

In this section we derive the standard error for the sample mean. We illustrate how to apply the proposed steps practically, i.e. by applying equations (3), (8) and (7). Note that the classical statistical inference for the sample mean is straightforward, but the interest here is to show how to derive the IF for the sample mean to then compute the SE applying the steps highlighted before. To derive the SE of the mean for a random sample we proceed as follows: First (Step 1), we find the distribution of the difference between the estimator and the parameter . We know from the central limit theorem that

 √nσ2⋅(^θn−θ)approx∼Normal(0,1).

In this case, corresponds to the identity function: . Then, following Step 2, we calculate the Hadamard derivative which in this case corresponds to the classical derivative in the direction of . Hence, following (9), we have:

 ∂^θ−θϕ(θ)=∂ϕ∂θ⋅(^θ−θ)=1⋅(^θ−θ)

We use Taylor’s expansion around to obtain:

 ϕ(^θ)=ϕ(θ)+∂^θ−θϕ(θ)=ϕ(θ)+∂ϕ∂θ(1nn∑i=1Xi−θ)IFϕ,θ(X)=θ+1⋅(1nn∑i=1Xi−θ)=1nn∑i=1Xi. (20)

Due to the asymptotic normality we can use (18) to proceed with Step 3:

 ϕ(^θ)−ϕ(θ)≈Normal(0,Var[% IFϕ,θ(X)])

The variance of the influence function is

 Var[IFϕ,θ(X)]=Var[1nn∑i=1Xi]=1n2n∑i=1Var[Xi]=σ2n (21)

and thus:

 ϕ(^θ)−ϕ(θ)≈Normal(0,σ2n) (22)

The variance of the influence function can be estimated via using the standard estimator of the variance, i.e. :

 ˆVar[IFϕ,θ(X)]=S2n (23)

Two-sided confidence intervals for can thus be estimated through

 ^θ±Z1−α/2√S2n

This shows how to obtain the results which are widely known from textbooks through the use of the IF.

In Box 1 we provide the code to compute the SE for a sample mean using the IF and compare the results with the Delta-method implementation from the R package MSM kavroudakis2015 and in Figure 1 we plot the IF for the sample mean.

Box 1. Derivation of the IF for the sample mean Figure 3: . Influence function for the sample mean from the example of Box 1. The IF for the sample mean is unbounded with mean zero. It reflects the deviation of every single sample observation from the empirical mean value i.e., representing the robustness of the estimator for the sample mean against outliers. Furthermore, the graphical display of the IF’s for different estimators will give a clear visual picture of the differences in the data sensitivity of the various estimators.

### 3.2 Derivation of the Standard Error for the Sample Mean seen as a Functional (Functional Delta-Method) based on the Influence Function

To develop the intuition of how to use the functional delta method we first derive the IF for the sample mean as in section 3.1 but writing the mean as a functional. Afterwards we’ll derive the IF for a more complicated situation: the quantile function.

Consider again the problem of estimating the mean. From the empirical probability mass function, we obtain the empirical mean, , as a functional of . Here we are considering . To simplify the example, assume that the are sampled from a discrete probability mass function such that there are only possible values of the . In this case following step 1 we know from the central limit theorem that for each value , the difference between the empirical probability mass function (which is an average) and its true value is asymptotically normal:

 limn→∞√n(Pn−P)∼% Normal(0,P⋅(1−P)). (24)

Where we have defined the empirical probability mass function as in (15):

 Pn(x)=Number of times x is in the samplen=1nn∑k=1I{xi}(x),

where the indicator variables are defined in section 2.4. We remark that the variance from (27) results from the variance of the indicators which are Bernoulli distributed.

We then follow step 2 to write the functional in terms of the estimator. In this case, the population mean is written as:

 ϕ(P):=μ=N∑i=1xiP(xi).

while the sample mean is given by the following expression:

 ϕ(Pn)=^μ=N∑i=1xiPn(xi)

We remark that in this case we will use the functional delta method as is a functional of the function . Hence to obtain the approximation in this case (step 3) we calculate the influence function from the definition in (16):

 IFϕ,PX(Y) (25) =N∑i=1xi⋅I{Y}(xi)−N∑i=1xi⋅P(xi) =ϕ(I{Y})−ϕ(P) =Y−ϕ(P)

Finally the variance of the influence function corresponds to the variance of :

 Var[IFϕ,PX(Y)]=Var[Y]=σ2 (26)

hence:

 limn→∞√n(ϕ(Pn)−ϕ(P))∼Normal(0,σ2). (27)

which is equivalent to the expression found by the classical method in (22).

### 3.3 Derivation of the Standard Error for the Ratio of Two Means

Consider a random sample of size of the i.i.d random variables and , which are both normally distributed, with respective means and which are estimated by their sample means and . We are interested in deriving the variance for the ratio of the two means (i.e. the ratio estimator) defined as: . In this case (following step 1) it is known that the difference is asymptotically normal.

Second (step 2) we obtain the Hadamard derivative which in this case corresponds to the gradient in the direction of

 v=^θ−θ=(¯X¯Y)−(μXμY)=(¯X−μX¯Y−μY).

 ∇ϕ=⎛⎜⎝∂ϕ∂μX∂ϕ∂μY⎞⎟⎠=⎛⎜⎝1μY−μXμ2Y⎞⎟⎠

where we assume . The Hadamard derivative (i.e. the influence function) is given by:

 IFϕ,P(X,Y)=∂vϕ(¯X,¯Y)=(1μY,−μXμ2Y)(¯X−μX¯Y−μY)=1μY(¯X−μX)−μXμ2Y(¯Y−μY).

The variance is hence given by the variance of the influence function (i.e. the Hadamard derivative):

 Var(IFϕ,P(X,Y)) =Var(1μY(¯X−μX)−μXμ2Y(¯Y−μY))=1n(1μ2YVar(X)+μ2Xμ4YVar(Y)−2μXμ3YCov(X,Y)) (28)

where we used that Var(X) under the independence assumption, Var(X) and .

For step 3, the estimated standard error is then obtained as the square root of the estimated variance and Wald-type confidence intervals (level ) follow:

 ¯X¯Y±Z1−α/2√ˆVar(IFϕ,P(X,Y)),

where the estimator for the variance is:

Box 2. Derivation of the IF for the ratio of two sample means

### 3.4 Derivation of the Standard Error for the Ratio of Two Probabilities (Risk Ratio)

In medical statistics, we are often interested in marginal and conditional (sometimes causal) risk ratios. Consider Table 1, where we are interested in the mortality risk by cancer status. Let denote the probability of being alive given that the patient has cancer and