1 Introduction
A fundamental problem in inferential statistics is to approximate the distribution of an estimator constructed from the sample (
i.e. a statistic). The standard error (SE) of an estimator characterises its variability.Boos2013 Oftentimes, it is not directly the estimator which is of interest but a function of it. In this case, the DeltaMethod can approximate the standard error (with known asymptotic properties) using Taylor expansions because a smooth function of an asymptotically normal estimator is also asymptotically normal. Vaart1998 In a more general setting, this technique is also useful for approximating the variance of some functionals. For instance, in epidemiology the Deltamethod is used to compute the SE of functions such as the risk difference (RD) and the risk ratio (RR),Agresti2010which are all functions of the risk (a parameter representing the probability of the outcome).
Armitage2005, Boos2013 Alternatively to the Deltamethod to approximate the distribution of the SE Boos2013, MillarMaximumADMB for large samples, we can use other computational methods such as the bootstrap.Efron1993, efron1982 In the course of their research, it may be necessary for applied statisticians to assess whether a large sample approximation of the distribution of a statistic is appropriate, how to derive the approximation, and how to use it for inference in applications. The distribution of the statistic must be approximated to directly estimate its variance and hence the SE because the number and type of inference problems for which it can be analytically determined is narrow.In this tutorial we introduce the use of the classical and functional Deltamethod, the Influence Function (IF), and their relationship from a practical perspective. Hampel introduced the concept of the IF in 1974.hampel1974 He highlighted that most estimators can actually be viewed as functionals constructed from the distribution functions. The IF was further developed in the context of robust statistics but is now used in many fields, including causal inference.hampel1974 The IF is often used to approximate the SE of a plugin asymptotically linear estimator.Tsiatis:2007aa Mathematically, the IF is derived using the second term of the first order Taylor expansion used to empirically approximate the distribution of the plugin estimator.Boos2013 It can be easily derived for most common estimators and it appears in the formulas for asymptotic variances of asymptotically normally distributed estimators. The IF is equivalent to the normalized score functions of maximum likelihood estimators.hampel1974
Furthermore, the tutorial includes boxes with R code (R Foundation for Statistical Computing, Vienna, Austria)R2020 to support the implementation of the methods and to allow readers to learn by doing. The code can be accessed at https://github.com/migariane/DeltaMethodInfluenceFunction
. In section 1, we introduce the importance of the Deltamethod in statistics and justify the need of a tutorial for applied statisticians. In section 2, we review the theory of the classical and functional Deltamethods and the influence function (IF). In section 3, we provide multiple worked examples and code for applications of the classical and functional Deltamethod, and the IF. The first examples involve deriving the SE for the sample mean of a variable, the ratio of two means of two independent variables, and the ratio of two sample proportions (i.e. the risk ratio). Also, we provide a example where the required conditions for the for the Deltamethod do not hold. We then show how to use the functional Deltamethod based on the IF to derive the SE for the quantile function and the correlation coefficient. Our final example is motivated by an application in cancer epidemiology and involves a parameter of interest that is a combination of coefficients in a logistic regression model. Finally, in section 4, we provide a concise conclusion where we mention additional interconnected methods with the Deltamethod and the IF such as Mestimation and the Huber Sandwich estimator.
2 Theory: The Classical Deltamethod
Let be a parameter. For this tutorial, we are interested in working with an estimand that can be written as a function of (i.e., ) rather than itself. For instance, we may not be interested in the probability of having a particular disease, but in the ratio of two probabilities , where the first probability () is of developing the disease under treatment, the second () is of developing the disease without treatment. The estimand represents the relative risk. Define the estimator of to be , the ratio of the estimators of the respective probabilities. The question is: if we know the variances of and , how do we obtain the variance of ? The Deltamethod is one approach to answer this.
Let be an estimator of from a random sample where the s are independent and identically distributed (i.i.d) with a distribution defined with a parameter (i.e. ). Examples of parameters include the rate of an exponential variable (), the mean and variance of a normal distribution or the probability of a specific category under a multinomial model with different categories: with .
Any (measurable) function of the random sample is called a statistic.Casella1998TheoryEstimation In particular, any estimator of is a function of the random sample making it a statistic. For example, if , the mean, is a function of the s. To emphasize the dependency of the estimator, , on the sample size, , we write: . Thus would denote the estimator under a random sample of size and denotes the estimator under a random sample “of infinite size”. Any (measurable) function of the estimator,
also depends upon the random sample and hence it is a statistic too. Due to the dependency upon the random sample, any statistic by itself is a random variable. We can thus characterise the estimator in terms of its distribution. As an example, if the
are i.i.d. then also has a normal distribution with parameters . Furthermore, the statistic has a distribution.More often than not, the distribution of a statistic cannot be estimated directly and we rely on the asymptotic (large sample) properties of where approaches
. A most powerful and wellknown result is the central limit theorem which states under reasonable regularity conditions (i.i.d. variables with mean
)Billingsley1961StatisticalChains that if then, for large ,(1) 
which is the property that allows us to construct the Waldtype asymptotic confidence intervals:
.Agresti2012ApproximateProportionsHowever, when the function – a function of one or more estimators with largesample normality with known variance – is not linear (e.g. the ratio of two proportions) and there is not a closed functional form to derive the SE, we use the Deltamethod. The classical Deltamethod states that under certain regularity conditions for the function , the statistic , and the i.i.d. random variables s, the distribution of can be approximated via a normal distribution with a variance proportional to ’s rate of change at , the derivative . In the one dimensional case of and , if is asymptotically normal, this theorem states that, for large (Appendix: Deltamethod proof):
This provides the researcher with confidence intervals based on asymptotic normality:
To better understand the Deltamethod we need to discuss four concepts. First, we need to discuss how derivatives approximate functions such as via a Taylor expansion. Second, we describe convergence in distribution which is what allows us to characterise the asymptotic properties of the estimator. Third, we present the central limit theorem, which is at the core of the Deltamethod. Finally we’ll generalize these results to the functional Deltamethod using influence functions.
2.1 Taylor’s Approximation
For Taylor’s approximation to work we need to have a function that is differentiable at . Following the classical definition of differentiability,Courant1988DifferentialCalculus a real valued function with domain , a subset of , () is differentiable at and has derivative if the following limit exists:
Intuitively, this definition states that one can estimate a unique tangent line to with slope at by calculating the values of the function at and and reducing the size of (see Figure 0(a)).
This definition can be extended to the multivariate case via directional derivatives (Gâteaux derivatives).Gateaux1919FonctionsIndependantes In multiple dimensions, there is no one unique tangent line that can be generated (see Figure 0(b)); hence, in addition to the function
, one must also specify the direction of the vector
in which the tangent line will be calculated. This results in , the derivative of at in the direction :^{1}^{1}1 You might notice a slight change in notation where the limit is stated as instead of the classical . The notation implies that the limit is taken with decreasing towards zero in order to distinguish the direction from .(2) 
As an example, Figure 0(b) is a graph of the function with two different vectors and . Each vector results in a different directional derivative, and , respectively, corresponding to the slopes of the tangent lines in the directions of and respectively.
It turns out that for the Deltamethod to be generalized to functionals (i.e. functions of functions) having a Gâteaux derivative is not enough. We require not only that the directional derivative exists but also that it exists and coincides with the one obtained for any sequence of directions that converge to (i.e. ). This is called (equivalently) the compact derivative or the HadamardBeutner2016FunctionalFunctionals (onesided directional) ^{Zajicek2014HadamardDifferentiability} derivative of at in the direction (as long as it is a linear function for any ) and is usually denoted as:
(3) 
This concept is illustrated in Figure 0(c) where the specific sequence converges to .
An equivalent definition of the Hadamard (onesided directional) derivative which is useful for calculations involves setting for some function and with which allows us to rewrite (3) as:
(4) 
In the particular case of a constant sequence such that the expression reduces to a Gâteaux derivative which can oftentimes be computed as a classical derivative. We discuss a particular case of this derivative, the influence function, IF, (also known as influence curve) in Section 2.4. It is interpreted as the rate of change of our functional in the direction of a new observation, .
Recall that the derivative, , represents the slope of the line tangent to the function. Intuitively, if is close to , the tangent line at should provide an adequate approximation of Figure 0(d)). This is stated in the Taylor first order approximation of around as follows:
(5) 
with and the sign is interpreted as approximately equal. This can be rewritten as the more classical approach:
(6) 
Readers might be familiar with the theorem in the classical notation of univariate calculus which states the approximation:
(7) 
In this case the Hadamard derivative coincides with the classical one multiplied by :
The justification for this connection is given by Fréchet’s derivative which represents the slope of the tangent plane. Intuitively, if the Hadamard (onesided directional) derivatives exist for all directions we can talk about the tangent plane to at . The tangent plane is “made up” of all the individual (infinite) tangent lines. The slope of the tangent plane is the Fréchet derivative .Zajicek2014HadamardDifferentiability, ciarlet2013linear. For univariate functions in the Fréchet derivative is ; for functions of a multivariate returning one value, , this derivative is called the gradient and corresponds to the derivative of the function by each entry:
For multivariate functions, , the Fréchet derivative is an matrix called the Jacobian (matrix):
(8) 
To obtain the Hadamard (onesided directional) derivative from the Fréchet derivatives, either or , one needs to apply the derivative operator to the direction vector . This operation can be seen as “projecting” the tangent plane into the direction of hence resulting in the directional derivative:
(9) 
Thus the notation in (6) which we’ll use for the remainder of the paper includes not only the functional scenario but also the classical cases of functions in and respectively which can be obtained as the usual (classical) Fréchet derivatives projected onto .
Finally, as a side note, we remark that it is possible to improve the approximation via higher order Taylor’s expansion around (see 0(d))Courant1988DifferentialCalculus, ren2001second:
where denotes the th derivative of defined as the derivative of the th derivative. Readers interested in pursuing higher order Hadamard derivatives can consult Ren and Sen (2001) and Tung and Bao (2022) REN2001187, tung2022higher.
2.2 Convergence in distribution
For any random variable,
, the cumulative distribution function (CDF), also commonly referred to as the distribution function, quantifies the probability that
is less than or equal to a real number . Thus ’s (i.e., the CDF) is given by:where the sign is interpreted pointwise if is a random vector of size (i.e. implies , , etc. for the vector ). The distribution function completely determines all the probabilities associated with a random variable as, for example, can be estimated as for any .
Given a statistic that depends upon the sample size, , the statistic’s distribution function also depends on . Let denote the distribution of and be the distribution of a random variable, . We say that converges in distribution to the random variable if the CDF of and the distribution of coincide at infinity:
We remark that convergence in distribution does not imply that the random variables and are the same; it solely entails that the probabilistic model of and are identical (e.g. both are ) They are different random variables with a common distribution. Convergence in distribution is usually interpreted as an approximation stating that for large , the distribution of is approximately (written ).
One of the most important results concerning convergence in distribution is the Central Limit Theorem (CLT). The CLT applies to any random sample with and finite variance: . It states that the error of the sample mean, , times the square root of the sample size is normally distributed:
(10) 
where and stands for convergence in distribution as . Figure 2 illustrates the distribution of for different sample sizes, , when the s are distributed.
2.3 Two sides of the same coin: the classical and functional Deltamethod
The Deltamethod uses both the Taylor approximation and the concept of convergence in distribution. It states that if for some series of numbers that depend on the sample size, with , we have that converges in distribution to then the weighted difference, , converges to the distribution of the derivative of in the direction of :
as long as is a function that can be approximated via its Taylor Series around . Examples of numbers include as in (10). The idea behind the Deltamethod relates to the fact that we can transform (7) into:
(11) 
where the random quantity, converges in distribution to and thus converges (approximately) to (the derivative in the direction of ).
In practical terms this implies that the variance of can be approximated by an scaling of the variance of , i.e.:
(12) 
The same idea can be extended when the parameter of interest, , is not a real number (or vector of numbers) but a function. In this case, is a functional (i.e. a function of functions) and the corresponding method is oftentimes called the functional Deltamethod. The result is that if with now denoting a random function, then:
(13) 
where denotes the Hadamard derivative of as in (6). We remark that the theorem of (13) is general in the sense that it works for classical derivatives (), gradients and jacobians (), and Hadamard derivatives () all following the notation from (3).
The reader is invited to consult the supplementary material for the classical proof of the DeltaMethod as well as the more general proof of the functional one.
2.4 The influence function
It is common to represent scientific questions by estimands (i.e., a quantity we are interested in estimating from our data). For example, suppose we are interested in a random variable which follows a (possibly unknown) discrete distribution . The variable might be a binary indicator for disease status, for example, in a particular population. If we are interested in the probability of having the given disease, our estimand is . In this case, we have , the estimand is equivalent to the expectation of , i.e.
. The estimand can thus be seen as the parameter of the Bernoulli distribution. However a second interpretation is of importance: the estimand can also be seen as a
functional as it takes a function – specifically, the probability mass function – as an input and applies a function to it: the expectation. For taking discrete values, we have(14) 
where denotes the support (i.e. possible values) of . In the binary case, . If is continuous, an estimand defined as the expectation of
is a functional of the probability density function
, such that .It is important to highlight that the estimand , which represents our scientific question, relates to a functional of the mass . Following the previous notation, we have that . If we have a random sample, , we can compute the empirical probability mass function (ePMF):
(15) 
where the indicator function of a set is defined as
The ePMF can be used to estimate , which gives us . This is called a “plugin” estimator, as we plug the estimator of (i.e. of ) into the function . In the above example, this implies calculating:
which, for an observed dataset is equivalent to taking its mean^{Vaart1998}:
where the last equality follows from the fact that only when and in that case the product is (we exchange with by using that in this scenario). The cases where don’t appear in the sum as results in adding to the sum.
The functional notation of allows us to study the robustness of our estimations using Hadamard derivatives. In particular, if the data are distributed according to the mass we can study the rate of change from distribution in the direction of another distribution, , by analyzing the derivative:
where we have substituted for all and in (4).
Intuitively this quantifies the rate of change in if the model deviates a little from towards (for example in the case of noisy data). Choosing as the indicator of the set that only contains the value (2.4) we can study the rate of change of in the direction of an observation, . In particular stands for the model that assigns probability to taking the value . Hence the derivative analyzes how an observation, , influences our estimation of .
The Hadamard derivative, in this special case, is called the influence function (IF) of the functional under model at and is denoted:
(16) 
The IF stands for the Hadamard derivative in a special case, thus the Taylor expansion in (5) can be rewritten as:
(17) 
Note that the Hadamard derivative establishes the change of value of a parameter (written as a functional) resultant from small perturbations of the estimator in the direction of
. Plotting the IF provides a tool to discover outliers and is informative about the robustness of the estimator
. Finally, if the difference is (asymptotically) normally distributed, the Deltamethod implies that:(18) 
where the variance, , is taken with respect to the random variable (with mass ). We remind the reader that an estimator for such a variance given by a random sample is:
(19) 
Notice that this estimator is the classical variance estimator for when the mean is known (the mean of the influence function is always ).
2.5 Summary
The Deltamethod to estimate the SE of any particular estimator of – a Hadamarddifferentiable function of a parameter – can be summarized in the following steps:

Determine the asymptotic distribution of . This variable, , is a function of the distance between the estimator and the true value .

Define the function related to the scientific question of interest, and compute its Hadamard derivative. Usually can be obtained from the mass or the distribution (i.e. the CDF). Recall that in the case of real valued functions coincides with the classical derivative in the direction of as in equation (3).

Use the asymptotic distribution of obtained in step 1 and multiply it by the Hadamard derivative in step two. Then, estimate the variance of the distribution and compute the confidence intervals accordingly. Note that in most cases (e.g. when comes from ), the difference is approximately normal and Waldtype confidence intervals can be constructed using the variance in (19), i.e. by estimating the variance through the sample variance of the estimated IF to derive the SE of Agresti2012ApproximateProportions.
3 Examples
In the following sections we’ll provide several examples and R code in a set of 6 boxes of applications of the classical and functional Deltamethod based on the Hadamard derivative and the IF. The code in the boxes can be accessed at https://github.com/migariane/DeltaMethodInfluenceFunction. All calculations and analytical derivations for the classical method were verified using the sympy package^{meurer2017sympy} in Python 3.7 in a notebook^{python} which can be accessed either in the same repository or in our Google Collab: https://github.com/migariane/DeltaMethodInfluenceFunction/tree/main/CalculationsDerivationsSympy.
3.1 Derivation of the Standard Error for the Sample Mean based on the Influence Function (Classical Deltamethod)
In this section we derive the standard error for the sample mean. We illustrate how to apply the proposed steps practically, i.e. by applying equations (3), (8) and (7). Note that the classical statistical inference for the sample mean is straightforward, but the interest here is to show how to derive the IF for the sample mean to then compute the SE applying the steps highlighted before. To derive the SE of the mean for a random sample we proceed as follows: First (Step 1), we find the distribution of the difference between the estimator and the parameter . We know from the central limit theorem that
In this case, corresponds to the identity function: . Then, following Step 2, we calculate the Hadamard derivative which in this case corresponds to the classical derivative in the direction of . Hence, following (9), we have:
We use Taylor’s expansion around to obtain:
(20) 
Due to the asymptotic normality we can use (18) to proceed with Step 3:
The variance of the influence function is
(21) 
and thus:
(22) 
The variance of the influence function can be estimated via using the standard estimator of the variance, i.e. :
(23) 
Twosided confidence intervals for can thus be estimated through
This shows how to obtain the results which are widely known from textbooks through the use of the IF.
In Box 1 we provide the code to compute the SE for a sample mean using the IF and compare the results with the Deltamethod implementation from the R package MSM kavroudakis2015 and in Figure 1 we plot the IF for the sample mean.
Box 1. Derivation of the IF for the sample mean
3.2 Derivation of the Standard Error for the Sample Mean seen as a Functional (Functional DeltaMethod) based on the Influence Function
To develop the intuition of how to use the functional delta method we first derive the IF for the sample mean as in section 3.1 but writing the mean as a functional. Afterwards we’ll derive the IF for a more complicated situation: the quantile function.
Consider again the problem of estimating the mean. From the empirical probability mass function, we obtain the empirical mean, , as a functional of . Here we are considering . To simplify the example, assume that the are sampled from a discrete probability mass function such that there are only possible values of the . In this case following step 1 we know from the central limit theorem that for each value , the difference between the empirical probability mass function (which is an average) and its true value is asymptotically normal:
(24) 
Where we have defined the empirical probability mass function as in (15):
where the indicator variables are defined in section 2.4. We remark that the variance from (27) results from the variance of the indicators which are Bernoulli distributed.
We then follow step 2 to write the functional in terms of the estimator. In this case, the population mean is written as:
while the sample mean is given by the following expression:
We remark that in this case we will use the functional delta method as is a functional of the function . Hence to obtain the approximation in this case (step 3) we calculate the influence function from the definition in (16):
(25)  
Finally the variance of the influence function corresponds to the variance of :
(26) 
hence:
(27) 
which is equivalent to the expression found by the classical method in (22).
3.3 Derivation of the Standard Error for the Ratio of Two Means
Consider a random sample of size of the i.i.d random variables and , which are both normally distributed, with respective means and which are estimated by their sample means and . We are interested in deriving the variance for the ratio of the two means (i.e. the ratio estimator) defined as: . In this case (following step 1) it is known that the difference is asymptotically normal.
Second (step 2) we obtain the Hadamard derivative which in this case corresponds to the gradient in the direction of
The gradient is given by:
where we assume . The Hadamard derivative (i.e. the influence function) is given by:
The variance is hence given by the variance of the influence function (i.e. the Hadamard derivative):
(28) 
where we used that Var(X) under the independence assumption, Var(X) and .
For step 3, the estimated standard error is then obtained as the square root of the estimated variance and Waldtype confidence intervals (level ) follow:
where the estimator for the variance is:
Box 2. Derivation of the IF for the ratio of two sample means
3.4 Derivation of the Standard Error for the Ratio of Two Probabilities (Risk Ratio)
In medical statistics, we are often interested in marginal and conditional (sometimes causal) risk ratios. Consider Table 1, where we are interested in the mortality risk by cancer status. Let denote the probability of being alive given that the patient has cancer and