Instrumental variables regression

06/15/2018 ∙ by Andzhey Koziuk, et al. ∙ Weierstrass Institute 0

IV regression in the context of a re-sampling is considered in the work. Comparatively, the contribution in the development is a structural identification in the IV model. The work also contains a multiplier-bootstrap justification.



There are no comments yet.


page 1

page 2

page 3

page 4

Code Repositories


Parameter Estimation and Inference in a Cointegrating Regression

view repo


K-class methods for instrumental variables regressions including OLS, two-stage least squares, LIML, Fuller, and generalized K-class methods.

view repo


Instrumental variables regression in matlab

view repo


:exclamation: This is a read-only mirror of the CRAN R package repository. cointReg — Parameter Estimation and Inference in a Cointegrating Regression. Homepage: Report bugs for this package:

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In the work a non-parametric regression with instrumental variables is considered. A general framework is introduced and identification of a target of inference is discussed. Furthermore, multiplier bootstrap in a general form is considered and justified. Moreover, the procedure is used to test a hypothesis on a target function.

2 Identification in non-parametric IV regression

2.1 iid model

Introduce independent identically distributed observations


from a sample set

on a probability space

. Let

be a compact and random variables are respectively coming from

, and .

Assume a system of non-linear equations


A parametric relaxation of the system introduces a non-parametric bias. For an orthonormal functional basis define


such that

Then a substitution transforms (2.2) and gives


with a bias


Particular case of (2.4) under parametric assumption () and with a single instrument () is a popular choice of a model with instrumental variables ([1],[8]). The system is rewritten as


with the definition .

Lemma 2.1.

The statements are equivalent.

  1. There exists and unique solution to (2.6).

  2. such that is a solution of (2.6).


A solution to (2.6) can be represented as

for a fixed , and such that and is a rotation of an orthogonal to linear subspace in

. If the vector

is unique then must be zero otherwise there exist infinitely many distinct solutions ( ). On the other hand for the vector is unique. ∎

The second statement helps to obtain exact form of a solution to (2.6)


Hence, the correlation of instrumental variable with features (note ) identifies (up to a scaling) making the choice of the variable a crucial task. An empirical relaxation to (2.6) in the literature (see [1],[8]) closely resembles the following form


for , , , and

or alternatively (lemma [2.1])

corresponding to the latter system up to a notational convention

The model was theoretically and numerically investigated in a number of papers (see [1],[8]) and in the article (see ’Numerical’) is used as a numerical benchmark.

The lemma [2.1] is a special case example of a more general statement on identification in (2.4).

Lemma 2.2.

The statements are equivalent.

  1. There exists and unique solution to the system (2.4).

  2. A solution to (2.4) is given by where is a solution to an optimization problem


    with .


The model (2.4) turns into


A solution to (2.10) is an intersection of a

-sphere and a hyperplane

. If it is unique the hyperplane is a tangent linear subspace to the -sphere and the optimization procedure (2.9) is solved by definition of the intersection point. Conversely, if there exist a solution to the optimization problem then it is guaranteed to be unique as a solution to a convex problem with linear constraints and by definition satisfy (2.4).

2.2 non-iid model



on a probability space . Let be a compact, random variables from , , and let the observations identify uniquely a solution to the system


in the particular case with

Identification in non iid case complicates the fact that is normally larger than leading to possibly different identifiability scenarios. Distinguish them based on a rank of a matrix


Note that the rank and, thus, a solution to [2.12] depends on a sample size ( is assumed to be fixed). However, there is no prior knowledge of what corresponds to the identifiable function . Therefore, the discussion requires an agreement on the target of inference.

A way to reconcile uniqueness with the observed dependence is to require the function and to be independent from . The model (2.12) makes sense if it points consistently at a single function independently from a number of observations. Define accordingly a target function.

Definition 2.3.

Assume s.t. the rank , then call a function a target if it solves (2.12) .

Remark 2.1.

In the case of a bias between a solution and the target has to be considered. However, in the subsequent text it is implicitly assumed that a sample size .

Based on the convention [2.3] introduce a classification:

  1. Complete model: s.t. the rank .

  2. Incomplete model: s.t the rank .

Identification in the ’incomplete’ model is equivalent to the iid case with the notational change for the number of instruments and respective change of equations with instruments to the equations from (2.12). Otherwise ’completeness’ of a model allows for a direct inversion of (2.12). Generally a complete model is given without the restriction


In this case a natural objective function for an inference is a quasi log-likelihood


again with


3 Testing a linear hypothesis: bootstrap log-likelihood ratio test

Introduce an empirical relaxation of the biased (2.4)


with centered errors . Courtesy of the lemma [2.2], a natural objective function is a penalized quasi log-likelihood



Maximum likelihood estimator (MLE) and its target are given

For a fixed projector introduce a linear hypothesis and define a log-likelihood ratio test


The test weakly converges

to chi-square distribution (theorem


) and it is convenient to define a quantile as

It implies that and that weakly depends on a dimension s.t. , .

For a set of re-sampling multipliers

define bootstrap conditional on the original data

and corresponding bootstrap MLE (bMLE) and its target

A centered hypothesis and a respective test are defined accordingly


And analogously . The theorem [4.4] enables the same convergence in growing dimension .

Under parametric assumption - the non-parametric bias is zero - the bootstrap log-likelihood test is empirically attainable and the quantile is computed explicitly. On the other hand an unattainable quantile calibrates . Between the two exists a direct correspondence. In the section [LABEL:GCA] it is demonstrated that can be used instead of .

Multiplier bootstrap procdeure: (3.5)
  • Sample computing satisfying

  • Test against using the inequalities

The idea is numerically validated in the section ’Numerical’. Its theoretical justification follows immediately.

4 Finite sample theory

In a most general case neither an objective estimates consistently nor a model (2.1) is justified as a suitable for arbitrary . Moreover, a regression with instrumental variables adds an additional concern, chosen instruments can be weakly identified (see section [7.1]) and an inference in the problem might involve a separate testing on weakness complicating an original problem.

Finite sample approach (Spokoiny 2012 [9]) is an option to merry a structure of with a properties of a probability space (2.1) and automatically account for an unknown nature of instruments in a regression problem.

Finite sample theory: (4.1)
  • Theorem 4.1.

    Suppose conditions (4.1) are fulfilled. Define a score vector

    then it holds with a universal constant

    at least with the probability .

    Bootstrap analogue of the Wilks expansion also follows. It was claimed in theorem B.4, section B.2 in Spokoiny, Zhilova 2015 [11].

    Theorem 4.2.

    Suppose conditions (4.1) are fulfilled. Define a bootstrap score vector

    then it holds with a universal constant

    at least with the probability .

    Moreover, the log-likelihood statistic follows the same local approximation in the context of hypothesis testing and the satisfies (see appendix - section (8.5)).

    Theorem 4.3.

    Assume conditions (4.1) are satisfied then with a universal constant

    with probability . The score vector is defined respectively

    and Fisher information matrix

    Similar statement can be proven in the bootstrap world.

    Theorem 4.4.

    Assume conditions (4.1) are fulfilled then with probability holds

    with a universal constant , where a score vector is given

    The theorem is effectively the same for as the re-sampling procedure replicates sufficient for the statement assumptions of a quasi log-likelihood (shown in section 8.3 Appendix).

    4.1 Small Modelling Bias

    In view of the re-sampling justification a separate discussion deserves a small modeling bias from Spokoiny, Zhilova 2015 [11]. The condition appears from the general way to prove the re-sampling procedure. Namely, for a small error term it is claimed

    with the matrices

    where the term is assumed to be of the error order essentially meaning that the deterministic bias is small. However, the assumption

    appears in the current development only in the form of the condition ’Target’ in (4.1). The substitution is possible due to the next lemma.

    Theorem 4.5.

    Assume that the condition ’Target’ holds, then .


    By definition of a target of estimation

    The condition ’Target’ implies that . Meaning, that any particular choice of the term with the index is also zero - . Thus, and the statement follows. ∎

    5 Gaussian comparison and approximation

    There are two results that constitute a basis for the re-sampling (3.5). The first - Gaussian comparison - is taken from Götze, F. and Naumov, A. and Spokoiny, V. and Ulyanov, V. [4] and adapted to the needs and notations in the work.

    Theorem 5.1.

    Assume centered Gaussian vectors and then it holds

    with a universal constant , where stands for the operator norm of a matrix.

    The second - Gaussian approximation - has been developed in the appendix (section [8.7]).

    Introduce the notations for the vectors

    such that

    1. and are independent and sub-Gaussian

    2. .

    Then a simplified version of the theorem [8.27] from the appendix holds.

    Theorem 5.2.

    Assume the framework above, then

    with the universal constant .

    Finally, the critical value and the empirical are glued together by a matrix concentration inequalities from the section (8.6).

    The essence of the re-sampling is to translate the closeness of and into the closeness of the matrices -with the help of the Wilks expansion (theorems [4.3,4.4]) and Gaussian comparison result - and approximate unknown by the respective Gaussian counterparts. It all amounts to the central theorem.

    Theorem 5.3.

    The parametric model (

    2.4) in the introduction - - under the assumption (4.1) enables

    with a dominating probability and universal constants .

    Remark 5.1.

    Note that the critical value depends on experimental data at hand and is fixed when the expectation is taken with respect to the data generating statistics.

    6 Numerical: conditional and bootstrap log-likelihood ratio tests

    Calibrate BLR test on a model from Andrews, Moreira and Stock [1]. In the paper the authors proposed conditional likelihood ratio test (CLR - ) used here as a benchmark. The simulated model reads as


    where , , and with a matrix , and (see section 1). And the hypothesis

    on a value of a structural parameter . For the hypothesis Moreira [8] and later Andrews, Moreira and Stock [1] construct a CLR test based on the two vectors


    with the notations , and . and are independent and together present sufficient statistics for the model (6.1) with only depending on instruments’ identification, thus conditioning on and CLR test. Log-likelihood ratio statistics in (6.1) is represented as (see Moreira 2003 [8]) -

    Additionally Lagrange multiplier and Anderson-Rubin tests are given by

    The latter two are known to perform acceptably except for weakly identified case.
    First, correctly specified model is generated for the sample of and with weak instruments (). In this case powers of , and true tests are drawn on the figure (8.1). To be consistent is also compared to and . The comparison is given on the figure (8.2) and the data in the case is aggregated in the table (1).
    Moreover an important step is to check how robust to a misspecification of the model. Three special examples are simulated:

    1. ,

    2. ,

    3. .

    Experiment (1) can be found on the figures (8.3), (8.4) and in the table (2). Numerical study of the experiment (2) with misspecified heteroskedastic error is given on the figure (8.5) and collected in the table (3). The last experiment is shown on the figure (8.6) and in the table (4).

    Remark 6.1.

    All the figures and tables are collected in the end of the work.

    7 Strength of instrumental variables

    On practice one wants to distinguish instruments based on its strength. For the clarity of exposition the section considers a simplified log-likelihood (2.15) identifying complete model with the Fisher information matrix

    Weak instrumental variables introduce an unavoidable lower bound on estimation error (lemma [7.1], see the proof in the appendix (8.1)).

    Lemma 7.1.

    Let conditions (4.1) hold then

    with a factor