 # Instrumental variables regression

IV regression in the context of a re-sampling is considered in the work. Comparatively, the contribution in the development is a structural identification in the IV model. The work also contains a multiplier-bootstrap justification.

## Code Repositories

### cointReg

Parameter Estimation and Inference in a Cointegrating Regression

### rkclass

K-class methods for instrumental variables regressions including OLS, two-stage least squares, LIML, Fuller, and generalized K-class methods.

### ivreg

Instrumental variables regression in matlab

### cointReg

:exclamation: This is a read-only mirror of the CRAN R package repository. cointReg — Parameter Estimation and Inference in a Cointegrating Regression. Homepage: https://github.com/aschersleben/cointReg Report bugs for this package: https://github.com/aschersleben/cointReg/issues

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In the work a non-parametric regression with instrumental variables is considered. A general framework is introduced and identification of a target of inference is discussed. Furthermore, multiplier bootstrap in a general form is considered and justified. Moreover, the procedure is used to test a hypothesis on a target function.

## 2 Identification in non-parametric IV regression

### 2.1 iid model

Introduce independent identically distributed observations

 (Yi,Xi,{Wki}k=¯¯¯¯¯¯¯¯1,K)i=¯¯¯¯¯¯1,n∈Ω (2.1)

from a sample set

 Ωdef=IR⊗Q⊗IR⊗K

on a probability space

. Let

be a compact and random variables are respectively coming from

, and .

Assume a system of non-linear equations

 ⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩IEW11(Y1−f(X1))=0,IEW21(Y1−f(X1))=0,...IEWK1(Y1−f(X1))=0,∫Qf2(x)dx=const. (2.2)

A parametric relaxation of the system introduces a non-parametric bias. For an orthonormal functional basis define

 ˆf(x)def=J∑j=1ψj(x)θ∗jdef=Ψ(x)Tθ∗ (2.3)

such that

 θ∗jdef=∫Qf(x)ψj(x)dx.

Then a substitution transforms (2.2) and gives

 ⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩IEW11(Y1−ˆf(X1))=δ1,IEW21(Y1−ˆf(X1))=δ2,...IEWK1(Y1−ˆf(X1))=δK,∫Qˆf2(x)dx=const, (2.4)

with a bias

 ∀k>0δkdef=IEWk1(f(X1)−ˆf(X1)). (2.5)

Particular case of (2.4) under parametric assumption () and with a single instrument () is a popular choice of a model with instrumental variables (,). The system is rewritten as

 (2.6)

with the definition .

###### Lemma 2.1.

The statements are equivalent.

1. There exists and unique solution to (2.6).

2. such that is a solution of (2.6).

###### Proof.

A solution to (2.6) can be represented as

 θ∗=αQ⊥η∗⊥+βη∗1

for a fixed , and such that and is a rotation of an orthogonal to linear subspace in

. If the vector

is unique then must be zero otherwise there exist infinitely many distinct solutions ( ). On the other hand for the vector is unique. ∎

The second statement helps to obtain exact form of a solution to (2.6)

 ˆf(x)=βJ∑j=1ψj(x)η∗1j=IEW11Y1J∑j=1(IEW11ψj(X1))2J∑j=1ψj(x)IEW11ψj(X1). (2.7)

Hence, the correlation of instrumental variable with features (note ) identifies (up to a scaling) making the choice of the variable a crucial task. An empirical relaxation to (2.6) in the literature (see ,) closely resembles the following form

 {Y1=ZTπβ+ε1,Y2=ZTπ+ε2, (2.8)

for , , , and

 (ε1,iε2,i)∼N(0,(λ1ρρλ2))

or alternatively (lemma [2.1])

corresponding to the latter system up to a notational convention

 W11,iY1,idef=Y1,i,∥W11,iΨ(X1,i)∥2def=Y2,i,W11,iψj(X1,i)def=Zjiandθdef=βπ.

The model was theoretically and numerically investigated in a number of papers (see ,) and in the article (see ’Numerical’) is used as a numerical benchmark.

The lemma [2.1] is a special case example of a more general statement on identification in (2.4).

###### Lemma 2.2.

The statements are equivalent.

1. There exists and unique solution to the system (2.4).

2. A solution to (2.4) is given by where is a solution to an optimization problem

 θid=\operatornamewithlimitsargminx∈IRJ∥x∥2s.t.⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩η∗T1x=IEW11Y1−δ1,η∗T2x=IEW21Y1−δ2,...,η∗TKx=IEWK1Y1−δK (2.9)

with .

###### Proof.

The model (2.4) turns into

 (2.10)

A solution to (2.10) is an intersection of a

-sphere and a hyperplane

. If it is unique the hyperplane is a tangent linear subspace to the -sphere and the optimization procedure (2.9) is solved by definition of the intersection point. Conversely, if there exist a solution to the optimization problem then it is guaranteed to be unique as a solution to a convex problem with linear constraints and by definition satisfy (2.4).

### 2.2 non-iid model

Redefine

 (Yi,Xi,{Wki}k=¯¯¯¯¯¯¯¯1,K)i=¯¯¯¯¯¯1,n∈Ω=IR⊗Q⊗IR⊗K (2.11)

on a probability space . Let be a compact, random variables from , , and let the observations identify uniquely a solution to the system

 ∀i=¯¯¯¯¯¯¯¯1,n⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩IEW1i(Yi−ˆf(Xi))=δ1,IEW2i(Yi−ˆf(Xi))=δ2,...IEWKi(Yi−ˆf(Xi))=δK,∫Qˆf2(x)dx=CI.⇒∀i=¯¯¯¯¯¯¯¯1,n⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩η∗1,iη∗T1,iθ=η∗1,iZikη∗2,iη∗T2,iθ=η∗2,iZik...,η∗K,iη∗TK,iθ=η∗K,iZikJ∑j=1θ2j=CI. (2.12)

in the particular case with

 η∗Tk,idef=(IEWkiψ1(Xi),IEWkiψ2(Xi),...,IEWkiψJ(Xi))andZikdef=WkiYi−δk.

Identification in non iid case complicates the fact that is normally larger than leading to possibly different identifiability scenarios. Distinguish them based on a rank of a matrix

 rdef=rank(n∑i=1K∑k=1η∗k,iη∗Tk,i)=rank(n∑i=1K∑k=1IEWkiΨ(Xi)IEΨT(Xi)Wki). (2.13)

Note that the rank and, thus, a solution to [2.12] depends on a sample size ( is assumed to be fixed). However, there is no prior knowledge of what corresponds to the identifiable function . Therefore, the discussion requires an agreement on the target of inference.

A way to reconcile uniqueness with the observed dependence is to require the function and to be independent from . The model (2.12) makes sense if it points consistently at a single function independently from a number of observations. Define accordingly a target function.

###### Definition 2.3.

Assume s.t. the rank , then call a function a target if it solves (2.12) .

###### Remark 2.1.

In the case of a bias between a solution and the target has to be considered. However, in the subsequent text it is implicitly assumed that a sample size .

Based on the convention [2.3] introduce a classification:

1. Complete model: s.t. the rank .

2. Incomplete model: s.t the rank .

Identification in the ’incomplete’ model is equivalent to the iid case with the notational change for the number of instruments and respective change of equations with instruments to the equations from (2.12). Otherwise ’completeness’ of a model allows for a direct inversion of (2.12). Generally a complete model is given without the restriction

 ∀n>N:∀i=¯¯¯¯¯¯¯¯1,n⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩IEW1i(Yi−ˆf(Xi))=δ1,IEW2i(Yi−ˆf(Xi))=δ2,...IEWKi(Yi−ˆf(Xi))=δK. (2.14)

In this case a natural objective function for an inference is a quasi log-likelihood

 L(θ)def=−12K∑k=1n∑i=1(Zik−ηiTkθ)2 (2.15)

again with

 ηiTkdef=(Wkiψ1(Xi),Wkiψ2(Xi),...,WkiψJ(Xi))

and

 Zikdef=WkiYi−δk.

## 3 Testing a linear hypothesis: bootstrap log-likelihood ratio test

Introduce an empirical relaxation of the biased (2.4)

 ⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩W1iΨT(Xi)θ=W1iYi−δ1+ε1,i,W2iΨT(Xi)θ=W2iYi−δ2+ε2,i,...WKiΨT(Xi)θ=WKiYi−δK+εK,i,∥θ∥2=CI (3.1)

with centered errors . Courtesy of the lemma [2.2], a natural objective function is a penalized quasi log-likelihood

 L(θ)def=n∑i=1ℓi(θ)def=−12K∑k=1n∑i=1(Zik−ηiTkθ)2−λ∥θ∥22 (3.2)

with

 ηiTkdef=(Wkiψ1(Xi),Wkiψ2(Xi),...,WkiψJ(Xi))andZikdef=WkiYi−δk.

Maximum likelihood estimator (MLE) and its target are given

For a fixed projector introduce a linear hypothesis and define a log-likelihood ratio test

 H0:θ∗∈{Πθ=0},
 H1:θ∗∈{IRp∖{Πθ=0}},
 TLRdef=supθL(θ)−supθ∈H0L(θ). (3.3)

The test weakly converges

to chi-square distribution (theorem

4.3

) and it is convenient to define a quantile as

 zα:IP((TLR−J)/√J

It implies that and that weakly depends on a dimension s.t. , .

For a set of re-sampling multipliers

 {ui∼N(1,1)}i=¯¯¯¯¯¯1,n

define bootstrap conditional on the original data

 L♭(θ)=n∑i=1ℓi(θ)uidef=n∑i=1⎛⎝K∑k=1⎛⎝−(Zik−ηiTkθ)22−λ∥θ∥22nK⎞⎠⎞⎠ui.

and corresponding bootstrap MLE (bMLE) and its target

 ˜θ♭def=\operatornamewithlimitsargmaxθ∈IRpL♭(θ)% and˜θdef=\operatornamewithlimitsargmaxθ∈IRpIEL♭(θ)=\operatornamewithlimitsargmaxθ∈IRpL(θ).

A centered hypothesis and a respective test are defined accordingly

 H♭0:˜θ∈{Π(θ−˜θ)=0},
 TBLRdef=supθL♭(θ)−supθ∈H♭0L♭(θ). (3.4)

And analogously . The theorem [4.4] enables the same convergence in growing dimension .

Under parametric assumption - the non-parametric bias is zero - the bootstrap log-likelihood test is empirically attainable and the quantile is computed explicitly. On the other hand an unattainable quantile calibrates . Between the two exists a direct correspondence. In the section [LABEL:GCA] it is demonstrated that can be used instead of .

 Multiplier bootstrap procdeure: (3.5)
• Sample computing satisfying

• Test against using the inequalities

 H0:TLRJ+z♭α√J.

The idea is numerically validated in the section ’Numerical’. Its theoretical justification follows immediately.

## 4 Finite sample theory

In a most general case neither an objective estimates consistently nor a model (2.1) is justified as a suitable for arbitrary . Moreover, a regression with instrumental variables adds an additional concern, chosen instruments can be weakly identified (see section [7.1]) and an inference in the problem might involve a separate testing on weakness complicating an original problem.

Finite sample approach (Spokoiny 2012 ) is an option to merry a structure of with a properties of a probability space (2.1) and automatically account for an unknown nature of instruments in a regression problem.

 Finite sample theory: (4.1)
• ###### Theorem 4.1.

Suppose conditions (4.1) are fulfilled. Define a score vector

 ξdef=(ΔIEL(θ∗))−1/2∇L(θ∗).

then it holds with a universal constant

 ∣∣∣√2L(˜θ,θ∗)−∥ξ∥∣∣∣≤C(J+x)/√Kn

at least with the probability .

Bootstrap analogue of the Wilks expansion also follows. It was claimed in theorem B.4, section B.2 in Spokoiny, Zhilova 2015 .

###### Theorem 4.2.

Suppose conditions (4.1) are fulfilled. Define a bootstrap score vector

 ξ♭def=(ΔIEL(θ∗))−1/2∇(L♭(θ∗)−L(θ∗)),

then it holds with a universal constant

 ∣∣∣√2L♭(˜θ♭,˜θ)−∥ξ♭∥∣∣∣≤C(J+x)/√Kn

at least with the probability .

Moreover, the log-likelihood statistic follows the same local approximation in the context of hypothesis testing and the satisfies (see appendix - section (8.5)).

###### Theorem 4.3.

Assume conditions (4.1) are satisfied then with a universal constant

 ∣∣√2TLR−∥ξs∥∣∣≤C(J+x)/√Kn

with probability . The score vector is defined respectively

 ξsdef=D−1/20(∇ΠθL(θ∗)−(I−Π)ΔIEL(θ∗)ΠT((I−Π)ΔIEL(θ∗)(I−Π)T)−1∇(I−Π)θL(θ∗)),

and Fisher information matrix

 D20def=−ΠΔIEL(θ∗)ΠT+(I−Π)ΔIEL(θ∗)ΠT((I−Π)ΔIEL(θ∗)(I−Π)T)−1ΠΔIEL(θ∗)(I−Π)T.

Similar statement can be proven in the bootstrap world.

###### Theorem 4.4.

Assume conditions (4.1) are fulfilled then with probability holds

 ∣∣√2TBLR−∥ξs♭∥∣∣≤C(J+x)/√Kn,

with a universal constant , where a score vector is given

 ξs♭def=D−1/20(∇ΠθL♭(θ∗)−(I−Π)ΔIEL(θ∗)ΠT((I−Π)ΔIEL(θ∗)(I−Π)T)−1∇(I−Π)θL♭(θ∗)).

The theorem is effectively the same for as the re-sampling procedure replicates sufficient for the statement assumptions of a quasi log-likelihood (shown in section 8.3 Appendix).

### 4.1 Small Modelling Bias

In view of the re-sampling justification a separate discussion deserves a small modeling bias from Spokoiny, Zhilova 2015 . The condition appears from the general way to prove the re-sampling procedure. Namely, for a small error term it is claimed

 supt|IP(TLR

with the matrices

 H20=n∑i=1IE∇ℓi(θ∗)∇Tℓi(θ∗)andB20=n∑i=1∇IEℓi(θ∗)∇TIEℓi(θ∗),

where the term is assumed to be of the error order essentially meaning that the deterministic bias is small. However, the assumption

 ∥H−10B20H−10∥op∼error

appears in the current development only in the form of the condition ’Target’ in (4.1). The substitution is possible due to the next lemma.

###### Theorem 4.5.

Assume that the condition ’Target’ holds, then .

###### Proof.

By definition of a target of estimation

 N∑i=1∇IEℓi(θ∗0)=0,and∇IEℓj(θ∗1)+N∑i=1∇IEℓi(θ∗1)=0.

The condition ’Target’ implies that . Meaning, that any particular choice of the term with the index is also zero - . Thus, and the statement follows. ∎

## 5 Gaussian comparison and approximation

There are two results that constitute a basis for the re-sampling (3.5). The first - Gaussian comparison - is taken from Götze, F. and Naumov, A. and Spokoiny, V. and Ulyanov, V.  and adapted to the needs and notations in the work.

###### Theorem 5.1.

Assume centered Gaussian vectors and then it holds

 supt|IP(∥ξ1∥

with a universal constant , where stands for the operator norm of a matrix.

The second - Gaussian approximation - has been developed in the appendix (section [8.7]).

Introduce the notations for the vectors

 ξ1def=n∑i=1ξ1,i,andξ0def=n∑i=1ξ0,i

such that

1. and are independent and sub-Gaussian

2. .

Then a simplified version of the theorem [8.27] from the appendix holds.

###### Theorem 5.2.

Assume the framework above, then

 supt|IP(∥ξ1∥

with the universal constant .

Finally, the critical value and the empirical are glued together by a matrix concentration inequalities from the section (8.6).

The essence of the re-sampling is to translate the closeness of and into the closeness of the matrices -with the help of the Wilks expansion (theorems [4.3,4.4]) and Gaussian comparison result - and approximate unknown by the respective Gaussian counterparts. It all amounts to the central theorem.

###### Theorem 5.3.

The parametric model (

2.4) in the introduction - - under the assumption (4.1) enables

 ∣∣IP((TLR−J)/√J>z♭α)−α∣∣≤C0J3/2√Kn+C1√JlogJ+xKn

with a dominating probability and universal constants .

###### Remark 5.1.

Note that the critical value depends on experimental data at hand and is fixed when the expectation is taken with respect to the data generating statistics.

## 6 Numerical: conditional and bootstrap log-likelihood ratio tests

Calibrate BLR test on a model from Andrews, Moreira and Stock . In the paper the authors proposed conditional likelihood ratio test (CLR - ) used here as a benchmark. The simulated model reads as

 Y1=ZTπβ+ε1, (6.1)
 Y2=ZTπ+ε2, (6.2)

where , , and with a matrix , and (see section 1). And the hypothesis

 H0:β∗=β0againstH1:β∗≠β0

on a value of a structural parameter . For the hypothesis Moreira  and later Andrews, Moreira and Stock  construct a CLR test based on the two vectors

 S=(ZTZ)−12ZTYb(bTΩb)−12

and

 T=(ZTZ)−12ZTYa(aTΩ−1a)−12

with the notations , and . and are independent and together present sufficient statistics for the model (6.1) with only depending on instruments’ identification, thus conditioning on and CLR test. Log-likelihood ratio statistics in (6.1) is represented as (see Moreira 2003 ) -

 TLR=STS−TTT+√(STS−TTT)2+4(STT)2.

Additionally Lagrange multiplier and Anderson-Rubin tests are given by

 TLM=(STT)2TTT,
 TAR=STSJ

The latter two are known to perform acceptably except for weakly identified case.
First, correctly specified model is generated for the sample of and with weak instruments (). In this case powers of , and true tests are drawn on the figure (8.1). To be consistent is also compared to and . The comparison is given on the figure (8.2) and the data in the case is aggregated in the table (1).
Moreover an important step is to check how robust to a misspecification of the model. Three special examples are simulated:

1. ,

2. ,

3. .

Experiment (1) can be found on the figures (8.3), (8.4) and in the table (2). Numerical study of the experiment (2) with misspecified heteroskedastic error is given on the figure (8.5) and collected in the table (3). The last experiment is shown on the figure (8.6) and in the table (4).

###### Remark 6.1.

All the figures and tables are collected in the end of the work.

## 7 Strength of instrumental variables

On practice one wants to distinguish instruments based on its strength. For the clarity of exposition the section considers a simplified log-likelihood (2.15) identifying complete model with the Fisher information matrix

 D20=−ΔIEL(θ∗)=n∑i=1K∑k=1IEη∗kiη∗Tki=n∑i=1K∑k=1IEWkiΨ(Xi)ΨT(Xi)Wki.

Weak instrumental variables introduce an unavoidable lower bound on estimation error (lemma [7.1], see the proof in the appendix (8.1)).

###### Lemma 7.1.

Let conditions (4.1) hold then

 ∃N>0,s.t.∀n>NIE∥˜θ−θ∗∥2≥CJsup∥u∥=1n∑i=1K∑k=1IE(uTΨ(Xi)Wki)2,

with a factor