# Variable Selection with the Knockoffs: Composite Null Hypotheses

The Fixed-X knockoff filter is a flexible framework for variable selection with false discovery rate (FDR) control in linear models with arbitrary (non-singular) design matrices and it allows for finite-sample selective inference via the LASSO estimates. In this paper, we extend the theory of the knockoff procedure to tests with composite null hypotheses, which are usually more relevant to real-world problems. The main technical challenge lies in handling composite nulls in tandem with dependent features from arbitrary designs. We develop two methods for composite inference with the knockoffs, namely, shifted ordinary least-squares (S-OLS) and feature-response product perturbation (FRPP), building on new structural properties of test statistics under composite nulls. We also propose two heuristic variants of the S-OLS method that outperform the celebrated Benjamini-Hochberg (BH) procedure for composite nulls, which serves as a heuristic baseline under dependent test statistics. Finally, we analyze the loss in FDR when the original knockoff procedure is naively applied on composite tests.

• 3 publications
• 42 publications
06/24/2021

### Multiple Testing for Composite Null with FDR Control Guarantee

False discovery rate (FDR) controlling procedures provide important stat...
11/21/2019

### Controlling the FDR in variable selection via multiple knockoffs

Barber and Candes recently introduced a feature selection method called ...
09/12/2021

### Differentially Private Variable Selection via the Knockoff Filter

The knockoff filter, recently developed by Barber and Candes, is an effe...
01/08/2018

### On the consistency of adaptive multiple tests

Much effort has been done to control the "false discovery rate" (FDR) wh...
12/07/2019

### Minimal Sufficient Conditions for Structural Observability/Controllability of Composite Networks via Kronecker Product

In this paper, we consider composite networks formed from the Kronecker ...
10/06/2021

### Deploying the Conditional Randomization Test in High Multiplicity Problems

This paper introduces the sequential CRT, which is a variable selection ...
12/12/2018

### Asynchronous Online Testing of Multiple Hypotheses

We consider the problem of asynchronous online testing, aimed at providi...

## 1 Introduction

Selecting variables from a large collection of potential explanatory variables that are associated with responses of interest is a fundamental problem in many fields of science including genome-wide association study (GWAS), geophysics, and economics. In this paper, we focus on the classical linear regression model,

 y=Xβ+w ,w∼N(0,σ2In) , (1)

where and are

-dimensional random vectors with elements denoting response and error variables, respectively,

denotes a fixed design matrix containing samples of explanatory features/variables, and is the vector of unknown fixed coefficients relating and . For the following hypotheses,

 {H0,i:βi=0H1,i:βi≠0 ,1≤i≤p

the problem of interest is to test these hypotheses while controlling a simultaneous measure of type I error called the

false discovery rate (FDR), defined by Benjamini and Hochberg (1995) as follows,

 FDR:=E[|^S∩S0|max(|^S|,1)], (2)

where denotes the set of true null variables and the selected variables by some variable selection procedure, and the cardinality of the sets. A selection rule controls the FDR at level if its corresponding FDR is guaranteed to be at most for some predetermined .

Recently, Barber and Candès proposed the (Fixed-X) knockoff filter procedure Barber and Candès (2015), a data-dependent selection rule that controls the FDR in finite sample settings and under arbitrary designs. In this procedure, a test statistic is computed for each feature through constructing a knockoff variable and a feature is selected by (data-dependent) thresholding the statistics according to the target FDR. The knockoff construction allows for correlated features and this framework has higher statistical power in comparison with the Benjamini-Hochberg (BH) procedure Benjamini and Hochberg (1995); Benjamini and Yekutieli (2001); Storey et al. (2004); Efron et al. (2001)

(using z-scores for the

variables) in a range of settings Barber and Candès (2015)

. The knockoff filter has inspired various formulations such as the model-X knockoffs and deep learning based knockoffs among others

Barber and Candès (2019); Candes et al. (2018); Barber et al. (2020); Romano et al. (2019); Jordon et al. (2018); Lu et al. (2018); Fan et al. (2019); Pournaderi and Xiang (2021).

The original fixed-X knockoff filter is focused on the simple nulls (). However, in practice we are often interested in composite hypotheses rather than simple ones. The multiple testing of composite null hypotheses using (mutually) independent p-values has been studied in Benjamini and Yekutieli (2001); Sun and McLain (2012); Dickhaus (2013); Cabras (2010). Given that the knockoff selection framework deals with the dependencies between statistics inherently, a natural question is whether one can extend this to handle composite nulls, namely,

 {H′0,i:|βi|≤δH′1,i:|βi|>δ ,1≤i≤p ,

for some given . In this paper, we provide an affirmative answer to the above question by developing two methods: shifted ordinary least-squares (S-OLS) and feature-response product perturbation (FRPP). We show that both methods achieve FDR control in finite sample settings under arbitrary designs, leveraging new structural properties for test statistics under composite nulls. The main technical difficulty is to handle composite nulls in tandem with dependent features. This is highly nontrivial and, to the best of our knowledge, the closest solution to this is the BH procedure for composite nulls but with independent test statistics (Benjamini and Yekutieli, 2001, Theorem 5.2) (referred to as the composite BH in this paper). We use the composite BH as a heuristic baseline in our simulations, as the theoretical guarantees no longer hold for composite nulls with dependent test statistics. Our S-OLS method motivates two LASSO-based heuristic variants that outperform the composite BH in power. Furthermore, we quantify the loss in FDR when one uses the original fixed-X knockoff filter for composite nulls, and the result reduces to the exact FDR control for simple nulls.

The paper is organized as follows. In Section 2, we briefly present the knockoff filter framework by introducing the main steps to set the stage for our analysis. In Section 3, we present our main results and theoretical guarantees, along with two heuristic methods, with all the proofs deferred to Appendices. We report our experimental results in Section 4 for a range of composite nulls, amplitude of alternatives, and correlation coefficients.

## 2 Background: Fixed-X Knockoff Filter

The knockoff variable selection procedure Barber and Candès (2015) consists of two main steps: (I) computing an statistic for each variable in the model, and (II) selecting , where the threshold depends on the target FDR and the set of computed statistics, i.e., . In this section we look at this procedure in more details.

### 2.1 Knockoff Design

The knockoff methodology for detecting non-null variables involves creating a fake (or knockoff) design that maintain the correlation structure of but break down the relationship between and .

###### Assumption 1.

is invertible.

Specifically, if , Barber and Candès (2015) proposes the following construction to produce knockoff designs

 ~X(s)=X(Ip−Σ−1diag{s})+~UC, (3)

where is a free vector of parameters as long as it satisfies , is obtained by Cholesky decomposition of the Schur complement of , and is an orthonormal matrix that satisfies (see Barber and Candès (2015) for details). Therefore, knockoff matrices are not unique and they are constructed according to the original design matrix. Using (3), the following relation can be easily verified.

 G=[ΣΣ−diag{s}Σ−diag{s}Σ]. (4)

In fact, this construction not only preserves the correlation structure of , but also has another subtle yet important geometrical implication: (-th column of and are second-order “exchangable” in a deterministic sense, i.e., swapping and does not change the inner product structure (Gram matrix) of the augmented design. This property makes an appropriate tool for FDR control. It should be noted that the detection power highly depends on the parameter as it determines the angle between a feature and its corresponding knockoff. In other words, (-th element of ) controls how different (or orthogonal) and would be. Assuming the columns of are normalized by the Euclidean norm, one way to choose is to solve the following convex problem,

 minimize p∑i=1|1−si| (5) subject to si≥0 ,diag{s}⪯2Σ .

This semi-definite programming minimizes the average correlation between variables and their corresponding knockoff variable.

###### Remark 1.

Since a column-wise normalization of the design matrix is natural for variable selection purpose and essential in terms of statistical power, throughout this paper, we always assume is normalized by norm of columns

### 2.2 Statistics

Using the knockoff features, we now compute a vector of anti-symmetric statistics by regression over the augmented design . Let and denote some estimated parameters corresponding to the variables and . In this case, one can define an statistic as follows

 Wi=|^θi|−|^θ′i| ,1≤i≤p . (6)

We can also define the statistics differently,

 Wi=sgn(|^θi|−|^θ′i|)max(|^θi|,|^θ′i|) . (7)

To be more precise, the term anti-symmetric here means that swapping the estimates and for any subset of indices has the effect of switching the signs of . Specifically, the knockoff framework guarantees the FDR control when the (anti-symmetric) statistics are computed based on estimators that depend on the data through the following form,

 ^θ2p×1=T(G,[X ~X]⊤y) , (8)

where is a deterministic operator and, swapping and will result in swapping and . For instance, the LASSO Tibshirani (1996) regression estimates given by

 (9)

can be considered as an example of estimators satisfying .

### 2.3 FDR Control

In this subsection, we briefly discuss the existing approaches to show the FDR control in the knockoff procedure. Let denote the permutation matrix corresponding to swapping and for all , then by the structure of knockoff matrix we have,

 P⊤FGPF=G ,F⊆{1,…,p} . (10)

Also, relying on (1), we get

 P⊤F[X ~X]⊤yd=[X ~X]⊤y ,F⊆S0 . (11)

The identities (10) and (11) immediately imply an interesting property of the estimates : the estimated parameters for null variables and their corresponding knockoff variables are exchangeable. This property along with the anti-symmetric structure of the statistics are the main ingredients of the FDR control proof in the original paper. In fact, under the simple null hypotheses, these properties lead to the so-called i.i.d. sign property of the nulls, i.e., the signs of null statistics (signs of ) are independent of the magnitudes and have i.i.d. Rademacher distribution (i.e., ). In this case, using the martingale theory, it is shown that rejecting with the following threshold controls the FDR at level .

 T=inf{t∈Ψ:ˆFDP(t)≤q}, (12)
 ˆFDP(t):=1+#{j:Wj≤−t}#{j:Wj≥t}∨1, (13)

where . Although this approach is elegant, we shall not follow it in development of composite tests as in this situation (11) fails to hold immediately. On the other hand, Barber et al. (2020) provides another proof for FDR control111The results in Barber et al. (2020) concern robustness of the Model-X knockoff framework (where is random) but the FDR control proof works for the Fixed-X setting as well, since they only rely on antisymmetry of the statistics and (14). which requires weaker conditions on the statistics (Barber et al., 2020, equation (16)). Specifically, it suggests that if the null statistics satisfy

 P{Wj>0 ∣∣ |Wj|,W−j}≤C⋅P{Wj<0 ∣∣ |Wj|,W−j}, (14)

for some , then the knockoff procedure with the target FDR controls the FDR at level . It is straightforward to verify that in the case of simple nulls, this condition holds with according to the i.i.d. sign property of the null statistics, resulting in FDR control.

## 3 Main Results

In case of a single composite test of the form , the common approach is to compute super uniform p-values under the null (i.e.,

) which clearly controls the probability of type I error by definition. This usually happens when the p-value is computed according to the distribution corresponding to a parameter on the boundary of the null region. However, in case of composite multiple testing problems, this argument gets more complicated as the dependencies between the statistics should be considered. The composite BH procedure

(Benjamini and Yekutieli, 2001, Theorem 5.2) guarantees the FDR control when the null p-values are super uniform, but it is only valid for independent p-values. On the other hand, the kncockoff filter is designed to utilize the model and covarites structure for computing one-bit p-values that handle the dependencies naturally. In developing the composite tests, it turns out that we can actually maintain this property of the knockoff procedure. To show the FDR control we rely on manipulating the estimates so that we can (upper) bound the following ratio222Recall that the knockoff procedure discards and as a result by definition. Also, we have for since . as it is known that this will lead to rigorous FDR control at level ((Barber et al., 2020, Theorem 2) with ).

 (15)

In fact, showing in the knockoff procedure framework means that the procedure overesitmates the and therefore, can be interpreted as an equivalent for super uniformity of null p-values in terms of BH procedure. However, for bounds where , we need to correct the test size by a factor of , i.e., . In cases where , we use a generalization of this argument which leads to tighter bounds. More precisely, we bound the following quantity.

 (16)

where is some event regarding the -th variable.

###### Theorem 1 (Barber et al. (2020), Theorem 2).

If , we get .

This bound characterizes the FDR loss term and provides intuition and insight about the loss of FDR under different values of parameters. In the following subsection we present composite inference methods that allow for theoretical FDR control.

### 3.1 Composite Testing with FDR Control

The following theorem concerns the composite knockoff procedure based on the ordinary least-squares.333Note that performing the knockoff procedure using the OLS estimator requires that is invertible. In Lemma 1 we show that this is the case if .

We consider both one-sided and two-sided null hypothesis, i.e.,

and , respectively. We show that shifting the estimates corresponding to the knockoff variables by would result in exact FDR control in the former case (). Regarding the latter case, we derive a bound of the FDR of this approach (, ).

###### Theorem 2 (S-Ols).

Consider the knockoff procedure (target FDR=) and based on the estimates with some for all . Let and denote the -th and -th elements of .

(I) Consider testing for . If , then the procedure controls the FDR at level .

(II) Consider testing for . If , then

 FDR≤
###### Remark 2.

We note that . This evidences how larger shifts would help the theoretical FDR control.

###### Corollary 1 (Alternative method for Theorem 2 (Ii)).

Consider testing for but let . If , then

 FDR≤

Our next method generalizes the knockoff framework, i.e., it is not restricted to any specific estimator. This makes the situation difficult since the analysis of the shifted estimates will become infeasible. In this case, we propose to introduce artificial randomness to the procedure to be able to perform composite inference in such general setting. Specifically, we perturb the feature-response products by noise generated from Laplace distribution. In this case, we will be able to show that with

determined by the noise variance.

###### Theorem 3 (Frpp).

Fix some . Define the null variables and let where , . The knockoff procedure with the target FDR , and using (antisymmetric) statistics based on any estimator of the form controls the FDR at level .

###### Remark 3.

As we discussed in the previous section, the original knockoff framework focuses on the estimators of the form (8) that satisfy where is the (symmetric) permutation matrix swapping -th and -th elements. We also keep assuming this property as we refer to general estimators, e.g., we assume . Observe that this is a very mild assumption (since ) and is satisfied by almost every estimators of linear models.

We note that Theorem 3 gives an stochastic generalization of the knockoff procedure, i.e., if the FRPP method reduces to the original method without any additional assumptions.

### 3.2 Heuristic Methods

Motivated by our results that show the shifting argument is theoretically valid in case of using the OLS estimator, we propose the following two methods based on shifting the LASSO estimates.

Method S-LAS1: This method shifts the LASSO estimates just as the S-OLS method. Namely, we use the following formulae to compute the statistics.

 ^βS-LAS1=^βLAS(λ)+(0δ)

Method S-LAS2: This method estimate the coefficients by solving the following LASSO problem.

where .

###### Remark 4.

We note that both methods reduce to the S-OLS method if .

### 3.3 FDR Bound for Naive Selection

###### Theorem 4.

Naive application of the (fixed) knockoff procedure (with target FDR ) on composite null hypotheses and using (antisymmetric) statistics based on any estimator of the form , will result in FDR bounded as follows.

 FDR≤minϵ≥0{q.eϵ+P(δσ2maxj∈S0∣∣γj−γ′j∣∣>ϵ)} , (17)

where .

###### Remark 5.

We note that the bound (17) will reduce to in the case of simple nulls, i.e., . It also evidences that when the FDR loss will be negligible.

## 4 Simulations

In this section we present simulation results on synthetic data sets for all methods. We set the sample size and dimension to be and , respectively. We generate samples (rows of ) i.i.d. according to where and then normalize the columns of by the -norm. The responses are generated according to the linear model (1) with noise variance and the number of true alternatives is . The composite null boundary is set to be , and we consider two distributions for generating coefficients corresponding to null variables:

(a) which is presented in the left column of the figures. We consider this setting as a practical case.

(b) , which is presented in the right column of the figures. This setting tries to examine the methods in the hardest situation (worst case) for FDR control.

The super uniform p-values for the composite BH procedure are computed according to . We adopt the equicorrelated knockoffs and set the elements of (the vector used in creating the knockoff matrices) to be , , and for the S-OLS, FRPP, and heuristic methods (S-LAS1 and S-LAS2), respectively. We adopt the structure (7) to construct the coefficient signed max statistics and use the LASSO estimator (9) with . The target FDR is and the plots are based on averaging 200 trials. The power is defined as follows,

 Power:=1kE∣∣^S∩Sc0∣∣ . (18)
###### Remark 6.

In Barber and Candès (2015), it is suggested to set for equicorrelated knockoffs. However, this is not possible in case of using OLS estimator as it would result in a singular augmented design if (See Lemma 1). Regarding the FRPP method, note that unlike the deterministic methods larger values of does not result in higher detection power necessarily because the variance of the additive Laplacian noise is proportional to . In our experiments it turns out that for (no correlation) case setting the procedure reaches to its highest power when . However this is not the case for higher correlations since will be small. In this case the maximum power is reached when .

## 5 Discussion

Fixed-X knockoff procedure is an elegant method for selective inference in general linear models with FDR control guarantee. However, the composite extension of this method has not been developed yet. In this paper, we have investigated the Fixed-X knockoff filter approach to the variable selection problem with composite nulls and under arbitrary dependencies among statistics. The knockoff inference procedure handles the dependencies between variables very naturally by computing model-based statistics. We have shown that this structure is still useful under the composite nulls and allows for development of methods with theoretical FDR control guarantee. We have derived a full stochastic generalization of the knockoff procedure which has reasonable statistical power. We have shown that if we restrict ourselves to the ordinary least-squares estimates, the intuitive (and deterministic) method of shifting the estimate would be theoretically valid. We have also derived a general bound on the FDR for cases where the original knockoff procedure is applied on a composite problem, without any additional assumptions.

## Appendix A Technical Lemmas

###### Lemma 1.

If , then is invertible.

###### Proof.

Recall,

 G=[ΣΣ−diag(s)Σ−diag(s)Σ] , (19)

where . We note,

 [I−IOI][ABBA][IIOI]=[A−B   OB   A+B] .

Therefore,

and as a result, the set of eigenvalues of

is the union of the eigenvalues of and . If holds, then which implies that is positive definite and therefore, invertible. ∎

###### Lemma 2.

Let and . If is invertible, has the following structure.

 G−1=[AA−DA−DA] ,

where is a diagonal matrix.

###### Proof.

From (4) recall,

 G=[ΣΣ−D∗Σ−D∗Σ] , (20)

with some diagonal . From the inverse of block matrices we have

 G−1=[J−J(Ip−D∗Σ−1)−J(Ip−D∗Σ−1)J] .

where . We observe,

 J−(−J(Ip−D∗Σ−1))=D∗−1 . (21)

Therefore, and . ∎

###### Lemma 3.

Let . It holds that for all .

###### Proof.

We observe . According to the structure of (20), we get

 ∣∣E(γj−γ′j)∣∣ =∣∣∣(Σ(j)−(Σ(j)−D∗(j)))β∣∣∣ =sj|βj|≤sjδ ,

where the index denotes the -th row of the matrices and the inequality holds according to the definition of . ∎

## Appendix B Proof of Theorem 4

###### Lemma 4.

Let . For fixed , any anti-symmetric statisfies the following properties.

(I) For all and ,

where .

(II) For all we have,

 (23)

where denotes a vector of unordered pairs and operates coordiante-wise.

###### Proof.

(I) We compute the conditional distribution . We note that where according to (1). Therefore we have,

 ((γj,γ′j)∣∣(γ−j,γ′−j)=(a,b))∼N(ηj,σ2Rj) , (24)

where

and

From the definition of the knockoff variables we have with some and according to Lemma 2 we have

with some symmetric positive-definite and diagonal . Therefore we get

with some . Hence,

 ηj =(˘γj˘γ′j)+[cjcjcjcj]⊤(a−˘γ−jb−˘γ′−j) (25) =(˘γj+c⊤j(a+b−˘γ−j−˘γ′−j)˘γ′j+c⊤j(a+b−˘γ−j−˘γ′−j))=(˘γj+dj˘γ′j+dj) .

Similarly, for the covariance matrix we have,

 (26)

where does not depend on or . Let denote . The proof for is trivial as it implies . Suppose . In this case, since swapping and would switch the sign of , we get

 P{Wj>0∣∣(γj,γ′j)∈U,(γ−j,γ′−j)}P{Wj<0∣∣(γj,γ′j)∈U,(γ−j,γ′−j)} ≤max{h(u,v)h(u,v)+h(v,u)h(v,u)h(u,v)+h(v,u),h(v,u)h(u,v)+h(v,u)h(u,v)h(u,v)+h(v,u)} =max{h(u,v)h(v,u),h(v,u)h(u,v)} . (27)

By a counter-clockwise rotation of we get

 h(u,v)h(v,u)= = h1(1√2(u−v))h1(−1√2(u−v)) , (28)

where,

 h1pdf∼N(1√2(ηj−ηj′),σ2(R(11)j−R(12)j)) ,
 h2pdf∼N(1√2(ηj+ηj′),σ2(R(11)j+R(12)j)) .

According to (25) and (26), we have

 (29)

where and since the columns of are normalized. Therefore, is free of the given values for , which also means

 (γj−γ′j∣∣γ−j,γ′−j)d=(γj−γ′j) , (30)

and hence . We now use (28) and (29) to bound the RHS of (27) under ,

 max{ h1(1√2(u−v))h1(−1√2(u−v)),h1(−1√2(u−v))h1(1√2(u−v))} ≤exp{−14σ2j(∣∣˘γj−˘γ′j∣∣−∣∣u−v∣∣)2}exp{−14σ2j(∣∣˘γj−˘γ′j∣∣+∣∣u−v∣∣)2} =exp{1σ2j∣∣˘γj−˘γ′j∣∣∣∣u−v∣∣} =exp{δσ2∣∣u−v∣∣} ,

where the second inequality is a consequence of under (Lemma 3).

(II) According to (25) and (26), the conditional distribution (24) only depends on (through ). Hence, we have

 (γj,γ′j∣∣γ−j,γ′−j)d=(γj,γ′j∣∣{γ−j,γ′−j}V) ,

We observe that the unordered pair is a function of . Therefore, by conditioning on it we get,

 (γj,γ′j∣∣(γ−j,γ′−j),{γj,γ′j})d=(γj,γ′j∣∣{γ,γ′}V) .

Therefore,

 (sgn(Wj)∣∣sgn(W−j),{γ,γ′}V)d=(sgn(Wj)∣∣{γ,γ′}V) ,

which implies (23) immediately. ∎

For all we have

###### Proof.
 P {Wj>0, δσ2∣∣γj−γ′j∣∣≤ϵ ∣∣ |Wj|,W−j} = E{P(Wj>0, δσ2∣∣γj−γ′j∣∣≤ϵ∣∣{γj,γ′j},(γ−j,γ′−j))∣∣∣|Wj|,W−j} = E{1{δσ2∣∣γj−γ′j∣∣≤ϵ}P(Wj>0∣∣{γj,γ′j},(γ−j,γ′−j))∣∣∣|Wj|,W−j} ≤ ∣∣∣|Wj|,W−j} ≤ = eϵP{Wj<0 ∣∣ |Wj|,W−j} ,

where denotes an unordered pair, the first inequality holds according to Lemma 4 (I), and we used the tower property since is a function of . ∎

Hence (17) holds according to Theorem 1.

## Appendix C Proof of Theorem 2

###### Lemma 6.

If we use estimates with distribution to compute the statistics, then, the following properties hold for any .

(I) If , then

 sgn(Wj)⊥⊥W−j∣∣{^βj,^β′j}, (31)

where denotes an unordered pair.

(II) If , then

 Wj⊥⊥W−j. (32)
###### Proof.

(I) We compute the conditional distribution,

 (^β−j,^β′−j∣∣(^βj,^β′j)=(a,b))∼N(ξ,C) .

We note that according to Lemma 2, and have the same structure. Therefore, similar calculations as in the proof of Lemma 4 gives,

 ξ=⎛⎝˘β−j˘β′−j⎞⎠+[cccc]⎛⎝a−˘βjb−˘β′j⎞⎠ ,
 C=Cov(^β−j,^β′−j)−[QQQQ] ,

with some and symmetric that does not depend on and . We note that the conditional distribution depends on the pair only through their sum , so it is free of the order of the pair. Thus, we get

 (^β−j,^β′−j∣∣^βj,^β′j)d=(^β−j,^β′−j∣∣{^βj,^β′j