Semidefinite relaxations for certifying robustness to adversarial examples

11/02/2018 ∙ by Aditi Raghunathan, et al. ∙ Stanford University 0

Despite their impressive performance on diverse tasks, neural networks fail catastrophically in the presence of adversarial inputs---imperceptibly but adversarially perturbed versions of natural inputs. We have witnessed an arms race between defenders who attempt to train robust networks and attackers who try to construct adversarial examples. One promise of ending the arms race is developing certified defenses, ones which are provably robust against all attackers in some family. These certified defenses are based on convex relaxations which construct an upper bound on the worst case loss over all attackers in the family. Previous relaxations are loose on networks that are not trained against the respective relaxation. In this paper, we propose a new semidefinite relaxation for certifying robustness that applies to arbitrary ReLU networks. We show that our proposed relaxation is tighter than previous relaxations and produces meaningful robustness guarantees on three different "foreign networks" whose training objectives are agnostic to our proposed relaxation.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Many state-of-the-art classifiers have been shown to fail catastrophically in the presence of small imperceptible but adversarial perturbations. Since the discovery of such adversarial examples

(Szegedy et al., 2014), numerous defenses have been proposed in attempt to build classifiers that are robust to adversarial examples. However, defenses are routinely broken by new attackers who adapt to the proposed defense, leading to an arms race. For example, distillation was proposed (Papernot et al., 2016) but shown to be ineffective (Carlini and Wagner, 2017). A proposed defense based on transformations of test inputs (Lu et al., 2017) was broken in only five days (Athalye and Sutskever, 2017). Recently, seven defenses published at ICLR 2018 fell to the attacks of Athalye et al. (2018).

A recent body of work aims to break this arms race by training classifiers that are certifiably robust to all attacks within a fixed attack model Hein and Andriushchenko (2017); Raghunathan et al. (2018); Wong and Kolter (2018); Dvijotham et al. (2018a). These approaches construct a convex relaxation for computing an upper bound on the worst-case loss over all valid attacks—this upper bound serves as a certificate

of robustness. In this work, we propose a new convex relaxation based on semidefinite programming (SDP) that is significantly tighter than previous relaxations based on linear programming (LP)

Wong and Kolter (2018); Dvijotham et al. (2018a, b) and handles arbitrary number of layers (unlike the formulation in Raghunathan et al. (2018), which was restricted to two). We summarize the properties of our relaxation as follows:

1. Our new SDP relaxation reasons jointly about intermediate activations and captures interactions that the LP relaxation cannot. Theoretically, we prove that there is a square root dimension gap between the LP relaxation and our proposed SDP relaxation for neural networks with random weights.

2. Empirically, the tightness of our proposed relaxation allows us to obtain tight certificates for foreign networks—networks that were not specifically trained towards the certification procedure. For instance, adversarial training against the Projected Gradient Descent (PGD) attack (Madry et al., 2018) has led to networks that are “empirically” robust against known attacks, but which have only been certified against small perturbations (e.g. in the -norm for the MNIST dataset (Dvijotham et al., 2018b)). We use our SDP to provide the first non-trivial certificate of robustness for a moderate-size adversarially-trained model on MNIST at .

3. Furthermore, training a network to minimize the optimum of particular relaxation produces networks for which the respective relaxation provides good robustness certificates Raghunathan et al. (2018). Notably and surprisingly, on such networks, our relaxation provides tighter certificates than even the relaxation that was optimized for during training.

Related work.

Certification methods which evaluate the performance of a given network against all possible attacks roughly fall into two categories. The first category leverages convex optimization and our work adds to this family. Convex relaxations are useful for various reasons. Wong and Kolter (2018); Raghunathan et al. (2018) exploited the theory of duality to train certifiably robust networks on MNIST. In recent work, Dvijotham et al. (2018a); Wong et al. (2018) extended this approach to train bigger networks with improved certified error and on larger datasets. Solving a convex relaxation for certification typically involves standard techniques from convex optimization. This enables scalable certification by providing valid upper bounds at every step in the optimization Dvijotham et al. (2018b).

The second category draws techniques from formal verification such as SMT (Katz et al., 2017a, b; Carlini et al., 2017; Huang et al., 2017), which aim to provide tight certificates for any network using discrete optimization. These techniques, while providing tight certificates on arbitrary networks, are often very slow and worst-case exponential in network size. In prior work, certification would take up to several hours or longer for a single example even for a small network with around 100 hidden units Carlini et al. (2017); Katz et al. (2017a). However, in concurrent work, Tjeng and Tedrake (2017) impressively scaled up exact verification through careful preprocessing and efficient pruning that dramatically reduces the search space. In particular, they concurrently obtain non-trivial certificates of robustness on a moderately-sized network trained using the adversarial training objective of Madry et al. (2018) on MNIST at perturbation level .

2 Setup

Our main contribution is a semidefinite relaxation of an optimization objective that arises in certification of neural networks against adversarial examples. In this section, we set up relevant notation and present the optimization objective that will be the focus of the rest of the paper.


For a vector

, we use to denote the coordinate of . For a matrix , denotes the row. For any function and a vector , is a vector in with , e.g., represents the function that squares each component. For , denotes that for . We use to represent the elementwise product of the vectors and . We use to denote the ball around . When it is necessary to distinguish vectors from scalars (in Section 4.1), we use to represent a vector in that is semantically associated with the scalar . Finally, we denote the vector of all zeros by and the vector of all ones by .

Multi-layer ReLU networks for classification.

We focus on multi-layer neural networks with ReLU activations. A network with hidden layers is defined as follows: let denote the input and denote the activation vectors at the intermediate layers. Suppose the network has units in layer . is related to as , where are the weights of the network. For simplicity of exposition, we omit the bias terms associated with the activations (but consider them in the experiments). We are interested in neural networks for classification where we classify an input into one of classes. The output of the network is such that represents the score of class . The class label assigned to the input is the class with the highest score: .

Attack model and certificate of robustness.

We study classification in the presence of an attacker that takes a clean test input and returns an adversarially perturbed input . In this work, we focus on attackers that are bounded in the norm: for some fixed . The attacker is successful on a clean input label pair if , or equivalently if for some .

We are interested in bounding the error against the worst-case attack (we assume the attacker has full knowledge of the neural network). Let denote the worst-case margin of an incorrect class that can be achieved in the attack model:


A network is certifiably robust on if for all . Computing for a neural network involves solving a non-convex optimization problem, which is intractable in general. In this work, we study convex relaxations to efficiently compute an upper bound . When , we have a certificate of robustness of the network on input .

Optimization objective.

For a fixed class , the worst-case margin of a neural network with weights can be expressed as the following optimization problem. The decision variable is the input , which we denote here by for notational convenience. The quantity we are interested in maximizing is , where is the final layer activation. We set up the optimization problem by jointly optimizing over all the activations , imposing consistency constraints dictated by the neural network, and restricting the input to be within the attack model. Formally,

subject to (Neural network constraints)
(Attack model constraints)

Computing is computationally hard in general. In the following sections, we present how to relax this objective to a convex semidefinite program and discuss some properties of this relaxation.

3 Semidefinite relaxations

In this section, we present our approach to obtaining a computationally tractable upper bound to the solution of the optimization problem described in (2).

Key insight.

The source of the non-convexity in (2) is the ReLU constraints. Consider a ReLU constraint of the form . The key observation is that this constraint can be expressed equivalently as the following three linear and quadratic constraints between and : (i) , (ii) , and (iii) . Constraint (i) ensures that is equal to either or and constraints (ii) and (iii) together then ensure that is at least as large as both. This reformulation allows us to replace the non-linear ReLU constraints of the optimization problem in  2 with linear and quadratic constraints, turning it into a quadratically constrained quadratic program (QCQP). We first show how this QCQP can be relaxed to a semidefinite program (SDP) for networks with one hidden layer. The relaxation for multiple layers is a straightforward extension and is presented in Section 5.

3.1 Relaxation for one hidden layer

Consider a neural network with one hidden layer containing nodes. Let the input be denoted by . The hidden layer activations are denoted by and related to the input as for weights .

Suppose that we have lower and upper bounds on the inputs such that . For example, in the attack model we have and where is the clean input. For the multi-layer case, we discuss how to obtain these bounds for the intermediate activations in Section 5.2. We are interested in optimizing a linear function of the hidden layer: , where . For instance, while computing the worst case margin of an incorrect label over true label , .

We use the key insight that the ReLU constraints can be written as linear and quadratic constraints, allowing us to embed these constraints into a QCQP. We can also express the input constraint as a quadratic constraint, which will be useful later. In particular, if and only if , thereby yielding the quadratic constraint . This gives us the final QCQP below:

s.t. (ReLU constraints)
(Input constraints)

We now relax the non-convex QCQP (3) to a convex SDP. The basic idea is to introduce a new set of variables representing all linear and quadratic monomials in and ; the constraints in (3) can then be written as linear functions of these new variables.

In particular, let . We define a matrix and use symbolic indexing to index the elements of , i.e .

The SDP relaxation of (3) can be written in terms of the matrix as follows.

s.t (ReLU constraints)
(Input constraints)

When the matrix admits a rank-one factorization , the entries of the matrix exactly correspond to linear and quadratic monomials in and . In this case, the ReLU and input constraints of the SDP are identical to the constraints of the QCQP. However, this rank-one constraint on would make the feasible set non-convex. We instead consider the relaxed constraint on that allows factorizations of the form , where can be full rank. Equivalently, we consider the set of matrices such that . This set is convex and is a superset of the original non-convex set. Therefore, the above SDP is a relaxation of the QCQP in 3 with , providing an upper bound on that could serve as a certificate of robustness. We note that this SDP relaxation is different from the one proposed in Raghunathan et al. (2018), which applies only to neural networks with one hidden layer. In contrast, the construction presented here naturally generalizes to multiple layers, as we show in Section 5. Moreover, we will see in Section 6 that our new relaxation often yields substantially tighter bounds than the approach of Raghunathan et al. (2018).

4 Analysis of the relaxation

Figure 1: (a) Plot showing the feasible regions for the vectors (green) and (red). The input constraints restrict to lie within the green circle. The ReLU constraint forces to lie on the dashed red circle and the constraint restricts it to the solid arc. (b) For a fixed value of input , when the angle made by with increases, the arc spanned by has a larger projection on and leading to a looser relaxation. Secondly, for a fixed value of , as increases, the norm increases and vice versa.

Before extending the SDP relaxation defined in (4) to multiple layers, we will provide some geometric intuition for the SDP relaxation.

4.1 Geometric interpretation

First consider the simple case where and , so that the problem is to maximize subject to and . In this case, the SDP relaxation of  (4) is as follows:

s.t (ReLU constraints)
(Input constraints)

The SDP operates on a PSD matrix and imposes linear constraints on the entries of the matrix. Since feasible can be written as , the entries of can be thought of as dot products between vectors, and constraints as operating on these dot products. For the simple example above, for some vectors . The constraint , for example, imposes i.e., is a unit vector. The linear monomials correspond to projections on this unit vector, and . Finally, the quadratic monomials , and correspond to , and respectively. We now reason about the input and ReLU constraints and visualize the geometry (see Figure 0(a)).

Input constraints. The input constraint equivalently imposes . Geometrically, this constrains vector on a sphere with center at and radius . Notice that this implicitly bounds the norm of . This is illustrated in Figure 0(a) where the green circle represents the space of feasible vectors , projected onto the plane containing and .

ReLU constraints. The constraint on the quadratic terms is the core of the SDP. It says that the vector is perpendicular to . We can visualize on the plane containing and in Figure 0(a); the component of perpendicular to this plane is not relevant to the SDP, because it’s neither constrained nor appears in the objective. The feasible trace out a circle with as the center (because the angle inscribed in a semicircle is a right angle). The linear constraints restrict to the arc that has a larger projection on than , and is positive.

Remarks. This geometric picture allows us to make the following important observation about the objective value of the SDP relaxation. The largest value that can take depends on the angle that makes with . In particular, as decreases, the relaxation becomes tighter and as the vector deviates from , the relaxation gets looser. Figure 0(b) provides an illustration. For large , the radius of the circle that traces increases, allowing to take large values.

That leads to the natural question: For a fixed input value (corresponding to ), what controls ? Since , as the norm of increases, increases. Hence a constraint that forces to be close to will cause the output to take smaller values. Porting this intuition into the matrix interpretation, this suggests that constraints forcing to be small lead to tighter relaxations.

4.2 Comparison with linear programming relaxation

Figure 2: (a) Visualization of the LP and SDP for a single ReLU unit with input and output . The LP is bounded by the line joining the extreme points. (b) Let and . On fixing the inputs and (both equal to ), we plot the feasible activations of the LP and SDP relaxation. The LP feasible set is a simple product over the independent sets, while the SDP enforces joint constraints to obtain a more complex convex set. (c) We plot the set across all feasible inputs for the same setup as (b) and the objective of maximizing . We see that .

In contrast to the SDP, another approach is to relax the objective and constraints in (2) to a linear program (LP) Kolter and Wong (2017); Ehlers (2017); Dvijotham et al. (2018b). As we will see below, a crucial difference from the LP is that our SDP can “reason jointly” about different activations of the network in a stronger way than the LP can. We briefly review the LP approach and then elaborate on this difference.

Review of the LP relaxation.

We present the LP relaxation for a neural network with one hidden layer, where the hidden layer activations are related to the input as . As before, we have bounds such that .

In the LP relaxation, we replace the ReLU constraints at hidden node with a convex outer envelope as illustrated in Figure 1(a). The envelope is lower bounded by the linear constraints and . In order to construct the upper bounding linear constraints, we compute the extreme points and and construct lines that connect and . The final LP for the neural network is then written by constructing the convex envelopes for each ReLU unit and optimizing over this set as follows:

(Lower bound lines)
(Upper bound lines)

The extreme points and

are the optima of a linear transformation (by

) over a box in and can be computed using interval arithmetic. In the attack model where and , we have and for .

From Figure 1(a), we see that for a single ReLU unit taken in isolation, the LP is tighter than the SDP. However, when we have multiple units, the SDP is tighter than the LP. We illustrate this with a simple example in dimensions with hidden nodes (See Figure 1(b)).

Simple example to compare the LP and SDP.

Consider a two dimensional example with input and lower and upper bounds and , respectively. The hidden layer activations and are related to the input as and . The objective is to maximize .

The LP constrains and independently. To see this, let us set the input to a fixed value and look at the feasible values of and . In the LP, the convex outer envelope that bounds only depends on the input and the bounds and and is independent of the value of . Similarly, the outer envelope of does not depend on the value of , and the feasible set for is simply the product of the individual feasible sets.

In contrast, the SDP has constraints that couple and . As a result, the feasible set of is a strict subset of the product of the individual feasible sets. Figure 1(b) plots the LP and SDP feasible sets for . Recall from the geometric observations (Section 4.1) that the arc of depends on the configuration of , while that of depends on . Since the vectors and are dependent, the feasible sets of and are also dependent on each other. An alternative way to see this is from the matrix constraint that in 4. This matrix constraint does not factor into terms that decouple the entries and , hence and cannot vary independently.

When we reason about the relaxation over all feasible points , the joint reasoning of the SDP allows it to achieve a better objective value. Figure 1(c) plots the feasible sets over all valid where the optimal value of the SDP, , is less than that of the LP, .

We can extend the preceding example to exhibit a dimension-dependent gap between the LP and the SDP for random weight matrices. In particular, for a random network with hidden nodes and input dimension

, with high probability,

while . More formally: Suppose that the weight matrix is generated randomly by sampling each element uniformly and independently from . Also let the output vector be the all-s vector, . Take and . Then, for some universal constant ,

We defer the proof of this to Section A.

5 Multi-layer networks

The SDP relaxation to evaluate robustness for multi-layer networks is a straightforward generalization of the relaxation presented for one hidden layer in Section 3.1.

5.1 General SDP

The interactions between and in (2) (via the ReLU constraint) are analogous to the interaction between the input and hidden layer for the one layer case. Suppose we have bounds on the inputs to the ReLU units at layer such that . We discuss how to obtain these bounds and their significance in Section 5.2. Writing the constraints for each layer iteratively gives us the following SDP:

(ReLU constraints for layer )
(Input constraints for layer )

5.2 Bounds on intermediate activations

From the geometric interpretation of Section 4.1, we made the important observation that adding constraints that keep small aid in obtaining tighter relaxations. For the multi-layer case, since the activations at layer act as input to the next layer , adding constraints that restrict will lead to a tighter relaxation for the overall objective. The SDP automatically obtains some bound on from the bounds on the input, hence the SDP solution is well-defined and finite even without these bounds. However, we can tighten the bound on by relating it to the linear monomial via bounds on the value of the activation . One simple way to obtain bounds on activations is to treat each hidden unit separately, using simple interval arithmetic to obtain


where and .

In our experiments on real networks (Section 6), we observe that these simple bounds are sufficient to obtain good certificates. However tighter bounds could potentially lead to tighter certificates.

6 Experiments

Grad-NN Raghunathan et al. (2018) LP-NN Wong and Kolter (2018) PGD-NN
SDP-cert (this work)
Grad-cert n/a
Table 1: Fraction of non-certified examples on MNIST. Different certification techniques (rows) on different networks (columns). SDP-cert is consistently better than other certificates. All numbers are reported for attacks at .

In this section, we evaluate the performance of our certificate (7) on neural networks trained using different robust training procedures, and compare against other certificates in the literature.


We consider feedforward networks that are trained on the MNIST dataset of handwritten digits using three different robust training procedures.

1. Grad-NN. We use the two-layer network with hidden nodes from Raghunathan et al. (2018), obtained by using an SDP-based bound on the gradient of the network (different from the SDP presented here) as a regularizer. We obtained the weights of this network from the authors of Raghunathan et al. (2018).

2. LP-NN. We use a two-layer network with hidden nodes (matching that of Grad-NN) trained via the LP-based robust training procedure of Wong and Kolter (2018). The authors of Wong and Kolter (2018) provided the weights.

3. PGD-NN. We consider a fully-connected network with four layers containing and hidden nodes (i.e., the architecture is 784-200-100-50-10). We train this network using adversarial training Goodfellow et al. (2015) against the strong PGD attack Madry et al. (2018)

. We train to minimize a weighted combination of the regular cross entropy loss and adversarial loss. We tuned the hyperparameters based on the performance of the PGD attack on a holdout set. The stepsize of the PGD attack was set to

, number of iterations to , perturbation size and weight on adversarial loss to .

The training procedures for SDP-NN and LP-NN yield certificates of robustness (described in their corresponding papers), but the training procedure of PGD-NN does not. Note that all the networks are “foreign networks” to our SDP, as their training procedures do not incorporate the SDP relaxation.

Certification procedures.

Recall from Section 2 that an upper bound on the maximum incorrect margin can be used to obtain certificates. We consider certificates from three different upper bounds.

1. SDP-cert. This is the certificate we propose in this work. This uses the SDP upper bound that we defined in Section 5. The exact optimization problem is presented in (7) and the bounds on intermediate activations are obtained using the interval arithmetic procedure presented in (8).

2. LP-cert. This uses the upper bound based on the LP relaxation discussed in Section 4.2 which forms the basis for several existing works on scalable certification Dvijotham et al. (2018b); Ehlers (2017); Weng et al. (2018); Wong and Kolter (2018). The LP uses layer-wise bounds for intermediate nodes, similar to in our SDP formulation (7). For Grad-NN and LP-NN with a single hidden layer, the layerwise bounds can be computed exactly using interval arithmetic. For the four-layer PGD-NN, in order to have a fair comparison with SDP-cert, we use the same procedure (interval arithmetic) (8).

3. Grad-cert. We use the upper bound proposed in Raghunathan et al. (2018). This upper bound is based on the maximum norm of the gradient of the network predictions and only holds for two-layer networks.

Table 1 presents the performance of the three different certification procedures on the three networks. For each certification method and network, we evaluate the associated upper bounds on the same random test points and report the fraction of points that were not certified. Computing the exact worst-case adversarial error is not computationally tractable. Therefore, to provide a comparison, we also compute a lower bound on the adversarial error—the error obtained by the PGD attack.

Performance of proposed SDP-cert.

SDP-cert provides non-vacuous certificates for all networks considered. In particular, we can certify that the four layer PGD-NN has an error of at most at . To compare, a lower bound on the robust error (PGD attack error) is . On the two-layer networks, SDP-cert improves the previously-known bounds. For example, it certifies that Grad-NN has an error of at most compared to the previously known . Similarly, SDP-cert improves the bound for LP-NN from to .

Figure 3: Histogram of PGD margins for (a) points that are certified by the SDP and (b) points that are not certified by the SDP.

The gap between the lower bound (PGD) and upper bound (SDP) is because of points that cannot be misclassified by PGD but are also not certified by the SDP. In order to further investigate these points, we look at the margins obtained by the PGD attack to estimate the robustness of different points. Formally, let

be the adversarial example generated by the PGD attack on clean input with true label . We compute , the margin of the closest incorrect class. A small value indicates that the was close to being misclassified. Figure 3 shows the histograms of the above PGD margin. The examples which are not certified by the SDP have much smaller margins than those examples that are certified: the average PGD margin is 1.2 on points that are not certified and 4.5 on points that are certified. From Figure 3, we see that a large number of the SDP uncertified points have very small margin, suggesting that these points might be misclassified by stronger attacks.


As discussed in Section 5, we could consider a version of the SDP that does not include the constraints relating linear and quadratic terms at the intermediate layers of the network. Empirically, such an SDP produces vacuous certificates ( error). Therefore, these constraints at intermediate layers play a significant role in improving the empirical performance of the SDP relaxation.

Comparison with other certification approaches.

From Table 1, we observe that SDP-cert consistently performs better than both LP-cert and Grad-cert for all three networks.

Grad-cert and LP-cert provide vacuous ( error) certificates on networks that are not trained to minimize these certificates. This is because these certificates are tight only under some special cases that can be enforced by training. For example, LP-cert is tight when the ReLU units do not switch linear regions Wong and Kolter (2018). While a typical input causes only of the hidden units of LP-NN to switch regions, of the hidden units of Grad-NN switch on a typical input. Grad-cert bounds the gradient uniformly across the entire input space. This makes the bound loose on arbitrary networks that could have a small gradient only on the data distribution of interest.

Comparison to concurrent work Tjeng and Tedrake (2017).

A variety of robust MNIST networks are certified by Tjeng and Tedrake (2017). On Grad-NN, their certified error is which is looser than our SDP certified error (). They also consider the CNN counterparts of LP-NN and PGD-NN, trained using the procedures of Wong and Kolter (2018) and Madry et al. (2018). The certified errors are and respectively. This reduction in the errors is due to the CNN architecture. Further discussion on applying our SDP to CNNs appears in Section 7.

Optimization setup.

We use the YALMIP toolbox Löfberg (2004) with MOSEK as a backend to solve the different convex programs that arise in these certification procedures. On a 4-core CPU, the average SDP computation took around minutes and the LP around minutes per example.

7 Discussion

In this work, we focused on fully connected feedforward networks for computational efficiency. In principle, our proposed SDP can be directly used to certify convolutional neural networks (CNNs); unrolling the convolution would result in a (large) feedforward network. Naively, current off-the-shelf solvers cannot handle the SDP formulation of such large networks. Robust training on CNNs leads to better error rates: for example, adversarial training against the PGD adversary on a four-layer feedforward network has error

against the PGD attack, while a four-layer CNN trained using a similar procedure has error less than (Madry et al., 2018). An immediate open question is whether the network in (Madry et al., 2018), which has so far withstood many different attacks, is truly robust on MNIST. We are hopeful that we can scale up our SDP to answer this question, perhaps borrowing ideas from work on highly scalable SDPs Ahmadi and Majumdar (2017) and explicitly exploiting the sparsity and structure induced by the CNN architecture.

Current work on certification of neural networks against adversarial examples has focused on perturbations bounded in some norm ball. In our work, we focused on the common attack because the problem of securing multi-layer ReLU networks remains unsolved even in this well-studied attack model. Different attack models lead to different constraints only at the input layer; our SDP framework can be applied to any attack model where these input constraints can be written as linear and quadratic constraints. In particular, it can also be used to certify robustness against attacks bounded in norm. Hein and Andriushchenko (2017) provide alternative bounds for norm attacks based on the local gradient.

Guarantees for the bounded norm attack model in general are sufficient but not necessary for robustness against adversaries in the real world. Many successful attacks involve inconspicious but clearly visible perturbations (Evtimov et al., 2017; Sharif et al., 2016; Carlini et al., 2016; Brown et al., 2017), or large but semantics-preserving perturbations in the case of natural language (Jia and Liang, 2017). These perturbations do not currently have well-defined mathematical models and present yet another layer of challenge. However, we believe that the mathematical ideas we develop for the bounded norm will be useful building blocks in the broader adversarial game.


All code, data and experiments for this paper are available on the Codalab platform at


This work was partially supported by a Future of Life Institute Research Award and Open Philanthrophy Project Award. JS was supported by a Fannie & John Hertz Foundation Fellowship and an NSF Graduate Research Fellowship. We thank Eric Wong for providing relevant experimental results. We are also grateful to Moses Charikar, Zico Kolter and Eric Wong for several helpful discussions and anonymous reviewers for useful feedback.


Appendix A Proof of Proposition 4.2

We first lower bound the LP value , and then upper bound the SDP value .

Part 1: Lower-bounding .

It suffices to exhibit a feasible solution for the constraints. Note that for a given hidden unit , we have and . In particular, at a feasible value for is .

For this feasible value of , we get that . In other words, is at least half the element-wise -norm of . Since is a random sign matrix we have for all , hence with probability .

Part 2: Upper-bounding .

We start by exhibiting a general upper bound on implied by the constraints: For any weight matrices and , we have , where is the operator norm of . The proof of Lemma A is given later in this section. To apply the lemma, note that in our case , while with probability , for some universal constant (see Theorem 5.39 of [Vershynin, 2010]). Therefore, Lemma A yields the bound with probability , as claimed.

a.1 Proof of Lemma A

First note that since , we have by Schur complements, and in particular (by taking the trace of both sides).

Using this, and letting

denote the nuclear norm (sum of singular values), we have


But we also have


Here (i) is Hölder’s inequality, and (iii) uses the fact that for all (due to the constraints imposed by and ).

Solving for , we obtain the bound . Plugging back into the preceding inequality, we obtain , as was to be shown.