The behavior of deep neural networks is as complex as it is powerful. The relation of individual parameters to the network’s output is highly nonlinear and is generally unclear to an external observer. Consequently, it has been widely supposed in the field that it is impossible to recover the parameters of a network merely by observing its output on different inputs.
Beyond informing our understanding of deep learning, going from function to parameters could have serious implications for security and privacy. In many deployed deep learning systems, the output is freely available, but the network used to generate that output is not disclosed. The ability to uncover a confidential network not only would make it available for public use but could even expose data used to train the network if such data could be reconstructed from the network’s weights.
This topic also has implications for the study of biological neural networks. Experimental neuroscientists can record some variables within the brain (e.g. the output of a complex cell in primary visual cortex) but not others (e.g. the pre-synaptic simple cells), and many biological neurons appear to be well modeled as the ReLU of a linear combination of their inputs(Chance et al., 2002). It would be highly useful if we could reverse engineer the internal components of a neural circuit based on recordings of the output and our choice of input stimuli.
In this work, we show that it is, in fact, possible in many cases to recover the structure and weights of an unknown ReLU network by querying it. Our method leverages the fact that a ReLU network is piecewise linear and transitions between linear pieces exactly when one of the ReLUs of the network transitions from its inactive to its active state. We attempt to identify the piecewise linear surfaces in input space where individual neurons transition from inactive to active. For neurons in the first layer, such boundaries are hyperplanes, for which the equations determine the weights and biases of the first layer (up to sign and scaling). For neurons in subsequent layers, the boundaries are “bent hyperplanes” that bend where they intersect boundaries associated with earlier layers. Measuring these intersections allows us to recover the weights between the corresponding neurons.
Our major contributions are:
We identify how the architecture, weights, and biases of a network can be recovered from the arrangement of boundaries between linear regions in the network.
We implement this procedure and demonstrate its success in recovering trained and untrained ReLU networks.
We show that this algorithm “degrades gracefully,” providing partial weights even when full weights are not recovered, and show that these situations can indicate intrinsic ambiguities in the network.
2 Related work
Various works within the deep learning literature have considered the problem of learning a network given its output on inputs drawn (non-adaptively) from a given distribution. It is known that this problem is in general hard (Goel et al., 2017), though positive results have been found for certain specific choices of distribution in the case that the network has only one or two layers (Ge et al., 2019; Goel et al., 2017; Goel and Klivans, 2017). By contrast, we consider the problem of learning about a network of arbitrary depth, given the ability to issue queries at specified input points. In this work, we leverage the theory of linear regions within a ReLU network, an area that has been studied e.g. by Telgarsky (2015); Raghu et al. (2017); Hanin and Rolnick (2019a). Most recently Hanin and Rolnick (2019b) considered the boundaries between linear regions as arrangements of “bent hyperplanes”.
Neuroscientists have long considered similar problems with biological neural networks, albeit armed with prior knowledge about network structure. For example, it is believed that complex cells in the primary visual cortex, which are often seen as translation-invariant edge detectors, obtain their invariance through what is effectively a two-layer neural network (Kording et al., 2004). A first layer is believed to extract edges, while a second layer essentially implements maxpooling. Heggelund (1981) perform physical experiments akin to our approach of identifying one ReLU at a time, by applying inputs that move individual neurons above their critical threshold one by one. Being able to solve such problems more generically would be useful for a range of neuroscience applications.
denotes the bias vector for layer. Given a neuron in the network, we use to denote its preactivation for input . Thus, for the th neuron in layer , we have
For each neuron , we will use to denote the set of for which . In general111More precisely, this holds for all but a measure zero set of networks, and any network for which this is not true may simply be perturbed slightly., will be an -dimensional piecewise linear surface in (see Figure 1, in which input dimension is 2 and the are simply lines). We call the boundary associated with neuron , and we say that is the boundary of the overall network. We refer to the connected components of as regions. Throughout this paper, we will make the Linear Regions Assumption: The set of regions is the set of linear pieces of the piecewise linear function . While this assumption has tacitly been made in the prior literature, it is noted in Hanin and Rolnick (2019b) that there are cases where it does not hold – for example, if an entire layer of the network is zeroed out for some inputs.
3.2 Isomorphisms of networks
Before showing how to infer the parameters of a neural network, we must consider to what extent these parameters can be inferred unambiguously. Given a network , there are a number of other networks that define exactly the same function from input space to output space. We say that such networks are isomorphic to . For multilayer perceptrons with ReLU activation, we consider the following network isomorphisms:
Permutation. The order of neurons in each layer of a network does not affect the underlying function. Formally, let be the network obtained from by permuting layer according to (along with the corresponding weight vectors and biases). Then, is isomorphic to for every layer and permutation .
Scaling. Due to the ReLU’s equivariance under multiplication, it is possible to scale the incoming weights and biases of any neuron, while inversely scaling the outgoing weights, leaving the overall function unchanged. Formally, for the th neuron in layer and any positive constant, let be the network obtained from by replacing , , and by , , and , respectively. It is simple to prove that is isomorphic to (see Appendix A).
Thus, we can hope to recover a network only up to layer-wise permutation and neuron-wise scaling. Formally, and are generators for a group of isomorphisms of . (As we shall see in §5, some networks also possess additional isomorphisms.)
4 The algorithm
Consider a network and neuron , so that is the boundary associated with neuron . Recall that is piecewise linear. We say that bends at a point if is nonlinear at that point (that is, if the point lies on the boundary of several regions). As observed in Hanin and Rolnick (2019b), can bend only at points where it intersects boundaries for in an earlier layer of the network. In general, the converse also holds; bends wherever it intersects such a boundary (see Appendix A). Then, for any two boundaries and , one of the following must hold: bends at their intersection (in which case occurs in a deeper layer of the network), bends (in which case occurs in a deeper layer), or neither bends (in which case and occur in the same layer). It is not possible for both and to bend at their intersection – unless that intersection is also contained in another boundary, which is vanishingly unlikely in general. Thus, the architecture of the network can be determined by evaluating the boundaries and where they bend in relation to one another.
Moving beyond architecture, the weights and biases of the network can also be determined from the boundaries, one layer at a time. Boundaries for neurons in the first layer do not bend and are simply hyperplanes; the equations of these hyperplanes expose the weights from the input to the first layer (up to permutation, scaling, and sign). For each subsequent layer, the weight between neurons and can be determined by calculating how bends when it crosses . The details of our algorithm below are intended to make these intuitions concrete and perform efficiently even when the input space is high-dimensional.
Input and Initialize for on boundary do Initialize , while do Pick and if on boundary then else ; break end if end while if then end if end for return
4.2 The first layer
We begin by identifying the first layer of the network , for which we must infer the number of neurons, the weight matrix , and the bias vector . As noted above, for each in the first layer, the boundary is a hyperplane with equation . For each neuron in a later layer of the network, the boundary will, in general, bend and not be a (complete) hyperplane (see Appendix A). We may therefore find the number of neurons in layer 1 by counting the hyperplanes contained in the network’s boundary , and we can infer weights and biases by determining the equations of these hyperplanes.
Boundary points along a line. Our algorithm is based upon the identification of points on the boundary . One of our core algorithmic primitives is a subroutine PointsOnLine that takes as input a line segment and approximates the set of boundary points along . The algorithm proceeds by binary search, leveraging the fact that boundary points subdivide into regions within which is linear. We maintain a list of points in order along (initialized to the endpoints and midpoint of ) and iteratively perform the following operation: For each three consecutive points on our list, , we determine if the vectors and are equal (to within computation error) – if so, we remove the point from our list, otherwise we add the points and to our list.222These weighted averages speed up the search algorithm by biasing it towards points closer towards the center of the segment, which is where we expect the most intersections given our choice of segments. The points in the list converge to discontinuities of the gradient , which are our desired boundary points. Note that PointsOnLine is where we make use of our ability to query the network.
Sampling boundary points. In order to identify the boundaries for in layer 1, we begin by identifying a set of boundary points with at least one on each . A randomly chosen line segment through input space will intersect some of the
– indeed, if it is long enough, it will intersect any fixed hyperplane with probability 1. We sample line segmentsin and run PointsOnLine on each. Many sampling distributions are possible, but in our implementation we choose to sample segments of fixed (long) length, tangent at their midpoints to a sphere of fixed (large) radius. This ensures that each of our sample lines remains far from the origin, where boundaries are in closer proximity and therefore more easily confused with one another (this will become useful in the next step). Let be the overall set of boundary points identified on our sample line segments.
Inferring hyperplanes. We now proceed to fit a hyperplane to each of the boundary points we have just identified. For each , there is a neuron such that lies on . The boundary is piecewise linear, with nonlinearities only along other boundaries, and with probability , does not lie on a boundary besides . Therefore, within a small enough neighborhood of , is given by a hyperplane, which we call the local hyperplane at . If is in layer 1, then equals the local hyperplane. The subroutine InferHyperplane takes as input a point on a boundary and approximates the local hyperplane within which lies. This algorithm proceeds by sampling many small line segments around , running PointsOnLine to find their points of intersection with
, and performing a linear regression to find the equation of the hyperplane containing these points.
Testing hyperplanes. Not all of the hyperplanes we have identified are actually boundaries for neurons in layer 1, so we need to test which hyperplanes are contained in in their entirety, and which are the local hyperplanes of boundaries that bend. The subroutine TestHyperplane takes as input a point and a hyperplane containing that point, and determines whether is contained in the boundary of the network. This algorithm proceeds by sampling points within that lie far from and applying PointsOnLine to verify the existence of a boundary within a short line segment. Applying TestHyperplane to those hyperplanes inferred in the preceding step allows us to determine those for which is in layer 1.
From hyperplanes to parameters. Finally, we identify the first layer of from the equations of hyperplanes contained in . The number of neurons in layer 1 is given simply by the number of distinct which are hyperplanes. As we have observed, for in layer 1, the hyperplane is given by . We can thus determine and up to multiplication by a constant. However, we have already observed that scaling and by a positive constant (while inversely scaling ) is a network isomorphism (§3.2). Therefore, we need only determine the true sign of the multiplicative constant, corresponding to determining which side of the hyperplane is zeroed out by the ReLU. This determination of sign will be performed below in §4.3.
Sample complexity. We expect the number of queries necessary to obtain weights and biases (up to sign) for the first layer should grow as , which for constant-width networks is only slightly above the number of parameters being inferred. Assuming that biases in the network are bounded above, each sufficiently long line has at least a constant probability of hitting a given hyperplane, suggesting that lines are required according to a coupon collector-style argument. Hanin and Rolnick (2019a) show that under natural assumptions, the number of boundary points intersecting a given line through input space grows linearly in the total number of neurons in the network. Finally, each boundary point on a line requires queries in order to fit a hyperplane.
4.3 Additional layers
We now assume that the weights and biases have already been determined within the network , with the exception of the sign choice for weights and biases at each neuron in layer . We now show how it is possible to determine the weights and biases , along with the correct signs for and .
Closest boundary along a line. In this part of our algorithm, we will need the ability to move along a boundary to its intersection with another boundary. For this purpose, the subroutine ClosestBoundary will be useful. It takes as input a point , a vector and the network parameters as determined up to layer , and outputs the smallest such that lies on for some in layer at most . In order to compute , we consider the region within which lies, which is associated with a certain pattern of active and inactive ReLUs. For each boundary , we can calculate the hyperplane equation which would define were it to intersect , due to the fixed pattern of active and inactive neurons within , and we can calculate the distance from to this hyperplane. While not every boundary intersects , the closest boundary does, allowing us to find the desired .
Unused boundary points. In order to identify the boundaries for in layer , we wish to identify a set of boundary points with at least one on each such boundary. However, in previous steps of our algorithm, a set of boundary points was created, of which some were used in ascertaining the parameters of earlier layers. We now consider the subset of points that were not found to belong to , for in layers through . These points have already had their local hyperplanes determined.
Exploring boundary intersections. Consider a point such that . Note that will, in general, have nonlinearities where it intersects each for which lies in an earlier layer than . We explore these intersections, and in particular attempt to find a point of for every in layer . Given the local hyperplane at , we pick a direction along and apply ClosestBoundary to calculate the closest point of intersection with for all already identified in the network. (Below we discuss how best to pick .) Note that if is in layer , then must be on as well as , while if is in a later layer of the network, then there may exist unidentified neurons in layers below and therefore may bend before meeting . We check if lies on by applying PointsOnLine, and if so apply InferHyperplane to calculate the local hyperplane of on the other side of from . We select a representative point on this local hyperplane. We repeat the process of exploration from the points until one of the following occurs: (i) a point of has been identified for every in layer (this may be impossible; see §5), (ii) is determined to be in a layer deeper than (as a result of not lying on ), or (iii) a maximum number of iterations has been reached.
How to explore. An important step in our algorithm is exploring points of that lie on other boundaries. Given a set of points on , we briefly consider several methods for picking a point and direction along the local hyperplane at to apply ClosestBoundary. One approach is to pick a random point from those already identified and a random direction ; this has the advantage of simplicity. However, it is somewhat faster to consider for which the intersection has not yet been identified and attempt specifically to find points on these intersections. One approach for this is to pick a missing and identify for which the boundary lies on the boundary of the region containing
and solve a linear program to find. Another approach is to pick a missing and a point , calculate the hyperplane which would describe under the activation pattern of , and choose along the local hyperplane to such that the distance to is minimized. This is the approach which we take in our implementation, though more sophisticated approaches may exist and present an interesting avenue for further work.
From boundaries to parameters. We now identify layer of , along with the sign of the parameters of layer , by measuring the extent to which hyperplanes bend at their intersection. We are, in addition, able to identify the correct signs at layer by solving an overconstrained system of constraints capturing the influence of neurons in layer on different regions of input space. The following theorem formalizes the inductive step that allows us to go from what we know at layer (weights and biases, up to scaling and sign) to the equivalent set of information for layer , plus filling in the signs for layer . The proof is given in Appendix B.
The following holds true for deep multi-layer perceptrons satisfying the Linear Region Assumption (§3.1), excluding a set of networks with measure zero:
Suppose that the weights and biases of are known up through layer , with the exception that for each neuron in layer , the sign of the incoming weights and the bias is unknown. Suppose also that for each in layer , there exists an ordered set of points such that: (i) Each point lies on the boundary of , and in (the interior of) a distinct region with respect to the earlier-layer boundaries already known; (ii) each point (except for ) has a precursor in an adjacent region; (iii) for each such pair of points, the local hyperplanes of are known, as is the boundary dividing them ( in an earlier layer); (iv) the set of such includes all of layer .
Then, it is possible to recover the weights and biases for layer , with the exception that for each neuron, the sign of the incoming weights and the bias is unknown. It is also possible to recover the sign for every neuron in layer .
Note that even when assumption (iv) of the Theorem is violated, the algorithm recovers the weights corresponding to whichever boundaries are successfully crossed (as we verify empirically in §6).
We here explore some reasons why our algorithm may fail, motivate our recursive approach, and discuss the potential for generalizations to different architectures.
Non-intersecting boundaries. It is possible that for some neurons and in consecutive layers, there is no point of intersection between the boundaries and (or that this intersection is very small), making it impossible to infer the weight between and by our algorithm. Some such cases represent an ambiguity in the underlying network – an additional isomorphism to those described in §3.2. Namely, is empty if one of the following cases holds: (1) whenever is active, is inactive; (2) whenever is active, is active; (3) whenever is inactive, is inactive; or (4) whenever is inactive, is active. In case 1, observe that a slight perturbation to the weight between and has no effect upon the network’s output; thus is not uniquely determined. Cases 2-4 present a more complicated picture; depending on the network, there may or may not be additional isomorphisms.
Boundary topology. For simplicity in our algorithm, we have not considered the relatively rare cases where boundaries are disconnected or bounded. If is disconnected, then it may not be possible to find a connected path along it that intersects all boundaries arising from the preceding layer. In this case, it is simple to infer that two independently identified pieces of the boundary belong to the same neuron to infer the full weight vector. Next, if is bounded for some , then it is a closed -dimensional surface within -dimensional input space333For 2D input, such must be topological circles, but for higher dimensions, it is conceivable for them to be more complicated surfaces, such as toroidal polyhedra.. While our algorithm requires no modification in this case, bounded may be more difficult to find by intersection with randomly chosen lines, and a more principled sampling method may be helpful.
Our recursive approach. Our approach proceeds layer by layer, leveraging the fact that each boundary bends only those for those boundaries corresponding to earlier neurons in the network. Our approach in the first layer is, however, distinct from (and simpler than) the algorithm for subsequent layers. One might wonder why, once the first layers have been identified, it is not possibly simply to apply our first-layer algorithm to the -dimensional “input space” arising from activations of layer . Unfortunately, this is not possible in general, as this would require the ability to evaluate layer for arbitrary settings of layer . ReLU networks are hard to invert, and therefore it is unclear how one could manufacture an input for a specified layer activation, even while knowing the parameters for the first layers.
While we have expressed our algorithm in terms of multilayer perceptrons with ReLU activation, it also extends to various other architectures of neural network. Other piecewise linear activation functions (such as leaky ReLU) admit similar algorithms. For a network with convolutional layers, it is possible to use the same approach to infer the weights between neurons, with two caveats: (i) As we have stated it, the algorithm does not account for weight-sharing – the number of “neurons” in each layer is thus dependent on the input size, and is very large for reasonably sized images. (ii) Pooling layersdo affect the partition into activation regions, and indeed introduce new discontinuities into the gradient; our algorithm therefore does not apply. For ResNets (He et al., 2016), our algorithm holds with slight modification, which we defer until future work.
, networks were initialized using i.i.d. normal weights with varianceand i.i.d. normal biases with unit variance. Networks were then trained to memorize 1000 datapoints with arbitrary binary labels. Training was performed using the Adam optimizer and a cross-entropy loss applied to the softmax of the final layer. The trained networks (when sufficiently large) were able to attain near-perfect accuracy. We observed that both the first-layer algorithm and additional-layer algorithm identified weights and biases to within extremely high accuracy (see Figures 3 and 4). Even in cases where, for the additional-layer algorithm, a small fraction of neurons were not identified (see §5), the algorithm was able to correctly predict the remaining parameters.
In this work, we have shown that it is often possible to recover the architecture, weights, and biases of deep ReLU networks by repeated queries. We proceed by identifying the boundaries between linear regions of the network and the intersections of these boundaries. Our approach is theoretically justified and empirically validated on networks before and after training. Where the algorithm does not succeed in giving a complete set of weights, it is nonetheless able to give a partial set of weights, and incompleteness in some cases reflects unresolvable ambiguities about the network.
Our approach works for a wide variety of networks, though not all. It is limited to ReLU or otherwise piecewise linear activation functions, though we believe it possible that a continuous version of this method could potentially be developed in future work for use with sigmoidal activation. If used with convolutional layers, our method does not account for the symmetries of the network and therefore scales with the size of the input as well as the number of features, resulting in high computation. Finally, the method is not robust to defenses such as adding noise to the outputs of the network, and therefore can be thwarted by a network designer that seeks to hide their weights/architecture.
We believe that the methods we have introduced here will lead to considerable advances in identifying neural networks from their outputs, both in the context of deep learning and, more speculatively, in neuroscience. While the implementation we have demonstrated here is effective in small instances, we anticipate future work that optimizes these methods for efficient use with different architectures and at scale.
- Gain modulation from background synaptic input. Neuron 35 (4), pp. 773–782. Cited by: §1.
- Learning two-layer neural networks with symmetric inputs. In International Conference on Learning Representations (ICLR), Cited by: §2.
- Reliably learning the ReLU in polynomial time. In Conference on Learning Theory (COLT), Cited by: §2.
- Learning neural networks with two nonlinear layers in polynomial time. Preprint arXiv:1709.06010. Cited by: §2.
- How to start training: The effect of initialization and architecture. In Neural Information Processing Systems (NeurIPS), Cited by: §6.
Complexity of linear regions in deep networks.
International Conference on Machine Learning (ICML), Cited by: §2, §4.2.
- Deep ReLU networks have surprisingly few activation patterns. In Neural Information Processing Systems (NeurIPS), Cited by: §2, §3.1, §4.1.
Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In
IEEE International Conference on Computer Vision (ICCV), Cited by: §6.
Deep residual learning for image recognition.
Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §5.
- Receptive field organization of simple cells in cat striate cortex. Experimental Brain Research 42 (1), pp. 89–98. Cited by: §2.
- How are complex cell properties adapted to the statistics of natural stimuli?. Journal of neurophysiology 91 (1), pp. 206–212. Cited by: §2.
- On the expressive power of deep neural networks. In International Conference on Machine Learning (ICML), Cited by: §2.
- Representation benefits of deep feedforward networks. Preprint arXiv:1509.08101. Cited by: §2.
Appendix A Useful Lemmata
Lemma 1 (Isomorphism under scaling).
Given an MLP with ReLU activation, the network is isomorphic to for every neuron and constant .
Suppose that is the th neuron in layer . Then, for each neuron in layer of the network , we have:
By comparison, in network , we have:
where we used the property that for any .
Lemma 2 (Bending hyperplanes).
The set of networks with the following property has measure zero in the space of networks: There exist neurons and in consecutive layers such that the boundary intersects but does not bend at the intersection.
Observe that is defined by the equation:
As it does not bend when it intersects , the gradient of the RHS must remain unchanged when flips between active and inactive. Unless another neuron transitions simultaneously with (an event that occurs with measure zero), this can happen only if , which itself is a measure zero event. ∎
Appendix B Proof of Theorem
In this proof, we will show how the information we are given by the assumptions of the theorem is enough to recover the weights and biases for each neuron in layer . We will proceed for each individually, progressively learning weights between and each of the neurons in the preceding layer (though for ResNets this procedure could also easily be generalized to learn weights from to earlier layers).
For each of the points , suppose that is the local hyperplane associated with on boundary . The gradient at is orthogonal to , and we thus already know the direction of the gradient, but its magnitude is unknown to us. We will proceed in order through the points , with the goal of identifying for each , up to a single scaling factor, as this computation will end up giving us the incoming weights for .
We begin with by assigning arbitrarily to either one of the two unit vectors orthogonal to . Due to scaling invariance (Lemma 1), the weights of can be rescaled without changing the function so that is multiplied by any positive constant. Therefore, our arbitrary choice can be wrong at most in its sign, and we need not determine the sign at this stage. Now, suppose towards induction that we have identified (up to sign) for . We wish to identify .
By assumption (ii), there exists a precursor to such that and intersect on a boundary . Let be our estimate of , for unknown sign . Let be a unit normal vector to , so that for some unknown constant . We pick the sign of so that it has the same orientation as with respect to the surface , and thus . Finally, let be our estimate of the gradient of ; where is also an unknown sign (recall that since is in layer we know its gradient up to sign). We will use and to identify .
Suppose that is the th neuron in layer and that is the th neuron in layer . Recall that
As is the boundary between inputs for which is active and inactive, must equal zero either (Case 1) on or (Case 2) on .
In Case 1, we have
which gives us the equation:
Since we know the vectors , we are able to deduce the constant .
A similar equation arises in Case 2:
giving rise to the same value of . We thus may complete our induction. In the process, observe that we have calculated a constant , where the sign is in Case 1 and in Case 2. Note that can be calculated based on whether points towards or . Therefore, we have obtained , which is exactly the weight (up to -dependent sign) that we wished to find. Once we have all weights incoming to (up to sign), it is simple to identify the bias for this neuron (up to sign) by calculating the equation of any known local hyperplane for and using the known weights and biases from earlier layers.
To complete the proof, we must now also calculate the correct signs of the neurons in layer . Pick some in layer and observe that for all points there corresponds an equation, obtained by taking gradients in equation (3):
where equals if is on the active side of . We can substitute in our (sign-unknown) values for these various quantities:
Now, we may estimate by a function that is 1 if and are on the same side of . This estimate will be wrong exactly when . Thus, , giving us the equation:
All the terms of this equation are known, with the exception of and the variables – giving us a linear system in variables. For a given , there are different representing the intersections with for each in layer ; choosing these should in general give linearly independent constraints. Moreover, the equation is in fact a vector equality with dimension ; hence, it is a highly overconstrained system, enabling us to identify the signs for each . This completes the proof of the theorem.