From deep learning to mechanistic understanding in neuroscience: the structure of retinal prediction

12/12/2019 ∙ by Hidenori Tanaka, et al. ∙ 0

Recently, deep feedforward neural networks have achieved considerable success in modeling biological sensory processing, in terms of reproducing the input-output map of sensory neurons. However, such models raise profound questions about the very nature of explanation in neuroscience. Are we simply replacing one complex system (a biological circuit) with another (a deep network), without understanding either? Moreover, beyond neural representations, are the deep network's computational mechanisms for generating neural responses the same as those in the brain? Without a systematic approach to extracting and understanding computational mechanisms from deep neural network models, it can be difficult both to assess the degree of utility of deep learning approaches in neuroscience, and to extract experimentally testable hypotheses from deep networks. We develop such a systematic approach by combining dimensionality reduction and modern attribution methods for determining the relative importance of interneurons for specific visual computations. We apply this approach to deep network models of the retina, revealing a conceptual understanding of how the retina acts as a predictive feature extractor that signals deviations from expectations for diverse spatiotemporal stimuli. For each stimulus, our extracted computational mechanisms are consistent with prior scientific literature, and in one case yields a new mechanistic hypothesis. Thus overall, this work not only yields insights into the computational mechanisms underlying the striking predictive capabilities of the retina, but also places the framework of deep networks as neuroscientific models on firmer theoretical foundations, by providing a new roadmap to go beyond comparing neural representations to extracting and understand computational mechanisms.



There are no comments yet.


page 2

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep convolutional neural networks (CNNs) have emerged as state of the art models of a variety of visual brain regions in sensory neuroscience, including the retina

[1, 2], primary visual cortex (V1), [3, 4, 5, 6], area V4 [3], and inferotemporal cortex (IT) [3, 4]

. Their success has so far been primarily evaluated by their ability to explain reasonably large fractions of variance in biological neural responses across diverse visual stimuli. However, fraction of variance explained is not of course the same thing as scientific explanation, as we may simply be replacing one inscrutable black box (the brain), with another (a potentially overparameterized deep network).

Indeed, any successful scientific model of a biological circuit should succeed along three fundamental axes, each of which goes above and beyond the simple metric of mimicking the circuit’s input-output map. First, the intermediate computational mechanisms used by the hidden layers to generate responses should ideally match the intermediate computations in the brain. Second, we should be able to extract conceptual insight into how the neural circuit generates nontrivial responses to interesting stimuli (for example responses to stimuli that cannot be generated by a linear receptive field). And third, such insights should suggest new experimentally testable hypotheses that can drive the next generation of neuroscience experiments.

However, it has been traditionally difficult to systematically extract computational mechanisms, and consequently conceptual insights, from deep CNN models due to their considerable complexity [7, 8]

. Here we provide a method to do so based on the idea of model reduction, whose goal is to systematically extract a simple, reduced, minimal subnetwork that is most important in generating a complex CNN’s response to any given stimulus. Such subnetworks then both summarize computational mechanisms and yield conceptual insights. We build on ideas from interpretable machine learning, notably methods of input attribution that can decompose a neural response into a sum of contributions either from individual pixels

[9] or hidden neurons [10]. To achieve considerable model reduction for responses to spatiotemporal stimuli, we augment and combine such input attribution methods with dimensionality reduction, which, for carefully designed artificial stimuli employed in neurophysiology experiments, often involves simple spatiotemporal averages over stimulus space.


Figure 1: Deep learning models of the retina trained only on natural scenes reproduce an array of retinal phenomena with artificial stimuli (reproduced from ref. [2]).
(A) Training procedure: We analyzed a three-layer convolutional neural network (CNN) model of the retina which takes as input a spatiotemporal natural scene movie and outputs a nonnegative firing rate, corresponding to a retinal ganglion cell response. The first layer consists of eight spatiotemporal convolutional filters (i.e., cell types) with the size of , the second layer of eight convolutional filters , and the fully connected layer predicting the ganglion cells’ response. As previously reported in [2], the deep learning model reproduces (B) an omitted stimulus response, (C) latency coding, (D) the motion reversal response, and (E) motion anticipation.

We demonstrate the power of our systematic model reduction procedure to attain mechanistic insights into deep CNNs by applying them to state of the art deep CNN models of the retina [1, 2]. The retina constitutes an ideal first application of our methods because the considerable knowledge (see e.g. [11]) about retinal mechanisms for transducing spatiotemporal light patterns into neural responses enables us to assess whether deep CNNs successfully learn the same computational structure. In particular, we obtain deep CNN models from [2]

which were trained specifically to mimic the input-output transformation from natural movies to retinal ganglion cell outputs measured in the salamander retina. The model architecture involved a three-layer CNN model of the retina with ReLU nonlinearities (Fig.

1A). This network was previously shown [1, 2] to: (i) yield state of the art models of the retina’s response to natural scenes that are almost as accurate as possible given intrinsic retinal stochasticity; (ii) possess internal subunits with similar response properties to those of retinal interneurons, such as bipolar and amacrine cell types; (iii) generalize from natural movies, to a wide range of eight different classes of artificially structured stimuli used over decades of neurophysiology experiments to probe retinal response properties. This latter generalization capacity from natural movies to artificially structured stimuli (that were never present in the training data) is intriguing given the vastly different spatiotemporal statistics of the artificial stimuli versus natural stimuli, suggesting the artificial stimuli were indeed well chosen to engage the same retinal mechanisms engaged under natural vision [2].

Here, we focus on understanding the computational mechanisms underlying the deep CNN’s ability to reproduce the neural responses to four classes of artificial stimuli (Fig. 1B-E), each of which, through painstaking experiments and theory, have revealed striking nonlinear retinal computations that advanced our scientific understanding of the retina. The first is the omitted stimulus response (OSR) [12, 13] (Fig. 1B), in which a periodic sequence of full field flashes entrains a retinal ganglion cell to respond periodically, but when a single flash is omitted, the ganglion cell produces an even larger response at the expected time of the response to the omitted flash. Moreover, the timing of this omitted stimulus response occurs at the expected time over a range of frequencies of the periodic flash train, suggesting the retina is somehow retaining a memory trace of the period of the train of flashes. The second is latency encoding [14], in which stronger stimuli yield earlier responses (Fig. 1C). The third is motion reversal [15], in which a bar suddenly reversing its motion near a ganglion cell receptive field generates a much larger response after the motion reversal (Fig. 1D). The fourth is motion anticipation [16], where the neural population responding to a moving bar is advanced in the direction of motion to compensate for propagation delays through the retina (Fig. 1E). These responses are striking because they imply the retina has implicitly built into it a predictive world model codifying simple principles like temporal periodicity, and the velocity based extrapolation of future position. The retina can then use these predictions to improve visual processing (e.g. in motion anticipation), or when these predictions are violated, the retina can generate a large response to signal that deviation (e.g. in the OSR and motion reversal).

While experimentally motivated prior theoretical models have been employed to explain the OSR [17, 18], latency encoding [14, 19], motion reversal [20, 21], and motion anticipation [16], to date, no single model other than the deep CNN found in [2] has been able to simultaneously account for retinal ganglion cell responses to both natural scenes and all four of these classes of stimuli, as well as several other classes of artificial stimuli. However, it is difficult to explain the computational mechanisms underlying the deep CNN’s ability to generate these responses simply by examining the complex network in Fig. 1A. For example, why does the deep CNN fire more when a stimulus is omitted, or when a bar reverses? How can it anticipate motion to compensate for propagation delays? And why do stronger responses cause earlier firing?

These are foundational scientific questions about the retina whose answers require conceptual insights that are not afforded by the existence of a complex but highly predictive CNN alone. And even more importantly, if we could extract conceptual insights into the computational mechanisms underlying CNN responses, would these mechanisms match those used in the biological retina? Or is the deep CNN only accurate at the level of modelling the input-output map of the retina, while being fundamentally inaccurate at the level of underlying mechanisms? Adjudicating between these two possibilities is essential for validating whether the deep learning approach to modelling in sensory neuroscience can indeed succeed in elucidating biological neural mechanisms, which has traditionally been the gold-standard of circuit based understanding in systems neuroscience [11, 22, 23, 24].

In the following we will show how a combination of dimensionality reduction and hidden neuron or stimulus attribution can yield simplified subnetwork models of the deep CNNs response to stimuli, finding models that are consistent with prior mechanistic models with experimental support in the case of latency encoding, motion reversal, and motion anticipation. In addition, our analysis yields a new model that cures the inadequacies of previous models of the OSR. Thus our overall approach provides a new roadmap to extract mechanistic insights into deep CNN function, confirms in the case of the retina that deep CNNs do indeed learn computational mechanisms that are similar to those used in biological circuits, and yields a new experimentally testable hypothesis about retinal computation. Moreover, our results in the retina yield hope (to be tested in future combined theory and experiments) that more complex deep CNN models of higher visual cortical regions, may not only yield accurate black box models of input-output transformations, but may also yield veridical and testable hypotheses about intermediate computational mechanisms underlying these transformations, thereby potentially placing deep CNN models of sensory brain regions on firmer epistemological foundations.

2 From deep CNNs to neural mechanisms through model reduction

To extract understandable reduced models from the millions of parameters comprising the deep CNN in Fig. 1A and [2], we first reduce dimensionality by exploiting spatial invariance present in the artificial stimuli carefully designed to specifically probe retinal physiology (Fig.1B-E), and then carve out important sub-circuits using modern attribution methods [9, 10]. We proceed in 3 steps:

Step (1): Quantify the importance of a model unit with integrated gradients. The nonlinear input-output map of our deep CNN can be expressed as , where denotes the nonnegative firing rate of a ganglion cell at time bin and denotes the recent spatiotemporal history of the visual stimulus spanning two dimensions of space (with 50 spatial bins in each dimension) as well as 40 preceding time bins parameterized by

. Thus a single component of the vector

is given by , which denotes the stimulus contrast at position at time . We assume a zero contrast stimulus yields no response (i.e. ). We can decompose, or attribute the response to each preceding spacetime point by considering a straight path in spatiotemporal stimulus space from the zero stimulus to given by where the path parameter ranges from to [9]. Using the line integral , we obtain


This equation represents an exact decomposition of the response into attributions from each preceding spacetime stimulus pixel . Intuitively, the magnitude of tells us how important each pixel is in generating the response, and the sign tells us whether or not turning on each pixel from to yields a net positive or negative contribution to . When is linear, this decomposition reduces to a Taylor expansion of about . However, in the nonlinear case, this decomposition has the advantage that it is exact, while the linear Taylor expansion is only approximate. The coefficient vector of this Taylor expansion is often thought of as the linear spacetime receptive field (RF) of the model ganglion cell, a concept that dominates sensory neuroscience. Thus choosing to employ this attribution method enables us to go beyond the dominant but imperfect notion of an RF, in order to understand nonlinear neural responses to arbitrary spatiotemporal stimuli. In supplementary material, we discuss how this theoretical framework of attribution to input space can be temporally extended to answer different questions about how deep networks process spatiotemporal inputs.

However, since our main focus here is model reduction, we consider instead attributing the ganglion cell response back to the first layer of hidden units, to quantify their importance. We denote by the pre-nonlinearity activation of the layer hidden units, where and are the convolutional filters and biases of a unit in channel () at convolutional position (with ). Now computing the line integral over the same stimulus path employed in (1) yields


This represents an exact decomposition of the response into attributions from each subunit at the same time (since all CNN filters beyond the first layer are purely spatial). This attribution further splits into a product of , reflecting the activity of that subunit originating from spatiotemporal filtering of the preceding stimulus history, and an effective stimulus dependent weight from each subunit to the ganglion cell, reflecting how variations in subunit activity as the stimulus is turned on from to yield a net impact on the response . A positive (negative) effective weight indicates that increasing subunit activity along the stimulus path yields a net excitatory (inhibitory) effect on .

Step (2): Exploiting stimulus invariances to reduce dimensionality. The attribution of the response to first layer subunits in (2) still involves attributions. We can, however, leverage the spatial uniformity of artificial stimuli used in neurophysiology experiments to reduce this dimensionality. For example, in the OSR and latency coding, stimuli are spatially uniform, implying is independent of spatial indices . Thus, we can reduce the number of attributions to the number of channels via


For the moving bar in both motion reversal and motion anticipation, is independent of the index and we can reduce the dimensionality from 10,368 down to 288 by


More generally for other stimuli with no obvious spatial invariances, one could still attempt to reduce dimensionality by performing PCA or other dimensionality reduction methods on the space of hidden unit pre-activations or attributions over time. We leave this intriguing direction for future work.

Step (3): Building reduced models from important subunits. Finally, we can construct minimal circuit models by first identifying “important” units defined as those with large magnitude attributions . We then construct our reduced model as a one hidden layer neural network composed of only the important hidden units, with effective connectivity from each hidden unit to the ganglion cell determined by the effective weights in (2), (3), or (4).

3 Results: the computational structure of retinal prediction

We now apply the systematic model reduction steps described in the previous section to each of the retinal stimuli in Fig. 1B-E. We show that in each case the reduced model yields scientific hypotheses to explain the response, often consistent with prior experimental and theoretical work, thereby validating deep CNNs as a method for veridical scientific hypothesis generation in this setting. Moreover, our approach yields integrative conceptual insights into how these diverse computations can all be simultaneously produced by the same set of hidden units.

3.1 Omitted stimulus response

As shown in Fig. 1B, periodic stimulus flashes trigger delayed periodic retinal responses. However, when this periodicity is violated by omitting a flash, the ganglion cell signals the violation with a large burst of firing [25, 26]. This OSR phenomenon is observed across several species including salamander [12, 13]. Interestingly, for periodic flashes in the range of 6-12Hz, the latency between the last flash before the omitted one, and the burst peak in the response, is proportional to the period of the train of flashes [12, 13], indicating the retina retains a short memory trace of this period. Moreover, pharmacological experiments suggest ON bipolar cells are required to produce the OSR [13, 17], which have been shown to correspond to the first layer hidden units in the deep CNN [1, 2].

These phenomena raise two fundamental questions: what computational mechanism causes the large amplitude burst, and how is the timing of the peak sensitive to the period of the flashes? There are two theoretical models in the literature that aim to answer these questions. One proposes that the bipolar cell activity responds to each individual flash with an oscillatory response whose period adapts to the period of the flash train [18]. However, recent direct recordings of bipolar cells suggest that such period adaptation is not present [27]. The other model claims that having dual pathways of ON and OFF bipolar cells are enough to reproduce most of the aspects of the phenomena observed in experiments [17]. However, the model only reproduces the shift of the onset of the burst, and not a shift in the peak of the burst, which has the critical predictive latency [18].

Direct model reduction (Fig. 2) of the deep CNN in Fig. 1A using the methods of section 2 yields a more sophisticated model than any prior model, comprised of three important pathways that combine one OFF temporal filter with two ON temporal filters. Unlike prior models, the reduced model exhibits a shift in the peak of the OSR burst as a function of the frequency of input flashes.

Figure 2: Omitted stimulus response. (A-1,2) Schematics of the model reduction procedure by only leaving three (1 OFF, 2 ON) highly contributing units. (B) Attribution for each of the cell types over time. (C) Effective stimulus dependent weight for each of the cell types over time. (D) The combination of the two pathways of filter 2 and 6 reproduces the period dependent latency. (E) Two ON bipolar cells are necessary to capture the predictive latency. Cell 2 with earlier peak is only active in a high-frequency regime, while the cell 6 with later peak is active independent of the frequency.

Fig. 2A presents a schematics of the model reduction steps described in (3). We first attribute a ganglion cell’s response to individual channels and then average across both spatial dimensions (Fig. 2A-1) as in steps (1) and (2). Then we build a reduced model from the identified important subunits that capture essential features of the omitted stimulus response phenomenon (Fig. 2A-2). In Fig. 2B, we present the time dependence of the attribution in (3) for the eight channels, or cell-types. Red (blue) traces reflect positive (negative) attributions. Channel temporal filters are to the left of each attribution row and the total output response is on the top row. The stimulus consists of three flashes, yielding three small responses, and then one large response after the end of the three flashes (grey line). Quantitatively, we identify that cell-type 3 dominantly explains the small responses to preceding flashes, while cell types 2 and 6 are necessary to explain the large burst after the flash train ends. The final set of units included in the reduced model should be the minimal set required to capture the defining features of the phenomena of interest. In the case of omitted stimulus response, the defining feature is the existence of the large amplitude burst whose peak location is sensitive to the period of the applied flashes. Once we identify the set of essential temporal filters, we then proceed to determine the sign and magnitude of contribution (excitatory or inhibitory) of the cell types. In Fig. 2C, we present the time-dependent effective weights from in (3) for the eight cell types, or channels. Red (blue) reflects positive (negative) weights. Given the product of the temporal filters and the weights, cell-types 2 and 6 are effectively ON cells, which cause positive ganglion cell responses to contrast increments, while cell-type 3 is an OFF cell, which is a cell type that causes positive responses to contrast decrements. Following the prescribed procedures, carving out the important cell-types and effective weights yields a novel, mechanistic three pathway model of the OSR, with 1 OFF and 2 ON pathways. Unlike prior models, the reduced model exhibits a shift in the peak of the OSR burst as a function of the frequency of input flashes (with dark to light blue indicating high to low frequency variation in the flash train) as in Fig. 2D. Furthermore, the reduced model is consistent across the frequency range that produces the phenomena. Finally, model reduction yields conceptual insights into how cell-types and enable the timing of the burst peak to remember the period of the flash train (Fig. 2E). The top row depicts the decomposition of the overall burst response (grey) into time dependent attributions (red) and (blue), obeying the relation . Cell-type 2, which has an earlier peak in its temporal filter, preferentially causes ganglion cell responses in high-frequency flash trains (left) compared to low frequency trains (right), while cell-type 6 is equally important in both. The middle row shows the temporal filter , which has an earlier peak with a long tail, enabling it to integrate across not only the last flash, but also preceding flashes (yellow bars). Time increases into the past from left to right. Thus, the activation of this cell type 2 decreases as the flash train frequency decreases, explaining the decrease in attribution in the top row. The bottom row shows that the temporal filter of cell type 6, in contrast, has a later peak with a rapidly decaying tail. Thus the temporal convolution of this filter with the flash train is sensitive only to the last flash, and is therefore independent of flash train frequency. The late peak and rapid tail explain why it supports the response at late times independent of frequency in the top row.

Thus, our systematic model reduction approach yields a new model of the OSR that cures important inadequacies of prior models. Moreover, it yields a new, experimentally testable scientific hypothesis that the OSR is an emergent property of three bipolar cell pathways with specific and diverse temporal filtering properties.

3.2 Latency coding

Figure 3: Latency coding (A) The decomposition of the overall response (grey) into dominant attributions (blue) from an OFF pathway, and (red) from an ON pathway, obeying the relation . Under a contrast decrement, the OFF pathway activated first, followed by delayed inhibitory input from the ON pathway. (B) As the amount of contrast decrement increases (yellow bars), delayed inhibition from the ON pathway (red) strengthens, which cuts off the total response in at late times more strongly, thereby shifting the location of the peak of to earlier times.

Rapid changes in contrast (which often occur for example right after saccades) elicit a burst of firing in retinal ganglion cells with a latency that is shorter for larger contrast changes [14] (Fig. 1C). Moreover, pharmacological studies demonstrate that both ON and OFF bipolar cells (corresponding to first layer hidden neurons in the deep CNN [1, 2]) are necessary to produce this phenomenon [19].

Model reduction via (3) in section 2 reveals that a single pair of slow ON and fast OFF pathways can explain the shift in the latency Fig. 3. First, under a contrast decrement, there is a strong, fast excitatory contribution from the OFF pathway. Second, as the magnitude of the contrast decrement increases, delayed inhibition from the slow ON pathway becomes stronger. This negative delayed contribution truncates excitation from the OFF pathway at late times, thereby causing a shift in the location of the total peak response to earlier times (Fig. 3). The dual pathway mechanism formed by slow ON and fast OFF bipolar cells is consistent with all existing experimental facts. Moreover, it has been previously proposed as a theory of latency coding [14, 19]. Thus this example illustrates the power of a general natural scene based deep CNN training approach, followed by model reduction, to automatically generate veridical scientific hypotheses that were previously discovered only through specialized experiments and analyses requiring significant effort [14, 19].

3.3 Motion reversal

Figure 4: Motion reversal of a moving bar. (A) Schematics of spatiotemporal model reduction obtained via (4). By averaging over we obtain cell types at different positions yielding units. Attribution values reveal that only cell types and play a dominant role in motion reversal. (B) The properties of cell-type explains the existence of the burst. On the left are time-series of pre-nonlinearity activations of hidden units whose RF center is at spatial position . Time indicates the time of motion reversal. The boxed region indicates the spatial and temporal extent of the retinal burst in response to motion reversal. The offset of the box from time indicates the fixed latency. A fixed linear combination with constant coefficents of this activation cannot explain the existence of the burst due to cancellations along the vertical -axis in the boxed region. However, due to downstream nonlinearities, the effective weight coefficients from subunits to ganglion cell responses rapidly flip in sign (middle), and generating a burst of motion reversal response (right). (C) Schematics of the reduced model keeping only important subunits. (D) Attribution contributions from the two dominant cell types (in pink) and (in blue), where . With only cell-type , the further the reversal location is from a ganglion cell’s RF center, the longer we would expect it to take to generate a reversal response. However, the inhibition coming from cell type 2 increases the further away the reversal occurs, truncating the late response and thus fixing the latency of the motion reversal response.

As shown in Fig. 1D and [15], when a moving bar suddenly reverses its direction of motion, ganglion cells near the reversal location exhibit a sharp burst of firing. While a ganglion cell classically responds as the bar moves through its receptive field (RF) center from left to right before the motion reversal, the sharp burst after the motion reversal does not necessarily coincide with the spatial re-entry of the bar into the center of the RF as it moves back from right to left. Instead, the motion reversal burst response occurs at a fixed temporal latency relative to the time of motion reversal, for a variety of reversal locations within 110 m of the RF center. These observations raise two fundamental questions: why does the burst even occur and why does it occur at a fixed latency?

The classical linear-nonlinear model cannot reproduce the reversal response; it only correctly reproduces the initial peak associated with the initial entry of a bar into the RF center [15]. Thus a nonlinear mechanism is required. Model reduction of the deep CNN obtained via (4) reveals that two input channels arrayed across 1D space can explain this response through a specific nonlinear mechanism (Fig. 4). Moreover, the second important channel revealed by model reduction yields a cross cell-type inhibition that explains the fixed latency (Fig. 4D). Intriguingly, this reduced model is qualitatively consistent with a recently proposed and experimentally motivated model [20] that points out the crucial role of dual pathways of ON and OFF bipolar cells.

Figure 5: Motion anticipation of a moving bar. Contributions from the two dominant cell types. in pink, in blue, in grey, where . Depending on the direction of motion of a bar, activity that lags behind the leading edge gets asymmetrically truncated by the inhibition from the cell type 2 (pink). (A) The bar is moving to the right and the inhibition (pink) is slightly stronger on the left side. (B) the bar is moving to the left and the inhibition (pink) is stronger on the right side.

3.4 Motion anticipation

As shown in Fig. 1E and [16] the retina already starts to compensate for propagation delays by advancing the retinal image of a moving bar along the direction of motion, so that the retinal image does not lag behind the instantaneous location as one might naively expect.

Model reduction of our deep CNN reveals a mechanism for this predictive tracking. First, since ganglion cell RFs have some spatial extent, a moving bar naturally triggers some ganglion cells before entering their RF center, yielding a leading edge of a retinal wave. What is then required for motion anticipation is some additional motion direction sensitive inhibition that cuts off the lagging edge of the wave so its peak activity shifts towards the leading edge. Indeed, model reduction reveals a computational mechanism in which one cell type feeds an excitatory signal to a ganglion cell while the other provides direction sensitive inhibition that truncates the lagging edge. This model is qualitatively consistent with prior theoretical models that employ such direction selective inhibition to anticipate motion [16].

4 Discussion


Figure 6: A unified framework to reveal computational structure in the brain. We outlined an automated procedure to go from large-scale neural recordings to mechanistic insights and scientific hypotheses through deep learning and model reduction. We validate our approach on the retina, demonstrating how only three cell-types with different ON/OFF and fast/slow spatiotemporal filtering properties can nonlinearly interact to simultaneously generate diverse retinal responses.

In summary, in the case of the retina, we have shown that complex CNN models obtained via machine learning can not only mimic sensory responses to rich natural scene stimuli, but also can serve as a powerful and automatic mechanism for generating valid scientific hypotheses about computational mechanisms in the brain, when combined with our proposed model reduction methods (Fig. 6). Applying this approach to the retina yields conceptual insights into how a single model consisting of multiple nonlinear pathways with diverse spatiotemporal filtering properties can explain decades of painstaking physiological studies of the retina. This suggests in some sense an inverse roadmap for experimental design in sensory neuroscience. Rather than carefully designing special artificial stimuli to probe specific sensory neural responses, and generating individual models tailored to each stimulus, one could instead fit a complex neural network model to neural responses to a rich set of ethologically relevant natural stimuli, and then apply model reduction methods to understand how different parts of a single model can simultaneously account for responses to artificial stimuli across many experiments. The interpretable mechanisms extracted from model reduction then constitute specific hypotheses that can be tested in future experiments. Moreover, the complex model itself can be used to design new stimuli, for example by searching for stimuli that yield divergent responses in the complex model, versus a simpler model of the same sensory region. Such stimulus searches could potentially elucidate functional reasons for the existence of model complexity.

In future studies, it will be interesting to conduct a systematic exploration of universality and individuality [28] in the outcome of model reduction procedures applied to deep learning models which recapitulate desired phenomena, but are obtained from different initializations, architectures, and experimental recordings. An intriguing hypothesis is that the reduced models required to explain specific neurobiological phenomena arise as universal computational invariants across the ensemble of deep learning models parameterized by these various design choices, while many other aspects of such deep learning models may individually vary across these choices, reflecting mere accidents of history in initialization, architecture and training.

It would also be extremely interesting to stack this model reduction procedure to obtain multilayer reduced models that extract computational mechanisms and conceptual insights into deeper CNN models of higher cortical regions. The validation of such extracted computational mechanisms would require further experimental probes of higher responses with carefully chosen stimuli, perhaps even stimuli chosen to maximize responses in the deep CNN model itself [29, 30]. Overall the success of this combined deep learning and model reduction approach to scientific inquiry in the retina, which was itself not at all a priori obvious before this work, sets a foundation for future studies to explore this combined approach deeper in the brain.


We thank Daniel Fisher for insightful discussions and support. We thank the Masason foundation (HT), grants from the NEI (R01EY022933, R01EY025087, P30-EY026877) (SAB), and the Simons, James S. McDonnell foundations, and NSF Career 1845166 (SG) for funding.


  • McIntosh et al. [2016] Lane McIntosh, Niru Maheswaranathan, Aran Nayebi, Surya Ganguli, and Stephen Baccus. Deep learning models of the retinal response to natural scenes. Advances in neural information processing systems, pages 1369–1377, 2016.
  • Maheswaranathan et al. [2018] Niru Maheswaranathan, Lane T McIntosh, David B Kastner, Josh Melander, Luke Brezovec, Aran Nayebi, Julia Wang, Surya Ganguli, and Stephen A Baccus. Deep learning models reveal internal structure and diverse computations in the retina under natural scenes. bioRxiv, page 340943, 2018.
  • Yamins et al. [2014] Daniel L. K. Yamins, Ha Hong, Charles F. Cadieu, Ethan A. Solomon, Darren Seibert, and James J. DiCarlo. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences, 111(23):8619–8624, 2014. doi: 10.1073/pnas.1403112111.
  • Khaligh-Razavi and Kriegeskorte [2014] Seyed-Mahdi Khaligh-Razavi and Nikolaus Kriegeskorte. Deep supervised, but not unsupervised, models may explain it cortical representation. PLoS computational biology, 10(11):e1003915, 2014.
  • Güçlü and van Gerven [2015] Umut Güçlü and Marcel AJ van Gerven. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. The Journal of Neuroscience, 35(27):10005–10014, 2015.
  • Cadena et al. [2017] Santiago A Cadena, George H Denfield, Edgar Y Walker, Leon A Gatys, Andreas S Tolias, Matthias Bethge, and Alexander S Ecker. Deep convolutional models improve predictions of macaque v1 responses to natural images. bioRxiv, page 201764, 2017.
  • Barrett et al. [2019] David GT Barrett, Ari S Morcos, and Jakob H Macke. Analyzing biological and artificial neural networks: challenges with opportunities for synergy? Current opinion in neurobiology, 55:55–64, 2019.
  • Glaser et al. [2019] Joshua I Glaser, Ari S Benjamin, Roozbeh Farhoodi, and Konrad P Kording. The roles of supervised machine learning in systems neuroscience. Progress in neurobiology, 2019.
  • Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 3319–3328, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR. URL
  • Dhamdhere et al. [2019] Kedar Dhamdhere, Mukund Sundararajan, and Qiqi Yan. How important is a neuron. In International Conference on Learning Representations, 2019. URL
  • Gollisch and Meister [2010] Tim Gollisch and Markus Meister. Eye smarter than scientists believed: neural computations in circuits of the retina. Neuron, 65(2):150–164, 2010.
  • Schwartz et al. [2007a] Greg Schwartz, Rob Harris, David Shrom, and Michael J Berry II. Detection and prediction of periodic patterns by the retina. Nature neuroscience, 10(5):552, 2007a.
  • Schwartz and Berry 2nd [2008] Greg Schwartz and Michael J Berry 2nd.

    Sophisticated temporal pattern recognition in retinal ganglion cells.

    Journal of neurophysiology, 99(4):1787–1798, 2008.
  • Gollisch and Meister [2008a] Tim Gollisch and Markus Meister. Rapid neural coding in the retina with relative spike latencies. science, 319(5866):1108–1111, 2008a.
  • Schwartz et al. [2007b] Greg Schwartz, Sam Taylor, Clark Fisher, Rob Harris, and Michael J Berry II. Synchronized firing among retinal ganglion cells signals motion reversal. Neuron, 55(6):958–969, 2007b.
  • Berry II et al. [1999] Michael J Berry II, Iman H Brivanlou, Thomas A Jordan, and Markus Meister. Anticipation of moving stimuli by the retina. Nature, 398(6725):334, 1999.
  • Werner et al. [2008] Birgit Werner, Paul B Cook, and Christopher L Passaglia. Complex temporal response patterns with a simple retinal circuit. Journal of neurophysiology, 100(2):1087–1097, 2008.
  • Gao et al. [2009] Juan Gao, Greg Schwartz, Michael J Berry, and Philip Holmes. An oscillatory circuit underlying the detection of disruptions in temporally-periodic patterns. Network: Computation in Neural Systems, 20(2):106–135, 2009.
  • Gollisch and Meister [2008b] Tim Gollisch and Markus Meister. Modeling convergent on and off pathways in the early visual system. Biological cybernetics, 99(4-5):263–278, 2008b.
  • Chen et al. [2014] Eric Y Chen, Janice Chou, Jeongsook Park, Greg Schwartz, and Michael J Berry. The neural circuit mechanisms underlying the retinal response to motion reversal. Journal of Neuroscience, 34(47):15557–15575, 2014.
  • Chen et al. [2013] Eric Y Chen, Olivier Marre, Clark Fisher, Greg Schwartz, Joshua Levy, Rava Azeredo da Silveira, and Michael J Berry. Alert response to motion onset in the retina. Journal of Neuroscience, 33(1):120–132, 2013.
  • Vaney et al. [2012] David I Vaney, Benjamin Sivyer, and W Rowland Taylor. Direction selectivity in the retina: symmetry and asymmetry in structure and function. Nature Reviews Neuroscience, 13(3):194, 2012.
  • Konishi [2003] Masakazu Konishi. Coding of auditory space. Annual review of neuroscience, 26(1):31–55, 2003.
  • Marder and Bucher [2007] Eve Marder and Dirk Bucher. Understanding circuit dynamics using the stomatogastric nervous system of lobsters and crabs. Annu. Rev. Physiol., 69:291–316, 2007.
  • Bullock et al. [1990] Theodore H Bullock, Michael H Hofmann, Frederick K Nahm, John G New, and James C Prechtl. Event-related potentials in the retina and optic tectum of fish. Journal of Neurophysiology, 64(3):903–914, 1990.
  • Bullock et al. [1994] Theodore H Bullock, Sacit Karamürsel, Jerzy Z Achimowicz, Michael C McClune, and Canan Başar-Eroglu. Dynamic properties of human visual evoked and omitted stimulus potentials. Electroencephalography and clinical neurophysiology, 91(1):42–53, 1994.
  • Deshmukh [2015] Nikhil Rajiv Deshmukh. Complex computation in the retina. PhD thesis, Princeton University, 2015.
  • Maheswaranathan et al. [2019] Niru Maheswaranathan, Alex H Williams, Matthew D Golub, Surya Ganguli, and David Sussillo. Universality and individuality in neural dynamics across large populations of recurrent networks. Advances in neural information processing systems, 2019.
  • Bashivan et al. [2019] Pouya Bashivan, Kohitij Kar, and James J DiCarlo. Neural population control via deep image synthesis. Science, 364(6439):eaav9436, 2019.
  • Walker et al. [2018] Edgar Y Walker, Fabian H Sinz, Emmanouil Froudarakis, Paul G Fahey, Taliah Muhammad, Alexander S Ecker, Erick Cobos, Jacob Reimer, Xaq Pitkow, and Andreas S Tolias. Inception in visual cortex: in vivo-silico loops reveal most exciting images. bioRxiv, page 506956, 2018.
  • Aumann and Shapley [1974] Robert J Aumann and Lloyd S Shapley. Values of non-atomic games. Princeton University Press, 1974.

5 Path-integrated gradients for nonlinear dynamical systems

A specific firing burst of a ganglion cell (e.g. omitted stimulus response, motion reversal etc.) is a result of spatiotemporal visual stimuli processed through a non-linearly entangled web of neurons, . Here we aim to identify and isolate the minimal neural circuit at play, in order to obtain theoretical and biophysical insights responsible for an array of retinal phenomena. Towards this goal, we need to attribute the contributions to each of the input pixels or interneurons. For a linear system, the product of the response function and an input stimulus quantifies how much an input pixel changes the output. However, it is not obvious how to introduce such measure local in space for non-linear systems, thus raises questions: how can we, in non-linear systems

, quantify the contributions of each of the (1) spatiotemporal input pixels, (2) internal neurons, and (3) different cell types (filters)? Here we harness attribution methods based on path-integrated gradients, first introduced in game theory

[31], and then recently in machine learning [9]. We then extend the theoretical framework to a movie input with an additional dimension of time.

Retina as a non-linear functional
Mathematically, a mapping from a space of spatiotemporal stimulus to a scalar response of a ganglion cell at time can be represented by a functional,


Here we assumed (i) causality and (ii) finite temporal memory of a ganglion cell with a cutoff at . Thus, the input is a subset of the entire spatiotemporal stimulus in a finite region and parametrized by .

Expansion of a general functional v.s. linear response theory
The Taylor expansion of the above functional around is,


In comparison, the linear response theory takes the form of,


where is the linear response function (e.g. receptive field around zero stimuli).
Two main approximations are made in the linear response theory; (i) no stimulus dependence of , (ii) truncation of the Taylor expansion at the first order. These two assumptions are valid only when holds everywhere. However, this condition hardly holds in visual systems in action, and the above conventional formulation is due to theoretical and experimental limitations rather than a well-justified approximation as in Figure 7A for schematics.

Figure 7: How important is an input pixel or a cell type (filter)? (A) [Linear theory: red] Conventional linear theory, such as classical receptive field, extrapolates linear trend to all over the input stimulus space. Deep learning model trained on natural scenes is able to compute local gradient, “instantaneous receptive field”, given an input of structured stimuli with high efficiency. However, the instantaneous receptive field only explains the local property of the non-linear function that does not necessalily reflect the global structure. [IntegratedGradients in 1D: blue] By integrating the local gradients captured by a deep learning model over input axis, we can obtain a more acculate description of the non-linear function. (B) The path-integrated gradients becomes dependents on the path in higher dimensions (), but straight path is shown to satisfy desired axioms [9]. Schematic illustrates the case of two-dimensions.

While both of the biological retina and the deep learning model gives upon a presentation of a stimulus, the stimulus-dependent instantaneous receptive field


is easily computable in the deep learning model. Below, we harness the quantity to investigate where a ganglion cell is looking at during a burst of neural firings.

Review: Path integrated gradients for an image input
Here we first review a case of an image input , where represents a two dimensional image. For a general path parameterized by ,


Sundararajan et al. [9] proved that a straight path satisfies an array of desirable axioms: Sensitivity, Insensitivity, Linearity, Preservation, Implementation invariance, Completeness, and Symmetry. In that case, the above further simplifies to,


See Figure 7 for schematics.

Space-time attribution by path integrated gradients for a movie input
Here we try to extend the theoretical framework of the above IntegratedGradients to the case of a movie input. In addition to the spatial components , a movie input has an additional dimension of time .

In this case, the attribution score can be computed by taking a path integral parametrized by ,


Thus, attribution score can be defined as


For a linear system, where is independent of , we can simply replace it by a linear response function , and the attribution score reduces to a product of the linear response function and the difference between the current stimulus and the baseline,


However, for a non-linear system, depends on and the attribution score depends on the space of function that we integrate over. Namely, each “path” corresponds to a different attribution method and provides different explanations to different questions. In particular, for a movie input, the history of stimuli forms a path naturally parametrized by time “”.

IntegratedGradients: Effective linear response function for non-linear systems
Here we focus on the straight path,


In this case, the integral of the gradients act as an effective linear response function for a nonlinear system as below,


By computing the attribution over the full temporal region, we can obtain the full attribution . Note that the function now depends both on and due to nonlinearity. Furthermore, we can reformulate the above equation into two forms to answer specific questions raised below.

Figure 8: Straight path integration from the null stimulus We computed IntegratedGradients for a periodic flashes with omitted stimulus by taking a straight path integration of gradients from the null stimulus to an actual stimulus . (A) (Left) Attribution map on response time v.s input time space. Causality limits the attribution in the region of . (Right) Attribution on input stimuli quantifies the total amount of output created by each of the pixels of an input movie. (B) (Left) Attribution map on response time v.s. integration time space. (Right) Attribution on integration time integrated over a single periodic burst triggered by flashes (blue) and omitted stimulus response (red). The peak of attribution for the flash response (blue) is placed earlier than the peak of attribution for the omitted stimulus response (orange). This is consistent with our finding that the responses for periodic flashes are caused by a fast OFF bipolar cell, while the omitted stimulus response is triggered by slow ON bipolar cells. (See main text.)
Figure 9: Straight path integration between two stimuli We computed IntegratedGradients to discriminate between two stimuli continuing periodic flashes , and periodic flashes with omitted stimulus . (A) (Left) Attribution map on response time v.s input time space. Causality limits the attribution in the region of . (Right) Attribution on input stimuli quantifies the total amount of output created by each of the pixels of ano input movie. Strikingly, the attribution concentrates at the location of the first omitted stimulus among the four omitted stimulus. (B) (Left) Attribution map on response time v.s. integration time space. (Right) Attribution on integration time integrated over a single periodic burst triggered by flashes (blue) and omitted stimulus response (orange).

Q1. Which pixel in the spatiotemporal input stimulus “” is responsible for a ganglion cell’s response?


See Figure 8 for an example with OSR.

Q2. Which part of the spatiotemporal kernel is “” responsible for a ganglion cell’s response?


See Figure 9 for an example with OSR.

6 Extension to deeper CNNs without spatial invariances in stimuli

In the main text, we demonstrated how we can reduce CNNs with three layers to multi-pathways linear non-linear models, specifically to validate deep CNNs in the retina where we had neurobiological ground truth. Here, we discuss one possible way to extend our method to deeper CNNs of depth processing natural movies through a dynamic programming (DP) approach that works backwards from layer to layer . First, note a natural movie of limited duration without spatial invariances is still well approximated by a low dimensional trajectory in both pixel space and every hidden layer. Let be the max dimensionality for spatial input patterns for any channel in any layer. Then the basic idea is to attribute the response in layer to the dimensional space of inputs to each channel in layer using integrated gradients. We first find the important channels in layer using methods presented in the main text. Then we recursively iterate via the same method to layer and so on down to the pixel layer. Because of the DP-like nature of our algorithm, the computational complexity (after dimensionality reduction to ) is where is the max number of channels in a layer, and not exponential in . The end result is a set of important channels in each layer, along with, for each important channel linear combinations of neurons that matter for generating the response in layer .