1 Mechanistic Explanations in (Neuro)science
Due to recent progress in AI, deep neural networks (DNNs) can be trained to solve a variety of increasingly interesting tasks, including perceptual tasks that are natural for humans but historically difficult for artificial systems. A line of related work has proposed that DNNs can serve as quantitative cognitive models of behavioral patterns on such tasks, as well as computational neuroscience models of the brain systems underlying these human abilities. But are such neural networks really explanatory models, either of human and animal behavior, or neural activity in the brain? Can we map them to the brain at all? Can they be used not just as tools for quantitative data analysis and curve fitting, but also to provide substantive scientific insights?
Defining what makes a model of a system as complex as the brain “explanatory” has itself been a challenging conceptual problem, whether for neural network models or any other kinds of models. This problem is longstanding, but the recent quantitative successes of neural networks has raised the stakes. If we can articulate a set of criteria to assess the explanatory value of models in neuroscience more generally, we can then apply what we find to a variety of examples – most notably, the case of deep neural networks as models of the primate visual system.
Intuitively, an explanation should illuminate the dependencies between the phenomenon of interest and the factors involved in generating it.111These dependencies and factors can be understood very broadly: everything from physical constituents to mathematical constraints, causal antecedents, or optimality considerations could in principle count as ”factors upon which the phenomena of interest depend”. And of course there is no single goal for modeling in science; diverse goals support a diversity of explanatory uses. But we focus here on what we take to be a dominant mode of explanation in neuroscience, which involves a kind of functional decomposition of a system in order to explain how it produces behavior or capacities of interest. Drawing on previous work in the philosophy of explanation, we start with the claim that to be “explanatory” for a target natural system, a model should:
be a mechanistic description of causal relationships in that target system, and
be intelligible in the sense of being cognitively manipulable
Of course, it is somewhat controversial what it means for a model to describe a mechanism or be intelligible. So in attempting to apply them in the complex context of the brain and modern computational methods, we will need to clarify our intended use of these terms, making clear the underlying theoretical and practical motivations. This first paper starts by dealing with the notion of mechanism; its successor deals with the issue of intelligibility.222See https://arxiv.org/abs/2104.01489.
In recent years, the vision of mechanistic explanation [1, 2, 3, 4] has been prominent in philosophy of science. This, together with the idea of interventionism [5, 6] as a means of gaining access to the causal structure of a system or phenomenon, is often presented as the goal of explanation in biology.
Broadly speaking, to construct a mechanistic explanation, we identify the relevant functional parts involved, articulate how those parts are organized, and how it is that their coordinated activities bring about the outcome or phenomenon of interest. In the context of models in systems neuroscience, Kaplan and Craver (2011) have articulated this mechanistic ideal as a Model-Mechanism-Mapping or “3M” constraint, to distinguish between those models that merely provide accurate descriptions or predictions of phenomena, and those that play genuine explanatory roles.
According to 3M, a model of a target phenomenon explains that phenomenon when “(a) the variables in the model correspond to identifiable components, activities, and organizational features of the target mechanism that produces, maintains, or underlies the phenomenon and (b) the (perhaps mathematical) dependencies posited among these (perhaps mathematical) variables in the model correspond to causal relations among the components of the target mechanism”.
This account of mechanistic model-building captures a central endeavor of experimental systems neuroscience in practice, which attempts to explain capacities of interest by linking physiology to behavior. However, it has been argued that computational models in neuroscience fit uneasily into the framework, if at all [8, 9]. We believe that if enough weight is given to the central role of abstraction in the practice of such modeling in practice, a mapping criterion very much in the spirit of 3M can serve as a unified framework for assessing computational models, including neural network models, as mechanisms for neuroscience.333Our discussion of models in perceptual neuroscience builds on the existing literature about the relationship between scientific abstraction and mechanistic modeling [4, 43, 39].
In particular, we note the intimate relationship between measurement (and the choice of what to measure) in experimental neuroscience, on the one hand, and abstraction (and the choice of what details to ignore) in computational neuroscience, on the other. We also note the importance of being able to compare data across animals, and survey extant strategies for doing so, all of which, again, require some details to be discarded and others to be preserved. In both cases, we make explicit the key role of abstraction that is already implicitly assumed in practice throughout systems neuroscience.
Putting these two pieces together, we propose a version of the 3M requirement that highlights the role that abstraction must play for quantitatively assessing the functional similarity of a model to its explanatory target.
To converge on the right level of abstraction, we propose that mechanistic models should be ”runnable”, in the sense of capturing enough of the causal structure of the phenomenon as to be able to reproduce them. This helps to guarantee that in abstracting away from some details, we have nonetheless retained those features that are causally sufficient to generate the phenomenon of interest.
Then, to capture the necessity of making comparisons across individuals, as well as mapping models to individual targets, we introduce the idea of a similarity transform. Together, these “3M++” criteria (as we call them) are not only well-suited for application in assessing the value of the new generation of neural network models, but provide a unifying perspective from which to evaluate earlier (and now canonical) models from computational neuroscience as well.
Here is a roadmap for the rest of this paper: In Section 2 we discuss shared empirical foundations from neuroscience that underlie core abstractions that make the rest of thinking about neuroscience (and the ideas we talk about here) possible. In Section 3, we discuss the original 3M criteria and make explicit two additional roles for abstraction in “Predictively Adequate Runnable Abstraction” and “Transform Similarity”, both based in the foundational notions from neuroscience surveyed in Section 2. Together, these broadened notions constitute what we call 3M++. In Section 4, we develop a series of examples of 3M++, both positive and negative: models that illustrate either the presence or absence of mechanistic mappings. In Section 5, we return to our original motivating question: the case of DNNs. We conclude that in some cases, DNNs can indeed provide reasonably good abstract mechanistic models, under the 3M++ criteria. Finally, in Section 6, we draw some general lessons about finding the appropriate level(s) of abstraction for neuroscientific modeling.
2 Foundational Abstractions from Neuroscience
A caricature of two approaches to neuroscience might pit an industrious army of ”more-details-better” experimentalists, painstakingly gathering as much data as possible, against the mounted horsemen of ”do-more-with-less” computationalists, enchanted by the mathematical appeal of simple models for complex phenomena. But of course, this is a caricature, for (we claim) the two camps in fact share deep common foundations – at least when the phenomenon they are both ultimately interested in explaining is the same.444Both camps would agree that different abstractions are likely needed to account for animal-level behavior , for example, than to account for the biophysics of neurons excitatory cells. The difference in outlook between the two camps might instead be captured as an expression of different kinds of humility: we don’t know what matters, but if the brain exhibits it, it is probably important; we don’t know what matters, but let’s see how far we can get with minimal assumptions and a few variables. And of course, given the tools available on either side, it may be that different aspects of the phenomena look like more promising candidates for the target of explanation.
, for example, than to account for the biophysics of neuronsqua
excitatory cells. The difference in outlook between the two camps might instead be captured as an expression of different kinds of humility: we don’t know what matters, but if the brain exhibits it, it is probably important; we don’t know what matters, but let’s see how far we can get with minimal assumptions and a few variables. And of course, given the tools available on either side, it may be that different aspects of the phenomena look like more promising candidates for the target of explanation.
There are many cell types in the brain, and a plethora of biological activities. Every experimental neuroscientist is faced with choices about what to measure in the brain. What are the variables that are functionally important in producing the behaviors or capacities of interest? In practice, then, experimentalists are forced to make abstractions that capture the details that matter while ignoring those that don’t. It is this very activity that has led to a convergence on the central role of neurons (and neural activity) and their organization (including connectivity and functional localization).
Of course, computationalists are also forced to deal in abstractions, perhaps more obviously so. In order to check their models against data, they need to be able to make clear predictions, in order to make quantitative assessments of how accurate the models are. While small models with relatively few components can be tested against data in a qualitative way, this becomes increasingly difficult as the number of components and interactions goes up. At some point, it becomes necessary to have computational model that can be run on an external system rather than by hand – and that requires precise formal descriptions of the relevant variables and interactions.
But are the experimental and computational abstractions the same? We claim that they often are – and summarize this common ground below, juxtaposing the standard non-mathematical “expermentalist” descriptions with their formal mathematized “computationalist” counterparts. Of course, it should be made clear at the outset that all of these abstractions are in a strong sense provisional. That is, they embody hypotheses about what features of target system are important for explaining the phenomena. If it turns out that they are inadequate, either because they neglect important features, so that we can not re-capture the phenomena of interest, or inaccurate, in a way that leads to poor predictions, then, like all hypotheses, they can and should be revised. The adequacy and accuracy of these hypotheses requires empirical validation – we have to see the abstractions in action in a model that can be held responsible to empirical evidence, by making testable predictions about real brains.
That said, although particular abstractions can and will be revised, together, they define an overall conceptual framework that is stable to individual changes in how we understand the precise role of particular components (or even what those components are). This in turn implies that the general applicability of 3M++ for mapping such models to the brain is retained through changes to the particular abstractions employed.555… up to a point. Of course, if we lose the idea that the brain is a network of cells whose spiking activities matter, we will also have lost most of the content of the framework.
2.1 The Strong Neural Doctrine
The consensus in much of systems neuroscience is that neurons are what matter for behavior and in particular, the patterns of neural action potentials (or spikes) propagating through the brain, either intrinsic or evoked by external stimuli.666At least for behavior on short time-scales of a few hundred milliseconds to seconds.
In this picture, neurons are functional units that respond to the spike rates of units synapsing onto them by aggregating those synaptic inputs (perhaps in a history-dependent way) and producing the appropriate spiking output by applying some relatively simple operation.
The strong neural doctrine, first inferred by Cajal from neuroanatomical observations 
, is thus the idea that the brain is composed of a network of simple units. Now if all we care about are spikes and spike rates with respect to individual neurons, then all we care about are scalar-valued inputs and scalar-valued outputs (as opposed to vector-valued or multi-dimensional ones) to cells. Thepropagation of these spikes, in turn, depends on the idea that these cells are connected in a network. And the mathematical formalization of these ideas above describes the operation of a directed network of such units asynchronously applying their computation repeatedly, operating massively in parallel. This is essentially the concept of parallel distributed processing (PDP) as introduced in the early 1940s by McCulloch and Pitts, and which also first gave artificial neural networks their name.
More practically, what is often recorded and analyzed are spikes or their metabolic proxies, averaged in various ways: over varying time windows, across multiple cells, or over repeated trials. To the extent that a neuroscientist expects to explain their phenomenon of interest by appealing to such data, they are committed to the idea that this is an adequate level of abstraction at which to describe their mechanism of interest, and moreover, that characterizing such spiking activity is sufficient to explain the behavior of interest. And indeed, much of the animal behavior that has been studied in a laboratory setting is well-predicted by these spike-rate averages.777The gap between those lab behaviors and complex behavior in the wild may still be significant.
2.2 The Canonical Neural Unit
The canonical description of a neuron is that of a functional unit that integrates its many synaptic inputs (excitatory or inhibitory, and of varying effective strengths), and then fires according to its biophysical properties whenever those inputs exceed some threshold.
A wide variety of experiments have suggested that, under normal physiological conditions, the neurons in the visual system can be described to some level of accuracy in a very simple form: that of the Linear-Nonlinear (LN) unit [11, 12]. The LN description is merely the formalization of experimental observations that purport to describe neural behavior at an accepted level of abstraction: As a function of their neural (spike-rate) input, the neural (spike-rate) output of a neuron is well-predicted as a linear combination of its inputs (with to-be-determined weights), followed by some simple non-linearity, which in a biological neuron might, for example, manifest as a spiking threshold. (Fig. 1).
Mathematically, this is equivalent to saying that:
where represents the spike-rate output of a given neuron as a function of stimulus input , represents the set of all the neurons that synapse onto , the numbers and are constants, and is some fairly simple nonlinear function such as rectification.
Let us re-emphasize that the LN formulation is a provisional abstraction: while it may be “reasonably” accurate, the LN form is definitely not a completely accurate description of real neurons. Nonetheless, three further points are worth re-emphasizing in the context of this particular example.
First, the LN description is a direct absorption by computational models of what brain scientists who are concerned with functional outputs have actually measured. Since the specific version of neural networks typically used as neuroscience models are based on this LN approximation, to the extent that the LN form is inaccurate, neural network models of the brain based upon them will be as well – but so too will be the many analyses in experimental neuroscience that presume that such a description is adequate for characterizing their data.
Second, if the inaccuracies of the LN approximation are sufficiently small or orthogonal to the capacity of interest, then models that posit a network of connected LN units could still be reasonably accurate for explaining that capacity of interest.
Finally — and perhaps most importantly — if systems neurophysiologists of vision were to robustly validate a better approximation for the functional form of single neurons in the visual system (see e.g. ), that new description could be substituted into all the arguments about using 3M++ to map models (such as DNNs) to the brain as described below, without substantially disturbing the form of the argument.
2.3 Gross Functional Connectivity
Throughout this paper, we use a part of the primate visual system known as the ventral visual pathway as our primary instance of an explanatory target. In addition to being one of the best-characterized complex multi-area brain systems in neuroscience, it is one for which the kind of explanation we argue for is especially apt.
The ventral visual pathway solves a difficult ecological problem: that of rendering the “blooming, buzzing confusion” of incoming visual stimuli into internal states that subserve high-level behavioral goals such as scene understanding, navigation, and action planning. We want to explain how this is achieved, in light of the patterns of neural activity observed in the system that we take to be involved.
In textbooks, the system is often presented as encoding a stimulus in patterns of neural activity. That is, the retina transduces incoming stimulation into patterns of neural activity, which are then transformed as they propagate through successive later brain areas. 888The visual system in humans and non-human primates is highly homologous – well-identified, anatomically distinguishable areas are primary visual cortex (V1), intermediate visual areas V2, V3 and V4, and inferior temporal cortex (IT). (See Fig. 2a.) The cortical pathway beginning with V1 is connected to photoreceptor input in the eye via a series of subcortical pathways in the retina and lateral geniculate nucleus (LGN). We will treat these subcortical circuits as an integral part of the system we seek to explain, but note the retina, LGN, and in fact the first cortical area V1, are not exclusive to the ventral pathway, but also subserve the dorsal pathway and other parts of the visual system.
Early visual areas, such as V1 cortex, respond well to low-level stimulus features including edges and center-surround patterns [15, 16]. Mid-level visual areas such as V2, V3, V4 and posterior IT (pIT) are less well-characterized by simple intuitive ascriptions than higher or lower visual areas closer to the sensorimotor periphery. Nonetheless, neural activity in these intermediate areas appears to be at least in part describable as responses to features of an intermediate level of complexity between simple edges and complex objects, along a pipeline of increasing receptive field size [17, 18, 19, 20, 21, 22, 23, 24, 25].
The brain areas near the end of the ventral pathway can provide useful support for many different visual behaviors. That is, neural response patterns measured from IT cortex can be decoded — by human scientists, and presumably by downstream brain areas themselves — in the service of guiding a range of possible behaviors. Typically, IT is associated with robust object recognition [26, 27]. However, in addition to object category, attributes such as fine-grained within-category identification, object position, size, and pose, and complex lighting and material properties, can be easily decoded from neural activity in IT.999 By “easily decodable” we specifically mean “linearly decodable”. The rationale for this is that proximate downstream neurons are only able to perform close-to-linear transformations on their input, and so any relevant features on which downstream neurons can be expected to condition their own outputs and behavior must be approximately linearly decodable from their inputs. (Not everyone agrees – see for example ). Nonetheless, from the traditional neurophysiological perspective, “easily decodable” means that there is clear correlation that is relatively easy to extract without too much further processing, between features of interest and the response properties of the neurons. Of course, this usage of correlation essentially implies a linear readout model on the raw data — as exemplified by the definition of ordinary least squares regression.
By “easily decodable” we specifically mean “linearly decodable”. The rationale for this is that proximate downstream neurons are only able to perform close-to-linear transformations on their input, and so any relevant features on which downstream neurons can be expected to condition their own outputs and behavior must be approximately linearly decodable from their inputs. (Not everyone agrees – see for example
). Nonetheless, from the traditional neurophysiological perspective, “easily decodable” means that there is clear correlation that is relatively easy to extract without too much further processing, between features of interest and the response properties of the neurons. Of course, this usage of correlation essentially implies a linear readout model on the raw data — as exemplified by the definition of ordinary least squares regression.
While the processing stages in this cascade are simple, it is critical that they are at least somewhat nonlinear: the composition of linear operations is linear, so additional complexity can’t be built up by a sequence of purely linear operations, and (plausibly) there would be no evolutionary point to allocating multiple brain areas for them in the first place. Given the non-linearities, complex transformations then arise from multiple such stages applied in series . Since the original visual input is highly non-linear and tangled along the dimensions of the stimulus relevant for behavior, the untangling process by which the brain parses visual data is likely to be as well.
A successful model of the visual system then, should be able to reproduce its output, as well as accurately predict neural activities in these intermediate areas that are thought to subserve the production of that functionally relevant output.
2.4 Spatial Locality and Retinotopy
Within the hierarchy of the ventral pathway, the activity in each distinct cortical area is itself retinotopically organized, so that the arrangement of neurons on the cortical surface mirrors the arrangement of their receptive fields (i.e. the parts of visual space that they are responsive to). In this way, neurons responsive to nearby regions of space are also located close to each other on the cortical surface. Together, the receptive fields of the neurons in each brain region completely cover the visual field, and as we go up the hierarchy, receptive fields become larger (and thus less specific to a small region of visual space) as well as responsive to more complex visual properties in the environment. We also see that each part of visual space is treated more or less the same by the cells responsible for responding to it (at least within the foveal region) – and this makes sense, given that any part of the scene may be in any cell’s receptive field at a time, given eye and head movements. See Fig. 2b.
In other words, each area is separately organized as a spatially-distributed two-dimensional array, with activity in small-ish local regions within the array reflecting responses to inputs at a corresponding set of closely spatially-clustered locations in the original (two-dimensional) image stimulus. Each cortical area thus contains a spatial map of the whole visual field, composed of an array of these locally responding regions.The spatial resolution of this map decreases with each successive area within the visual pathway, from high resolution in V1 to low resolution in area IT. Moreover, the distribution of response patterns as a function of input are observed to be very similar at each different spatial location within each area’s spatial map. Thus, each cortical area can be thought of as tiling the overall visual field with a set of local response functions [30, 31, 23, 32].101010We are only talking about foveal vision for the purposes of object recognition.
In fact, the single most celebrated set of findings in visual neuroscience characterizes neurons in early cortical area V1 in essentially these terms. A line of experiments begun by Hubel and Wiesel [33, 30] and carried further by many others [12, 16, 34] showed that V1 neurons can at least in part be described as responding optimally to “edges” of varying frequencies and orientations.
Over the course of several decades, it became clear that Hubel and Wiesel’s array of ‘edge-detecting’ V1 neurons could be redescribed in a very mathematically compact way: as performing spatial convolution with the so-called “Gabor wavelet filterbank.” Filterbank convolution is, by definition, the operation of multiplying each small patch of an input with a set of fixed patterns (the “filters”), in a tiled fashion across an input array. This concept is a natural formalization of the spatially local, tiled, linear computation strongly suggested by the neurophysiological data. Models convolving input images with the Gabor filterbank, followed by simple rectification and normalization nonlinearities, have achieved striking success in characterizing V1 neural responses [12, 34, 12, 35]. See Fig. 2c. While this characterization turns out to be quite a bit too simple given what we now know , it is nonetheless a good enough approximation to serve many useful predictive purposes.
2.5 Deep Hierarchical LN Cascades
With all of the preceding abstractions on board, we can put the familiar pieces together to get something much less unfamiliar than it might initially have seemed. We’ve described the idea of a linear-nonlinear cascade, as well as the way that the linear contribution to neural response profiles within a region can be reformulated in mathematical terms as filterbank convolutions. Taken together, the series of linear convolutions, interspersed with nonlinear operations, is a hierarchical convolutional neural network (HCNN). See Fig. 2d.
Though the filterbanks corresponding to early cortical areas were reasonably well-characterized by the Gabor wavelet model, it turned out to be quite difficult to identify a simple formula for filterbanks that would accurately describe activity in higher cortical areas. In fact, because of lack of success in extending the receptive-field-characterization program from early to higher cortical areas in the period up through (roughly) 2010, three key assumptions that were tacitly made in the above discussion began to be questioned.
First, the hierarchical model might have been overly simplified. Perhaps we also needed to take into account the (empirically observed) feedback and recurrent connectivity when describing neural responses in the visual pathway . Second, perhaps the LN idea was too simple. Even if the relationship between inputs and neural responses  could often by characterized by an LN function and even if the relationship between neural responses and behavior could be expressed in the form of a linear decoder , perhaps the functional relationships between neurons internal to the visual pathway were more complicated . Finally, maybe the rate-code assumption was incorrect. Perhaps we had to return to a more detailed spike-timing description to adequately capture responses in higher cortical areas.
At this point, it should be clear that these worries are essentially of the same form as earlier worries about whether we were employing the right abstractions. And just as before, they can be addressed by looking at the quality of the predictions they generate. Success in characterizing the receptive field response properties of the visual system would help to allay these worries. As we will discuss in the next section, it turns out that deep hierarchical convolutional neural network models have been able, in the last 10 years, to produce a substantial progress toward this end.
Before we get there however, we will return to the idea of model-mechanism-mapping (3M) and relate it to roles played in neuroscience by the shared abstractions just described.
3 3M++: A Expanded Notion of Model-Mechanism Mapping
3.1 The Model-Mechanism-Mapping (3M) Constraint
Recall that we are trying to employ the 3M requirement for evaluating explanatory models for neuroscience. In its most straightforward form, it seems to give a clear formula for assessing whether a model describes a mechanism - i.e. whether it is a mechanistic model.
The relationship between mechanism and explanatory power is relatively intuitive, but perhaps worth making explicit. When we identify the parts, activities, and organization of a system, we gain an intuitive understanding of its causal structure – and moreover, a causal structure composed of concrete entities (in our case, biological) performing electromechanical activities familiar from other contexts, and for whose further explanation we may easily defer to well-established areas of science such as chemistry or physics. There is a way in which mechanistic explanations are thus satisfying, in that they fit well into our existing conceptual structures. Happily, they are also the very explanations that most life sciences practitioners explicitly aim at.
That said, the exact definition of what does or does not constitute a mechanism is somewhat contentious, precisely because of debate over the role of abstraction. What exactly counts as an ’entity’? Must it be spatiotemporally localized and contiguous? How concrete must the characterization of its effects or activities be? Sometimes, computational explanations are held out as prime examples of non-mechanistic explanation; other times, they are said to be merely incomplete mechanisms, sketches that must be filled in in order to qualify as full-fledged explanatory models.
To clarify these issues, we propose a a revised version of 3M that gives an explicit place to common kinds of abstractions in neuroscience. As presaged in the previous section, the kinds of abstractions commonly employed, far from being mechanistically disqualifying, are in fact the same ones relied upon by paradigm examples of mechanistic models. That being so, with respect to this revised (3M++) condition, we think that some computational models are in fact prime examples of mechanistic explanation.
3.2 Predictively Adequate Runnable Abstraction
Skeptics have taken it as obvious that no model-to-mechanism mapping can be found for NN models because the components, activities, and organizational features of those models seem very different from those of the brain – ironically, given their name and historical origin. And it’s true that the units in an artificial neural network abstract away from many apparently important features of neurons, such as their action potentials (spikes), and the fact that some are excitatory and others are inhibitory, while the standard gradient-descent-based techniques used to train artificial NNs are commonly considered to be biologically implausible.
But how important are these differences for our modeling purposes? Certainly we don’t expect the same drugs to work on an artificial neural network as on a biological brain. Nor would we expect GPUs to respond the same way as brains do to changes in temperature or oxygen level. And if we zoom in far enough, no two animal brains are identical either. On the other hand, if we are primarily interested in a single clearly delineated capacity (say, immediate responses to visual stimuli under normal conditions), it might be that we can abstract away from these properties because these are not differences that make a difference in the context of that specific capacity, operationalized as performance on a particular task. There are many levels of description at which two systems may be similar, but one naturally illuminating and parsimonious level of abstraction is the least detailed one that still captures the features functionally relevant to our capacity of interest, and is moreover adequate to predict the behavior of the target system which displays that capacity.
What we have stated explicitly here is already implicitly assumed in neuroscientific practice. To suppose that any model system (even another biological one) could be similar to a target brain is to have already accepted that some details can be dropped and some differences of internal mechanism may be safely ignored. Which ones those are will depend on the phenomenon of interest.111111Notice too that some degree of abstraction is already implicit in the notion of mechanism from which the 3M requirement arises. Take the classical example of the mechanism of synaptic transmission. This mechanism has components at many different size scales: ion channels, vesicles, second messenger molecules, and so on. The components of the mechanism are “biological entities” – not described at some maximal level of physical detail, but rather, those remaining after we have abstracted away from all physical details that are not functionally relevant to the biological phenomenon of interest, and where the explanatory target itself is characterized at some level of idealization. (There is no synapse that looks exactly like the diagram in a textbook, and the predictions made by a textbook 3M-satisfying mechanistic model for synaptic transmission are for the most part qualitative rather than mathematically precise.)
We have already emphasized that every experimental neuroscientist is faced with choices about what to measure in the brain. To be undertaking neuroscience in a way that yields scientific insight requires us, like map-makers, to choose a level of abstraction far above that of the finest physical detail. In section 2, we claim that modelers have for the most part made the same choices as experimentalists in regard to these abstractions. But to what extent are these choices well-motivated in the first place? In measuring (or modeling) brains, why shouldn’t we have picked some other more or less detailed level of description than networks of spike-rate units? Or one that focused on different functional units? We claim that there is a reasonable answer to this question that imposes fairly strong constraints on what counts as a sufficient level of abstraction.
The fundamental observation is that neuroscientists are interested in a particular task or behavioral capacity, and the functional contribution of components of the system to the task of interest. For this reason, we want our model to be able to actually perform that task. Thus, the level of abstract description we want must be one that still captures the features functionally relevant to our capacity of interest (and may vary depending on what that capacity is). To this end, we introduce a notion of predictively adequate runnable abstraction (PARA), which requires that the model actually runs successfully on novel instances of the same kind of input that the real system gets, to produce the output behavior of interest. The requirement that the instances be novel rules out models that merely describe existing data, instead of capturing something important about how it is generated.
The capacity of interest itself may be characterized more or less finely – so that an abstraction that is predictively adequate to the capacity, understood coarsely, may no longer be adequate when we become interested in more details of performance (requiring us to re-operationalize the capacity with a finer-grained task). For example, if we are only interested maximizing the average accuracy across many ecologically-valid stimulus types, this places potentially less stringent constraints than accurately predicting the target’s pattern of accuracy and errors for each stimulus individually. In practice, however, performance constraints can be very strong — achieving high performance on ImageNet categorization imposes a high degree of consistency (although not complete consistency) with human error patterns. See for instance Rajalingham et al.
Critically, the requirement of runnability can end up forcing the preservation of many internal details. That is, it can turn out that more details matter for the function of the system than we would have guessed from the armchair. If the model does run successfully, we may be reassured that we have not discarded more detail than we should have; runnability validates something like the “Salmon-completeness” of the model being run. 121212 In Craver and Kaplan’s terminology, “the Salmon-complete constitutive mechanism for P versus P’ is the set of all and only the factors constitutively relevant to P versus P’.” 
Take the capacity of the visual system to categorize images by the objects they portray. The criterion of runnability for the visual system model imposes strong constraints on how coarse the level of description can be. It is very unlikely that a model that collapses all the distinction between multiple units within a cortical area and replaces them with a single “area-level” identity token would be sufficiently detailed to run at all. So the level of description that is typically used in analyses of high-level brain connectivity (e.g. ) may be informative about many scientific questions but is almost certainly insufficient for runnability on standard perceptual tasks. A finer grain of detail – at least preserving the “unit-level” of detail, with the capacity for having many variable units within each cortical area – is needed for a model of visual behavior that runs based on pixel-level input.
In a sense, then, a predictively adequate runnable abstraction is more faithful than the kind of “classical” box-and-arrow mechanistic model often mentioned in the biological sciences, because we have imposed the further requirement that you must be able to run the model forward in order to predict – quantitatively – future states of the target system. This is especially important because it gives us a way to test the adequacy of the abstraction (as well as the causal relevance of the variables invoked), by directly assessing the accuracy of the output predictions.
We claim that such a predictively adequate abstraction is a useful way to elaborate on the 3M mechanistic requirement (call the new requirement 3M+). That is, if we use the notion of adequate abstraction to pick out just the functionally relevant components, activities, and organizational features, then a correspondence between these and components, activities, and features of the target system will satisfy the spirit of 3M.
So far, our extension of 3M is quite modest. Many of the points regarding the ubiquity of abstraction have been explored by others [41, 42, 39, 43]; we have just made explicit the kinds of neuroscientific abstraction that we will allow ourselves in constructing mechanistic NN models. In addition, we emphasize how important it is that the model be runnable – that it be not just a scientific representation of the capacity of interest, but a working instantiation of that capacity.
3.3 Transform Similarity for Biological Populations
With a notion of abstraction in hand, we turn now to the question of how to assess the extent to which a model “corresponds to” or resembles its target. Our starting point is that a baseline standard for resemblance is established by the similarity of one typical animal brain to another of the same species.131313Similarity is a vexed notion. Everything is similar to everything else in some respects and not others. Similarity fades in and out depending on our perspective, how closely we look, and which details we take to be important. See  for a full discussion of the notion of similarity in the context of scientific modeling. But this immediately raises a prior and subtler problem that must be addressed before we can really ask whether an artificial model is similar to the brain. This is the fact that there is no single primate visual system; every monkey and every human is different.
In fact, for most biological systems of interest, every individual of the species is different.141414Even the same individual can look different at different times. Even in fully-developed adult brains, the exact roles of individual neurons can change over time. To give a very simple example, in the hippocampus, the well-known place cells that code an animal’s location in its environment get remapped regularly, and the same cell may not code for the same location before and after remappings.  Some key regularities must remain constant over time – place cells retain their identities as place cells over time, and distribution of locations coded for by the population of place cells remains the same – or else nothing stable could be coded at all. Nonetheless, the variability of the hippocampal population over time is substantial, and such variability is common throughout neuroscience. Thus, throughout the rest of this section, whenever we say “different individual”, this encompasses the possibility of dealing with the same brain at different times. That means that to define a notion of similarity that is suitable for biological systems, we must deal with the fact that the target of explanation is inherently a population of individuals. We must discover some sense in which the targets nonetheless share the ”same” functional physiological organization, allowing us to map the components (and activities and organizational features) of one to those of another, even when they are not identical in fairly obvious ways.
What allows us to do this is the acceptance of some kind of transform that supports the mapping from one animal to another –- and just as importantly for our purposes, the mapping from a neural network model to an animal. It should be systematic, principled and not ad hoc. Such transforms are already used in order to compare data across individuals (e.g. hyperalignment for fMRI data) – we propose it as a means to assess model-target accuracy as well. In order to come up with the right transform, the key question is: what transform class is needed to make accurate predictions about a given brain area of target animal on the basis of facts about the anatomically homologous brain area of source animal ? If and were exact duplicates of each other, we could predict the activity of every neuron in by observing neurons in . But given that there is typically no one-to-one mapping of neurons from to , we must open up the class of inter-animal transforms a bit wider.
In the case of the higher area of the visual system, such as V4 and IT, this transform class is often taken to be the set of linear maps. That is, it is posited that the stimulus-driven activity of any neuron in a given area of individual can be reproduced by a linear combination of the activity of neurons in the analogous area of individual . Mathematically, this means that for all input stimuli , the neuronal responses to by neuron in individual , which we will denote , can be written as:
where ranges over all neurons in the corresponding brain area of . (This is the same as equation 1 except there’s no nonlinearity.) The numbers are constants representing how much contribution each source neuron makes to replicating some target neuron . This is essentially a measure of similarity between the two neurons. If one source neuron made a hundred percent of the contribution to predicting the target neuron, then the mapping would be one-to-one. That multiple source neurons might need to contribute (i.e. for multiple ’s) is just a statement of the fact that there isn’t a one-to-one mapping of the source animal’s neurons onto the target.
But why is this assumption about linear mapping made? The reasons are partially mathematical and partially empirical. To understand the theoretical reason, we need to go back to a basic principle of the hierarchical LN cascade described in section 2.3.
While the processing stages in this cascade are simple, it is critical that they are at least somewhat nonlinear: the composition of linear operations is linear, so additional complexity can’t be built up by a sequence of purely linear operations, and (plausibly) there would be no evolutionary point to allocating multiple brain areas for them in the first place. Given the non-linearities, complex transformations can only arise from multiple such stages applied in series . Since the original visual input is highly non-linear and tangled along the dimensions of the stimulus relevant for behavior, the untangling process by which the brain parses visual data is likely to be as well.
To return to the problem of comparing two animals (assumed hereafter to be conspecifics and typical members of their population), we start with the uncontroversial assumption that they typically have the same number of constituent brain areas (i.e. processing stages) in their visual pathway (the linear-nonlinear cascade), and moreover, that the corresponding brain areas are themselves similar. We can then ask, how do we assess the similarity of two corresponding stages in a Linear-Nonlinear cascade? Anticipating the application of this question to neural network models later, we’ll call these stages layers - with the caveat that these are meant to suggest the layers in a NN, and do not refer to the cortical layers anatomically arranged within a given brain area.
We can make the comparison simpler by noting that the nonlinearities are drawn from a limited number of simple (unparameterized) forms corresponding to discrete functional “cell types” that are plausibly the same for any two animals. Now our question has been reduced to the question of when the linear portion of the transform is the same.
The simplest and strictest notion of similarity across corresponding brain areas in two animals would require that the two sets of neurons can be placed in one-to-one correspondence, i.e. that there is a permutation of the indices of one animal’s neurons such that it becomes equal to the other’s. In early areas along the visual pathway, such as the retina, this strict mapping may largely be correct. But for intermediate and higher cortical areas such as V4 and IT, where neurons will be highly experience-dependent and represent objects and parts of objects, such a one-to-one mapping is almost certainly too strict.
Instead, a natural next-strictest step is to say that sets of neurons in such areas are similar when one can be constructed from the other via an invertible linear transform. When this is possible, we will have used the linear transform as a remapping between the two animals. Or equivalently, we’ve taken one animal to be a “linear model” for the other.
The class of linear transforms is more powerful than just permutations, and thus defines a looser equivalence notion than one-to-one mappability. But the linear transform class is not all-powerful; for example, it is not the case that just any two groups of neurons can be transformed to each other via linear transform. In fact, linear transforms are comparatively weak, because (and this is obvious once you think about it) any two brain areas that are related to each other by a non-linear relationship (as we believe V4 and IT to be) are by definition not linearly transformable to each other.
A convenient consequence of this “weakness” of linear transforms (and thus the relative strictness of the similarity class they define) is that performance on any task decoded by the modeler from linearly equivalent areas will be the same, assuming that a simple linear decoder is used to assess performance. For example, if one uses a standard method such as a support vector machine (SVM) or linear discriminant analysis (LDA) to assess the explicitly accessible information for a given task that is present in a given brain area, then performing a linear transform on the data will not change the results. Thus, if two individuals are said to be equivalent when their neural responses in corresponding areas line up under linear transform, this means that the individual’s performance on any behavioral task (assuming the brain areas involved are actually being used to support that task behaviorally) will be the same. The linear transform class is essentially the largest (and thus least assumption-prone) transform class that has this property.
Aside from these mathematical reasons, the choice is also somewhat empirically-motivated. To be clear, it is not yet known empirically whether response properties in two animals’ brain areas really are equivalent up to linear transform. The reason this is not known is simply that no single experiment has yet amassed enough neural data in a single cleanly-comparable experimental condition to ascertain this fact, though it has not yet been ruled out. But what is known empirically [46, 27] is that activity patterns in corresponding brain areas between animals are substantially more similar to each other up to linear transform than are the activity patterns in two different brain areas within the same animal. That is, for brain areas V4 and IT and source/target animals and :
where means distance in linear span. Similar empirical inequalities have been shown to hold when comparing other pairs of areas in the ventral pathway, e.g. V1 vs V4 or V1 vs IT. In fact, the distances between pairs of areas mirrors the structure of the ventral pathway itself, with .
Linear transform thus does appear to empirically capture a reasonably effective notion of similarity in which the abstract structure that is supposed to be the target of description (“V4” or “IT” in “the brain”) is well defined, up to linear transform. In mathematical terms, we can “quotient out” the inter-animal differences by means of the proper class of inter-animal transforms, creating a “population equivalence class” as the target of explanation.
While linear transforms are a reasonable first pass, we are not claiming that linear transform is the best or only concept of similarity in any deep sense — only that whatever
turns out to be the strictest (e.g. least expressive) empirical mapping class demanded by the population is what should be used to define similarity between different individuals. Linear mappings are just the current simplest transform class that appears to be consistent with the known data, and the pseudo-mathematical argument above is just a heuristic that explains why this might be true.
However, were a very large set of ventral stream neural responses on a common set of images to be collected from a sufficiently numerous population of macaques, it would become empirically feasible to determine the actual inter-animal mapping class. If it turned out that the full strength of linear maps was needed to map one animal to the other, the choices described above would be justified. However, if animals were more similar than linear (e.g. perhaps only orthonormal transforms are required), or animals were less similar than linear (e.g. non-linear mappings are required), the choice of linearity would have to be revisited, and the specific results in the literature that depend on this choice would also need to be revised – but the overall structure of our argument here would remain (much as the argument in §3.2 would remain even if neuroscientists validated a better model for neurons than the LN abstraction).151515In fact, the empirical data from various areas in the visual pathway give clues to ways in which stricter, more ”one-to-one”-like, classes of transforms are probably indicated. As mentioned above, in the retina, at the very beginning of the visual pathway, neuronal circuits appear to be highly stereotyped within species, and it is perhaps possible that, for any retinal neuron in one animal, a single neuron could be found in any other animal of the same species that matches the original neuron’s response pattern very closely. Somewhat further away from the sensory periphery in cortical area V1, a strict one-to-one match between organisms would be much less effective (especially since different individuals can have different sized V1s ). Nonetheless, the strictest transform class between different animals’ V1s must be in some sense narrower than the full space of linear transforms, because there is no guarantee that the stereotypical V1 simple and complex cells that are so robustly observed would be preserved under linear transform (since a linear transform of a Gabor wavelet filterbank needn’t contain any Gabor wavelets). Going further down the ventral pathway, to (e.g.) V4 or IT, the flexibility of the linear transform class is likely to be increasingly important – the further one gets from the sensory periphery, as layers of non-linear synapses pile up, the more slip-room there is likely to be in how different the response patterns are between individuals. However, even in IT cortex far from the sensory periphery, the existence of large fractions of units with particular selectivity patterns (e.g. the neurons in the well-known face, body, and place areas [48, 49, 50]) shows that the correct transform class can’t quite be linear. That is because the fraction of units in a population with a particular selectivity is not a linear invariant: a linear transform can too-loosely up- or down-weight the prevalence of units with specific selectivities, changing the selectivity profile of the population in abiological ways. However, despite all these ways in which the linear transform class might be too loose to properly account for the inter-animal similarities in various visual areas, it is still a much better mapping class than the obvious alternative – namely, the much stricter one-to-one mapping. Hopefully as the field progresses, understandings of the proper transform classes for each visual area better than either one-to-one or full-linear will emerge.
So far in this section, we have been concerned with defining the scientific target phenomenon itself. But now returning to the question of comparing models to the brain, we need to modify the definition of a mechanistic model given by the 3M constraint to accommodate the nature of the target. We propose that whatever class of transforms is used to define similarity between individuals should also be the one used to make mechanistic mappings between mathematical models of the system and any one target individual in the population. Thus if we decide based on empirical grounds that the proper class of transforms between individuals in the population is linear, then the proper notion of a mechanistic mapping between a model and any one individual should also then be linear. In other words, we should be able to write the responses of any monkey neuron in terms of a linear combination of units in the model,
where is the number of units in the model layer. We can think of as a “synthetic neuron” (created from the linear combination of model units) that maps directly (one-to-one) to a monkey neuron.
It might be tempting to object that the modeling of target real neuron as an amalgam of artificial neurons in the model doesn’t feel very “mechanistic”. But remember that this same formula is the very one needed to even determine the sense in which a target neuron is similar to neurons in the same brain area of another animal of the same species. To the extent that activity in brain areas across individuals are predictable phenomena in the first place, the modeling relation expressed by eq. 3 is as mechanistic as it can ever get (unless our aim is to build a different model for each individual).161616A consequence of this idea is that, instead of requiring that a computational model perfectly predict activity in a target system, we should only judge accuracy up to the level at which one population member (one animal) can predict another population member (a conspecific). This defines a clear “noise ceiling” with which to normalize prediction accuracy: even if the literal match between predictions of a model and a given individual is imperfect (i.e. a correlation of less than 1.0), as long as this similarity is as good as the typical individual-individual similarity (which itself might exhibit intra-specific correlations of less than 1.0), the model can be said to be predictively adequate.
Adding this second elaboration to what kinds of abstraction make sense in constructing our model-to-mechanism mapping, we arrive at a set of revised criteria we’ll call ”3M++”. And just as the requirement for runnability discussed in the last section imposes some constraints on the level of detail of internal structure that can allowably be left out of an acceptable model, so too does the requirement that the model-to-target mapping be of the same kind that we use to map individuals to each other impose some sharp constraints on acceptable models.
To illustrate the kind of model that would be ruled out, consider the famous result in theoretical neuroscience called the Universal Approximation Theorem (UAT) . The UAT (roughly) says that any functionality that can be generated by a deep multi-layer neural network can be approximately by a shallow network with a single, potentially very large, hidden layer. However, even if such a single-layer network passed the functional adequacy criterion of §3.2, it would immediately fail the mappability criterion discussed here, since the kind of mapping required to go from a huge shallow network to a multi-layer network with fewer units doing the same task is not the same kind of mapping that would be needed to use data from one monkey to predict activity in another. In fact, recent mathematical analyses suggest that the number of units required by a shallow network to replicate the functionality of a deep network is exponentially greater – e.g. the complexity (in a suitable sense) of the functions computable by ANNs grows linearly with network width (number of units per layer) but exponentially with depth (number of layers). Thus, the violation of the transform similarity constraint would be very large. See  for a discussion of these ideas.
3.4 The 3M++ Criteria
The force of the “transform similarity” expansion of 3M described in §3.3 is to allow for a class of systems with the same types of underlying components (e.g. neurons) to be considered a single explanatory target, even when any two observable instances of the system are different in some substantial regard (e.g. having responses that differ up to a linear transform). The force of the “predictively adequate runnable abstraction” expansion of 3M (§3.2) is to allow for systems with different very-low-level implementations (GPU transistors vs biological cells) to be compared at a meaningful functional level.
This distinction can be visualized as a diagram (Fig. 3) in which the abstraction mappings are the dotted diagonal arrows that attempt to bring implementations with different physical details (blue vs. purple) into registration at a shared level of description (green), while the similarity mappings are the green curved dotted arrows that attempt to bring different abstract descriptions (either of individuals within a biological population, or of a model and a target system) into registration. The abstraction mappings are non-invertible, and differ from implementation type to implementation type: you lose detail when you go from GPUs to activations in the abstract description and these are different types of details than the ones you lose when you go from biological cells to activations.
Transform similarity by contrast is invertible and should be the same whether the sources are models or biological instances: insofar as you are required to discard details in going from data collected from Animal to data collected from Animal , the same exact sort of discardings must be demanded when going from Animal to Animal , or from abstract descriptions of model to abstract descriptions of any animal instance, or vice-versa.
Moreover, the constraint imposed by the mapping similarity requirement interacts with the constraints imposed by the predictive adequacy criterion, which require that the model be articulated and detailed enough that it can be run forward to predict the activity of the target system, on the same kinds of stimuli as those to which we want to explain the target system’s responses. This comes to a way of helping us locate the right level of abstraction to explain the phenomenon of interest. First, in describing what we are interested in explaining, we fix the level of grain of the explanandum in part by loosening the description of that explanandum until it can capture what is shared among members of a population (e.g. ”what can be mapped by such-and-such similarity transform to a shared space”). Then, we want to capture all the details needed to explain that explanandum – but no more. (Just as when building a minimal model, we want to throw away as much detail as possible – but no more). Runnability serves to validate the abstractions chosen, guaranteeing that they are sufficient to reproduce the phenomenon of interest, at the level of generality desired.
To illustrate using the same examples given earlier: the PARA constraint would rule out very coarse models that collapse the need for multiple units within each area as potential mechanistic models for the monkey visual system. Meanwhile, the similarity mapping constraint would rule out a huge shallow network, which would keep the units intact but collapse the organization structure of brain areas. Together, the two impose significant constraints on how abstract or intuitively unlike the brain models can really be – indeed, likely stronger constraints than those articulated in the original 3M criteria for what constitutes a mechanistic model.
4 Some Illustrative Examples of 3M++
We turn now to some cases to illustrate the contrast between the presence and absence of genuinely deep similarity between a model and the phenomena it predicts, to flesh out how the 3M++ criteria work in practice.
Neural Networks in Genomics (Negative example): Recent work in computational genomics has shown that neural networks can be successfully fit to various aspects of genotype-to-phenotype relationships. For example, the state-of-the-art prediction of 3D protein folding structure from the protein’s 1D genetic sequence is currently done by deep neural networks . In this case, there is no claim that the internal layers of the neural network have any kind of part-level mapping to the biological phenomena that actually give rise to the protein (e.g. either the ribosome that mechanistically builds the protein on the basis of the genetic sequence or the energy-landscape considerations that cause proteins to have reliable folding dynamics). These were never intended to be mechanistic models, and the level of coarse-graining at which a “part-level mapping” would be possible would completely quotient out all the details of the network into a single “box”, at which level the model would no longer be runnable. Thus, the 3M++ criterion fails in this case.
Single Neurons (Positive): Returning to neuroscience, it is instructive to first look at the case of a single electrically-isolated neuron.
To illustrate a positive example of the application of our criteria to a well-known model, we’ll describe a disagreement over whether the Hodgkin-Huxley model of neural firing qualifies as a mechanistic model. Hodgkin and Huxley famously proposed their model of neural firing on the basis of the interaction of several ionic currents by fitting exponential functions to observed voltage changes in giant squid axons. Their model matched the data (having been constructed from it), but also had provocative features that suggested as-yet-undiscovered features of the structure of the sodium and potassium-specific ion channels.
One of the main exponents of the mechanistic framework (Craver, 2006) has argued the Hodgkin-Huxley model is not a full mechanistic model, because it appeals to undischarged filler-terms such as ”activation” or ”inactivation” (particles). In response, Levy (2013) points to the role of deliberate abstraction for the purposes of describing a phenomenon at the relevant level, as well as the ubiquity of “aggregative abstraction” in the description of biological phenomena at all scales. So for example, when tracking phenomena in population genetics, or trying to explain cellular physiology, we must often consider not single concrete entities as the relevant ”entities” in our mechanism, but rather more generalized ”things” like ionic fluxes, the ”spread” of an allele, or averages and distributions of smaller distributed entities. We agree with Levy that the abstractions indispensably employed throughout systems neuroscience have this distributed, partially mathematized flavor. Nonetheless, all parties agree that once we agree to interpret the relevant variables as mapping to subunits of the ion channels (at least provisionally, in a way that, again, can be revised if needed), then the resulting model is clearly mechanistic. And in the decades since Hodgkin and Huxley, the details of models of neural firing have been developed substantially, resulting in detailed compartment models that confirm and build upon the initial mechanistic mapping.
One cell type that has been the target of detailed modeling work is the cortical layer 5 pyramidal cell (L5PC). Painstaking low-level electrophysiological measurements have shown how the electrical responses of L5PCs are well-characterized by a biophysical multi-compartment model  that elaborates the original Hodgkin-Huxley model, but expands the list of specific parts in the model, and the accuracy with which they are mapped. For the same reasons that the HH model meets the 3M++ criteria once its components are mapped, so too do these multi-compartment models. In fact, such biophysical models are sufficiently accurate, at least in isolated circumstances, that they used to serve as “ground-truth” descriptions of L5PCs for other modeling purposes, as we’ll describe next.
Single Neurons (Negative): In recent work  David et al. construct a deep neural network in order to produce a fine-grained model of a single neuron
. Specifically, the authors show that it is possible to use deep neural networks with fully connected layers – also known as multi-layer perceptrons (MLPs) – to reproduce the input-output profile of (the multi-compartment model of) L5PCs. The MLP models are then created by fitting the weight parameters to the responses generated by this “ground-truth” biophysical compartment model. The authors find that a seven-layer MLP can learn this I/O relationship effectively, achieving high performance on held-out testing data.
Though it is predictively adequate, the seven-layer MLP does not satisfy the 3M++ criterion because it does not satisfy a model-to-mechanism mapping. This is because the components of the MLP do not map to real-world components. No physical or explanatory significance can be assigned to the weight parameters of the MLP. The seven fully-collected layers in the model do not correspond to seven physical cellular subsystems connected in series within the (ground-truth) L5PC cell biophysical compartment model. Nor do the the components of each layer (the artificial neurons) correspond to physical entities such as molecules or molecular aggregations that are components of the biological cell being modeled. Indeed, the finest coarse-graining at which it would be possible to collapse the 7-layer MLP model and the L5PC compartment model so that there would be a correspondence would be one where we had abstracted away all these non-corresponding details with a very coarse coarse-graining — at which point neither model would be runnable.
Gabor Model of V1 (Positive): The observation that the Gabor filterbank model of V1 (Fig. 2c) is an abstract mechanistic model according to 3M++ largely boils down to restating the results described in section 2.4. It is evidence that the Gabor model is runnable, since there is a well-defined algorithm for computing output responses for any arbitrary input image. Furthermore, to the extent that Gabor wavelets actually do a reasonable job at predicting real V1 neural responses (i.e. well, but not perfectly), the model is somewhat predictively accurate. The Gabor model can be written in the form of a neural network with one hidden layer, and the mapping from the scalar-valued activities of the artificial neurons to the spike-rates of real V1 neurons is direct. In fact, unlike in higher cortical areas such as V4 and IT, where a looser transform class (e.g. linear transforms) is required to identify similarity between neural samples, in V1 models (and V1 data), it is likely that a single neuron-to-neuron mapping is possible – at least, for those V1 neurons that the Gabor adequately predicts in the first place.
5 The Case of Deep Neural Networks
Recall that the original 3M criteria required us to be able to map model components to brain components, model organization to brain organization, and model activities to brain activities. If we understand this as the model satisfying 3M++, so that both it and the target system can be mapped to a common class of runnable abstractions, we think that deep HCNN models of the ventral visual pathway can, in certain circumstances, count as mechanistic models. To see this, we’ll check the requirements needed to establish 3M++ as exemplified in fig. 3.
On the one hand, starting with the model (the blue boxes in fig. 3), the HCNN is constructed to be runnable: it is a well-defined computational program accepting as input any image-like stimulus, and performing the ecologically-relevant task that is a proxy for the capacity of interest. Second, there is (obviously) a coarse-graining from the components of the physically-implemented HCNN running on (say) a GPU to a more generic abstract level of description. In other words, we’ve passed from the blue boxes in Fig. 3 to the bottom row of green boxes (the “abstracted model”) in that figure.
Turning to the target biological system (the purple boxes in fig. 3), the arguments from section 2 can be rephrased as saying that there is a coarse-graining from the components of any instance of a real brain’s ventral visual system to that same abstract level of description as used for the model. Once at that shared level of abstraction, we can use the same transform as that used to register the activity of any two animal brains to map and predict the activity of components in the model to the activity of components in the real brain, in a way that respects the organizational structure of both the HCNN and the ventral pathway. In other words, we’ve constructed the top row of green boxes (the “abstracted target”) in Fig 3, and have posited that the dotted green arrows exist.
But there is one major ingredient that is missing: we have not shown that the abstract dynamics of the two systems (e.g. the target and model dynamics) are actually similar.171717In mathematical terms, does the inner diagram actually commute?. To address this, we arrive finally at the place where the recent results from comparing modern HCNNs to neurophysiology are brought to bear. Indeed, everything up to this point could actually largely have been said about the very earliest multi-layer convolutional networks, such as the neocognitron . At this point, however, we need to ask whether the activities in the layers of the network correspond to neural activities in brain areas throughout the ventral pathway – and that is where modern HCNNs have succeeded to an extent that earlier models did not.
In these results, a deep HCNN is built with about the the right number of layers as there are observed brain areas in the ventral visual pathway. Then the parameters of the HCNN are optimized such that the resulting network is able to solve a challenging behavioral task – typically, 1000-way object categorization in real-world images . Intriguingly, it turns out that the error patterns of the outputs of these goal-optimized networks correspond to a large degree to those measured from data in human and primate behavioral experiments , even though the networks were optimized for overall performance rather than for producing any particular error pattern. Most importantly for our purposes, it also turns out that HCNNs built this way are by a very long margin the best quantitative models of neural responses in every measured cortical area of the primate ventral visual pathway. Specifically:
Inferior Temporal Cortex: Model responses from hidden layers near the top of HCNNs are highly predictive of neural responses in IT cortex, up to linear transform, both in electrophysiological [58, 60], and fMRI data [59, 61]. In published works such as Yamins et al 2014 
, the predictivities are approximately 50% explained variance (e.g. 70% correlation similarity). However, this number does not take into account the noise ceiling implied by inter-animal variability. Very recent work (personal communication) suggests that the inter-animal correlation similarity up to linear transform is about 0.8, and thus that the normalized correlation similarity is about 0.93, i.e. an explained variance of about 87%.
Importantly for our purposes, the HCNN models are also substantially better at predicting neural response variance in IT than ideal-observer semantic models which have perfect access to object category or other attributes . Though the ideal-observer models “solve” the posited behavioral task (e.g. categorization) perfectly, they are not in fact runnable models at all, and so are not constrained to generate the answer from only the real inputs that the system has (e.g pixels). The fact that the HCNNs, which are runnable models, end up being much better predictors of neural responses shows that runnability is an important constraint on the system.
Intermediate Cortical Areas: Intermediate layers of the same HCNNs whose later layers match IT neurons also yield state-of-the-art predictions of neural responses in V4 cortex [58, 61], the dominant cortical input to IT. Again, the mapping between models and brain data uses a linear transform to perform the match. Similarly, recent models with especially good task performance have distinct layers clearly segregating late-intermediate visual area PIT neurons from downstream central IT (CIT) and AIT neurons . These results are important because they show that high-level ecologically-relevant constraints on network function — i.e. the categorization task imposed at the network’s output layer — are strong enough to shape upstream neural responses in a non-trivial way. In other words, HCNN models suggest that the computations performed by the circuits V4 are structured in order that that downstream computations in PIT and, subsequently, AIT, can support robust categorization in tasks that require the ability to deal with high-variation images.
Early Visual Cortex: Results in early visual cortex are equally striking. Extending the correspondence between HCNN layers and ventral stream layers down further, it has been shown that lower HCNN layers match neural responses in early visual cortex areas such as V1 [59, 61]. The filters that emerge from the learning process in early HCNN layers naturally resemble the Gabor wavelets seen qualitatively in V1 , without modelers having to build this structure in explicitly . In fact, recent high-resolution results show that early-intermediate layers of performance-optimized HCNNs are substantially better models of macaque V1 neural responses to natural images than previous state-of-the-art models that were hand-designed to replicate qualitative neuroscience observations such as those described in Fig 2b .
Taking all these results together, what has been shown is that a deep HCNN is a runnable model that can be mapped, at the spike-rate abstraction level, to both the structure and functional activity of the ventral visual pathway, using the same transform classes that are needed to map two animals’ ventral pathways to each other. In other words, the categorical aspects of the 3M++ criteria are satisfied. Moreover, the quantitative accuracy of the mapping can be assessed: as mentioned above, the match of HCNN activities to real neural activities in corresponding brain areas, using the linear transform similarity class, is reasonably quantitatively good (though not perfect).
Before the deep HCNN results described above, there had not yet been direct experimental evidence that the neural responses throughout the visual system could be adequately characterized as a series of LN operations, organized into a feedforward hierarchical cascade, purely defined in terms of spike rates. Because these assumptions had not been experimentally confirmed, it has been suggested that to capture the (spike-rate) responses in downstream areas of the ventral pathway, it might be necessary for modelers to employ either primitives more complex than LN units , or to introduce recurrent connections and long-range inter-area feedbacks , or to model computations at a finer scale of abstraction than spike-rates, such as (e.g.) at the level of individual spikes . The existence of a runnable feedforward LN-cascade model that reasonably predicts spike-rates throughout the ventral pathway gives an existence proof that these more complex hypotheses are largely not yet necessary, at least not for explaining the capacity to perform this particular sort of fast categorization task.
The meaning of imperfection.
It is worth focusing for a moment on the imperfections of the HCNN models, and what implications those imperfections have for our advocacy of the 3M++ criteria. It is clear that HCNNs donot explain 100% of the stimulus-driven variability of ventral pathway neurons, up to inter-animal noise ceiling. This failure is not just a mere trifle. Finding a model that does meet the strict 100%-explained-variance criterion would ultimately be needed to complete the mechanistic program required by 3M++. Moreover, it is possible (and indeed likely) that some of the key structural issues described above (e.g. the lack of feedback and recurrent connections) will need to be resolved to develop models that map to the visual system with 100% statistical accuracy, i.e. statistically indistinguishable from inter-animal maps.
However – and this is a crucial point for understanding our contribution in this paper – such specific imperfections of the model class, and the changes needed to rectify them, will not substantively change our arguments that the 3M++ criteria provide a good framework for describing (and quantifying) what is required of a good mechanistic model in the first place. Even as improved models (e.g. recurrent deep neural networks with long-range feedback connections) are developed, and the quantitative fits refined, the form of all the main ingredients – the abstractions, the transform similarities, and how they fit together – will remain the same.
6 Using 3M++ to triangulate the level of abstraction
To illustrate the significance of picking the right level of abstraction, consider a third, more loosely analogous case of digital computer chips. In such chips, there is a level of description at which it is best to think of the operation of the chip as digital. The circuit gates describing the system at this level are appropriately modelled with the abstract mathematics of Boolean logic, on which higher-level algorithmic abstractions can in turn be built. However, the underlying physical system implementing the chip is actually analog (since all real physics is analog). The voltage equations governing the chip can in theory have much more complex response patterns than allowed by the digital logic gate that the circuit component is used to implement. If one these wanted to describe these further patterns with “digital logic” they’d be very much more complicated than the single simple gate that actually describes the component’s I/O. Why is this? The reason why digital programming works is that during normal usage the chip is clocked at a safe speed, i.e. driven by electrical switching events with a frequency below a pre-determined cutoff. In the safe regime, the complex voltage-driven response patterns of the analog guts of the circuit component can be guaranteed to produce a final I/O relationship that is consistent with the Boolean logic gate: “digital discipline” has been achieved. It is possible to clock the system at a faster frequency than safe for maintaining digital discipline (this is called “over-clocking”) but the correct operation of the chip can no longer be assumed. Faster chips are produced by smaller and more precise components at which higher clock frequencies are possible without violating digital discipline.
We are not suggesting that the situation with neurons is tightly connected to that of electronic circuits in any mechanistic way. However, at a high level the relationship between the analog chip implementation and its digital description is somewhat analogous to the relationship between the biophysics of the neuron and its LN-level spike-rate description. The lower-level descriptions are in both cases more complex, and when one tries to use the high-level description language (Boolean gates for CPUs, LN layers for neural networks) to describe the operations at a too-low level, the complexity of the description blows up as its fidelity to the physical mechanism falls apart. In the case of the 7-layer MLP for the L5PC cell discussed earlier, a deep neural network has been used merely as a nonlinear regression method — definitely not useless as a data analysis tool, but not implementing what we would consider a predictively adequate runnable abstraction for the purposes of the 3M++ criteria. By contrast, in both the case of the CPU chip and the spike-rate-DNN, the coarser-grain descriptions do – unlike the seven-layer biophysical MLP – satisfy the 3M++ requirements when assessed at their proper level of abstraction.
So it is important to emphasize that the results described in the previous sections indicated a good match between DNNs and neurons precisely because they did not descend to the biophysical level. Instead of requiring a mapping to the ion channel and voltage level of description, the “neurons” of the DNNs in that work are mapped only to real neurons’ spiking rates. While this means they make less detailed predictions, the spike-rate-DNNs are operating at a completely self-consistent description level (another way of saying that they have enough detail to actually be “run”), and – at their stipulated level of detail – the spike-rate networks are both component-to-component mappable and predictively accurate.
Much of what we have said here may be reminiscent of Marr’s influential distinction between the implementational level and the algorithmic. We think that Marr’s terminology, while helpful in some cases, can lead us astray in others. This is because it conflates the value of providing abstract descriptions with the value of providing representational descriptions at an algorithmic level, descriptions all examples of which were also both concise and intuitively interpretable.
As we have seen, in real biological systems, such concise, intermediate-level descriptions may not be available – not just because we haven’t found them yet, but because they may not exist, given the nature of the system. Instead, we should understand that implementional facts (what we have been calling ”mechanism”, too, can come at different levels of abstraction. The way that these mechanistic facts produce the function or behavior (or ”computation”, as Marr called his top level) of interest, does not go via simple algorithms involving variables with clear psychological interpretations, or subfunctions whose composition intuitively produces the desired outcome, but, rather as we discuss in the companion paper,181818See https://arxiv.org/abs/2104.01489. are shaped by selection processes. We can probe those selection processes, but we cannot expect anything like Marr’s algorithmic level to show up.
Respecting the 3M++ constraint forces us to build explanatory models whose parameters have clear interpretations as values of causal variables. Because the parameters have some significance not bound to the particular data-set being fit, they can generalize to new contexts. Models that satisfy 3M++ capture real causal patterns and dependencies in the world.
-  Machamer, P., Darden, L. & Craver, C. F. Thinking about mechanisms. Philosophy of science 67, 1–25 (2000).
-  Glennan, S. S. Mechanisms and the nature of causation. Erkenntnis 44, 49–71 (1996).
-  Craver, C. F. Explaining the Brain: Mechanisms and the Mosaic Unity of Neuroscience (Oxford University Press, 2007).
-  Levy, A. & Bechtel, W. Abstraction and the organization of mechanisms. Philosophy of science 80, 241–261 (2013).
-  Pearl, J. Causality: Models, Reasoning, and Inference (Cambridge University Press, USA, 2000).
-  Woodward, J. Explanation in neurobiology: An interventionist perspective (2014). URL http://philsci-archive.pitt.edu/10974/.
-  Kaplan, D. M. & Craver, C. F. The explanatory force of dynamical and mathematical models in neuroscience: A mechanistic perspective. Philosophy of science 78, 601–627 (2011).
-  Ross, L. N. Dynamical models and explanation in neuroscience. Philosophy of Science 82, 32–54 (2015).
-  Chirimuuta, M. Explanation in computational neuroscience: Causal and non-causal. The British Journal for the Philosophy of Science 69, 849–880 (2017).
-  Shepherd, G. M. Foundations of the neuron doctrine (Oxford University Press, 2015).
-  Pillow, J. W. et al. Spatio-temporal correlations and visual signalling in a complete neuronal population. Nature 454, 995–999 (2008).
-  Carandini, M. et al. Do We Know What the Early Visual System Does? Journal of Neuroscience 25, 10577–10597 (2005).
-  Poirazi, P., Brannon, T. & Mel, B. W. Pyramidal neuron as two-layer neural network. Neuron 37, 989–999 (2003).
-  James, W., Burkhardt, F., Bowers, F. & Skrupskelis, I. K. The principles of psychology, vol. 1 (Macmillan London, 1890).
-  Carandini, M. et al. Do we know what the early visual system does? J Neurosci 25, 10577–97 (2005).
-  Movshon, J. A., Thompson, I. D. & Tolhurst, D. J. Spatial summation in the receptive fields of simple cells in the cat’s striate cortex. The Journal of physiology 283, 53–77 (1978).
-  Freeman, J. & Simoncelli, E. Metamers of the ventral stream. Nature Neuroscience 14, 1195–1201 (2011).
-  DiCarlo, J. J. & Cox, D. D. Untangling invariant object recognition. Trends Cogn Sci 11, 333–41 (2007).
-  DiCarlo, J. J., Zoccolan, D. & Rust, N. C. How does the brain solve visual object recognition? Neuron 73, 415–34 (2012).
-  Schmolesky, M. T. et al. Signal timing across the macaque visual system. J Neurophysiol 79, 3272–8 (1998).
-  Lennie, P. & Movshon, J. A. Coding of color and form in the geniculostriate visual pathway (invited review). J Opt Soc Am A Opt Image Sci Vis 22, 2013–33 (2005).
-  Schiller, P. Effect of lesion in visual cortical area v4 on the recognition of transformed objects. Nature 376, 342–344 (1995).
-  Gallant, J., Connor, C., Rakshit, S., Lewis, J. & Van Essen, D. Neural responses to polar, hyperbolic, and cartesian gratings in area v4 of the macaque monkey. Journal of Neurophysiology 76, 2718–2739 (1996).
-  Brincat, S. L. & Connor, C. E. Underlying principles of visual shape selectivity in posterior inferotemporal cortex. Nat Neurosci 7, 880–6 (2004).
-  Yau, J. M., Pasupathy, A., Brincat, S. L. & Connor, C. E. Curvature processing dynamics in macaque area v4. Cerebral Cortex bhs004 (2012).
-  Hung, C. P., Kreiman, G., Poggio, T. & Dicarlo, J. J. Fast readout of object identity from macaque inferior temporal cortex. Science 310, 863–866 (2005).
-  Majaj, N. J., Hong, H., Solomon, E. A. & DiCarlo, J. J. Simple learned weighted sums of inferior temporal neuronal firing rates accurately predict human core object recognition performance. The Journal of Neuroscience 35, 13402–13418 (2015).
-  Rahnev, D. & Denison, R. N. Suboptimality in perceptual decision making. Behavioral and Brain Sciences 41, e223 (2018).
-  Sharpee, T. O., Kouh, M. & Reyholds, J. H. Trade-off between curvature tuning and position invariance in visual area v4. PNAS 110, 11618–11623 (2012).
-  Hubel, D. H. & Wiesel, T. N. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. The Journal of physiology 160, 106–154 (1962).
-  Pasupathy, A. & Connor, C. Population coding of shape in area v4. Nature neuroscience 5, 1332–1338 (2002).
-  DiCarlo, J. J. & Cox, D. D. Untangling invariant object recognition. Trends in Cognitive Sciences 11, 333–341 (2007).
-  Hubel, D. H. & Wiesel, T. N. Receptive fields of single neurones in the cat’s striate cortex. The Journal of physiology 148, 574–591 (1959).
-  Ringach, D. L., Shapley, R. M. & Hawken, M. J. Orientation selectivity in macaque v1: diversity and laminar dependence. Journal of Neuroscience 22, 5639–5651 (2002).
-  Willmore, B., Prenger, R. J., Wu, M. C.-K. & Gallant, J. L. The berkeley wavelet transform: a biologically inspired orthogonal wavelet transform. Neural computation 20, 1537–1564 (2008).
-  Cadena, S. A. et al. Deep convolutional models improve predictions of macaque v1 responses to natural images. PLoS computational biology 15, e1006897 (2019).
-  Gilbert, C. D. & Li, W. Top-down influences on visual processing. Nature Reviews Neuroscience 14, 350 (2013).
-  Rajalingham, R. et al. Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks. Journal of Neuroscience 38, 7255–7269 (2018).
-  Craver, C. F. & Kaplan, D. M. Are More Details Better? On the Norms of Completeness for Mechanistic Explanations. The British Journal for the Philosophy of Science 71, 287–319 (2018). URL https://doi.org/10.1093/bjps/axy015. https://academic.oup.com/bjps/article-pdf/71/1/287/32567818/axy015.pdf.
-  Bullmore, E. & Sporns, O. Complex brain networks: graph theoretical analysis of structural and functional systems. Nature reviews neuroscience 10, 186–198 (2009).
From implausible artificial neurons to idealized cognitive models: Rebooting philosophy of artificial intelligence (2019).URL http://philsci-archive.pitt.edu/16602/. Forthcoming in Philosophy of Science.
-  Boone, W. & Piccinini, G. The cognitive neuroscience revolution. Synthese 193, 1509–1534 (2016). URL https://doi.org/10.1007/s11229-015-0783-4.
-  Piccinini, G. & Craver, C. Integrating psychology and neuroscience: functional analyses as mechanism sketches. Synthese 183, 283–311 (2011). URL https://doi.org/10.1007/s11229-011-9898-4.
-  Weisberg, M. Simulation and similarity: Using models to understand the world (Oxford University Press, 2012).
-  Ocko, S. A., Hardcastle, K., Giocomo, L. M. & Ganguli, S. Emergent elasticity in the neural code for space. Proceedings of the National Academy of Sciences 115, E11798–E11806 (2018).
-  Kriegeskorte, N. et al. Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron 60, 1126–41 (2008).
-  Schwarzkopf, D. S., Song, C. & Rees, G. The surface area of human v1 predicts the subjective experience of object size. Nature neuroscience 14, 28–30 (2011).
-  Downing, P., Jiang, Y., Shuman, M. & Kanwisher, N. A cortical area selective for visual processing of the human body. Science 293, 2470–2473 (2001).
-  Kanwisher, N., McDermott, J. & Chun, M. M. The fusiform face area: a module in human extrastriate cortex specialized for face perception. J Neurosci 17, 4302–11 (1997).
-  Epstein, R. & Kanwisher, N. A cortical representation of the local visual environment. Nature 392, 592–601 (1998).
Approximation by superpositions of a sigmoidal function.Mathematics of control, signals and systems 2, 303–314 (1989).
Poggio, T., Mhaskar, H.,
Rosasco, L., Miranda, B. &
Why and when can deep-but not shallow-networks avoid the curse of dimensionality: a review.International Journal of Automation and Computing 14, 503–519 (2017).
-  Wei, G.-W. Protein structure prediction beyond alphafold. Nature Machine Intelligence 1, 336–337 (2019).
-  Hay, E., Hill, S., Schürmann, F., Markram, H. & Segev, I. Models of neocortical layer 5b pyramidal cells capturing a wide range of dendritic and perisomatic active properties. PLoS computational biology 7, e1002107 (2011).
-  David, B., Idan, S. & Michael, L. Single cortical neurons as deep artificial neural networks. bioRxiv 613141 (2019).
Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position.Biol Cybernetics (1980).
-  Deng, J., Li, K., Do, M., Su, H. & Fei-Fei, L. Construction and analysis of a large scale image ontology. In Vision Sciences Society (2009).
-  Yamins*, D. et al. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences (2014).
-  Khaligh-Razavi, S. M. & Kriegeskorte, N. Deep supervised, but not unsupervised, models may explain it cortical representation. PLOS Comp. Bio. (2014).
-  Cadieu, C. F. et al. Deep neural networks rival the representation of primate it cortex for core visual object recognition. PLoS computational biology 10, e1003963 (2014).
-  Güçlü, U. & van Gerven, M. A. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. The Journal of Neuroscience 35, 10005–10014 (2015).
-  Nayebi, A. et al. Task-driven convolutional recurrent models of the visual system. In Advances in Neural Information Processing Systems, 5290–5301 (2018).
-  Krizhevsky, A., Sutskever, I. & Hinton, G. ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems (2012).
-  Ghosh-Dastidar, S. & Adeli, H. Spiking neural networks. International journal of neural systems 19, 295–308 (2009).