Project to visualize the kernels and the outputs of the individual layers of a CNN built in pytorch.
A neuroscience method to understanding the brain is to find and study the preferred stimuli that highly activate an individual cell or groups of cells. Recent advances in machine learning enable a family of methods to synthesize preferred stimuli that cause a neuron in an artificial or biological brain to fire strongly. Those methods are known as Activation Maximization (AM) or Feature Visualization via Optimization. In this chapter, we (1) review existing AM techniques in the literature; (2) discuss a probabilistic interpretation for AM; and (3) review the applications of AM in debugging and explaining networks.READ FULL TEXT VIEW PDF
Project to visualize the kernels and the outputs of the individual layers of a CNN built in pytorch.
Understanding the human brain has been a long-standing quest in human history. One path to understanding the brain is to study what each neuron111In this chapter, “neuron”, “cell”, “unit”, and “feature” are used interchangeably. codes for , or what information its firing represents. In the classic 1950’s experiment, Hubel and Wiesel studied a cat’s brain by showing the subject different images on a screen while recording the neural firings in the cat’s primary visual cortex (Fig. 1). Among a variety of test images, the researchers found oriented edges to cause high responses in one specific cell . That cell is referred to as an edge detector and such images are called its preferred stimuli. The same technique later enabled scientists to discover fundamental findings of how neurons along the visual pathway detect increasingly complex patterns: from circles, edges to faces and high-level concepts such as one’s grandmother  or specific celebrities like the actress Halle Berry .
Similarly, in machine learning (ML), visually inspecting the preferred stimuli of a unit can shed more light into what the neuron is doing [49, 48]. An intuitive approach is to find such preferred inputs from an existing, large image collection e.g. the training or test set . However, that method may have undesired properties. First, it requires testing each neuron on a large image set. Second, in such a dataset, many informative images that would activate the unit may not exist because the image space is vast and neural behaviors can be complex . Third, it is often ambiguous which visual features in an image are causing the neuron to fire e.g. if a unit is activated by a picture of a bird on a tree branch, it is unclear if the unit “cares about” the bird or the branch (Fig. 13b). Fourth, it is not trivial how to extract a holistic description of what a neuron is for from the typically large set of stimuli preferred by a neuron.
Instead of finding real images from an existing dataset, one can synthesize the visual stimuli from scratch [32, 10, 27, 25, 42, 46, 29]. The synthesis approach offers multiple advantages: (1) given a strong image prior, one may synthesize (i.e. reconstruct) stimuli without the need to access the target model’s training set, which may not be available in practice (see Sec. 5); (2) more control over the types and contents of images to synthesize, which helps shed light on more controlled research experiments.
Activation Maximization Let
be the parameters of a classifier that maps an image(that has color channels, each of which is pixels wide and
pixels high) onto a probability distribution over the output classes. Finding an imagethat maximizes the activation of a neuron indexed in a given layer of the classifier network can be formulated as an optimization problem:
This problem was introduced as activation maximization222Also sometimes referred to as feature visualization [32, 29, 48]. In this chapter, the phrase “visualize a unit” means “synthesize preferred images for a single neuron”. (AM) by Erhan, Bengio and others . Here, returns the activation value of a single unit as in many previous works [28, 27, 29]; however, it can be extended to return any neural response that we wish to study e.g. activating a group of neurons [24, 33, 26]
. The remarkable DeepDream visualizations were created by running AM to activate all the units across a given layer simultaneously. In this chapter, we will write instead of when the exact indices can be omitted for generality.
AM is a non-convex optimization problem for which one can attempt to find a local minimum via gradient-based  or non-gradient methods . In post-hoc interpretability , we often assume access to the parameters and architecture of the network being studied. In this case, a simple approach is to perform gradient ascent [48, 10, 27, 31] with an update rule such as:
That is, starting from a random initialization (here, a random image), we iteratively take steps in the input space following the gradient of to find an input that highly activates a given unit. is the step size and is chosen empirically.
Note that this gradient ascent process is similar to the gradient descent process used to train neural networks via backpropagation, except that here we are optimizing the network input instead of the network parameters , which are frozen.333Therefore, hereafter, we will write instead of , omitting , for simplicity. We may stop the optimization when the neural activation has reached a desired threshold or a certain number of steps has passed.
In practice, synthesizing an image from scratch to maximize the activation alone (i.e. an unconstrained optimization problem) often yields uninterpretable images . In a high-dimensional image space, we often find rubbish examples (also known as fooling examples ) e.g. patterns of high-frequency noise that look like nothing but that highly activate a given unit (Fig. 2).
In a related way, if starting AM optimization from a real image (instead of a random one), we may easily encounter adversarial examples  e.g. an image that is slightly different from the starting image (e.g. of a school bus), but that a network would give an entirely different label e.g. “ostrich” .
Those early AM visualizations [44, 28] revealed huge security and reliability concerns with machine learning applications and informed a plethora of follow-up adversarial attack and defense research [1, 16].
Examples like those in Fig. 2b are not human-recognizable. While the fact that the network responds strongly to such images is intriguing and has strong implications for security, if we cannot interpret the images, it limits our ability to understand what the unit’s purpose is. Therefore, we want to constrain the search to be within a distribution of images that we can interpret e.g. photo-realistic images or images that look like those in the training set. That can be accomplished by incorporating natural image priors into the objective function, which was found to substantially improve the recognizability of AM images [48, 21, 29, 27, 32]. For example, an image prior may encourage smoothness  or penalize pixels of extreme intensity . Such constraints are often incorporated into the AM formulation as a regularization term :
For example, to encourage the smoothness in AM images, may compute the total variation (TV) across an image . That is, in each update, we follow the gradients to (1) maximize the neural activation; and (2) minimize the total variation loss:
However, in practice, we do not always compute the analytical gradient . Instead, we may define a regularization operator (e.g. a Gaussian blur kernel), and map to a more regularized (e.g. slightly blurrier as in ) version of itself in every step. In this case, the update step becomes:
Local statistics AM images without priors often appear to have high-frequency patterns and unnatural colors (Fig. 2b). Many regularizers have been designed in the literature to ameliorate these problems including:
While substantially improving the interpretability of images (compared to high-frequency rubbish examples), these methods only effectively attempt to match the local statistics of natural images.
Global structures Many AM images still lack global coherence; for example, an image synthesized to highly activate the “bell pepper” output neuron (Fig. 3b–e) may exhibit multiple bell-pepper segments scattered around the same image rather than a single bell pepper. Such stimuli suggest that the network has learned some local discriminative features e.g. the shiny, green skin of bell peppers, which are useful for the classification task. However, it raises an interesting question: Did the network ever learn the global structures (e.g. the whole pepper) or only the local discriminative parts? The high-frequency patterns as in Fig. 3b–e might also be a consequence of optimization in the image space. That is, when making pixel-wise changes, it is non-trivial to ensure global coherence across the entire image. Instead, it is easy to increase neural activations by simply creating more local discriminative features in the stimulus.
Previous attempts to improve the global coherence include:
Gradually paint the image by scaling it and alternatively following the gradients from multiple output layers of the network .
While these methods somewhat improved the global coherence of images (Fig. 354, 29]. In addition, there is still a large realism gap between the real images and these visualizations (Fig. 3a vs. h).
Diversity A neuron can be multifaceted in that it responds strongly to multiple distinct types of stimuli, i.e. facets . That is, higher-level features are more invariant to changes in the input [49, 19]. For example, a face-detecting unit in CaffeNet  was found to respond to both human and lion faces . Therefore, we wish to uncover different facets via AM in order to have a fuller understanding of a unit.
However, AM optimization starting from different random images often converge to similar results [10, 29]—a phenomenon also observed when training neural networks with different initializations . Researchers have proposed different techniques to improve image diversity such as:
Drop out certain neural paths in the network when performing backpropagation to produce different facets .
Cluster the training set images into groups, and initialize from an average image computed from each group’s images .
Add noise to the image in every update to increase image diversity .
While obtaining limited success, these methods also introduce extra hyperparameters and require further investigation. For example, if we enforce two stimuli to be different, exactly how far should they be and in which similarity metric should the difference be measured?
Much previous AM research were optimizing the preferred stimuli directly in the high-dimensional image space where pixel-wise changes are often slow and uncorrelated, yielding high-frequency visualizations (Fig. 3b–e).
Instead, Nguyen et al.  propose to optimize in the low-dimensional latent space of a deep generator network, which they call Deep Generator Network Activation Maximization (DGN-AM).
They train an image generator network to take in a highly compressed code
and output a synthetic image that looks as close to real images from the ImageNet dataset  as possible.
To produce an AM image for a given neuron, the authors optimize in the input latent space of the generator so that it outputs an image that activates the unit of interest (Fig. 4).
Intuitively, DGN-AM restricts the search to only the set of images that can be drawn by the prior and encourages the image updates to be more coherent and correlated compared to pixel-wise changes (where each pixel is modified independently).
Generator networks We denote the sub-network of CaffeNet  that maps images onto 4096-D features as an encoder . We train a generator network to invert i.e. . In addition to the reconstruction losses, the generator was trained using the Generative Adversarial Network (GAN) loss  to improve the image realism. More training details are in [27, 9]. Intuitively, can be viewed as an artificial general
“painter” that is capable of painting a variety of different types of images, given an arbitrary input description (i.e. a latent code or a condition vector). The idea is thatwould be able to faithfully portray what a target network has learned, which may be recognizable or unrecognizable patterns to humans.
Optimizing in the latent space Intuitively, we search in the input code space of the generator to find a code such that the image maximizes the neural activation (see Fig. 4). The AM problem in Eq. 3 now becomes:
That is, we take steps in the latent space following the below update rule:
Note that, here, the regularization term is on the latent code instead of the image . Nguyen et al.  implemented a small amount of regularization and also clipped the code. These hand-designed regularizers can be replaced by a strong, learned prior for the code .
Optimizing in the latent space of a deep generator network showed a great improvement in image quality compared to previous methods that optimize in the pixel space (Fig. 5; and Fig. 3b–h vs. Fig. 3i). However, images synthesized by DGN-AM have limited diversity—they are qualitatively similar to the real top-9 validation images that highest activate a given unit (Fig. 6).
To improve the image diversity, Nguyen et al.  harnessed a learned realism prior for
via a denoising autoencoder (DAE), and added a small amount of Gaussian noise in every update step to improve image diversity. In addition to an improvement in image diversity, this AM procedure also has a theoretical probabilistic justification, which is discussed in Section 4.
In this section, we first make a note about the AM objective, and discuss a probabilistically interpretable formulation for AM, which is first proposed in Plug and Play Generative Networks (PPGNs) , and then interpret other AM methods under this framework. Intuitively, the AM process can be viewed as sampling from a generative model, which is composed of (1) an image prior and (2) a recognition network that we want to visualize.
We start with a discussion on AM objectives. In the original AM formulation (Eq. 1), we only explicitly maximize the activation of a unit indexed in layer ; however, in practice, this objective may surprisingly also increase the activations of some other units in the same layer and even higher than . For example, maximizing the output activation for the “hartebeest” class is likely to yield an image that also strongly activates the “impala” unit because these two animals are visually similar . As the result, there is no guarantee that the target unit will be the highest activated across a layer. In that case, the resultant visualization may not portray what is unique about the target unit .
Instead, we are interested in selective stimuli that highly activate only , but not . That is, we wish to maximize such that it is the highest single activation across the same layer . To enforce that selectivity, we can either maximize the softmax or log of softmax of the raw activations across a layer [42, 26] where the softmax transformation for unit across layer is given as . Such selective stimuli (1) are more interpretable and preferred in neuroscience  because they contain only visual features exclusively for one unit of interest but not others; (2) naturally fit in our probabilistic interpretation discussed below.
Let us assume a joint probability distribution where denotes images, and
is a categorical variable for a given neuron indexedin layer . This model can be decomposed into an image density model and an image classifier model:
p(x, y) = p(x) p(y | x)
Note that, when is the output layer of an ImageNet 1000-way classifier , also represents the image category (e.g. “volcano”), and is the classification probability distribution (often modeled via softmax).
We can construct a Metropolis-adjusted Langevin  (MALA) sampler for our model . This variant of MALA  does not have the accept/reject step, and uses the following transition operator:444We abuse notation slightly in the interest of space and denote as a sample from that distribution. The first step size is given as in anticipation of later splitting into separate and terms.
x_t+1 = x_t + ϵ_12∇logp(x_t, y) + N(0, ϵ_3^2) mala-approx
Since is a categorical variable, and chosen to be a fixed neuron outside the sampler, the above update rule can be re-written as:
x_t+1 = x_t +ϵ_12∇logp(y =y_c | x_t) +ϵ_12∇logp(x_t ) +N(0, ϵ_3^2)
Decoupling into explicit and multipliers, and expanding the into explicit partial derivatives, we arrive at the following update rule:
x_t+1 = x_t + ϵ_1∂logp(y = yc| xt)∂xt + ϵ_2∂logp(xt)∂xt + N(0, ϵ_3^2) update_rule
An intuitive interpretation of the roles of these three terms is illustrated in Fig. 7 and described as follows:
term: take a step toward an image that causes the neuron to be the highest activated across a layer (Fig. 7; red arrow)
term: take a step toward a generic, realistic-looking image (Fig. 7; blue arrow).
term: add a small amount of noise to jump around the search space to encourage image diversity (Fig. 7; green arrow).
Maximizing raw activations vs. softmax Note that the term in Eq. LABEL:eqn:update_rule is not the same as the gradient of raw activation term in Eq. 2. We summarize in Table LABEL:tab:logit_vs_softmax three variants of computing this
gradient term: (1) derivative of logits; (2) derivative of softmax; and (3) derivative of log of softmax. Several previous works empirically reported that maximizing raw, pre-softmax activationsproduces better visualizations than directly maximizing the softmax values (logit_vs_softmaxa vs. b); however, this observation had not been fully justified . Nguyen et al.  found the log of softmax gradient term (1) working well empirically; and (2) theoretically justifiable under the probabilistic framework in Section 4.2.
|a. Derivative of raw activations. Worked well in practice [27, 10] but may produce non-selective stimuli and is not quite the right term under the probabilistic framework in Sec. 4.2.||
|b. Derivative of softmax. Previously avoided due to poor performance [42, 48], but poor performance may have been due to ill-conditioned optimization rather than the inclusion of logits from other classes.||
|c. Derivative of log of softmax. Correct term under the sampler framework in Sec. 4.2. Well-behaved under optimization, perhaps due to the term untouched by the multiplier.||
We refer readers to  for a more complete derivation and discussion of the above MALA sampler. Using the update rule in Eq. LABEL:eqn:update_rule, we will next interpret other AM algorithms in the literature.
Here, we consider four representative approaches in light of the probabilistic framework:
Activation maximization with no priors. From update_rule, if we set , we obtain a sampler that follows the neuron gradient directly without contributions from a term or the addition of noise. In a high-dimensional space, this results in adversarial or rubbish images [44, 28] (as discussed in Sec. 2). We can also interpret the optimization procedure in [44, 28] as a sampler with a non-zero but with a such that i.e. a uniform where all images are equally likely.
Activation maximization with a Gaussian prior. To avoid producing high-frequency images  that are uninterpretable, several works have used decay, which can be thought of as a simple zero-mean Gaussian prior over images [42, 48, 46]. From update_rule, if we define a Gaussian centered at the origin (assume the mean image has been subtracted) and set , pulling Gaussian constants into , we obtain the following noiseless update rule:
x_t+1 = (1-λ) x_t + ∂logp(y = yc| xt)∂xt update_rule_gaussian
The first term decays the current image slightly toward the origin, as appropriate under a Gaussian image prior, and the second term pulls the image toward higher probability regions for the chosen neuron. Here, the second term is computed as the derivative of the log of a softmax transformation of all activations across a layer (see Table LABEL:tab:logit_vs_softmax).
Activation maximization with hand-designed priors. In an effort to outdo the simple Gaussian prior, many works have proposed more creative, hand-designed image priors such as Gaussian blur , total variation , jitter, rotate, scale , and data-driven patch priors . These priors effectively serve as a simple component in Eq. LABEL:eqn:update_rule. Note that all previous methods considered under this category are noiseless ().
Activation maximization in the latent space of generator networks To ameliorate the problem of poor mixing in the high-dimensional pixel space , several works instead performed optimization in a semantically meaningful, low-dimensional feature space of a generator network [27, 47, 6, 53, 26].
That approach can be viewed as re-parameterizing as , and sampling from the joint probability distribution instead of , treating as a deterministic variable. That is, the update rule in Eq. LABEL:eqn:update_rule is now changed into the below:
h_t+1 = h_t + ϵ_1∂logp(y = yc| ht)∂ht + ϵ_2∂logp(ht)∂ht + N(0, ϵ_3^2) update_rule_h
In this category, DGN-AM  follows the above rule with (,,) = (1,1,0).555 because noise was not used in DGN-AM . Specifically, we hand-designed a via clipping and regularization (i.e. a Gaussian prior) to keep the code within a “realistic” range. PPGNs follows exactly the update rule in Eq. LABEL:eqn:update_rule_h with a better prior learned via a denoising autoencoder . PPGNs produce images with better diversity than DGN-AM .
In this section, we review how one may use activation maximization to understand and explain a pre-trained neural network.
The results below are specifically generated by DGN-AM  and PPGNs  where the authors harnessed a general image generator network to synthesize AM images.
Visualize output units for new tasks We can harness a general learned ImageNet prior to synthesize images for networks trained on a different dataset e.g. MIT Places dataset  or UCF-101 activity videos  (Figs. 5 & 8).
Visualize hidden units
Instead of synthesizing preferred inputs for output neurons (Fig. 5), one may apply AM to the hidden units.
In a comparison with visualizing real image regions that highly activate a unit , we found AM images may provide similar but sometimes also complementary evidence suggesting what a unit is for  (Fig. 9).
For example, via DGN-AM, we found that a unit that detects “TV screens” also detects people on TV (Fig. 9, unit ).
Synthesize preferred images activating multiple neurons
First, one may synthesize images activating a group of units at the same time to study the interaction between them [27, 32].
For example, it might be useful to study how a network distinguishes two related and visually similar concepts such as “impala” and “hartebeest” animals in ImageNet .
One way to do this is to synthesize images that maximize the “impala” neuron’s activation but also minimize the “hartebeest” neuron’s activation.
Second, one may reveal different facets of a neuron  by activating different pairs of units.
That is, activating two units at the same time e.g. (castle + candle); and (piano + candle) would produce two distinct images of candles that activate the same “candle” unit  (Fig. 10).
In addition, this method sometimes also produces interesting, creative art [27, 12].
Watch feature evolution during training
We can watch how the features evolved during the training of a target classifier network .
Example videos of AM visualizations for sample output and hidden neurons during the training of CaffeNet  are at: https://www.youtube.com/watch?v=q4yIwiYH6FQ and https://www.youtube.com/watch?v=G8AtatM1Sts.
One may find that features at lower layers tend to converge faster vs. those at higher layers.
To gain insights into the inner functions of an activity recognition network , one can synthesize a single frame (Fig. 8; right) or an entire preferred video.
By synthesizing videos, Nguyen et al.  found that a video recognition network (LRCN ) classifies videos without paying attention to temporal correlation across video frames.
That is, the AM videos666https://www.youtube.com/watch?v=IOYnIK6N5Bg appear to be a set of uncorrelated frames of activity e.g. a basketball game.
Further tests confirmed that the network produces similar top-1 predicted labels regardless of whether the frames of the original UCF-101 videos  are randomly shuffled.
Activation maximization as a debugging tool We discuss here a case study where AM can be used as a debugging tool. Suppose there is a bug in your neural network image classifier implementation that internally and unexpectedly converts all input RGB images (Fig. 10(a)) into BRG images (Fig. 10(b)) before feeding them to the neural network. This bug might be hard to notice by only examining accuracy scores or attribution heatmaps . Instead, AM visualizations could reflect the color space of the images that were fed to the neural network and reveal this bug (Fig. 10(c)).
Synthesize preferred images conditioned on a sentence Instead of synthesizing images preferred by output units in an image classifier, we can also synthesize images that cause an image captioning network to output a desired sentence (examples in Fig. 12).
This reverse-engineering process may uncover interesting insights into the system’s behaviors. For example, we discovered an interesting failure of a state-of-the-art image captioner  when it declares birds even when there is no bird in an image (Fig. 13).
Synthesize preferred images conditioned on a semantic segmentation map We can extend AM methods to synthesize images with more fine-grained controls of where objects are placed by matching a semantic map output of a segmentation network (Fig. 14) or a target spatial feature map of a convolutional layer.
Synthesize preferred stimuli for real, biological brains While this survey aims at visualizing artificial networks, it is also possible to harness our AM techniques to study biological brains. Two teams of Neuroscientists [22, 36] have recently been able to reconstruct stimuli for neurons in alive macaques’ brains using either the ImageNet PPGN (as discussed in Sec. 4)  or the DGN-AM (as discussed in Sec. 3) . The synthesized images surprisingly resemble monkeys and human nurses that the subject macaque meets frequently  or show eyes in neurons previously shown to be tuned for detecting faces . Similar AM frameworks have also been interestingly applied to reconstruct stimuli from EEG or MRI signals of human brains [41, 34].
While activation maximization has proven a useful tool for understanding neural networks, there are still open challenges and opportunities such as:
One might wish to harness AM to compare and contrast the features learned by different models. That would require a robust, principled AM approach that produces faithful and interpretable visualizations of the learned features for networks trained on different datasets or of different architectures. This is challenging due to two problems: (1) the image prior may not be general enough and may have a bias toward a target network or one dataset over the others; (2) AM optimization on different network architectures, especially of different depths, often requires different hyper-parameter settings to obtain the best performance.
It is important for the community to propose rigorous approaches for evaluating AM methods. A powerful image prior may incur a higher risk of producing misleading visualizations—it is unclear whether a synthesized visual feature comes from the image prior or the target network being studied or both. Note that we have investigated that and surprisingly found the DGN-AM prior to be able to generate a wide diversity of images including the non-realistic ones (e.g. blurry, cut-up, and BRG images ).
One may also perform AM in the parameter space of a 3D renderer (e.g. modifying the lighting, object geometry or appearances in a 3D scene) that renders a 2D image that strongly activates a unit . AM in a 3D space allows us to synthesize stimuli by varying a controlled factor (e.g. lighting) and thus might offer deeper insights into a model’s inner-workings.
Activation maximization techniques enable us to shine light into the black-box neural networks. As this survey shows, improving activation maximization techniques improves our ability to understand deep neural networks. We are excited for what the future holds regarding improved techniques that make neural networks more interpretable and less opaque so we can better understand how deep neural networks do the amazing things that they do.
Anh Nguyen is supported by Amazon Research Credits, Auburn University, and donations from Adobe Systems Inc and Nvidia.
Alcorn, M.A., Li, Q., Gong, Z., Wang, C., Mai, L., Ku, W.S., Nguyen, A.: Strike (with) a pose: Neural networks are easily fooled by strange poses of familiar objects. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. vol. 1, p. 4. IEEE (2019)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems. pp. 1097–1105 (2012)
Le, Q.V.: Building high-level features using large scale unsupervised learning. In: Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. pp. 8595–8598. IEEE (2013)
Li, Y., Yosinski, J., Clune, J., Lipson, H., Hopcroft, J.: Convergent learning: Do different neural networks learn the same representations? In: Feature Extraction: Modern Questions and Challenges. pp. 196–212 (2015)
Nguyen, A., Yosinski, J., Clune, J.: Understanding innovation engines: Automated creativity and improved stochastic optimization via deep learning. Evolutionary Computation 24(3), 545–572 (2016)