Analytical tools for visualizing and understanding the neurons of a GAN
Generative Adversarial Networks (GANs) have recently achieved impressive results for many real-world applications, and many GAN variants have emerged with improvements in sample quality and training stability. However, visualization and understanding of GANs is largely missing. How does a GAN represent our visual world internally? What causes the artifacts in GAN results? How do architectural choices affect GAN learning? Answering such questions could enable us to develop new insights and better models. In this work, we present an analytic framework to visualize and understand GANs at the unit-, object-, and scene-level. We first identify a group of interpretable units that are closely related to object concepts with a segmentation-based network dissection method. Then, we quantify the causal effect of interpretable units by measuring the ability of interventions to control objects in the output. Finally, we examine the contextual relationship between these units and their surrounding by inserting the discovered object concepts into new images. We show several practical applications enabled by our framework, from comparing internal representations across different layers, models, and datasets, to improving GANs by locating and removing artifact-causing units, to interactively manipulating objects in the scene. We provide open source interpretation tools to help peer researchers and practitioners better understand their GAN models.READ FULL TEXT VIEW PDF
Generative Adversarial Networks (GANs) have achieved impressive results ...
Generative Adversarial Networks (GANs) have become a dominant class of
Despite the success of Generative Adversarial Networks (GANs), mode coll...
This paper explores visual indeterminacy as a description for artwork cr...
Generative adversarial networks (GANs) are one of the most popular metho...
Generative models are becoming increasingly popular in the literature, w...
The visual world we sense, interpret and interact everyday is a complex
Analytical tools for visualizing and understanding the neurons of a GAN
Generative Adversarial Networks (GANs) (Goodfellow et al., 2014) have been able to produce photorealistic images, often indistinguishable from real images. This remarkable ability has powered many real-world applications ranging from visual recognition (Wang et al., 2017), to image manipulation (Isola et al., 2017; Zhu et al., 2017), to video prediction (Mathieu et al., 2016). Since its invention in 2014, many GAN variants have been proposed (Radford et al., 2016; Zhang et al., 2018), often producing more realistic and diverse samples with better training stability.
Despite this tremendous success, many questions remain to be answered. For example, to produce a church image (Figure 1a), what knowledge does a GAN need to learn? Alternatively, when a GAN sometimes produces terribly unrealistic images (Figure 1f), what causes the mistakes? Why does one GAN variant work better than another? What fundamental differences are encoded in their weights?
In this work, we study the internal representations of GANs. To a human observer, a well-trained GAN appears to have learned facts about the objects in the image: for example, a door can appear on a building but not on a tree. We wish to understand how a GAN represents such a structure. Do the objects emerge as pure pixel patterns without any explicit representation of objects such as doors and trees, or does the GAN contain internal variables that correspond to the objects that humans perceive? If the GAN does contain variables for doors and trees, do those variables cause the generation of those objects, or do they merely correlate? How are relationships between objects represented?
We present a general method for visualizing and understanding GANs at different levels of abstraction, from each neuron, to each object, to the contextual relationship between different objects. We first identify a group of interpretable units that are related to object concepts (Figure1b). These units’ featuremaps closely match the semantic segmentation of a particular object class (e.g., trees). Second, we directly intervene within the network to identify sets of units that cause a type of objects to disappear (Figure 1c) or appear (Figure 1d). We quantify the causal effect of these units using a standard causality metric. Finally, we examine the contextual relationship between these causal object units and the background. We study where we can insert the object concepts in new images and how this intervention interacts with other objects in the image (Figure 1d). To our knowledge, our work provides the first systematic analysis for understanding the internal representations of GANs.
Finally, we show several practical applications enabled by this analytic framework, from comparing internal representations across different layers, GAN variants and datasets; to debugging and improving GANs by locating and ablating “artifact” units (Figure 1e); to understanding contextual relationships between objects in scenes; to manipulating images with interactive object-level control.
The quality and diversity of results from GANs (Goodfellow et al., 2014) has continued to improve, from generating simple digits and faces (Goodfellow et al., 2014), to synthesizing natural scene images (Radford et al., 2016; Denton et al., 2015), to generating 1k photorealistic portraits (Karras et al., 2018), to producing one thousand object classes (Miyato et al., 2018; Zhang et al., 2018). In addition to image generation, GANs have also enabled many applications such as visual recognition (Wang et al., 2017; Hoffman et al., 2018), image manipulation (Isola et al., 2017; Zhu et al., 2017), and video generation (Mathieu et al., 2016; Wang et al., 2018). Despite the huge success, little work has been done to visualize what GANs have learned. Prior work (Radford et al., 2016; Zhu et al., 2016)
manipulates latent vectors and observes how the results change accordingly.
Visualizing deep neural networks.
Visualizing deep neural networks.Various methods have been developed to understand the internal representations of networks, such as visualizations for CNNs (Zeiler & Fergus, 2014) and RNNs (Karpathy et al., 2016; Strobelt et al., 2018). We can visualize a CNN by locating and reconstructing salient image features (Simonyan et al., 2014; Mahendran & Vedaldi, 2015) or by mining patches that maximize hidden layers’ activations (Zeiler & Fergus, 2014), or we can synthesize input images to invert a feature layer (Dosovitskiy & Brox, 2016). Alternately, we can identify the semantics of each unit (Zhou et al., 2015; Bau et al., 2017; Zhou et al., 2018a) by measuring agreement between unit activations and object segmentation masks. Visualization of an RNN has also revealed interpretable units that track long-range dependencies (Karpathy et al., 2016). Most previous work on network visualization has focused on networks trained for classification; our work explores deep generative models trained for image generation.
Explaining the decisions of deep neural networks. We can explain individual network decisions using informative heatmaps (Zhou et al., 2018b, 2016; Selvaraju et al., 2017) or modified back-propagation (Simonyan et al., 2014; Bach et al., 2015; Sundararajan et al., 2017). The heatmaps highlight which regions contribute most to the categorical prediction given by the networks. Recent work has also studied the contribution of feature vectors (Kim et al., 2017; Zhou et al., 2018b) or individual channels (Olah et al., 2018) to the final prediction. Morcos et al. (2018)
has examined the effect of individual units by ablating them. Those methods explain discriminative classifiers. Our method aims to explain how an image can be generated by a network, which is much less explored.
Our goal is to analyze how objects such as trees are encoded by the internal representations of a GAN generator . Here denotes a latent vector sampled from a low-dimensional distribution, and denotes an generated image. We use representation
to describe the tensoroutput from a particular layer of the generator , where the generator creates an image from random through a composition of layers: and .
Since has all the data necessary to produce the image , certainly contains the information to deduce the presence of any visible class in the image. Therefore the question we ask is not whether information about is present in — it is — but how such information is encoded in . In particular, for any class from a universe of concepts , we seek to understand whether explicitly represents in some way where it is possible to factor at locations P into two components
where the generation of the object at locations P depends mainly on the units , and is insensitive to the other units . Here we refer to each channel of the featuremap as a unit: U denotes the set of unit indices of interest and is its complement; we will write and to refer to the entire set of units and featuremap pixels in . We study the structure of in two phases:
Dissection: starting with a large dictionary of object classes, we identify the classes that have an explicit representation in by measuring the agreement between individual units of and every class (Figure 1b).
Intervention: for the represented classes identified through dissection, we identify causal sets of units and measure causal effects between units and object classes by forcing sets of units on and off (Figure 1c,d).
We first focus on individual units of the representation. Recall that is the one-channel featuremap of unit in a convolutional generator, where is typically smaller than the image size. We want to know if a specific unit encodes a semantic class such as a “tree”. For image classification networks, Bau et al. (2017) has observed that many units can approximately locate emergent object classes when the units are upsampled and thresholded. In that spirit, we select a universe of concepts for which we have a semantic segmentation for each class. Then we quantify the spatial agreement between the unit ’s thresholded featuremap and a concept ’s segmentation with the following intersection-over-union (IoU) measure:
where and denote intersection and union operations, and denotes the image generated from . The one-channel feature map slices the entire featuremap at unit . As shown in Figure 2a, we upsample to the output image resolution as . produces a binary mask by thresholding the at a fixed level . is a binary mask where each pixel indicates the presence of class in the generated image . The threshold is chosen to be informative as possible by maximizing the information quality ratio (using a separate validation set), that is, it maximizes the portion of the joint entropy H which is mutual information I (Wijaya et al., 2017).
We can use to rank the concepts related to each unit and label each unit with the concept that matches it best. Figure 3 shows examples of interpretable units with high . They are not the only units to match tables and sofas: layer3 of the dining room generator has units (of ) that match tables and table parts, and layer4 of the living room generator has (of ) sofa units.
Once we have identified an object class that a set of units match closely, we next ask: which units are responsible for triggering the rendering of that object? A unit that correlates highly with an output object might not actually cause that output. Furthermore, any output will jointly depend on several parts of the representation. We need a way to identify combinations of units that cause an object.
To answer the above question about causality, we probe the network using interventions: we test whether a set of units U in cause the generation of by forcing the units of U on and off.
Recall that denotes the featuremap at units U and locations P. We ablate those units by forcing . Similarly, we insert those units by forcing , where is a per-class constant, as described in Section S-6.4. We decompose the featuremap into two parts , where are unforced components of :
An object is caused by U if the object appears in and disappears from . Figure 1c demonstrates the ablation of units that remove trees, and Figure 1d demonstrates insertion of units at specific locations to make trees appear. This causality can be quantified by comparing the presence of trees in and and averaging effects over all locations and images. Following prior work (Holland, 1988; Pearl, 2009), we define the average causal effect (ACE) of units U on the generation of on class as:
where denotes a segmentation indicating the presence of class in the image at P. To permit comparisons of between classes which are rare, we normalize our segmentation by . While these measures can be applied to a single unit, we have found that objects tend to depend on more than one unit. Thus we need to identify a set of units U that maximize the average causal effect for an object class .
Given a representation with units, exhaustively searching for a fixed-size set U with high is prohibitive as it has subsets. Instead, we optimize a continuous intervention , where each dimension indicates the degree of intervention for a unit . We maximize the following average causal effect formulation :
where denotes the all-channel featuremap at locations P, denotes the all-channel featuremap at other locations , and applies a per-channel scaling vector to the featuremap . We optimize over the following loss with an L2 regularization:
controls the relative importance of each term. We add the L2 loss as we seek a minimal set of casual units. We optimize using stochastic gradient descent, sampling over bothand featuremap locations P and clamping the coefficient within the range at each step (d is the total number of units). More details of this optimization are discussed in Section S-6.4. Finally, we can rank units by and achieve a stronger causal effect (i.e., removing trees) when ablating successively larger sets of tree-causing units as shown in Figure 4.
We study three variants of Progressive GANs (Karras et al., 2018) trained on LSUN scene datasets (Yu et al., 2015). To segment the generated images, we use a recent model (Xiao et al., 2018) trained on the ADE20K scene dataset (Zhou et al., 2017). The model can segment the input image into object classes, parts of large objects, and materials. To further identify units that specialize in object parts, we expand each object class into additional object part classes c-t, c-b, c-l, and c-r, which denote the top, bottom, left, or right half of the bounding box of a connected component.
Below, we use dissection for analyzing and comparing units across datasets, layers, and models (Section 4.1), and locating artifact units (Section 4.2). Then, we start with a set of dominant object classes and use intervention to locate causal units that can remove and insert objects in different images (Section 4.3 and 4.4). In addition, our video demonstrates our interactive tool.
We are particularly interested in any units that are correlated with instances of an object class with diverse visual appearances; these would suggest that GANs generate those objects using similar abstractions as humans. Figure 3 illustrates two such units. In the dining room dataset, a unit emerges to match dining table regions. More interestingly, the matched tables have different colors, materials, geometry, viewpoints, and levels of clutter: the only obvious commonality among these regions is the concept of a table. This unit’s featuremap correlates to the fully supervised segmentation model (Xiao et al., 2018) with a high IoU of .
The set of all object classes matched by the units of a GAN provides a map of what a GAN has learned about the data. Figure 5 examines units from GANs trained on four LSUN scene categories (Yu et al., 2015). The units that emerge are object classes appropriate to the scene type: for example, when we examine a GAN trained on kitchen scenes, we find units that match stoves, cabinets, and the legs of tall kitchen stools. Another striking phenomenon is that many units represent parts of objects: for example, the conference room GAN contains separate units for the body and head of a person.
In classifier networks, the type of information explicitly represented changes from layer to layer (Zeiler & Fergus, 2014). We find a similar phenomenon in a GAN. Figure 6 compares early, middle, and late layers of a progressive GAN with internal convolutional layers. The output of the first convolutional layer, one step away from the input , remains entangled: individual units do not correlate well with any object classes except for two units that are biased towards the ceiling of the room. Mid-level layers to have many units that match semantic objects and object parts. Units in layers and beyond match local pixel patterns such as materials, edges and colors. All layers are shown in Section S-6.7.
Interpretable units can provide insights about how GAN architecture choices affect the structures learned inside a GAN. Figure 7 compares three models from Karras et al. (2018): a baseline Progressive GANs, a modification that introduces minibatch stddev statistics, and a further modification that adds pixelwise normalization. By examining unit semantics, we confirm that providing minibatch stddev statistics to the discriminator increases not only the realism of results, but also the diversity of concepts represented by units: the number of types of objects, parts, and materials matching units increases by more than . The pixelwise normalization increases the number of units that match semantic classes by .
|Fréchet Inception Distance (FID)|
|“artifacts” units ablated (ours)||27.14|
|random units ablated||43.17|
|Human preference score||original images|
|“artifacts” units ablated (ours)||72.4%|
|random units ablated||49.9%|
While our framework can reveal how GANs succeed in producing realistic images, it can also analyze the causes of failures in their results. Figure 8a shows several annotated units that are responsible for typical artifacts consistently appearing across different images. We can identify these units efficiently by human annotation: out of a sample of 1000 images, we visualize the top ten highest activating images for each unit, and we manually identify units with noticeable artifacts in this set. It typically takes minutes to locate artifact-causing units out of units in layer4.
More importantly, we can fix these errors by ablating the above artifact-causing units. Figure 8b shows that artifacts are successfully removed, and the artifact-free pixels stay the same, improving the generated results. In Table 1 we report two standard metrics, comparing our improved images to both the original artifact images and a simple baseline that ablates randomly chosen units. First, we compute the widely used Fréchet Inception Distance (Heusel et al., 2017) between the generated images and real images. We use real images and generate images with high activations on these units. Second, we score images per method on Amazon MTurk, collecting human annotations regarding whether the modified image looks more realistic compared to the original. Both metrics show significant improvements. Strikingly, this simple manual change to a network beats state-of-the-art GANs models. The manual identification of “artifact” units can be approximated by an automatic scoring of the realism of each unit, as detailed in Section S-6.1.
Errors are not the only type of output that can be affected by directly intervening in a GAN. A variety of specific object types can also be removed from GAN output by ablating a set of units in a GAN. In Figure 9 we apply the method in Section 3.2 to identify sets of units that have causal effects on common object classes in conference rooms scenes. We find that, by turning off these small sets of units, most of the output of people, curtains, and windows can be removed from the generated scenes. However, not every object can be erased: tables and chairs cannot be removed. Ablating those units will reduce the size and density of these objects, but will rarely eliminate them.
The ease of object removal depends on the scene type. Figure 10 shows that, while windows can be removed well from conference rooms, they are more difficult to remove from other scenes. In particular, windows are just as difficult to remove from a bedroom as tables and chairs from a conference room. We hypothesize that the difficulty of removal reflects the level of choice that a GAN has learned for a concept: a conference room is defined by the presence of chairs, so they cannot be altered. And modern building codes mandate that all bedrooms must have windows; the GAN seems to have caught on to that pattern.
We can also learn about the operation of a GAN by forcing units on and inserting these features into specific locations in scenes. Figure 11 shows the effect of inserting layer4 causal door units in church scenes. In this experiment, we insert these units by setting their activation to the fixed mean value for doors (further details in Section S-6.4). Although this intervention is the same in each case, the effects vary widely depending on the objects’ surrounding context. For example, the doors added to the five buildings in Figure 11 appear with a diversity of visual attributes, each with an orientation, size, material, and style that matches the building.
We also observe that doors cannot be added in most locations. The locations where a door can be added are highlighted by a yellow box. The bar chart in Figure 11 shows average causal effects of insertions of door units, conditioned on the background object class at the location of the intervention. We find that the GAN allows doors to be added in buildings, particularly in plausible locations such as where a window is present, or where bricks are present. Conversely, it is not possible to trigger a door in the sky or on trees. Interventions provide insight on how a GAN enforces relationships between objects. Even if we try to add a door in layer4, that choice can be vetoed later if the object is not appropriate for the context. These downstream effects are further explored in Section S-6.5.
By carefully examining representation units, we have found that many parts of GAN representations can be interpreted, not only as signals that correlate with object concepts but as variables that have a causal effect on the synthesis of objects in the output. These interpretable effects can be used to compare, debug, modify, and reason about a GAN model. Our method can be potentially applied to other generative models such as VAEs (Kingma & Welling, 2014) and RealNVP (Dinh et al., 2017).
We have focused on the generator rather than the discriminator (as did in Radford et al. (2016)) because the generator must represent all the information necessary to approximate the target distribution, while the discriminator only learns to capture the difference between real and fake images. Alternatively, we can train an encoder to invert the generator (Donahue et al., 2017; Dumoulin et al., 2017). However, this incurs additional complexity and errors. Many GANs also do not have an encoder.
Our method is not designed to compare the quality of GANs to one another, and it is not intended as a replacement for well-studied GAN metrics such as FID, which estimate realism by measuring the distance between the generated distribution of images and the true distribution (Borji (2018) surveys these methods). Instead, our goal has been to identify the interpretable structure and provide a window into the internal mechanisms of a GAN.
Prior visualization methods (Zeiler & Fergus, 2014; Bau et al., 2017; Karpathy et al., 2016) have brought new insights into CNN and RNNs research. Motivated by that, in this work we have taken a small step towards understanding the internal representations of a GAN, and we have uncovered many questions that we cannot yet answer with the current method. For example: why can a door not be inserted in the sky? How does the GAN suppress the signal in the later layers? Further work will be needed to understand the relationships between layers of a GAN. Nevertheless, we hope that our work can help researchers and practitioners better analyze and develop their own GANs.
We thank Zhoutong Zhang, Guha Balakrishnan, Didac Suris, Adrià Recasens, and Zhuang Liu for valuable discussions. We are grateful for the support of the MIT-IBM Watson AI Lab, the DARPA XAI program FA8750-18-C000, NSF 1524817 on Advancing Visual Recognition with Feature Visualizations, NSF BIGDATA 1447476, and a hardware donation from NVIDIA.
LSTMVis: A tool for visual analysis of hidden state dynamics in recurrent neural networks.IEEE TVCG, 24(1):667–676, Jan 2018.
Unified perceptual parsing for scene understanding.In ECCV, 2018.
In Section 4.2, we have improved GANs by manually identifying and ablating artifact-causing units. Now we describe an automatic procedure to identify artifact units using unit-specific FID scores.
To compute the FID score (Heusel et al., 2017) for a unit , we generate images and select the images that maximize the activation of unit , and this subset of images is compared to the true distribution (
real images) using FID. Although every such unit-maximizing subset of images represents a skewed distribution, we find that the per-unit FID scores fall in a wide range, with most units scoring well in FID while a few units stand out with bad FID scores: many of them were also manually flagged by humans, as they tend to activate on images with clear visible artifacts.
Figure 12 shows the performance of FID scores as a predictor of manually flagged artifact units. The per-unit FID scores can achieve 50% precision and 50% recall. That is, of the 20 worst-FID units, 10 are also among the 20 units manually judged to have the most noticeable artifacts. Furthermore, repairing the model by ablating the highest-FID units works: qualitative results are shown in Figure 13 and quantitative results are shown in Table 2.
|Fréchet Inception Distance (FID)|
|manually chosen “artifact” units ablated (as in Section 4.2)||27.14|
|highest-20 FID units ablated||27.6|
|union of manual and highest FID (30 total) units ablated||26.1|
|random units ablated||43.17|
As a sanity check, we evaluate the gap between human labeling of object concepts correlated with units and our automatic segmentation-based labeling, for one model, as follows.
For each of 512 units of layer4 of a “living room” Progressive GAN, 5 to 9 human annotations were collected (3728 labels in total). In each case, an AMT worker is asked to provide one or two words describing the highlighted patches in a set of top-activating images for a unit. Of the 512 units, 201 units were described by the same consistent word (such as ”sofa”, ”fireplace” or ”wicker”) in 50% or more of the human labels. These units are interpretable to humans.
Applying our segmentation-based dissection method, 154/201 of these units are also labeled with a confident label with IoU 0.05 by dissection. In 104/154 cases, the segmentation-based model gave the same label word as the human annotators, and most others are slight shifts in specificity. For example, the segmentation labels “ottoman” or “curtain” or “painting” when a person labels “sofa” or “window” or “picture,” respectively. A second AMT evaluation was done to rate the accuracy of both segmentation-derived and human-derived labels. Human-derived labels scored 100% (of the 201 human-labeled units, all of the labels were rated as consistent by most raters). Of the 154 segmentation-generated labels, 149 (96%) were rated by most AMT raters as accurate as well.
The five failure cases (where the segmentation is confident but rated as inaccurate by humans) arise from situations in which human evaluators saw one concept after observing only 20 top-activating images, while the algorithm, in evaluating 1000 images, counted a different concept as dominant. Figure 14a shows one example: in the top images, mostly sofas are highlighted and few ceilings, whereas in the larger sample, mostly ceilings are triggered.
There are also 47/201 cases where the segmenter is not confident while humans have consensus. Some of these are due to missing concepts in the segmenter. Figure 14b shows a typical example, where a unit is devoted to letterboxing (white stripes at the top and bottom of images), but the segmentation has no confident label to assign to these. We expect that as future semantic segmentation models are developed to be able to identify more concepts such as abstract shapes, more of these units can be automatically identified.
Our method relies on having a segmentation function that identifies pixels of class in the output . However, the segmentation model can perform poorly in the cases where does not resemble the original training set of . This phenomenon is visible when analyzing earlier GAN models. For example, Figure 15 visualizes two units from a WGAN-GP model (Gulrajani et al., 2017) for LSUN bedrooms (this model was trained by Karras et al. (2018) as a baseline in the original paper). For these two units, the segmentation network seems to be confused by the distorted images.
To protect against such spurious segmentation labels, we can use a technique similar to that described in Section S-6.1: automatically identify units that produce unrealistic images, and omit those “unrealistic” units from semantic segmentation. An appropriate threshold to apply will depend on the distribution being modeled: in Figure 16, we show how applying a filter, ignoring segmentation on units with FID 55 or higher, affects the analysis of this base WGAN model. In general, fewer irrelevant labels are associated with units.
In this section we provide more details about the ACE optimization described in Section 3.2.
In Eqn. 3, the negative intervention is defined as zeroing the intervened units, and a positive intervention is defined as setting the intervened units to some big class-specific constant . For interventions for class , we set to be mean featuremap activation conditioned on the presence of class at that location in the output, with each pixel weighted by the portion of the featuremap locations that are covered by the class . Setting all units at a pixel to will tend to strongly cause the target class. The goal of the optimization is to find the subset of units that is causal for .
When optimizing the causal objective (Eqn. 5), the intervention locations P are sampled from individual featuremap locations. When the class is rare, most featuremap locations are uninformative: for example, when class is a door in church scenes, most regions of the sky, grass, and trees are locations where doors will not appear. Therefore, we focus the optimization as follows: during training, minibatches are formed by sampling locations P that are relevant to class by including locations where the class is present in the output (and are therefore candidates for removal by ablating a subset of units), and an equal portion of locations where class is not present at P, but it would be present if all the units are set to the constant (candidate locations for insertion with a subset of units). During the evaluation, causal effects are evaluated using uniform samples: the region P is set to the entire image when measuring ablations, and to uniformly sampled pixels P when measuring single-pixel insertions.
When optimizing causal for class , we initialize with
That is, we set the initial so that the largest component corresponds to the unit with the largest IoU for class , and we normalize the components so that this largest component is .
When applying the interventions, we clip by keeping only its top components and zeroing the remainder. To compare the interventions of different classes an different models on an equal basis, we examine interventions where we set .
To investigate the mechanism for suppressing the visible effects of some interventions seen in Section 4.4, in this section we insert 20 door-causal units on a sample of individual featuremap locations at layer4 and measure the changes caused in later layers.
To quantify effects on downstream features, the change in each feature channel is normalized by that channel’s mean L1 magnitude, and we examine the mean change in these normalized featuremaps at each layer. In Figure 17, these effects that propagate to layer14 are visualized as a heatmap: brighter colors indicate a stronger effect on the final feature layer when the door intervention is in the neighborhood of a building instead of trees or sky. Furthermore, we plot the average effect on every layer at right in Figure 17, separating interventions that have a visible effect from those that do not. A small identical intervention at layer4 is amplified to larger changes up to a peak at layer12.
Dissection can also be used to monitor the progress of training by quantifying the emergence, diversity, and quality of interpretable units. For example, in Figure 18 we show dissections of layer4 representations of a Progressive GAN model trained on bedrooms, captured at a sequence of checkpoints during training. As training proceeds, the number of units matching objects increases, as does the number of object classes with matching units, and the quality of object detectors as measured by average IoU over units increases. During this successful training, dissection suggests that the model is gradually learning the structure of a bedroom, as increasingly units converge to meaningful bedroom concepts.
In Section 4.1 we show a small selection of layers of a GAN; in Figure 19 we show a complete listing of all the internal convolutional layers of that model (a Progressive GAN trained on LSUN living room images). As can be seen, the diversity of units matching high-level object concepts peaks at layer4-layer6, then declines in later layers, with the later layers dominated by textures, colors, and shapes.