Finding and Visualizing Weaknesses of Deep Reinforcement Learning Agents

04/02/2019 ∙ by Christian Rupprecht, et al. ∙ 0

As deep reinforcement learning driven by visual perception becomes more widely used there is a growing need to better understand and probe the learned agents. Understanding the decision making process and its relationship to visual inputs can be very valuable to identify problems in learned behavior. However, this topic has been relatively under-explored in the research community. In this work we present a method for synthesizing visual inputs of interest for a trained agent. Such inputs or states could be situations in which specific actions are necessary. Further, critical states in which a very high or a very low reward can be achieved are often interesting to understand the situational awareness of the system as they can correspond to risky states. To this end, we learn a generative model over the state space of the environment and use its latent space to optimize a target function for the state of interest. In our experiments we show that this method can generate insights for a variety of environments and reinforcement learning methods. We explore results in the standard Atari benchmark games as well as in an autonomous driving simulator. Based on the efficiency with which we have been able to identify behavioural weaknesses with this technique, we believe this general approach could serve as an important tool for AI safety applications.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 5

page 6

page 8

page 11

page 12

page 13

page 14

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Humans can naturally learn and perform well at a wide variety of tasks, driven by instinct and practice; more importantly, they are able to justify why they would take a certain action. Artificial agents should be equipped with the same capability, so that their decision making process is interpretable by researchers. Following the enormous success of deep learning in various domains, such as the application of convolutional neural networks (CNNs) to computer vision

[23, 22, 24, 36], a need for understanding and analyzing the trained models has arisen. Several such methods have been proposed and work well in this domain, for example for image classification [38, 46, 10], sequential models [16] or through attention [44].

Deep reinforcement learning (RL) agents also use CNNs to gain perception and learn policies directly from image sequences. However, little work has been so far done in analyzing RL networks. We found that directly applying common visualization techniques to RL agents often leads to poor results. In this paper, we present a novel technique to generate insightful visualizations for pre-trained agents.

Currently, the generalization capability of an agent is—in the best case—evaluated on a validation set of scenarios. However, this means that this validation set has to be carefully crafted to encompass as many potential failure cases as possible. As an example, consider the case of a self-driving agent, where it is near impossible to exhaustively model all interactions of the agent with other drivers, pedestrians, cyclists, weather conditions, even in simulation. Our goal is to extrapolate from the training scenes to novel states that induce a specified behavior in the agent.

In our work, we learn a generative model of the environment as an input to the agent. This allows us to probe the agent’s behavior in novel states created by an optimization scheme to induce specific actions in the agent. For example we could optimize for states in which the agent sees the only option as being to slam on the brakes; or states in which the agent expects to score exceptionally low. Visualizing such states allows to observe the agent’s interaction with the environment in critical scenarios to understand its shortcomings. Furthermore, it is possible to generate states based on an objective function specified by the user. Lastly, our method does not affect and does not depend on the training of the agent and thus is applicable to a wide variety of reinforcement learning algorithms.

Our contributions are:

  1. This is one of the first works to visualize and analyze deep reinforcement learning agents.

  2. We introduce a series of objectives to quantify different forms of interstingness and danger of states for RL agents.

  3. We evaluate our algorithm on 50 Atari games and a driving simulator, and compare performance across three different reinforcement learning algorithms.

  4. We quantitatively evaluate parts of our model in a comprehensive loss study (Tab. 1) and analyze generalization though a pixel level analysis of synthesized unseen states (Tab. 2).

  5. An extensive supplement shows additional comprehensive visualizations on 50 Atari games.

We will describe our method before we will discuss relevant related work from the literature.

2 Methods

We will first introduce the notation and definitions that will be used through out the remainder of the paper. We formulate the reinforcement learning problem as a discounted, infinite horizon Markov decision process

, where at every time step the agent finds itself in a state and chooses an action following its policy . Then the environment transitions from state to state given the model . Our goal is to visualize RL agents given a user-defined objective function, without adding constraints on the optimization process of the agent itself, i.e. assuming that we are given a previously trained agent with fixed parameters .

We approach visualization via a generative model over the state space and synthesize states that lead to an interesting, user-specified behavior of the agent. This could be, for instance, states in which the agent expresses high uncertainty regarding which action to take or states in which it sees no good way out. This approach is fundamentally different than saliency-based methods as they always need an input for the test-set on which the saliency maps can be computed. The generative model constrains the optimization of states to induce specific agent behavior.

2.1 State Model

Often in feature visualization for CNNs, an image is optimized starting from random noise. However, we found this formulation too unconstrained, often ending up in local minima or fooling examples (Figure 4a). To constrain the optimization problem we learn a generative model on a set

of states generated by the given agent that is acting in the environment. The model is inspired by variational autoencoders (VAEs)

[20] and consists of an encoder

that maps inputs to a Gaussian distribution in latent space and a decoder

that reconstructs the input. The training of our generator has three objectives. First, we want the generated samples to be close to the manifold of valid states . To avoid fooling examples, the samples should also induce correct behavior in the agent and lastly, sampling states needs to be efficient. We encode these goals in three corresponding loss terms.

(1)

The role of is to ensure that the reconstruction is close to the input such that is minimized. We observe that in the typical reinforcement learning benchmarks, such as Atari games, small details—e.g. the ball in Pong or Breakout—are often critical for the decision making of the agent. However, a typical VAE model tends to yield blurry samples that are not able to capture such details. To address this issue, we model the reconstruction error with an attentive loss

term, which leverages the saliency of the agent to put focus on critical regions of the reconstruction. The saliency maps are computed by guided backpropagation of the policy’s gradient with respect to the state.

(2)

As discussed earlier, gradient based reconstruction methods might not be ideal for explaining a CNN’s reasoning process [17]. Here however, we only use it to focus the reconstruction on salient regions of the agent and do not use it to explain the agent’s behavior for which these methods are ideally suited. This approach puts emphasis on details (salient regions) when training the generative model.

Since we are interested in the actions of the agent on synthesized states, the second objective is used to model the perception of the agent:

(3)

where is a generic formulation of the output of the agent. For a DQN for example, , i.e. the final action is the one with the maximal -value. This term encourages the reconstructions to be interpreted by the agent the same way as the original inputs . The last term ensures that the distribution predicted by the encoder

stays close to a Gaussian distribution. This allows us to initialize the optimization with a reasonable random vector later and forms the basis of a regularizer. Thus, after training, the model approximates the distribution of states

by sampling from . We will now use the generator inside an optimization scheme to generate state samples that satisfy a user defined target objective.

2.2 Sampling States of Interest

Training a generator with the objective function of Equation 1 allows us to sample states that are not only visually close to the real ones, but which the agent can also interpret and act upon as if they were states from a real environment.

We can further exploit this property and formulate an energy optimization scheme to generate samples that satisfy a specified objective. The energy operates on the latent space of the generator and is defined as the sum of a target function on agent’s policy and a regularizer

(4)

The target function can be defined freely by the user and depends on the agent that is being visualized. For a DQN, one could for example define as the Q-value of a certain action, e.g. pressing the brakes of a car. In section 2.3, we show several examples of targets that are interesting to analyze. The regularizer can again be chosen as the KL divergence between

and the normal distribution:

(5)

forcing the samples that are drawn from the distribution to be close to the Gaussian distribution that the generator was trained with. We can optimize Equation 4 with gradient descent on .

(a) Pong -
scoring a point
(b) Space Invaders -
shooting an enemy
(c) Enduro -
overtaking an opponent
(d) Name This Game -
whether to refill air
(e) Seaquest -
out of oxygen
(f) Beamrider -
avoiding the enemy
(g) Kung Fu Master -
enemies on both sides
(h) Kung Fu Master -
easy, many points to score
(i) Kung Fu Master -
no enemies
Figure 1: Qualitative Results: Visualization of different target functions (Sec. 2.3). generates high reward and low reward states; generates states in which one action is highly beneficial and another is bad. For a long list of results, with over 50 Atari games, please see the supplementary material.

2.3 Target Functions

Depending on the agent, one can define several interesting target functions – we present and explore seven below, which we refer to as: , , , , , , and action maximization. For a DQN the previously discussed action maximization is interesting to find situations in which the agent assigns a high value to a certain action e.g. . Other states of interest are those to which the agent assigns a low (or high) value for all possible actions . Consequently, one can optimize towards a low -value for the highest valued action with the following objective:

(6)

where controls the sharpness of the soft maximum formulation. Analogously, one can maximize the lowest -value with . We can also optimize for interesting situations in which one action is of very high value and another is of very low value by defining

(7)

The energy (Equation 4) can be optimized with gradient descent on .

3 Related Work

We divide prior work into two parts. First we discuss the large body of visualization techniques developed primarily for image recognition, followed by related efforts in reinforcement learning.

3.1 Feature Visualization

In the field of computer vision, there is a growing body of literature on visualizing features and neuron activations of CNNs. As outlined in

[12], we differentiate between saliency methods, that highlight decision-relevant regions given an input image, methods that synthesize an image (pre-image) that fulfills a certain criterion, such as activation maximization [9] or input reconstruction, and methods that are perturbation-based, i.e. they quantify how input modification affects the output of the model.

3.1.1 Saliency Methods

Saliency methods typically use the gradient of a prediction or neuron at the input image to estimate importance of pixels. Following gradient magnitude heatmaps

[38] and class activation mapping [48], more sophisticated methods such as guided backpropagation [39, 28], excitation backpropagation [47], GradCAM [37] and GradCAM++ [6] have been developed. [49] distinguish between regions in favor and regions speaking against the current prediction. [40] distinguish between sensitivity and implementation invariance.

An interesting observation is that such methods seem to generate believable saliency maps even for networks with random weights [1]. [18] show that saliency methods do not produce analytically correct explanations for linear models and further reliability issues are discussed in [2, 13, 17].

3.1.2 Perturbation Methods

Perturbation methods modify a given input to understand the importance of individual image regions. [46] slide an occluding rectangle across the image and measure the change in the prediction, which results in a heatmap of importance for each occluded region. This technique is revisited by [10]

who introduce blurring/noise in the image, instead of a rectangular occluder, and iteratively find a minimal perturbation mask that reduces the classifier’s score, while

[7] train a network for masking salient regions.

3.1.3 Input Reconstruction

As our method synthesizes inputs to the agent, the most closely related work includes input reconstruction techniques. [25] reconstruct an image from an average of image patches based on nearest neighbors in feature space. [27] propose to reconstruct images by inverting representations learned by CNNs, while [8] train a CNN to reconstruct the input from its encoding.

When maximizing the activation of a specific class or neuron, regularization is crucial because the optimization procedure—starting from a random noise image and maximizing an output—is vastly under-constrained and often tends to generate fooling examples that fall outside the manifold of realistic images [32]. In [28] total variation (TV) is used for regularization, while [3] propose an update scheme based on Sobolev gradients. In [32] Gaussian filters are used to blur the pre-image or the update computed in every iteration. Since there are usually multiple input families that excite a neuron, [33] propose an optimization scheme for the distillation of these clusters. [41] show that even CNNs with random weights can be used for regularization. More variations of regularization can be found in [34, 35]. Instead of regularization, [30, 31]

use a denoising autoencoder and optimize in latent space to reconstruct pre-images for image classification.

Figure 2: Seaquest with ACKTR. Visualization results for a network trained with ACKTR on Seaquest. The objective is indicating situations that can be rewarding but also have a low scoring outcome. The generated states show low oxygen or close proximity to enemies.

3.2 Explanations for Reinforcement Learning

In deep reinforcement learning however, feature visualization is to date relatively unexplored. [45] apply t-SNE [26] on the last layer of a deep Q-network (DQN) to cluster states of behavior of the agent. [29] also use t-SNE embeddings for visualization, while [11] examine how the current state affects the policy in a vision-based approach using saliency methods. [42] use saliency methods from [38] to visualize the value and advantage function of their dueling Q-network. [14] finds critical states of an agent based on the entropy of the output of a policy. Interestingly, we could not find prior work using activation maximization methods for visualization. In our experiments we show that the typical methods fail in the case of RL networks and generate images far outside the manifold of valid game states, even with all typical forms of regularization. In the next section, we will show how to overcome these difficulties.

4 Experiments

In this section we thoroughly evaluate and analyze our method on Atari games [4] using the OpenAI Gym [5] and a driving simulator. We present qualitative results for three different reinforcement learning algorithms, show examples on how the method helps finding flaws in an agent, analyze the loss contributions and compare to previous techniques.

4.1 Implementation details

In all our experiments we use the same factors to balance the loss terms in Equation 6: for the KL divergence and for the agent perception loss. The generator is trained on frames (using the agent and an -greedy policy with ). Optimization is done with Adam [19] with a learning rate of and a batch size of for epochs. Training takes approximately four hours on a Titan Xp. Our generator uses a latent space of dimensions, and consists of four encoder stages comprised of a

convolution with stride 2, batch-normalization

[15]

and ReLU layer. The starting number of filters is 32 and is doubled at every stage. A fully connected layer is used for mean and log-variance prediction. Decoding is inversely symmetric to encoding, using deconvolutions and halving the number of channels at each of the four steps.

For the experiments on the Atari games we train a double DQN [42] for two million steps with a reward discount factor of . The input size is pixels. Therefore, our generator performs up-sampling by factors of , up to a output, which is then center cropped to pixels. The agents are trained on grayscale images, for better visual quality however, our generator is trained with color frames and convert to grayscale using a differentiable, weighted sum of the color channels. In the interest of reproducibility we will make the visualization code available.

4.2 Visualizations On Atari Games

In Figure 1, we show qualitative results from various Atari games using different target functions , as described in Section 2.3. From these images we can validate that the general visualizations that are obtained from the method are of good quality and can be interpreted by a human. generates generally high value states independent of a specific action (first row of Figure 1), while generates low reward situations, such as close before losing the game in Seaquest (Figure 1.e) or when there are no points to score (Figure 1.i). Critical situations can be found by maximizing the difference between lowest and highest estimated Q-value with . In those cases, there is clearly a right and a wrong action to take. In Name This Game (Figure 1.d) this occurs when close to the air refill pipe, which prevents suffocating under water; in Kung Fu Master when there are enemies coming from both sides (Figure 1.g), the order of attack is critical, especially since the health of the agent is low (yellow/blue bar on top). An example of maximizing the value of a single action (similar to maximizing the confidence of a class when visualizing image classification CNNs) can be seen in (Figure 1.f) where the agent sees moving left and avoiding the enemy as the best choice of action.

Figure 3: Weight Visualization. We visualize the weighting (second row) of the reconstruction loss from Equation 2 for eight randomly drawn samples (first row) of the dataset. Most weight lies on the player’s submarine and close enemies, supporting their importance for the decision making.

4.3 Acktr

To show that this visualization technique generalizes over different RL algorithms, we also visualize ACKTR [43]. We use the code and pretrained models from a public repository [21]

and train our generative model with the same hyperparameters as above and without any modifications on the agent. We present the

objective for the ACKTR agent in Figure 2 to visualize states with both high and low rewards, for example low oxygen (surviving vs. suffocating) or close proximity to enemies (earning points vs. dying).

Compared to the DQN visualizations the ACKTR visualizations, are almost identical in terms of image quality and interpretability. This supports the notion that our proposed approach is independent of the specific RL algorithm.

4.4 Interpretation of Visualizations

Analyzing the visualizations on Seaquest, we make an interesting observation. When maximizing the -value for the actions, in many samples we see a low or very low oxygen meter. In these cases the submarine would need to ascend to the surface to avoid suffocation. Although the up action is the only sensible choice in this case, we also obtain visualized low oxygen states for all other actions. This implies that the agent has not understood the importance of resurfacing when the oxygen is low. We then run several roll outs of the agent and see that the major cause of death is indeed suffocation and not collision with enemies. This shows the impact of visualization, as we are able to understand a flaw of the agent. Although it would be possible to identify this flawed behavior directly by analyzing the frames of training data for our generator, it is significantly easier to review a handful of samples from our method. Further, as the generator is a generative model, we can synthesize states that are not part of its training set.

4.5 Ablation Studies (Loss Terms)

In this section we analyze the three loss terms of our generative model. The human perceptual loss is weighted by the (guided) gradient magnitude of the agent in Equation 2. In Figure 3 we visualize this mask for a DQN agent for random frames from the dataset. The masks are blurred with an averaging filter of kernel size . We observe that guided backpropagation results in precise saliency maps focusing on player and enemies that then focus the reconstructions on what is important for the agent.

To study the influence of the loss terms we perform an experiment in which we evaluate the agent not on the real frames but on their reconstructions. If the reconstructed frames are perfect, the agent with generator goggles achieves the same score as the original agent. We can use this metric to understand the quantitative influence of the loss terms. In Pong, the ball is the most important visual aspect of the game for decision making.

Agent VAE only Ours (full)
Pong 14 -8 4 14
Atlantis 108 95 98 109
Q*bert 64 26 28 31
Table 1: Loss Study. We compare the performance of the original agent with the agent operating on reconstructed frames instead. The original performance represents an upper bound for the score of the same agent which is operating on reconstructions instead. Shown are average scores over 20 runs.

In Table 1 we see that the VAE baseline scores much lower than our model. This can be explained as follows. Since the ball is very small, it is mostly ignored by the reconstruction loss of a VAE. The contribution of one pixel to the overall loss is negligible and the VAE never focuses on reconstructing the important part of the image. Our formulation is built to regain the original performance of the agent, by reweighing the loss on perceptually salient regions of the agent. Overall, we see that our method always improves over the baseline but does not always match the original performance.

4.6 Comparison with Activation Maximization

(a) Activation Maximization
(b) Ours
Figure 4: Comparison with activation maximization. The visual features learned by the agents are not complex enough to reconstruct typical frames from the game via activation maximization. This problem is mitigated in our method by learning a low-dimensional embedding of games states first.

For image classification tasks, activation maximization works well when optimizing the pre-image directly [27, 3]. However we find that for reinforcement learning, the features learned by the network are not complex enough to reconstruct meaningful pre-images, even with sophisticated regularization techniques. The pre-image converges to a fooling example maximizing the class but being far away from the manifold of states of the environment.

(a) SeaQuest visualization .
(b) Closest training frame (L2).
(c) Closest training frame ().
Figure 5: Generating novel states. We show a frame generated by our method under the objective and retrieve the closest frame from the training set using L2 distance and the objective function. Both frames are very different, showing that the method is able to generate novel states. For a quantitative evaluation, please see Tab. 2.

In Figure 4.a we compare our results with the reconstructions generated using the method of [3] for a DQN agent. We obtain similarly bad pre-images with TV-regularization [28], Gaussian blurring [32] and other regularization tricks such as random jitter, rotations, scaling and cropping [34]. This shows that it is not possible to directly apply common techniques for visualizing RL agents and explains why a learned regularization from our generator is needed to produce meaningful examples.

4.7 Experiments with a Driving Simulator

Driving a car is a continuous control task set within a much more complex environment than Atari games. To explore the behavior of our proposed technique in this setting we have created a 3D driving simulation environment and trained an A2C agent maximizing speed while avoiding pedestrians that are crossing the road.

Figure 6: Driving simulator. We show one input frame sample on the left and then two target function visualizations obtained by our method. For each objective we show three random samples. For simplicity we only show the first frame of segmentation instead of the whole synthesized state (8 frames).
Figure 7: Driving simulator. We show 16 samples for the objective of an agent trained in the reasonable pedestrians environment. From these samples one can infer that the agent is aware of traffic lights (red) and other cars (blue) but has very likely not understood the severity of hitting pedestrians (yellow). Deploying this agent in the distracted pedestrians environment shows that the agent indeed collides with people that cross the road in front of the agent.
#pixels different % % % % % % %
samples 99% 73% 16% 4% 1% 1% 0%
Table 2: Synthesizing unseen states. We compare generated samples to their closest neighbor in the training set and compute the percentage of pixels whose values differ by at least 25%, e.g. 73% of the synthesized samples differ in more than 20% pixels in comparison to their closest training sample.

In our first set of experiments we trained an A2C agent to maximize speed while avoiding swerving and pedestrians that are crossing the road. The input to the agent are eight temporal frames comprised of depth, semantic segmentation and a gray-scale image (Figure 6). With this experiment we visualize three random samples for two target functions. The moving car and person categories, appear most prominently when probing the agent for the break action. However, we are also able to identify a flaw: unnecessary braking on empty roads as shown in the left most image of the right most block of three frames. Inappropriate breaking is a well known issue in this problem domain.

In a second set of experiments, we use our simulator to build two custom environments and validate that we can identify problematic behavior in the agent. The agent is trained with four temporal semantic segmentation frames ( pixels) as input (Figure 7). We train the agent in a “reasonable pedestrians” environment, where pedestrians cross the road carefully, when no car is coming or at traffic lights. With these choices, we model data collected in the real world, where it is unlikely that people unexpectedly run in front of the car. We visualize states in which the agent expects a low future return ( objective) in Figure 7. It shows that the agent is aware of other cars, traffic lights and intersections. However, there are no generated states in which the car is about to collide with a person, meaning that the agent does not recognize the criticality of pedestrians. To verify our suspicion, we test this agent in a “distracted pedestrians” environment where people cross the road looking at their phones without paying attention to approaching cars. We find that the agent does indeed run over humans. With this experiment, we show that our visualization technique can identify biases in the training data just by critically analyzing the sampled frames.

4.8 Novel states

To be able to generate novel states is useful, since it allows the method to model new scenarios that were not accounted for during training of the agent. This allows the user to identify potential problems without the need to include every possible permutation of situations in the simulator or real-world data collection.

While one could simply examine the experience replay buffer to find scenarios of interest, our approach allows unseen scenarios to be synthesized. To quantitatively evaluate the assertion that our generator is capable of generating novel states, we sample states and compare them to their closest frame in the training set under an MSE metric. We count a pixel as different if the relative difference in a channel exceeds 25% and report the histogram in Table 2. The results show that there are very few samples that are very close to the training data. On average a generated state is different in 25% of the pixels, which is high, considering the overall common layout of the road, buildings and sky.

We examine these results visually for Atari SeaQuest in Fig. 5, where we show a generated frame and the L2-closest frame from the training set additional to the closest frame in the training set based on the objective function. Retrieval with L2 is, as usual not very meaningful on images since the actual interesting parts of the images are dominated by the background. Thus we have also included a retrieval experiment based on the objective score which shows the submarine in a similar gameplay situation but with different enemy locations. The results in Tab. 2 and Fig. 5 confirm that the method is able to generate unseen states and does not overfit to the training set.

5 Discussion and Conclusions

We have presented a method to synthesize inputs to deep reinforcement learning agents based on generative modeling of the environment and user-defined objective functions. The agent perception loss helps the reconstructions to focus on regions of the current state that are important to the agent and avoid generating fooling examples. Training the generator to produce states that the agent perceives as those from the real environment enables optimizing its latent space to sample states of interest. Please consult the supplementary material included with this submission for more extensive visualization experiments.

We believe that understanding and visualizing agent behavior in safety critical situations is a crucial step towards creating safer and more robust agents using reinforcement learning. We have found that the methods explored here can indeed help accelerate the detection of problematic situations for a given learned agent. For our car simulation experiments we have focused upon the identification of weaknesses in constructed scenarios; however, we see great potential to apply these techniques to much more complex simulation environments where less obvious safety critical weaknesses may be lurking.

Acknowledgments

We would like to thank Iro Laina, Alexandre Piché, Simon Ramstedt, Evan Racah and Adrien Ali Taiga for helpful discussions and proofreading. We thank the Open Philanthropy project for supporting C.R. while he was an intern at Mila, where this work began.

References

Supplementary Material

To show an unbiased and wide variety of results, in the following, we will show four random samples generated by our method for a DQN agent trained on many of the Atari benchmark environments. We show visualizations optimized for a meaningful objective for each game (e.g. not optimizing for unused buttons). All examples were generated with the same hyperparameter settings.

Please note that for some games better settings can be found. Some generators on visually more complex games would benefit from longer training to generate sharper images. Our method is able to generate reasonable images even when the DQN was unable to learn a meaningful policy such as for Montezuma’s revenge (Fig. 39). We show two additional objectives maximizing/minimizing the expected reward of the state under a random action: and . Results in alphabetical order and best viewed in color.

Figure 8: Air Raid. Target function: .
Figure 9: Alien. Target function: right.
Figure 10: Amidar. Target function: up.
Figure 11: Assault. Target function: .
Figure 12: Asterix. Target function: .
Figure 13: Asteroids. Target function: up-fire.
Figure 14: Atlantis. Target function: .
Figure 15: Bank Heist. Target function: .
Figure 16: Battlezone. Target function: .
Figure 17: Beamrider. Target function: .
Figure 18: Berzerk. Target function: .
Figure 19: Bowling. Target function: .
Figure 20: Boxing. Target function: .
Figure 21: Breakout. Target function: .
Figure 22: Breakout. Target function: Left.
Figure 23: Carnival. Target function: right.
Figure 24: Centipede. Target function: .
Figure 25: Chopper Command. Target function: .
Figure 26: Crazy Climber. Target function: .
Figure 27: Demon Attack. Target function: .
Figure 28: Elevator Action. Target function: no-op.
Figure 29: Enduro. Target function: .
Figure 30: Freeway. Target function: .
Figure 31: Frostbite. Target function: no-op.
Figure 32: Gopher. Target function: .
Figure 33: Gravitar. Target function: .
Figure 34: Hero. Target function: .
Figure 35: JamesBond. Target function: .
Figure 36: Kangaroo. Target function: .
Figure 37: Krull. Target function: fire.
Figure 38: Kung Fu Master. Target function: up.
Figure 39: Montezuma’s Revenge. Target function: .
Figure 40: Ms. Pacman. Target function: no-op.
Figure 41: Name This Game. Target function: .
Figure 42: Phoenix. Target function: .
Figure 43: Pong. Target function: no-op.
Figure 44: Pooyan. Target function: .
Figure 45: Q-Bert. Target function: left.
Figure 46: River Raid. Target function: .
Figure 47: Space Invaders. Target function: left.
Figure 48: Star Gunner. Target function: .
Figure 49: Tutankham. Target function: no-op.
Figure 50: Venture. Target function: .
Figure 51: Video Pinball. Target function: .
Figure 52: Wizard Of Wor. Target function: left.