Experiments with S3N
With the aim of designing automated tools that assist in the video game quality assurance process, we frame the problem of identifying bugs in video games as an anomaly detection (AD) problem. We develop State-State Siamese Networks (S3N) as an efficient deep metric learning approach to AD in this context and explore how it may be used as part of an automated testing tool. Finally, we show by empirical evaluation on a series of Atari games, that S3N is able to learn a meaningful embedding, and consequently is able to identify various common types of video game bugs.READ FULL TEXT VIEW PDF
The considerable significance of Anomaly Detection (AD) problem has rece...
Bugs that persist into releases of video games can have negative impacts...
In spite of advances in gaming hardware and software, gameplay is often
Video-game projects are notorious for having day-one bugs, no matter how...
For teachers, automated tool support for debugging and assessing their
We continue to develop our neural network (NN) based forecasting approac...
Conventional Endoscopy (CE) and Wireless Capsule Endoscopy (WCE) are kno...
Experiments with S3N
Video game development companies take significant steps at all stages of development to reduce the likelihood of bugs appearing in release code. These steps range from the use of software development paradigms early in the process to heavy investment in Quality Assurance (QA) closer to release. As games become increasingly vast and complex, exploring and uncovering bugs manually is becoming less feasible. In contrast, the continuing advancements in Reinforcement Learning (RL) are allowing software agents to play and explore with greater proficiency in increasingly complex games. This has opened up an opportunity for the development of automated tools to assist developers and testers in the video game QA process. Previous attempts in developing these tools have focused on building frameworks or require detailed descriptions of the environment and are heavily integrated with the games internal implementation .
With the aim of developing automated testing tools that can be easily integrated with existing development practices, we frame the problem of identifying bugs as an Anomaly Detection (AD) problem, treating the manifestation of a bug in the raw observation space (as seen by a human player) as an anomaly. With this view, we explore deep metric learning as an approach to anomaly detection, and its potential to form the basis for such tools.
Specifically, we formalise the AD problem in this context and present State-State Siamese Networks (S3N) as a semi-supervised metric learning approach that learns from only raw observations (frames). S3N uses spatial and local temporal information to efficiently learn a latent representation of the state space that induces a meaningful measure of normality. We use Atari games from the Arcade Learning Environment (ALE)  to create an open dataset of anomalies, the Atari Anomaly Dataset (AAD). The dataset consists of trajectories from 7 Atari games collected using model-free RL with common types of bugs  introduced artificially. Finally, we evaluate S3N’s ability to construct meaningful representations and consequently its ability to detect anomalies on AAD, and discuss promising future directions.
We use the following formalism for the remainder of the paper. We refer to a single frame (image) of a video game at time as a state . A player action leads to a (stochastic) transition from the state to according to a transition function . To simplify our discussion we consider a Markovian transition function . We refer to a single play-through of a game (from an initial state to a final state) as a trajectory . Under this formalism, a game is modelled as a labelled directed graph
, with nodes as states and edges as transitions with associated action labels and probabilities. The development process (including QA) can be thought of as the incremental improvement of successive graphs that arecloser to some ideal graph. The graph that is released to customers being the closest approximation to the ideal graph. We denote the ideal graph as and an approximate graph as . We assume no prior knowledge of the state space or transition probabilities, and a small constant frame-rate.
As it is common in video games for particular states or transitions to be rare, we cannot take the view of anomalies as outliers
anomalies as outliers. It is also common for games to have a large branching factor, in the worst cases states that occur later in time are exponentially more unlikely than their predecessors. With this in mind, we take the out of distribution view, and define two types of anomaly:
State anomaly - a state is anomalous iff
Transition anomaly - a transition is anomalous iff .
Siamese networks are a general approach to metric learning, and have been successfully applied to many areas, most notably for learning image similarities in facial recognition[10, 11]. Siamese networks learn an implicit distance by learning to represent examples in low dimensional space according to a distance-based objective . They are trained on pairs of examples , requiring some labelling that is indicative of the desired latent structure. One popular distance-based objective is triplet loss :
is typically a neural network with parameters, is an anchor example, is an example with the same label as and is an example with a label that differs from . Triplet loss is derived from the desired property . The margin parameter prevents the network from learning trivial solutions. Many other objectives exist [3, 2], triplet loss is the objective that is used in our experiments.
More recently, metric learning and specifically siamese networks, have been applied to anomaly detection . The key idea is that instead of using a proxy anomaly score (e.g. reconstruction error), the score is learnt directly. The anomaly score is used to rank examples by their normality, with higher scores typically indicating abnormality.
|a) Visual artefact||b) Flicker||c) Freeze skip||d) Split horizontal||e) Split vertical|
Illustrative plots of distance vectorsfor a 2D embedding of Breakout. Blue and red points correspond to normal and anomalous transitions respectively.
S3N is a data-efficient learning procedure that is able to construct meaningful embeddings without the use of action information or a direct labelling of normal/anomalous states or transitions. S3N consists of a dynamic labelling schema and training procedure, the labelling schema is given below:
Under this labelling schema, states that have a temporal relationship are considered close according to the learned metric. That is, the network will attempt to embed the game graph, with connected nodes mapped to similar regions of the embedding space. We hope then, that the support of is in some sense captured by the neighbourhood of the particular node in the embedding. The desired property is given below:
In later discussion we refer to as the displacement with reference to a particular trajectory . We do not impose any additional constraints on the embedding structure, and have found in our experiments that the embedding is meaningful with respect to the AD problem, see Fig. 1. The learned metric evaluated on a particular query pair , can be used directly as an anomaly score, with anomalous transitions indicated by high , or low , see Fig. 2.
Part of the difficulty with the approach is in its computational complexity. To avoid computing a distance matrix over an entire trajectory, which is unnecessarily costly, we take a mini-batching approach and employ stochastic gradient descent. Positive pairsare uniformly sampled in batches from trajectories that are collected using a trajectory collector . We then assume that the positive part for pair is negative for all other anchors in the batch and construct a distance matrix accordingly. With a sufficiently large sample space it is unlikely that the assumption is broken, but care should be taken if the graph is dense. In our experiments the effect was negligible. It is also important to note that the embedding dimension should be sufficiently large, with dense graphs requiring larger dimensions. The S3N training algorithm is described in Alg. 1. Using this algorithm, a good embedding can be learned quickly111in order of minutes rather than hours using an NVIDIA RTX 2070 GPU requiring orders of magnitude less data than approaches that rely on prediction or that have a generative aspect.
In order to learn a meaningful embedding, S3N training is semi-supervised and trained only on normal trajectories. In a practical setting, we may not have access to normal trajectories, more likely we have access to an in-progress
approximate game that contains some bugs. To make S3N viable for use as part of a practical tool, we envisage an active learning procedure in which a developer is continually adapting the training data by re-programming the game after receiving feedback on the most anomalous transitions. As this process continues, the game will approach the ideal game and S3N will improve and adapt its knowledge of normality. Realising active learning is left as future work, in our experiments we use the ideal game directly as an initial demonstration of the feasibility of S3N as an approach.
|Beam Rider (64)||0.0616||0.0517||0.0076||0.0032||0.0019||0.0004||0.9347||0.9997||0.0048||0.9878||0.9905||0.9927|
|Space Invaders (64)||0.0284||0.0938||0.0423||0.0245||0.0001||0.0000||0.9834||1.0000||0.0179||0.9750||0.9949||0.9951|
|VA = Visual Artefact, F-Skip = Freeze Skip, SH = Split Horizontal, SV = Split Vertical|
To test our approach, we use 7 Atari games222Beam Rider, Breakout, Enduro, Pong, Qbert, Seaquest, and Space Invaders that have previously been made available as part of the Arcade Learning Environment (ALE)  and OpenAI Gym. States (and actions) have been collected using model-free RL, specifically, with the OpenAI stable-baselines  implementation of Advantage Actor-Critic (A2C), totalling approx. 200k states per game. Common types of anomalies  have been artificially introduced into approximately half of the collected trajectories, these include freezing, flickering and visual artefacts (see Fig. 3) at a rate of . Each game was chosen with a specific motivation in mind, testing S3N’s ability to deal with large discontinuities including flashing and scene changes, embed (a)cyclic graphs, dense/sparse graphs, or to deal with a high inherent dimensionality. Data and further details can be found here333https://www.kaggle.com/benedictwilkinsai/atari-anomaly-dataset-aad.
The neural network used in the experiments to follow has a three layer convolutional architecture with leaky ReLU activation and a final linear embedding layer of dimension 64 or 256. The same network architecture was used for each game, with the following set of hyper parameters, batch size, margin , squared norm was used as the distance in triplet loss, learning rate
for Adam optimiser. The network was trained for 12 epochs on as little as 60k states from theraw partition of AAD. All code and pre-trained models are available here444https://github.com/BenedictWilkinsAI/S3N.
Before evaluating the performance of S3N on detecting anomalies, we make an attempt at evaluating the quality of the learned embedding. A poor embedding may be the result of an insufficient embedding dimension or high-entropy transitions, but there are other more subtle possibilities. For example, due to the lack of a hard restriction on the magnitude of .
As the learned metric is going to be used directly to determine a ranking for normal and anomalous transitions, in order to avoid false positives, we want to be sure that there are no large jumps in a normal embedding trajectory. At first glance, the standard deviation of displacementseems to give a good indication of uniformity, however self-transitions are an issue. To make the statistic more robust, we look at the standard deviation of the residuals where is the margin parameter. This has the effect of ignoring any normal displacements that are already within an acceptable tolerance, and leads to a more intuitive ideal 0 value. We refer to the standard deviation of residual 1-step displacements as the Uniform Distance Statistic (UDS).
To ensure the embedding is consistent with the original objective , we treat each
as a random variable whose realisations correspond to-step displacements and determine using a rank-sum test. We show results for increasing values of in Table I and see that the probability quickly vanishes. When combined with the UDS, we can conclude that S3N is able to construct good embeddings, even in the face of scene changes and other large discontinuities. In the case of Breakout, UDS is comparatively high. We hypothesise that this is due to its high inherent (combinatorial) dimension with some jumps occurring at the transitions between different block configurations.
To evaluate the performance of S3N on the detection of anomalies, as is common in the literature, we use the AUC score. As shown by the scores in Table I, S3N is able to correctly identify flickering, skips and various kinds of visual artefacts. Freezing is part of a particular class of self-transitioning anomaly that cannot be detected by our approach. In our experiments, S3N is learning a proper distance ( norm), i.e. and . The second axiom results in an anomaly score of being assigned to self-transitions and hence the bad performance in this case. We have given special consideration to labelling transitions for flickering and freeze skip anomalies, labelling only the non self-transitions as anomalous. It should also be noted that S3N is invariant to the direction of time due to symmetry in the distance. We leave it as part of future work to explore alternative measures that might address these issues, perhaps by incorporating action information as a source of asymmetry.
S3N is an efficient learning algorithm for constructing video game embeddings for the purpose of anomaly detection, requiring orders of magnitude less data and training than similar generative or predictive approaches. We have given an initial demonstration of the feasibility of S3N on our dataset (AAD), making it available to support future work in this area. We have evaluated the ability of S3N to construct meaningful embeddings, and shown that it is able to successfully identify many common types of video game bugs. Future direction includes exploring actions as part of alternative measures for use in the objective.
Journal of Artificial Intelligence Research47, pp. 253–279. External Links: Cited by: §I, §IV-A.
Large Scale Online Learning of Image Similarity Through Ranking.
Journal of Machine Learning Research11 (Mar), pp. 1109–1135. External Links: Cited by: §II-C.