The success of AlphaGo 
has brought a significant attention for artificial intelligence in games (game AI). Agents trained by deep reinforcement learning have demonstrated hands-down victories over expert human players in classic games such as Chess, Go , and Atari 
. With more complex setting, real-time strategy (RTS) games serve a means to evaluate state-of-the-art learning algorithms. Game AI today opens up new opportunities and challenges for machine learning. The benefits of developing game AI are widespread beyond gaming applications. The exploration to adopt an intelligent agent in science (e.g., predicting protein folding in organic chemistry) and enterprise business service (e.g., chatbots ) is making to enter a new era for game AI.
In this paper, we describe DefogGAN that takes a generative approach to compensate imperfect information presented to a gamer due to the fog of war. We use StarCraft, an RTS game featuring three well-balanced races for a gamer to choose and build substantially different playing styles and strategies. StarCraft remains a popular E-sport after more than two decades of the original release. In a daunting aim for our game AI to conquer highly-skilled human players, we train our DefogGAN with more than 30,000 episodes of expert and professional human replays. Such aim has been notoriously difficult for StarCraft whose long withstanding popularity has compounded a broad range of adept game tactics in addition to micro-control techniques  widespread in the E-sport scenes and Battle.net.
The fog of war refers to the lack of vision and information on an area without a friendly unit around it, including all regions that have been previously explored but left unattended currently. Partially Observable Markov Decision Process (POMDP) best describes the fog of war problem. In general, POMDP gives a practical formulation for most real-world problems characterized by having many unobserved variables. For game AI, solving a partial observation problem is essential to improving its performance. In fact, many existing approaches to design intelligent game AI often suffer from the partial observation problem . Recently, generative models are used to alleviate the uncertainty of partial observations. The agent’s performance is enhanced from taking advantage of the (predictive) results obtained through a generative model [27, 8]. The generative approach, however, cannot fully match highly skillful scouting techniques of a top-notch professional human player.
StarCraft provides a great platform to study complex POMDP problems related to game AI. We set up DefogGAN to accurately predict the state of an opponent hidden in the fog using the realistic information generated, thanks to generative adversarial nets (GANs) 
. We find empirically that GANs generate more realistic images than variational autoencoders (VAEs). To generate a defogged game state, we have modified the original GAN generator into an encoder-decoder network.
In principle, DefogGAN is a variant of conditional GAN . Utilizing skip connections, the DefogGAN generator is trained on residual learning from the encoder-decoder structure. In addition to the GAN adversarial loss, we set up a reconstruction loss between fogged and defogged game states to emphasize the regression of unit positions and quantities. This paper makes the following contributions.
We develop DefogGAN to resolve a fogged game state into useful information for winning. DefogGAN makes one of the earliest GAN-based approaches to cope with the StarCraft fog of war;
Using skip connections for residual learning, we have set up DefogGAN to contain past information (sequence) in a feedforward manner without introducing any recurrent structure, making it suitable for real-time uses;
We empirically validate DefogGAN in ablation study and other settings such as testing against extracted game intervals and the current state-of-the-art defog strategy.
Our dataset, source code, and pretrained networks are available online for public access.111https://github.com/TeamSAIDA/DefogGAN
StarCraft is an immensely successful RTS game developed by Blizzard Entertainment. Since its original release in 1998, StarCraft has attracted professional E-sport leagues and millions of amateur enthusiasts worldwide. Consisting of three fictional races, namely Terran, Protoss, and Zerg, StarCraft is considered as one of the most well-balanced online games ever created. The combinatorial complexity of player actions is extremely high, although at a high level, winning conditions for StarCraft can be built upon the military power and an economy accumulated by the player.
StarCraft AI has a long history, reflecting a number of different playing styles. Ontanon et al. ontanon point out that StarCraft playing essentially comprises two tasks. First, micro-management refers to the ability to control units individually. Good micro-management can keep a player’s worker and combat units alive for a long time. Secondly, macro-management is the ability to produce units and expand the production facilities into regions other than the start location.
Defogging can be crucial to both micro- and macro-aspects of the game. Better estimation of hidden areas in the map will help win combats while the player has a higher chance of making the right decision for the future. A poor observation in general can hurt macro-management[28, 29]. Scouting is the most straightforward defogging technique [18, 23]
. Interestingly, Justesen & Risi justesen2017learning propose a deep learning-based approach to learn the opponent status from units and upgrades information. Generative models give a new class of prediction techniques in StarCraft AI. The convolutional encoder-decoder (CED) model[27, 8] can be used to recover information hidden in the fog. Synnaeve et al. synnaeve1 find beneficial to use a convolutional encoder and a convolutional-LSTM encoder. Our approach of using GAN to generate hidden information as a predictive measure is new to the literature.
Generative Adversarial Nets (GAN)
Goodfellow et al.  introduce GAN to generate data from probabilistic sampling. GAN constitutes two neural nets, a generator and a discriminator , trained in the competition described by a minimax game:
Radford, Metz, and Chintala dcgan have proposed DCGAN that uses a deep convolutional neural net as . Vanilla GAN is trained on the Jensen-Shannon divergence (JSD), which can cause the vanishing gradient and mode collapse problems. WGAN  proposes the use of the Wasserstein-1 metric to improve the vanilla GAN problems. Gulrajani et al. wgangp propose WGAN-GP having a gradient penalty that has a similar effect as the weight clipping. Zhao et al. ebgan introduce Energy-based GAN (EBGAN) using an autoencoder. Berthelot, Schumm, and Metz began propose BEGAN that combines the WGAN and EBGAN ideas. We will experimentally compare the GAN variants for defogging performances.
Generative Approaches for Defogging
The fog of war problem is similar to inpainting  and denoising . However, there are three key differences. First, the enemy units may be hidden even in the presence of the friendly units, so defogging must predict the location and the number of each enemy unit type in a 2D grid space up to 4096 4096. Secondly, defogging is a regression problem, which must infer the number of units in the entire area based on a partial observation. Lastly, the problem is not just to generate an image based on the masked (fogged) image. Defogging must indicate the grid where a unit of interest is likely to exist.
This section presents DefogGAN, explaining its architecture and objective functions. We also describe our implementation details.
DefogGAN generates a fully observed (defogged) state from a partially observed (fogged) state at a time . For StarCraft, a fully observed state includes the exact locations of all friendly and enemy units at a given time. Figure 2 presents DefogGAN. Feature maps computed on the current partially observed state input are sum-pooled. Feature maps on the past observations are accumulated and concatenated to the current before entering the generator. The reconstruction loss between the predicted and the actual fully observed states and the discriminator adversarial loss are used to train the generator.
We denote a ground-truth fully observed state at time , consisting of the exact locations of all units in the game. It is represented as a three-dimensional array of width, height, and channels. Each unit type makes up a channel, and the size of a raw game image in StarCraft is pixels. With 66 unit types, a 1-vs-1 StarCraft game state is . We use for a predicted fully observed state. A partially observed state at a time is . In StarCraft, friendly units are always visible, making the half of the channels in the input fully observed. Ignoring the enemy buildings, which are static units, a partially observed state is an array of size . Here, 50 channels include 34 channels for friendly units and 16 channels for enemy combat units. Partially observed states accumulated until time is denoted by . Accumulated partial observations, however, include enemy buildings and exclude friendly units, which are already a part in the current partial observation. This results in an array of size for . Combining and , a concatenated total input is applied to DefogGAN.
Accumulating Partial Observations
Unlike vanilla GAN that generates an image from a latent variable, Defog needs to generate a defogged observation given a partial observation . Defog has an autoencoder generator instead of a deconvolutional net.
If a partially observed state lacks temporal information about moving units, it would be insufficient to learn how to generate a fully observed state . Accumulated partial observation
facilitates such temporal information. Later in the paper, we show that using accumulated partial observation as an input increases precision and recall. DefogGAN takes in concatenatedand :
Note that we use downsampled and . Since the size of a raw state is too large, we reduce it to 32 x 32. More specifically, as shown in Figure 2, a partially observed state is now an array of size (). The downsampling allows DefogGAN to efficiently learn how to generate a fully observed state while preserving semantic information of a state [27, 8].
For using temporal information, we could use a recurrent neural net. Using a recurrent neural net, however, comes with some disadvantages such as information dilution and gradient vanishing. Since StarCraft has a relatively long playing time, recurrent nets in general should take too many frames (e.g., to infer game states for a 10-minute duration, 14,400 frames are needed). Our DefogGAN approach has opted for stacking past partial observations onto the current . By incorporating accumulated partial observation , we derive the adversarial objectives
Pyramidal Reconstruction Loss
We train the generator by minimizing the reconstruction loss between a generated state and the ground truth . To further enhance the generator, we introduce pyramidal reconstruction loss as a sum of the MSE between multiple levels of pooling having different sizes (, ). Figure 3 illustrates pyramidal reconstruction loss.
Multiple predictions at different scale are generated by sum pooling. By adjusting filter and stride sizes, sum pooling can generate multiple predictions in a pyramidal shape. More specifically, for a given feature map , the sum pooling function creates with a stride and a filter size . This can be formulated as follows
where is the coordinates of , and is the coordinates of .
At each generation, is evaluated as follows.
For the resolution (using ) to be reduced, pyramidal reconstruction loss is evaluated by
A scaling factor adjusts the loss values at different scales. It is defined
The proposed pyramidal reconstruction loss allows DefogGAN to learn the total number of units with the lowest resolution of . Finally, by incorporating the reconstruction loss, the generator loss of DefogGAN is extended as
Observation Preserving Connection
The DefogGAN encoder and decoder are connected in a symmetrical structure. We add residuals between the encoder and decoder at each layer to maintain the parts that have been seen already. By doing so, the generator learns the parts that are hidden in the fog [5, 7]. Through the encoder network, the compressed feature is well-communicated to the decoder for efficient learning. In particular, the observation preserving connections that tie the beginning and the end convey the information that has been observed already. This allows DefogGAN to focus on the information of the units that needs to be inferred. That is, the generator learns by observing informational connection with less the observed informational connection :
The total objective of DefogGAN is
Note hyperparametersand . In this paper, we use and .
The DefogGAN generator follows the style of the VGG network . The filter size is fixed at . The number of filters doubles when the feature map size is reduced by half. DefogGAN does not use any spatial pooling or fully-connected layers but uses convolutional layers to preserve spatial information from input to output.
The DefogGAN generator consists of encoder, decoder, and a channel combination layer. The encoder uses
input and extracts semantic features hidden in the fog by convolutional neural networks (CNNs). Each convolutional layer uses batch normalization and rectified linear unit (ReLU) to make the nonlinear conversion possible[6, 15].
The decoder generates predictive data using semantically extracted encoder features. The decoding process reconstructs data into a high dimension, and the inference is done using the transposed convolution operation. The decoder produces the same output shape as the input shape. We do not use as many convolutional layers as ResNet, considering the speed of learning due to the large initial channel size .
The final channel combination layer consists of a single convolutional layer, which combines the 82 channels of accumulated partial observations and to obtain 66 channels of information to predict. This infers .
We have collected a large dataset of more than 33,000 replays of professional StarCraft players. Our experiments utilize replay log files, which contain detailed unit information. For each frame, a concatenated partial observation () of the fogged map exists along the corresponding ground truth () in . From each episode, we have decided to only use a portion from the 7th to the 17th minutes. This is because high-level units in a StarCraft game start to appear in about 7 minutes. Also, the game typically finishes in 10 to 20 minutes  although not many replays are of more than 17 minutes of duration. Our dataset comprises 496,830 frames. We use 80% of the data for training, 10% validation, and 10% testing.
Table 1 summarizes the DefogGAN input-output statistics, including partially observed states , accumulated partially observed states , and ground truth . On average, 54% of the total number of units are seen in partial observation, and 83% are seen in accumulated partial observation. Note that accumulated partial observation causes a type 1 error (i.e., false positive) because accumulated states contain the previous locations of moving units that are obsolete at the current time. Given this output space, the defog problem is to select an average of 141 spaces out of 67,584 () spaces possible.
For performance evaluation, we compute five metrics:
Mean Squared Error (MSE)
The MSE between and is
Our MSE criterion measures: 1) correct prediction of unit types present at each location 2) correct prediction of how many (if present).
Accuracy, Precision, Recall and F1 score
Accuracy indicates how well the existence of units is predicted. Recall reflects how much false negative rate (type 2 error) is improved. For DefogGAN perspective, type 2 error gives a more practical indicator because the damage caused by an unexpected enemy (false negative) is greater than a nonexistent enemy (false positive). Precision represents a type 1 error as a percentage of what is expected to exist. The F1 score indicates the harmonic mean of recall and precision.
Determining Generator Training Interval
We have experimentally determined a reasonable amount of data needed for training the DefogGAN generator. Table 2 summarizes the generator performance measured in MSE, accuracy, and F1 score computed by varying number of frames used in training. Due to the nature of the DefogGAN prediction, the MSE criterion is most valuable. The empirical results suggest training with 10-sec worth of frames the best among our tested intervals.
In this section, we present a comparative performance analysis for DefogGAN. A rule-based StarCraft agent using accumulated partial observation is a reasonable baseline. This baseline means that a prediction model needs to make at least better prediction than just memorizing partial observation history. For comprehensive comparison, we select a diverse range of models including an autoencoder-based model CED [27, 8], simple GAN-based models, DCGAN  and BEGAN began, and WGAN-based models, WGAN-GP wgangp and cWGAN .
Comparison with baseline
As shown in Table 4, DefogGAN results in a 44% decrease in MSE compared to the baseline. DefogGAN predicts the number of units in a given cell more accurately than the baseline. This is because DefogGAN is able to predict enemy units hidden in fog. On the other hand, DefogGAN seems to provide similar prediction performance in terms of accuracy and F1 score. Note that accuracy and F1 score do not measure how accurately the number of units are predicted, but just measure how accurately the existence is predicted. Then, the result can be understood that DefogGAN can predict the number of units much precisely while correctly predicting the overall distribution of units on a map.
Comparison with autoencoder model
Compared to CED, one of autoencoder-based models, DefogGAN provide about 33% decreased MSE, and about 17% point increased F1 score. Note that recall of DefogGAN is very high, compared to that of CED. This high recall means that DefogGAN successfully discover enemy units hidden in fog. This high recall property is very important in StarCraft, since misdetected enemy units (i.e., low recall) can increase possible threat such as sudden attacks.
Comparison with GAN-based models
DefogGAN makes a better prediction compared to other GAN-based models. As shown in Table 4, unconditional base GAN models such as DCGAN and BEGAN performs very poorly. This is mainly because these models are trained without reconstruction loss. WGAN-GP makes relatively good prediction results without reconstruction loss, but does not exceed DefogGAN. We carefully think that the Wasserstein distance of WGAN-GP makes an effect of reconstruction loss in training. Therefore, we do additional comparison with cWGAN, a WGAN variants that has reconstruction loss. However, cWGAN does not provide better performance than WGAN-GP.
Visualization of prediction results
The prediction performance of DefogGAN can be effectively explained with the visualization in Figure 4. We randomly select four replays and present the defogged fully observed states predicted by each model. For example, in replay 4, we cannot see red enemy units in the lower right corner of the partially observed state . Also, we can only see a subset of enemy units from the accumulated partially observed states . By using both observation and accumulated observation, DefogGAN generates a fully observed state that looks most similar to the ground truth. Since DCGAN and BEGAN do not use reconstruction loss, they fail to generate a fully observed state that has similar pattern to the ground truth. CED generates fairly plausible full states, but DefogGAN generates more accurate results. WGAN-GP generates plausible full states without reconstruction loss. However, it seems to have a tendency to generate false positive results (i.e., low precision). cWGAN (a WGAN-GP variant that additionally use reconstruction loss) seems to reduce such false positives, but still do not make a prediction better than DefogGAN.
We evaluate the performance of the proposed method that combines accumulated partial observation and partial observation , joint loss and reduced resolution loss. Finally, we compare the performance of the observation preserving connection.
DefogGAN proposed in Table 5 shows that our proposed techniques in the ablation study produce good performance.
Effect of concatenated partial observation
Using the concatenated partial observation method, the MSE is 29% better than using only the accumulated partial observed information and 51% better than using only the partial observed information. This indicates that it is important to utilize past information. In addition, when used in combination with partially observed and accumulated partially observed information, the total number of units observed from the past is identified, and certain information without type 1 errors is used for learning. In other words, it contributes to the performance improvement by showing the number of units as much as possible and the units that can be confirmed as correct.
Effect of adversarial learning
The third row of Table 5 shows the overall accuracy performance of DefogGAN when trained without adversarial loss. Without adversarial loss, the overall accuracy performance significantly decreases. MSE increases about 49% (i.e., from 0.00208 to 0.00310). F1 score decreases by 0.194 (i.e., from 0.856 to 0.662). In the area of image generation, learning with adversarial loss generates clearer images than learning with MSE loss [19, 7]. In DefogGAN, we see a similar effect. We conjecture that adversarial loss also helps accurately predict the fully observed states of a game.
Effect of reconstruction loss
Pyramidal reconstruction loss helps to learn fully observed states. Since it measures the difference between a predicted fully observed state and the ground truth at multiple scales, it helps DefogGAN accurately predict the total number of units hidden in the fog.
Effect of observation preserving connection
As shown in the 6th row of Table 5, when trained without observation preserving connection, the overall accuracy performance of DefogGAN significantly decreases. More specifically, MSE increases about 200% (i.e., from 0.00208 to 0.00410). F1 score decreases by 0.340 (i.e., from 0.856 to 0.516). This can be considered as a similar effect that skip connection of U-Net  provides better results by allowing information to flow from input to output.
We have presented DefogGAN, a generative approach for game AI to predict crucial state information unavailable due to the fog of war. DefogGAN accurately generates defogged images of a game that can be used to improve win rates against expert human players. In our experiments with StarCraft, we have validated that DefogGAN achieves a superior performance against state-of-the-art defoggers. Improving on imperfect information during an RTS game play could bring substantially better macro-management overall, although this is an ongoing research area for game AI. DefogGAN is one of the earliest applications for adversarial learning to improve the fog of war problem, and it can be applied to other real-world POMDP problems.
The authors would like to thank Dr. Wonpyo Hong, CEO and President of Samsung SDS, whose vision in AI has led to create SAIDA Lab for advanced AI research.
-  (2017) Wasserstein gan. arXiv preprint arXiv:1701.07875. Cited by: Generative Adversarial Nets (GAN).
-  (2019) Single image haze removal using conditional wasserstein generative adversarial networks. arXiv preprint arXiv:1903.00395. Cited by: Accuracy Comparison.
-  (2018) De novo structure prediction with deeplearning based scoring. Annu Rev Biochem. Cited by: Introduction.
-  (2014) Generative adversarial nets. NeurIPS. Cited by: Introduction, Generative Adversarial Nets (GAN).
-  (2016) Deep residual learning for image recognition. In Proc. of CVPR, pp. 770–778. Cited by: Observation Preserving Connection, Generator.
-  (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167. Cited by: Generator.
-  (2017) Image-to-image translation with conditional adversarial networks. In Proc. of CVPR, pp. 1125–1134. Cited by: Observation Preserving Connection, Effect of adversarial learning.
-  (2018) Clear the fog: combat value assessment in incomplete information games with convolutional encoder-decoders. arXiv preprint arXiv:1811.12627. Cited by: Introduction, StarCraft AI, Accumulating Partial Observations, Accuracy Comparison.
-  (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: Training.
-  (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: Introduction, Generative Approaches for Defogging.
-  (2017) STARDATA: a starcraft ai research dataset. In Thirteenth Artificial Intelligence and Interactive Digital Entertainment Conference, Cited by: Dataset.
-  (2014) Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784. Cited by: Introduction.
-  (2013) Playing atari with deep reinforcement learning. NeurIPS Deep Learning Workshop. Cited by: Introduction, Accumulating Partial Observations.
-  (1982) State of the art—a survey of partially observable markov decision processes: theory, models, and algorithms. Management Science 28 (1), pp. 1–16. Cited by: Introduction.
Rectified linear units improve restricted boltzmann machines. In Proc. of ICML, Cited by: Generator.
Edgeconnect: generative image inpainting with adversarial edge learning. arXiv preprint arXiv:1901.00212. Cited by: Generative Approaches for Defogging.
-  (2013) A survey of real-time strategy game ai research and competition in starcraft. IEEE Trans. on Computational Intelligence and AI in games. Cited by: Introduction, Baseline.
-  (2012) Prediction of early stage opponents strategy for starcraft ai using scouting and machine learning. Proc. of the Workshop at SIGGRAPH Asia. Cited by: StarCraft AI.
-  (2016) Context encoders: feature learning by inpainting. In Proc. of CVPR, pp. 2536–2544. Cited by: Effect of adversarial learning.
-  (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434. Cited by: Discriminator, Accuracy Comparison.
-  (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: Effect of observation preserving connection.
-  (2015) Review of integrated applications with aiml based chatbot. In 2015 International Conference on Computer and Information Engineering (ICCIE), Vol. . Cited by: Introduction.
-  (2014) A scouting strategy for real-time strategy games. In Proc. of the 2014 Conference on Interactive Entertainment, pp. 1–8. Cited by: StarCraft AI.
-  (2016) Mastering the Game of Go with Deep Neural Networks and Tree Search. nature. Cited by: Introduction.
-  (2018) A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science. Cited by: Introduction.
-  (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: Generator.
-  (2018) Forward modeling for partial observation strategy games - a starcraft defogger. NeurIPS. Cited by: Introduction, StarCraft AI, Accumulating Partial Observations, Baseline, Accuracy Comparison.
-  (2011) A particle model for state estimation in real-time strategy games. In Seventh Artificial Intelligence and Interactive Digital Entertainment Conference, Cited by: StarCraft AI.
-  (2018) Macro action selection with deep reinforcement learning in starcraft. arXiv preprint arXiv:1812.00336. Cited by: Introduction, StarCraft AI.