1 Introduction
When we learn to solve a problem, we can learn a policy that directly maps the problem to the solution. This amounts to fast thinking, which underlies reflexive or impulsive behavior, or muscle memory, and it can happen when one is emotional or under time constraint. We may also learn an objective function or value function that assigns values to candidate solutions, and we optimize the objective function by an iterative algorithm to find the most valuable solution. This amounts to slow thinking, which underlies planning, searching or optimal control, and it can happen when one is calm or have time to think through.
While fast thinking policy and slow thinking planning are commonly studied for solving the sequential decision problems such as reinforcement learning [25] and optimal control [4], they can also be used to solve the nonsequential decision problems. For instance, as a reviewer, his or her decision on whether to accept a paper can be based on gut feeling (fast thinking) or careful deliberation (slow thinking), and this is a oneshot nonsequential decision problem. We shall study such a problem in this paper.
Specifically, we shall study the supervised learning of the conditional distribution of a highdimensional output given an input, where the output and input belong to two different modalities. For instance, the output may be an image, while the input may be a class label, a text description, or a sketch. The input defines the problem, and the output is the solution. We also refer to the input as the source or condition, and the output as the target.
We solve this problem by learning two models cooperatively. One model is policylike. It generates the output directly by a nonlinear transformation of the input and a noise vector, where the noise vector is to account for randomness or uncertainty in the output. This amounts to fast thinking because the conditional generation is accomplished by direct sampling. The other model is plannerlike. It learns an objective function in the form of a conditional energy function, so that the output can be generated by optimizing the objective function, or more rigorously by sampling from the conditional energybased model, where the sampling is to account for randomness and uncertainty. This amounts to slow thinking because the sampling is accomplished by an iterative algorithm such as Langevin dynamics, which is an example of Markov chain Monte Carlo (MCMC). We propose to learn the two models jointly, where the policylike model serves to initialize the sampling of the plannerlike model, and the plannerlike model refines the initial solution by an iterative algorithm. The plannerlike model learns from the difference between the refined solution and the observed solution, while the policylike model learns from the difference between the initial solution and the refined solution.
Figure 1 conveys the basic idea. The algorithm iterates two steps, a solving step and a learning step. The solving step consists of two stages: Solvefast: The policylike model generates the initial solution. Solveslow: The plannerlike model refines the initial solution. The learning step also consists of two parts: Learnpolicy: The policylike model learns from how the plannerlike model refines its initial solution. Learnplanner: The plannerlike model updates its objective function by shifting its high value region from the refined solution to the observed solution.
Figure 2(a) illustrates Learnpolicy step. In the Solvefast step, the policylike model generates the latent noise vector, which, together with the input condition, is mapped to the initial solution. In the Learnpolicy step, the policylike model updates its parameters so that it maps the latent vector to the refined solution, in order to absorb the refinement made by the plannerlike model. Because the latent vector is known, it does not need to be inferred and the learning is easy.
Figure 2(b) illustrates Learnplanner step. In the Solveslow step, the plannerlike model finds the refined solution at high value region around a mode of the objective function. In the Learnplanner step, the plannerlike model updates its parameters so that the objective function shifts its high value region around the mode toward the observed solution, so that in the next iteration, the refined solution will get closer to the observed solution.
The plannerlike model shifts its mode toward the observed solution, while inducing the policylike model maps the latent vector to its mode.
Learning a policylike model is like mimicking “how”, while learning a plannerlike model is like trying to understand “why” in terms of goal or value underlying the action.
Why planner? The reason we need a plannerlike model in addition to a policylike model is that it is often easier to learn the objective function than learning to generate the solution directly, since it is always easier to demand or desire something than to actually produce something. Because of its relative simplicity, the learned objective function can be more generalizable than the learned policy. For instance, in an unfamiliar situation, we tend to be tentative, relying on slow thinking planning rather than fast thinking habit.
Efficiency. Even though we use the wording “slow thinking”, it is only relative to “fast thinking”. In fact, the slow thinking planning is usually fast enough, especially if it is jumpstarted by fasting thinking policy, and there is no problem scaling up our method to big datasets. Therefore the time efficiency of the slow thinking method is not a concern.
Studentteacher vs actorcritic. We may consider the policylike model as a student model, and the plannerlike model as a teacher model. The teacher refines the initial solution of the student by a refinement process, and distills the refinement process into the student. This is different from the actorcritic relationship in (inverse) reinforcement learning [1, 37, 9] because the critic does not refine the actor’s solution by a slow thinking process.
Associative memory. The two models may also be considered as associative memory [10]. While the policylike model is like sudden recall, the plannerlike model is like rumination, filling in and playing out details.
We apply our learning method to various conditional image generation tasks. Our experiments show that the proposed method is effective compared to other methods, such as those based on GAN [6].
2 Contributions and related work
This paper proposes a novel method for supervised learning of highdimensional conditional distributions by learning a fast thinking policylike model and a slow thinking plannerlike model. We show the effectiveness of our method on conditional image generation and recovery tasks.
Perhaps more importantly, we propose a different method for conditional learning than GANbased methods. Unlike GAN methods, our method has a learned objective function to guide a slow thinking process for sampling or optimization. The proposed strategy may be applied to a broad range of problems in AI. The interaction between the fast thinking policy and the slow thinking planner can be of interest to cognitive science.
The following are related themes of research.
Inverse reinforcement learning. Although we adopt terminologies in inverse reinforcement learning and inverse optimal control [1, 37] to explain our method, we are concerned with supervised learning instead of reinforcement learning. Unlike the action space in reinforcement learning, the output in our work is of a much higher dimension, a fact that also distinguishes our work from common supervised learning problem such as classification. As a result, the policylike model needs to transform a latent noise vector to generate the initial solution, and this is different from the policy in reinforcement learning, where the policy is defined by the conditional distribution of action given state, without resorting to a latent vector.
Conditional random field. The objective function and the conditional energybased model can also be considered a form of conditional random field [14]. Unlike traditional conditional random field, our conditional energy function is defined by a deep network, and its sampling process is jumpstarted by a policylike model.
Multimodal generative learning
. Learning joint probability distribution of signals of different modalities enables us to recover or generate one modality based on other modalities. For example,
[33] learns a dualwing harmoniums model for image and text data. [20] learns stacked multimodal autoencoder on video and audio data. [24]learns a multimodal deep Boltzmann machine for joint image and text modeling. Our work focuses on the conditional distribution of one modality given another modality, and our method involves the cooperation between two types of models.
Conditional adversarial learning. A popular method of multimodal learning is conditional GANs, where both the generator and discriminator networks are conditioned on the source signal, such as discrete class labels, text, etc. For example, [18, 5] use conditional GAN for image synthesis based on class labels. [21, 35] study textconditioned image synthesis. Other examples include multimodal imagetoimage mapping [28]
[11, 36, 17], and superresolution
[16]. Our work studies similar problems. The difference is that our method is based on a conditional energy function and an iterative algorithm guided by this objective function. Existing adversarial learning methods, including those in inverse reinforcement learning [9], do not involve this slow thinking planning process.Cooperative learning. Just as the conditional GAN is inspired by the original GAN [6], our learning method is inspired by the recent work of [30], where the models are unconditioned. While unconditioned generation is interesting, conditional generation and recovery is much more useful in applications. The conditional distributions are also much better behaved than the unconditioned distributions, because the former can be much less multimodal, so that the conditional sampling and learning can be easier and more stable.
3 Conditional learning
Let be the dimensional signal of the target modality, and be the signal of the source modality, where “C” stands for “condition”. defines the problem, and is the solution. Our goal is to learn the conditional distribution of the target signal (solution) given the source signal (problem) as the condition. We shall learn from the training dataset of the pairs with the fast thinking policylike model and slow thinking plannerlike model.
3.1 Slow thinking plannerlike model
The plannerlike model is based an objective function or value function defined on . can be defined by a bottomup convolutional network (ConvNet) where collects all the weight and bias parameters. defines a joint energybased model [31]:
(1) 
where is the normalizing constant.
Fixing the source signal , defines the value of the solution for the problem defined by , and defines the conditional energy function. The conditional probability is given by
(2) 
where . The learning of this model seeks to maximize the conditional loglikelihood function
(3) 
whose gradient is
(4) 
where denotes the expectation with respect to . The identity underlying (4) is
The expectation in (4) is analytically intractable and can be approximated by drawing samples from and then computing the Monte Carlo average. This can be solved by an iterative algorithm, which is a slow thinking process. One solver is the Langevin dynamics for sampling . It iterates the following step:
(5) 
where indexes the time steps of the Langevin dynamics, is the step size, and
is Gaussian white noise.
is the dimensionality of . A MetropolisHastings acceptancerejection step can be added to correct for finite . The Langevin dynamics is gradient descent on the energy function, plus noise for diffusion so that it samples the distribution instead of being trapped in the local modes.For each observed condition , we run the Langevin dynamics according to (5) to obtain the corresponding synthesized example as a sample from . The Monte Carlo approximation to is
(6) 
We can then update .
Value shift: The above gradient ascent algorithm is to increase the average value of the observed solutions versus that of the refined solutions, i.e., on average, it shifts high value region or mode of from the generated solution toward the observed solution .
The convergence of such a stochastic gradient ascent algorithm has been studied by [34].
3.2 Fast thinking policylike model
The policylike model is of the following form:
(7) 
where is the dimensional latent noise vector, and is a topdown ConvNet defined by the parameters . The ConvNet maps the latent noise vector and the observed condition to the signal directly. If the source signal is of high dimensionality, we can parametrize by an encoderdecoder structure: we first encode into a latent vector , and then we map to by a decoder. Given , we can generate from the conditional generator model by direct sampling, i.e., first sampling from its prior distribution, and then mapping into directly. This is fast thinking without iteration.
We can learn the policylike model from the training pairs by maximizing the conditional loglikelihood , where . The learning algorithm iterates the following two steps. (1) Sample from by Langevin dynamics. (2) Update by gradient descent on . See [7] for details.
3.3 Cooperative training
The policylike model and the plannerlike model cooperate with each other as follows.
(1) The policylike model supplies initial samples for the MCMC of the plannerlike model. For each observed condition input , we first generate , and then generate the initial solution . If the current policylike model is close to the current plannerlike model, then the generated should be a good initialization for sampling from the plannerlike model , i.e., starting from the initial solutions , we run Langevin dynamics for steps to get the refined solutions . These serve as the synthesized examples from the plannerlike model and are used to update in the same way as we learn the plannerlike model in equation (6) for value shifting.
(2) The policylike model then learns from the MCMC. Specifically, the policylike model treats produced by the MCMC as the training data. The key is that these are obtained by the Langevin dynamics initialized from the , which are generated by the policylike model with known latent noise vectors . Given , we can learn by minimizing , which is a nonlinear regression of on . This can be accomplished by gradient descent
(8) 
Mapping shift: Initially maps to the initial solution . After updating , maps to the refined solution . Thus the updating of absorbs the MCMC transitions that change to . In other words, we distill the MCMC transitions of the refinement process into .
Algorithm 1 presents a description of the conditional learning with two models. See Figures 1 and 2 for illustrations.
See the supplementary materials for a theoretical understanding of our learning method.
4 Experiments
We test the proposed framework for multimodal conditional learning on a variety of tasks.
4.1 Experiment 1: Category Image
4.1.1 Conditional image generation
We start form learning the conditional distribution of an image given a category or class label. We learn the two models jointly on 30,000 MNIST handwritten digit images conditioned on their class labels, which are encoded as onehot vectors.
In the policylike model, we concatenate the 10dimensional onehot vector with the dimensional latent noise vector sampled from as the input of the topdown ConvNet to build a conditional generator . The generator maps the 110dimensional input (i.e., ) into the digit image of size by 4 layers of deconvolutions with kernels, with upsampling factors from top to bottom. The numbers of channels at different layers are
from top to bottom. Batch normalization and ReLU layers are used between deconvolution layers and tanh nonlinearity is added at the bottom layer.
To build the plannerlike model, we first use a decoder parametrized by to decode the onehot vector into a “template image” and perform channel concatenation with the target image . The value function is defined by a bottomup ConvNet that maps the class decoding and the target image to the value. The decoder has the same structure as the generator in the policylike model except that the input is a 10dimensional onehot vector. We parametrize by 3 layers of convolutions with kernels, with downsampling factors from bottom to top, followed by a fullyconnected layer. The numbers of channels at different layers are 64, 128, 256, and 100 from bottom to top.
We use the Adam [12] for optimization. The joint models are trained with minibatches of size 100. Figure 3 shows some of the generated samples conditioned on the class labels after training. Each row is conditioned on one label and each column is a different generated sample.
To evaluate the learned conditional distribution, Table 1
shows Gaussian Parzen window loglikelihood estimates of the MNIST
[15] test set. We sample 10,000 examples from the learned conditional distribution by first sampling the class label from the uniform prior distribution, and from , then the policylike model and the plannerlike model cooperatively generate the synthesized example from the sampled and. A Gaussian Parzen window is fitted to these synthesized examples, and then the loglikelihod of the test set using the Parzen window distribution is estimated. The standard deviation of the Gaussians is obtained by cross validations. We follow the same procedure as
[6] for computing the loglikelihood estimates for fair comparison.Model  loglikelihood 

DBN [3]  138 2.0 
Stacked CAE [3]  121 1.6 
Deep GSN [2]  214 1.1 
GAN [6]  225 2.0 
Conditional GAN [18]  132 1.8 
ours  226 2.1 
We also test the proposed framework on Cifar10 [13] object dataset, which contains 60k training images of pixels, with the same architecture mentioned above. Figure 4 shows the generated object patterns. Each row is conditioned on one category. The first two columns display some typical training examples, while the rest columns show generated images conditional on labels. We evaluate the learned conditional distribution by computing the inception scores of the generated examples. Table 2 compares our framework against some baselines for conditional learning. It can be seen that in the proposed cooperative framework, the solution provided by the policylike model can be further refined by the plannerlike model.
4.1.2 Disentangling style and content
A realistic conditional generative model can be useful for exploring the underlying structure of the data by manipulation of the latent variables and the condition variable. In this section, we investigate disentanglement of content and style. The onehot vector in the policylike model mainly accounts for content information, such as label, but ti does not account for style, e.g. shape, rotation, size, etc. Therefore, in order to generate realistic and diverse images, the policylike model must learn to use noise sample (i.e., latent variables) to capture style variations.
In this experiment, we train a plannerlike model jointly with a policylike model with a twodimensional latent noise vector from MNIST dataset. With the learned models, we first use the policylike model to generate images by fixing the category label , while varying the latent vector over a range , where we discretize both and into 10 equally spaced values, and then use the plannerlike model to refine each generated examples with the corresponding category label . Figure 5 displays two examples of visualization of handwriting styles with category labels set to be digit 4 and digit 9 respectively. In both examples, the nearby regions of latent space corresponds to similar handwriting styles, which are independent of the category labels.
4.1.3 Style transfer
We demonstrate that the learned model can perform style transfer from an unseen testing image onto other categories. The models are trained on SVHN [19] dataset that contains 10 classes of digits collected from street view house numbers. With the learned policylike model, we first infer the latent variables corresponding to that testing image. We then fix the inferred latent vector, change the category label , and generate the different categories of images with the same style as the testing image by the learned model. Given a testing image with known category label , the inference of the latent vector can be performed by directly sampling from the posterior distribution via Langevin dynamics, which iterates
(9) 
If the category label of the testing image is unknown, we need to infer both and from . Since is a onehot vector, in order to adopt a gradientbased method to infer , we adopt a continuous approximation by reparametrizing using a softMax transformation on the auxiliary continuous variables . Specifically, let and , we reparametrize where , for , and assume the prior for to be . Then the Langevin dynamics for sampling iterates
(10) 
Figure 6 shows 8 results of style transfer. For each testing image , we infer and by sampling , which iterates (1) , and (2) where , with randomly initialized and . We then fix the inferred latent vector , change the category label , and generate images from the combination of and by the learned models. This again demonstrates the disentanglement of style from category.
4.2 Experiment 2: Image Image
4.2.1 Semantic labels Scene images
We study learning conditional distribution for imagetoimage synthesis by our framework. The expereiments are conducted on CMP Facades dataset [26] where each building facade image is associated with an image of architectural labels. In the policylike model, we first sample from the Gaussian noise prior , and we encode the conditional image via an encoder parametrized by . The image embedding is then concatenated to the latent noise vector . After this, we generate target image by a generator . We disign the policylike model by following a general shape of a “UNet” [22] in this experiment. In the plannerlike model, we first perform channel concatenation on target image and conditional image , where both images are of size . The value function is then defined by a 4layer bottomup ConvNet , which maps the 6channel “image” to value score by 3 convolutional layers with numbers of channels , filter sizes and subsampling factors at different layers (from bottom to top), and one fully connected layer with 100 single filers. Leaky ReLU layers are used between convolutional layers.
Figure 7 shows some qualitative results of generating building facade images from the semantic labels. The first row displays 4 semantic label images that are unseen in the training data. The second row displays the corresponding ground truth images for reference. The results by a baseline method [11] are shown in the third row for comparison. The fourth and fifth rows show the generated results conditioned on the images shown in the first row by the learned policylike model and plannerlike model respectively. Please see the supplementary materials for more highresolution results.
4.2.2 Sketch images (or edge images) Photo images
We next test the model on CUHK Face Sketch database (CUFS) [29], where for each face, there is a sketch drawn by an artist based on a photo of the face. We learn to recover the color face images from the sketch images by the proposed framework. Figure 8(a) displays the face image synthesis results conditioned on the sketch images. The 1st, 2nd, and 3rd columns show some sketch images, while the 4th, 5th, and 6th columns show the corresponding recovered images obtained by sampling from the conditional distribution.
Figure 8
(b) demonstrates the learned sketch (condition) manifold by showing 3 examples of interpolation. For each row, the sketch images at the two ends are first encoded into the embedding by
, and then each face image in the middle is obtained by first interpolating the sketch embedding, and then generating the images using the policylike model with fixed noise, and eventually refining the results by the plannerlike model. Even though there is no groundtruth sketch images for the intervening points, the generated faces appear plausible. Since the noise is fixed, the only changing factor is the sketch embedding. We observe smooth changing of the outline of the generated faces.We conduct another experiment on UT Zappos50K dataset [26] for photo image recovery from edge image. The dataset contains 50k training images of shoes. Edge images are computed by HED edge detector [32] with post processing. We use the same model structure as the one in the last experiment. Figure 9 shows some qualitative results of synthesizing shoe images from edge images.
4.2.3 Occluded images Inpainted images
We also test our method on the task of image inpainting by learning a mapping from an occluded image (256
256 pixels), where a mask with the size of pixels is centrally placed, to its original version. We use the CMP Facades datset. Figure 10 shows a qualitative comparison of our method and a baseline method [11]. Table 3 shows quantitative results where the recovery performance is measured by PSNR and SSIM, which are computed between the occlusion regions of the generated example and the ground truth example. The batch size is one. Our method outperforms the baseline in this recovery task. Please see the supplementary materials for more results.PSNR  SSIM  

pixel2pixel [11]  19.3411  0.739 
ours  20.4678  0.767 
5 Conclusion
This paper addresses the problem of highdimensional conditional learning and proposes a learning method that couples a fast thinking policylike model and a slow thinking plannerlike model. The policylike model initializes the iterative optimization or sampling process of the plannerlike model, while the plannerlike model in return teaches the policylike model by distilling its iterative algorithm into the policylike model. We demonstrate the proposed method on a variety of image synthesis and recovery tasks.
Compared to GANbased method, our method is equipped with an extra iterative sampling and optimization algorithm to refine the solution, guided by a learned objective function. This may prove to be a powerful method for solving challenging conditional learning problems. Integrating fast thinking and slow thinking may also be of interest to cognitive science.
Appendix: Theoretical understanding
KullbackLeibler divergence
The KullbackLeibler divergence between two conditional distributions and is defined as
(11)  
(12) 
where the expectation is over the joint distribution
.Slow thinking plannerlike model
The slow thinking plannerlike model is
(13) 
where
(14) 
is the normalizing constant and is analytically intractable.
Suppose the training examples are generated by the true joint distribution , whose conditional distribution is .
For large sample , the maximum likelihood estimation of is to minimize the KullbackLeibler divergence
(15) 
In practice, the expectation with respect to is approximated by the sample average. The difficulty with is that the term is analytically intractable, and its derivative has to be approximated by MCMC sampling from the model .
Fast thinking policylike model
The fast thinking policylike model is
(16) 
We use the notation to denote the resulting conditional distribution. It is obtained by
(17) 
which is analytically intractable.
For large sample, the maximum likelihood estimation of is to minimize the KullbackLeibler divergence
(18) 
Again, the expectation with respect to is approximated by the sample average. The difficulty with is that is analytically intractable, and its derivative has to be approximated by MCMC sampling of the posterior .
Value shift: modified contrastive divergence
Let be the transition kernel of the finitestep MCMC that refines the initial solution to the refined solution . Let be the distribution obtained by running the finitestep MCMC from .
Given the current policylike model , the value shift updates to
, and the update approximately follows the gradient of the following modified contrastive divergence
[8, 30](19) 
Compare (19) with the MLE (13), (19) has the second divergence term to cancel the term, so that its derivative is analytically tractable. The learning is to shift or its high value region around the mode from the refined solution provided by toward the observed solution given by . If is close to , then the second divergence is close to zero, and the learning is close to MLE update.
Mapping shift: distilling MCMC
Given the current plannerlike model , the mapping shift updates to , and the update approximately follows the gradient of
(20) 
This update distills the MCMC transition into the model . In the idealized case where the above divergence can be minimized to zero, then . The limiting distribution of the MCMC transition is , thus the cumulative effect of the above update is to lead close to .
Compare (20) to the MLE (16), the training data distribution becomes instead of . That is, learns from how refines it. The learning is accomplished by mapping shift where the generated latent vector is known, thus does not need to be inferred (or the Langevin inference algorithm can initialize from the generated ). In contrast, if we are to learn from , we need to infer the unknown by sampling from the posterior distribution.
In the limit, if the algorithm converges to a fixed point, then the resulting minimizes , that is, seeks to be the stationary distribution of the MCMC transition , which is .
If the learned is close to , then is even closer to . Then the learned is close to MLE because the second divergence term in (19) is close to zero.
References

[1]
P. Abbeel and A. Y. Ng.
Apprenticeship learning via inverse reinforcement learning.
In
Proceedings of the Twentyfirst International Conference on Machine Learning (ICML)
, pages 1–8, 2004.  [2] Y. Bengio, E. Laufer, G. Alain, and J. Yosinski. Deep generative stochastic networks trainable by backprop. In International Conference on Machine Learning, pages 226–234, 2014.
 [3] Y. Bengio, G. Mesnil, Y. Dauphin, and S. Rifai. Better mixing via deep representations. In International Conference on Machine Learning, pages 552–560, 2013.
 [4] D. P. Bertsekas, D. P. Bertsekas, D. P. Bertsekas, and D. P. Bertsekas. Dynamic programming and optimal control, volume 1. Athena scientific Belmont, MA, 2005.
 [5] E. L. Denton, S. Chintala, R. Fergus, et al. Deep generative image models using a laplacian pyramid of adversarial networks. In Advances in Neural Information Processing Systems, pages 1486–1494, 2015.
 [6] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2672–2680, 2014.
 [7] T. Han, Y. Lu, S.C. Zhu, and Y. N. Wu. Alternating backpropagation for generator network. 2017.
 [8] G. E. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):1771–1800, 2002.

[9]
J. Ho and S. Ermon.
Generative adversarial imitation learning.
In Advances in Neural Information Processing Systems, pages 4565–4573, 2016.  [10] J. J. Hopfield. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 79(8):2554–2558, 1982.

[11]
P. Isola, J.Y. Zhu, T. Zhou, and A. A. Efros.
Imagetoimage translation with conditional adversarial networks.
 [12] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 [13] A. Krizhevsky. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
 [14] J. Lafferty, A. McCallum, and F. C. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. 2001.
 [15] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.

[16]
C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta,
A. Aitken, A. Tejani, J. Totz, Z. Wang, et al.
Photorealistic single image superresolution using a generative
adversarial network.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pages 4681–4690, 2017.  [17] M.Y. Liu, T. Breuel, and J. Kautz. Unsupervised imagetoimage translation networks. In Advances in Neural Information Processing Systems, pages 700–708, 2017.
 [18] M. Mirza and S. Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.

[19]
Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng.
Reading digits in natural images with unsupervised feature learning.
In
NIPS workshop on deep learning and unsupervised feature learning
, volume 2011, page 5, 2011.  [20] J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng. Multimodal deep learning. In Proceedings of the 28th international conference on machine learning (ICML11), pages 689–696, 2011.
 [21] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee. Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396, 2016.
 [22] O. Ronneberger, P. Fischer, and T. Brox. Unet: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computerassisted intervention, pages 234–241. Springer, 2015.
 [23] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved techniques for training gans. In Advances in Neural Information Processing Systems, pages 2234–2242, 2016.
 [24] N. Srivastava and R. R. Salakhutdinov. Multimodal learning with deep boltzmann machines. In Advances in neural information processing systems, pages 2222–2230, 2012.
 [25] R. S. Sutton and A. G. Barto. Introduction to Reinforcement Learning, volume 135. MIT press Cambridge, 1998.
 [26] R. Tyleček and R. Šára. Spatial pattern templates for recognition of objects with regular structure. In German Conference on Pattern Recognition, pages 364–374. Springer, 2013.
 [27] D. Wang and Q. Liu. Learning to draw samples: With application to amortized mle for generative adversarial learning. arXiv preprint arXiv:1611.01722, 2016.
 [28] X. Wang and A. Gupta. Generative image modeling using style and structure adversarial networks. In European Conference on Computer Vision, pages 318–335. Springer, 2016.
 [29] X. Wang and X. Tang. Face photosketch synthesis and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(11):1955–1967, 2009.
 [30] J. Xie, Y. Lu, R. Gao, and Y. N. Wu. Cooperative learning of energybased model and latent variable model via mcmc teaching. In AAAI, 2018.
 [31] J. Xie, Y. Lu, S.C. Zhu, and Y. N. Wu. A theory of generative convnet. In International Conference on Machine Learning, 2016.
 [32] S. Xie and Z. Tu. Holisticallynested edge detection. In Proceedings of the IEEE international conference on computer vision, pages 1395–1403, 2015.
 [33] E. P. Xing, R. Yan, and A. G. Hauptmann. Mining associated text and images with dualwing harmoniums. arXiv preprint arXiv:1207.1423, 2012.
 [34] L. Younes. On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates. Stochastics: An International Journal of Probability and Stochastic Processes, 65(34):177–228, 1999.
 [35] H. Zhang, T. Xu, H. Li, S. Zhang, X. Huang, X. Wang, and D. Metaxas. Stackgan: Text to photorealistic image synthesis with stacked generative adversarial networks. In IEEE Int. Conf. Comput. Vision (ICCV), pages 5907–5915, 2017.
 [36] J.Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired imagetoimage translation using cycleconsistent adversarial networks.

[37]
B. D. Ziebart, A. L. Maas, J. A. Bagnell, and A. K. Dey.
Maximum entropy inverse reinforcement learning.
In
TwentyThird AAAI Conference on Artificial Intelligence
, pages 1433–1438, 2008.
Comments
There are no comments yet.