Tensorflow implementation for paper "Adversarial Text Generation via Feature-Mover’s Distance"
Generative adversarial networks (GANs) have achieved significant success in generating real-valued data. However, the discrete nature of text hinders the application of GAN to text-generation tasks. Instead of using the standard GAN objective, we propose to improve text-generation GAN via a novel approach inspired by optimal transport. Specifically, we consider matching the latent feature distributions of real and synthetic sentences using a novel metric, termed the feature-mover's distance (FMD). This formulation leads to a highly discriminative critic and easy-to-optimize objective, overcoming the mode-collapsing and brittle-training problems in existing methods. Extensive experiments are conducted on a variety of tasks to evaluate the proposed model empirically, including unconditional text generation, style transfer from non-parallel text, and unsupervised cipher cracking. The proposed model yields superior performance, demonstrating wide applicability and effectiveness.READ FULL TEXT VIEW PDF
The Generative Adversarial Network (GAN) has achieved great success in
Despite success on a wide range of problems related to vision, generativ...
Text generation with generative adversarial networks (GANs) can be divid...
Text generation is of particular interest in many NLP applications such ...
Generative feature matching network (GFMN) is an approach for training
Modern generative learning is mainly associated with Generative Adversar...
Automatically generating coherent and semantically meaningful text has m...
Tensorflow implementation for paper "Adversarial Text Generation via Feature-Mover’s Distance"
Natural language generation is an important building block in many applications, such as machine translation , dialogue generation , and image captioning . While these applications demonstrate the practical value of generating coherent and meaningful sentences in a supervised setup, unsupervised
text generation, which aims to estimate the distribution of real text from a corpus, is still challenging. Previous approaches, that often maximize the log-likelihood of each ground-truth word given prior observed words, typically suffer from exposure bias [6, 47], i.e., the discrepancy between training and inference stages. During inference, each word is generated in sequence based on previously generated words, while during training ground-truth words are used for each timestep [27, 53, 58].
Recently, adversarial training has emerged as a powerful paradigm to address the aforementioned issues. The generative adversarial network (GAN)  matches the distribution of synthetic and real data by introducing a two-player adversarial game between a generator and a discriminator. The generator is trained to learn a nonlinear function that maps samples from a given (simple) prior distribution to synthetic data that appear realistic, while the discriminator aims to distinguish the fake data from real samples. GAN can be trained efficiently via back-propagation through the nonlinear function of the generator, which typically requires the data to be continuous (e.g.
, images). However, the discrete nature of text renders the model non-differentiable, hindering use of GAN in natural language processing tasks.
Attempts have been made to overcome such difficulties, which can be roughly divided into two categories. The first includes models that combine ideas from GAN and reinforcement learning (RL), framing text generation as a sequential decision-making process. Specifically, the gradient of the generator is estimated via the policy-gradient algorithm. Prominent examples from this category include SeqGAN, MaliGAN , RankGAN , LeakGAN  and MaskGAN 
. Despite the promising performance of these approaches, one major disadvantage with such RL-based strategies is that they typically yield high-variance gradient estimates, known to be challenging for optimization[40, 61].
Models from the second category adopt the original framework of GAN without incorporating the RL methods (i.e., RL-free). Distinct from RL-based approaches, TextGAN  and Gumbel-Softmax GAN (GSGAN)  apply a simple soft-argmax operator, and a similar Gumbel-softmax trick [28, 40], respectively, to provide a continuous approximation of the discrete distribution (i.e., multinomial) on text, so that the model is still end-to-end differentiable. What makes this approach appealing is that it feeds the optimizer with low-variance gradients, improving stability and speed of training. In this work, we aim to improve the training of GAN that resides in this category.
When training GAN to generate text samples, one practical challenge is that the gradient from the discriminator often vanishes after being trained for only a few iterations. That is, the discriminator can easily distinguish the fake sentences from the real ones. TextGAN  proposed a remedy based on feature matching , adding Maximum Mean Discrepancy (MMD) to the original objective of GAN . However, in practice, the model is still difficult to train. Specifically, (i) the bandwidth of the RBF kernel is difficult to choose; (ii) kernel methods often suffer from poor scaling; and (iii) empirically, TextGAN tends to generate short sentences.
In this work, we present feature mover GAN (FM-GAN), a novel adversarial approach that leverages optimal transport (OT) to construct a new model for text generation. Specifically, OT considers the problem of optimally transporting one set of data points to another, and is closely related to GAN. The earth-mover’s distance (EMD) is employed often as a metric for the OT problem. In our setting, a variant of the EMD between the feature distributions of real and synthetic sentences is proposed as the new objective, denoted as the feature-mover’s distance (FMD). In this adversarial game, the discriminator aims to maximize the dissimilarity of the feature distributions based on the FMD, while the generator is trained to minimize the FMD by synthesizing more-realistic text. In practice, the FMD is turned into a differentiable quantity and can be computed using the proximal point method .
The main contributions of this paper are as follows: (i) A new GAN model based on optimal transport is proposed for text generation. The proposed model is RL-free, and uses a so-called feature-mover’s distance as the objective. (ii) We evaluate our model comprehensively on unconditional text generation. When compared with previous methods, our model shows a substantial improvement in terms of generation quality based on the BLEU statistics  and human evaluation. Further, our model also achieves good generation diversity based on the self-BLEU statistics . (iii) In order to demonstrate the versatility of the proposed method, we also generalize our model to conditional-generation tasks, including non-parallel text style transfer , and unsupervised cipher cracking .
We review the basic idea of adversarial distribution matching (ADM), which avoids the specification of a likelihood function. Instead, this strategy defines draws from the synthetic data distribution by drawing a latent code from an easily sampled distribution , and learning a generator function such that . The form of is neither specified nor learned, rather we learn to draw samples from . To match the ensemble of draws from with an ensemble of draws from the real data distribution , ADM introduces a variational function , where is known as the critic function or discriminator. The goal of ADM is to obtain an equilibrium of the following objective:
where is computed using samples from and (not explicitly in terms of the distributions themselves), and defines a discrepancy metric between two distributions [3, 42]. One popular example of ADM is the generative adversarial network (GAN), in which recovers the Jensen-Shannon divergence (JSD) for ; expectations and are computed approximately with samples from the respective distributions. Most of the existing work in applying GAN for text generation also uses this standard form, by combining it with policy gradient . However, it has been shown in  that this standard GAN objective suffers from an unstably weak learning signal when the discriminator gets close to local optimal, due to the gradient-vanishing effect. This is because the JSD implied by the original GAN loss is not continuous wrt the generator parameters.
GAN models were originally developed for learning to draw from a continuous distribution. The discrete nature of text samples hinders the use of GANs, and thus a vectorization of a sequence of discrete tokens is considered. Letbe a sentence of length , where denotes the one-hot representation for the -th word. A word-level vector representation of each word in is achieved by learning a word embedding matrix , where is the size of the vocabulary. Each word is represented as a -dimensional vector . The sentence is now represented as a matrix
. A neural network, such as RNN [5, 10], CNN [29, 18, 52] or SWEM , can then be applied to extract feature vector .
GAN can be interpreted in the framework of optimal transport theory, and it has been shown that the Earth-Mover’s Distance (EMD) is a good objective for generative modeling 
. Originally applied in content-based image retrieval tasks, EMD is well-known for comparing multidimensional distributions that are used to describe the different features of images (, brightness, color, and texture content). It is defined as the ground distance (i.e.
, cost function) between every two perceptual features, extending the notion of a distance between single elements to a distance between sets of elements. Specifically, consider two probability distributionand ; EMD can be then defined as:
denotes the set of all joint distributionswith marginals and , and is the cost function (e.g., Euclidean or cosine distance). Intuitively, EMD is the minimum cost that has to transport from to .
We propose a new GAN framework for discrete text data, called feature mover GAN (FM-GAN). The idea of optimal transport (OT) is integrated into adversarial distribution matching. Explicitly, the original critic function in GANs is replaced by the Earth-Mover’s Distance (EMD) between the sentence features of real and synthetic data. In addition, to handle the intractable issue when computing (2) [3, 49], we define the Feature-Mover’s Distance (FMD), a variant of EMD that can be solved tractably using the Inexact Proximal point method for OT (IPOT) algorithm . In the following sections, we discuss the main objective of our model, the detailed training process for text generation, as well as extensions. Illustration of the framework is shown in Figure 1.
In practice, it is not tractable to calculate the minimization over in (2) [3, 19, 50]. In this section, we propose the Feature-Mover’s Distance (FMD) which can be solved tractably. Consider two sets of sentence feature vectors and drawn from two different sentence feature distributions and ; and are the total number of -dimensional sentence features in and , respectively. Let be a transport matrix in which defines how much of feature vector would be transformed to . The FMD between two sets of sentence features is then defined as:
where and are the constraints, and represents the Frobenius dot-product. In this work, the transport cost is defined as the cosine distance: , and is the cost matrix such that . Note that during training, we set as the mini-batch size.
We propose to use the Inexact Proximal point method for Optimal Transport (IPOT) algorithm to compute the optimal transport matrix , which provides a solution to the original optimal transport problem (3) . Specifically, IPOT iteratively solves the following optimization problem:
where denotes the Bregman divergence wrt the entropy functional .
Here the Bregman divergence serves as a proximity metric and is the proximity penalty. This problem can be solved efficiently by Sinkhorn-style proximal point iterations [13, 59], as detailed in Algorithm 1.
Notably, unlike the Sinkhorn algorithm , we do not need to back-propagate the gradient through the proximal point iterations, which is justified by the Envelope Theorem  (see the Supplementary Material (SM)). This accelerates the learning process significantly and improves training stability .
To integrate FMD into adversarial distribution matching, we propose to solve the following mini-max game:
where is the sentence feature extractor, and is the sentence generator. We call this feature mover GAN (FM-GAN). The detailed training procedure is provided in Algorithm 2.
Sentence generator25] is used as our sentence generator parameterized by . Let be our learned word embedding matrix, where is the vocabulary size, with each word in sentence embedded into , a -dimensional word vector. All words in the synthetic sentence are generated sequentially, i.e.,
where is the hidden unit updated recursively through the LSTM cell: , is a decoding matrix, defines the distribution over the vocabulary. Note that, distinct from a traditional sentence generator, here, the argmax operation is used, rather than sampling from a multinomial distribution, as in the standard LSTM. Therefore, all randomness during the generation is clamped into the noise vector .
The generator cannot be trained, due to the non-differentiable function argmax. Instead, an soft-argmax operator  is used as a continuous approximation:
where is the temperature parameter. Note when , this approximates (6). We denote as the approximated embedding matrix for the synthetic sentence.
We use the convolutional neural network proposed in[11, 29] as our sentence feature extractor parameterized by
, which contains a convolution layer and a max-pooling layer. Assuming a sentence of length, the sentence is represented as a matrix , where is the word-embedding dimension, and is the maximum sentence length. A convolution filter is applied to a window of
words to produce new features. After applying the nonlinear activation function, we then use the max-over-time pooling operation to the feature maps and extract the maximum values. While the convolution operator can extract features independent of their positions in the sentence, the max-pooling operator tries to capture the most salient features.
The above procedure describes how to extract features using one filter. Our model uses multiple filters with different window sizes, where each filter is considered as a linguistic feature detector. Assume different window sizes, and for each window size we have filters; then a sentence feature vector can be represent as , where .
Style transfer Our FM-GAN model can be readily generalized to conditional generation tasks, such as text style transfer [26, 54, 44, 35]. The style transfer task is essentially learning the conditional distribution and , where and represent the labels for different styles, with and sentences in different styles. Assuming and are conditionally independent given the latent code , we have:
) suggests an autoencoder can be applied for this task. From this perspective, we can apply our optimal transport method in the cross-aligned autoencoder, by replacing the standard GAN loss with our FMD critic. We follow the same idea as  to build the style transfer framework. is our encoder that infers the content from given style and sentence ; is our decoder that generates synthetic sentence , given content and style . We add the following reconstruction loss for the autoencoder:
where and are the empirical data distribution for each style. We also need to implement adversarial training on the generator with discrete data. First, we use the soft-argmax approximation discussed in Section 3.2; second, we also use Professor-Forcing  algorithm to match the sequence of LSTM hidden states. That is, the discriminator is designed to discriminate with real sentence . Unlike  which uses two discriminators, our model only needs to apply the FMD critic twice to match the distributions for two different styles:
where is the learned word embedding matrix. The final objective function for this task is: , where
is a hyperparameter that balances these two terms.
Unsupervised decipher Our model can also be used to tackle the task of unsupervised cipher cracking by using the framework of CycleGAN . In this task, we have two different corpora, i.e., denotes the original sentences, and denotes the encrypted corpus using some cipher code, which is unknown to our model. Our goal is to design two generators that can map one corpus to the other, i.e., , . Unlike the style-transfer task, we define and as two sentence feature extractors for the different corpora. Here we denote to be the empirical distribution of the original corpus, and to be the distribution of the encrypted corpus. Following , we design two losses: the cycle-consistency loss (reconstruction loss) and the adversarial feature matching loss. The cycle-consistency loss is defined on the feature space as:
where denotes the -norm, and is the word embedding matrix. The adversarial loss aims to help match the generated samples with the target:
The final objective function for the decipher task is: , where is a hyperparameter that balances the two terms.
GAN for text generation SeqGAN , MaliGAN , RankGAN , and MaskGAN  use reinforcement learning (RL) algorithms for text generation. The idea behind all these works are similar: they use the REINFORCE algorithm to get an unbiased gradient estimator for the generator, and apply the roll-out policy to obtain the reward from the discriminator. LeakGAN  adopts a hierarchical RL framework to improve text generation. However, it is slow to train due to its complex design. For GANs in the RL-free category, GSGAN  and TextGAN 
use the Gumbel-softmax and soft-argmax trick, respectively, to deal with discrete data. While the latter uses MMD to match the features of real and synthetic sentences, both models still keep the original GAN loss function, which may result in the gradient-vanishing issue of the discriminator.
GAN with OT Wasserstein GAN (WGAN) [3, 23] applies the EMD by imposing the constraint on the discriminator, which alleviates the gradient-vanishing issue when dealing with continuous data (i.e., images). However, for discrete data (i.e., text), the gradient still vanishes after a few iterations, even when weight-clipping or the gradient-penalty is applied on the discriminator . Instead, the Sinkhorn divergence generative model (Sinkhorn-GM)  and Optimal transport GAN (OT-GAN)  optimize the Sinkhorn divergence , defined as an entropy regularized EMD (2): , where is the entropy term, and is the hyperparameter. While the Sinkhorn algorithm  is proposed to solve this entropy regularized EMD, the solution is sensitive to the value of the hyperparameter , leading to a trade-off between computational efficiency and training stability. Distinct from that, our method uses IPOT to tackle the original problem of OT. In practice, IPOT is more efficient than the Sinkhorn algorithm, and the hyperparameter in (4) only affects the convergence rate .
We apply the proposed model to three application scenarios: generic (unconditional) sentence generation, conditional sentence generation (with pre-specified sentiment), and unsupervised decipher. For the generic sentence generation task, we experiment with three standard benchmarks: CUB captions , MS COCO captions , and EMNLP2017 WMT News .
Since the sentences in the CUB dataset are typically short and have similar structure, it is employed as our toy evaluation. For the second dataset, we sample sentences from the original MS COCO captions. Note that we do not remove any low-frequency words for the first two datasets, in order to evaluate the models in the case with a relatively large vocabulary size. The third dataset is a large long-text collection from EMNLP2017 WMT News Dataset. To facilitate comparison with baseline methods, we follow the same data preprocessing procedures as in . The summary statistics of all the datasets are presented in Table 1.
|MS COCO captions||120,000||10,000||27,842||11|
|EMNLP2017 WMT News||278,686||10,000||5,728||28|
For conditional text generation, we consider the task of transferring an original sentence to the opposite sentiment, in the case where parallel (paired) data are not available. We use the same data as introduced in . For the unsupervised decipher task, we follow the experimental setup in CipherGAN  and evaluate the model improvement after replacing the critic with the proposed FMD objective.
, and human evaluation as the evaluation metrics for the generic sentence generation task. To ensure fair comparison, we perform extensive comparisons with several strong baseline models using the benchmark tool in Texygen. For the non-parallel text style transfer experiment, following [26, 54]
, we use a pretrained classifier to calculate the sentiment accuracy of transferred sentences. We also leverage human evaluation to further measure the quality of the transferring results. For the deciphering experiment, we adopt the average proportion of correctly mapped words as accuracy as proposed in. Our code will be released to encourage future research.
In general, when evaluating the performance of different models, we desire high test-BLEU score (good quality) and low self-BLEU score (high diversity). Both scores should be considered: (i) a high test-BLEU score together with a high self-BLEU score means that the model might generate good sentences while suffering from mode collapse (i.e., low diversity); (ii) if a model generates sentences randomly, the diversity of generated sentence could be high but the test-BLEU score would be low. Figure 2 is used to compare the performance of every model. For each subplot, the -axis represents test-BLEU, and the -axis represents self-BLEU (here we only show BLEU-3 and BLEU-4 figures; more quantitative results can be found in the SM). For the CUB and MS COCO datasets, our model achieves both high test-BLEU and low self-BLEU, providing realistic sentences with high diversity. For the EMNLP WMT dataset, the synthetic sentences from SeqGAN, RankGAN, GSGAN and TextGAN is less coherent and realistic (examples can be found in the SM) due to the long-text nature of the dataset. In comparison, our model is still capable of providing realistic results.
|Human score||2.54 0.79||2.55 0.83||2.86 0.95||3.410.82|
|Method||GSGAN||TextGAN||Our model||real sentences|
To further evaluate the generation quality based on the EMNLP WMT dataset, we conduct a human Turing test on Amazon Mechanical Turk; 10 judges are asked to rate over 100 randomly sampled sentences from each model with a scale from 0 to 5. The means and standard deviations of the rating score are calculated and provided in Table2. We also provide some examples of the generated sentences from LeakGAN and our model in Table 3. More generated sentences are provided in the SM.
|LeakGAN:||(1) " people , if aleppo recognised switzerland stability , " mr . trump has said that " " it has been|
|filled before the courts .|
|(2) the russian military , meanwhile previously infected orders , but it has already been done|
|on the lead of the attack .|
|Ours:||(1) this is why we will see the next few years , we ’ re looking forward to the top of the world ,|
|which is how we ’ re in the future .|
|(2) If you ’ re talking about the information about the public , which is not available , they have|
|to see a new study .|
Table 4 presents the sentiment transfer results on the Yelp review dataset, which is evaluated with the accuracy of transferred sentences, determined by a pretrained CNN classifier . Note that with the same experimental setup as in , our model achieves significantly higher transferring accuracy compared with the cross-aligned autoencoder (CAE) model . Moreover, our model even outperforms the controllable text generation method  and BST 
, where a sentiment classifier is directly pre-trained to guide the sentence generation process (on the contrary, our model is trained in an end-to-end manner and requires no pre-training steps), and thus should potentially have a better control over the style (i.e., sentiment) of generated sentences . The superior performance of the proposed method highlights the ability of FMD to mitigate the vanishing-gradient issue caused by the discrete nature of text samples, and give rises to better matching between the distributions of reviews belonging to two different sentiments.
|Method||Controllable ||CAE ||BST ||Our model|
Human evaluations are conducted to assess the quality of the transferred sentences. In this regard, we randomly sample 100 sentences from the test set, and 5 volunteers rate the outputs of different models in terms of their fluency, sentiment, and content preservation in a double blind fashion. The rating score is from 0 to 5. Detailed results are shown in Table 4. We also provide sentiment transfer examples in Table 5. More examples are provided in the SM.
|Original:||one of the best gourmet store shopping experiences i have ever had .|
|Controllable :||one of the best gourmet store shopping experiences i have ever had .|
|CAE:||one of the worst staff i would ever ever ever had ever had .|
|Ours:||one of the worst indian shopping store experiences i have ever had .|
|Original:||staff behind the deli counter were super nice and efficient !|
|Controllable:||staff behind the deli counter were super rude and efficient !|
|CAE:||the staff were the front desk and were extremely rude airport !|
|Ours:||staff behind the deli counter were super nice and inefficient !|
CipherGAN  uses GANs to tackle the task of unsupervised cipher cracking, utilizing the framework of CycleGAN  and adopting techniques such as Gumbel-softmax  that deal with discrete data. The implication of unsupervised deciphering could be understood as unsupervised machine translation, in which one language might be treated as an enciphering of the other. In this experiment, we adapt the idea of feature mover’s distance to the original framework of CipherGAN and test this modified model on the Brown English text dataset .
The Brown English-language corpus  has a vocabulary size of over one million. In this experiment, only the top most frequent words are considered while the others are replaced by an “unknown” token. We denote this modified word-level dataset as Brown-W200. We use Vigenère  to encipher the original plain text. This dataset can be downloaded from this repository111https://github.com/for-ai/CipherGAN.
For fair comparison, all the model architectures and parameters are kept the same as CipherGAN while the critic for the discriminator is replaced by the FMD objective as shown in (3). Table 6 shows the quantitative results in terms of average proportion of words mapped in a given sequence (i.e., deciphering accuracy). The baseline frequency analysis model only operates when the cipher key is known. Our model achieves higher accuracy compared to the original CipherGAN. Note that some other experimental setups from  are not evaluated, due to the extremely high accuracy (above ); the amount of improvement would not be apparent.
|Method||Freq. Analysis (with keys)||CipherGAN ||Our model|
|Accuracy(%)||< 0.1 (44.3)||75.7||77.2|
We introduce a novel approach for text generation using feature-mover’s distance (FMD), called feature mover GAN (FM-GAN). By applying our model to several tasks, we demonstrate that it delivers good performance compared to existing text generation approaches. For future work, FM-GAN has the potential to be applied on other tasks such as image captioning , joint distribution matching [17, 46, 9, 34, 55, 45], unsupervised sequence classification , and unsupervised machine translation [4, 12, 33].
This research was supported in part by DARPA, DOE, NIH, ONR and NSF.
Unsupervised neural machine translation.In ICLR, 2018.
The concrete distribution: A continuous relaxation of discrete random variables.In ICLR, 2017.
Unpaired image-to-image translation using cycle-consistent adversarial networks.In ICCV, 2017.
In this section, we use Envelope theorem to prove that the gradient for the transport matrix is in our algorithm.
Envelope theorem  Let and be real-valued continuously functions, where , and are the parameters. We assume is the optimal solution of with fixed and constraint , i.e.,
Then assume function is also continuous and differentiable, defined as the derivative of over is:
Assume the parameters in is , and the parameters in is . Using Envelope theorem, the gradient respect to is:
|Original:||the new yorker was amazing .|
|CAE:||the new off was not funny .|
|Ours:||the new yorker was horrible either .|
|Original:||it was beautiful and lined with lady fingers to cover sides .|
|CAE:||it was impossible and it were impossible to get with _num_ days|
|Ours:||it was beautiful and lady with someone to fix table to deliver .|
|Original:||the subs are so delicious .|
|CAE:||the bathrooms are just so bland .|
|Ours:||the subs are so bland .|
|Original:||pasta , sandwiches , and desserts .|
|CAE:||_num_ , and salsa , and wings .|
|Ours:||pasta , salad , sandwiches , and desserts .|
|Original:||beautiful building and a memorable experience .|
|CAE:||clean and a disaster experience experience experience .|
|Ours:||beautiful building and a fluke experience memorable .|
|Original:||my experience was horrible .|
|CAE:||my experience was great !|
|Ours:||my experience was amazing .|
|Original:||an employee could not find it , and his manager could not find it .|
|CAE:||it ’s a new place and it , and it ’s my dog loves it !|
|Ours:||my employee could not find it , and his manager could not find it .|
|Original:||the place was dirty , way crowded with crap products .|
|CAE:||the place was clean , clean with clean .|
|Ours:||the place is clean , kind with crap products .|
|Original:||the worst place you can go to .|
|CAE:||the best place to go .|
|Ours:||the best place you can go .|
|Original:||this place is a shit hole the management is nonexistent after _num_ o’clock .|
|CAE:||this place is a clean place is the best number are occupied .|
|Ours:||this place is a cool hole the staff is after _num_ o’clock .|
|Original:||i love the food … however service here is horrible .|
|CAE:||i love the service here is awesome service .|
|Ours:||i love the food service is great here .|
|TextGAN:||- a display store with colorful chairs with no leash as they can walk|
|- a bathroom with many signs on it and many businesses|
|- a motorcycle in white shirt texts on reading of paper|
|- a family dressed looking toward the large mans reflection in the corner|
|- a cat that is standing on the corner of water|
|- two giraffes hang around with palm trees|
|- a woman sits at the phone while sitting holding onto her cell phone in an office|
|- a young girl stands outside with no parking umbrella|
|- a red bus drives around in front of an automobile station|
|- a crosswalk is on both sides|
|- a pair of walk sign with pink and flowers in it|
|- a living area with seating area for sale sign|
|- a small aircraft with landed at the water|
|- a closeup of a big boat being driven by the water|
|- a smiling group of travel down in london|
|- a motorcycle area with benches with many sides of|
|- a sea park bench being displayed on a sunny day|
|- a kitchen filled with cluttered pots and decorated clutter|
|- several young adults with skis and posing for a cone|
|- a number of workers cooking in the kitchen|
|LeakGAN:||- a number of signs hanging from a car|
|- a train near a building that is sitting on the road|
|- a number of people on a sidewalk with a street sign|
|- a train traveling down train tracks while families approaches the nurse appears|
|- a couple of elephants that are walking through a grassy field|
|- a couple of giraffes that are walking over the dirt road|
|- a group of people walking along side to cross the street|
|- a bird is flying over a domed chimney|
|- a couple of buses driving down a street|
|- a small bird sitting on a tree limb at the beach|
|- a group of people standing on top of a train next to slue of tracks|
|- a fire hydrant has growing off the side of it|
|- a bunch of people that are walking and other signs|
|- a man and a child standing outside a window of a store|
|- a man holding a stuffed bear levers leaves|
|- a train that is sitting on the train tracks|
|- a brown and orange train that is pouring out into the distance|
|- a man is sitting on a saddled horse|
|- a weird dozen bicycles are on a city street|
|- some people are on a fire hydrant on the street|
|FM-GAN:||- a man sitting around a laptop and holding something|
|- a large group of people walking on a sidewalk|
|- a white cat is placed on the hood of a bus stop|
|- trucks drive down a town area near a building|
|- two teddy bears and other items on top of a street|
|- a man is laying in a chair looking at a subway|
|- a motorcycle is sitting at a street with bags|
|- a plate of steak is cutting a sauce on it|
|- a woman sitting on top of a computer on a table|
|- a group of red sheep on a mountain range|
|- a man holding a cell phone and small black bowl|
|- a couple of people riding on a dirt horse next to a tree|
|- a living room is outside of the kitchen|
|- a tall building is under a blue plane|
|- a man standing on a white motorcycle with an umbrella stands in the snow|
|- a goat is standing next to the water|
|- a cat looking in a bathtub next to a sink|
|- a yellow bus on street curb near a sidewalk|
|- the giraffes are walking by their zoo enclosure in the distance|
|- there are many people in public office together|