Generative Adversarial Data Programming

04/30/2020 ∙ by Arghya Pal, et al. ∙ 19

The paucity of large curated hand-labeled training data forms a major bottleneck in the deployment of machine learning models in computer vision and other fields. Recent work (Data Programming) has shown how distant supervision signals in the form of labeling functions can be used to obtain labels for given data in near-constant time. In this work, we present Adversarial Data Programming (ADP), which presents an adversarial methodology to generate data as well as a curated aggregated label, given a set of weak labeling functions. More interestingly, such labeling functions are often easily generalizable, thus allowing our framework to be extended to different setups, including self-supervised labeled image generation, zero-shot text to labeled image generation, transfer learning, and multi-task learning.



There are no comments yet.


page 3

page 22

page 23

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Curated labeled data is a key building block of modern machine learning algorithms and a driving force for deep neural network models that work in practice. However, the creation of large-scale hand-annotated datasets in every domain is a challenging task due to the requirement for extensive domain expertise, long hours of human labor and time - which collectively make the overall process expensive and time-consuming. Even when data annotation is carried out using crowdsourcing (e.g. Amazon Mechanical Turk), additional effort is required to measure the correctness (or goodness) of the obtained labels. We seek to address this problem in this work. In particular, we focus on automatically learning the parameters of a given joint image-label probability distribution (as provided in training image-label pairs) with a view to automatically create the labeled dataset.

To this end, we exploit the use of distant supervision signals to generate labeled data. These distant supervision signals are provided to our framework as a set of weak labeling functions which represent domain knowledge or heuristics obtained from experts or crowd annotators. This approach has a few advantages: (i) labeling functions (which can even be just loosely defined) are cheaper to obtain than collecting labels for a large dataset; (ii) labeling functions act as an implicit regularizer in the label space, thus allowing good generalization; (iii) with a small fine-tuning, labeling functions can be easily re-purposed for new domains (

transfer learning) and multi-task learning (discussed in Section 5.1); and (v) the labeling functions can be generalized to using semantic attributes, which aids adapting the approach of generalized zero-shot text to labeled image generation 5.4. We note that, writing a set of labeling functions (as we found in our experiments) is fairly easy and quick - we demonstrate three python-like functions as labeling functions to weakly label the SVHN SVHN digit “0” in Figure 2.2.a (def l - def l). Figure 1 shows a few examples of our results to illustrate the overall idea.

In practice, labeling functions can be associated with two kinds of dependencies: (i) relative accuracies (shown as solid arrows in Figure 2.2.b), are weights to labeling functions measuring the correctness of the labeling functions w.r.t. the true class label ; and (ii) inter-function dependencies (shown as dotted line in Figure 2.2.b) that capture the relationships between the labeling functions concerning the predicted class label. In this work, we propose a novel adversarial framework, i.e. Adversarial Data Programming (ADP) presented in Figure 2.1, using Generative Adversarial Network (GAN) that learns these dependencies along with the data distribution using a minmax game.

Our broad idea of learning relative accuracies and inter-function dependencies of labeling functions is inspired by the recently proposed Data Programming (DP) framework ratner2016data (and hence, the name ADP), but our method is different in many ways: (i) DP learns

), while ADP learns joint distribution, i.e.

; (ii) DP uses Maximum Likelihood Estimation (MLE) to estimate relative accuracies of labeling functions. We instead use adversarial framework to estimate relative accuracies and inter-function dependencies of labeling functions. We note that,

danihelka2017comparison and theis2015note provide insights on the advantage of using a GAN-based estimator over MLE. (iii) We use adversarial approach to learn inter functional dependencies of labeling functions and replaces the computationally expensive factor graph modeling proposed in ratner2016data.

Figure 1: (A) Labeled image generations of ADP on different datasets: (a)-(d) Sample results of image-label pairs generated using the proposed ADP framework on SVHN SVHN, CIFAR 10 CIFAR10, Chest-Xray-14 CHEST, and LSUN LSUN datasets (respectively); (B) Cross-domain labeled image generation using ADP:; (C) Transfer learning using ADP: ADP transfers its knowledge from a source domain to a target domain if distant supervision signals are common. We demonstrate transfer learning from MNIST MNIST to SVHN; (D) Generalized zero-shot text to labeled image generation: ZS-ADP is a first-of-its-kind model that performs zero-shot text to labeled image generation on Flower102 Nilsback08, UCSD Bird WahCUB_200_2011 and Chest-Xray-14 CHEST datasets.

Furthermore, we show applicability of ADP in different tasks, such as: (i) Self supervised labeled image generation (SS-ADP), that generates labeled image using an unlabeled dataset. The SS-ADP dependencies of labeling functions using image rotation based self-supervised loss (similar to the image rotation loss proposed in chen2019self). (ii) Generalized zero-shot text to labeled image synthesis (ZS-ADP) that generates labeled images from textual descriptions (see Section 5.4). We show that ZS-ADP infer zero-shot classes as well as seen classes of generated images using labeling functions those are semantic attributes (similar to semantic attributes proposed by Lampert et al. lampert2009learning). To the best of our knowledge, the ZS-ADP is the first generalized zero-shot text to labeled image generator.

As outcomes of this work, we show a framework to integrate labeling functions within a generative adversarial framework to model joint image label distribution. To summarize:

  • We propose a novel adversarial framework, ADP, to generate robust data-label pairs that can be used as datasets in domains that have little data and thus save human labor and time.

  • The proposed framework can also be extended to incorporate generalized zero-shot text to labeled image generation, i.e. ZS-ADP; and self-supervised labeled image generation, i.e. SS-ADP in Section 5.

  • We demonstrate that the proposed framework can also be used in a transfer learning setting, and multi-task joint distribution learning where images from two domains are generated simultaneously by the model along with the labels, in Section 5.

2 Related Work

Distant Supervision: In this work, we explored the use of distant supervision signals (in the form of labeling functions) to generate labeled data points. Distant supervision signals such as labeling functions are cheaper than manual annotation of each data point, and have been successfully used in recent methods such as ratner2016data; fries2019weakly. MeTaL ratner2018snorkel extends ratner2017learning by identifying multiple sub-tasks that follow a hierarchy and then provides labeling functions for sub-tasks. These methods require unlabeled test data to generate a labeled dataset and computationally expensive due to the use of Gibbs sampling methods in the estimation step (also shown in our results).
Learning Joint Distribution using Adversarial Methods: In this work, we use an adversarial approach to learn the joint distribution by weighting a set of domain-specific label functions using a Generative Adversarial Network (GAN). We note efforts pu2018jointgan; lucic2019high which attempt to train GANs to sample from a joint distribution. In this work, we propose a novel idea to instead use distant supervision signals to accomplish learning the joint distribution of labeled images, and compare against these methods.
Generalized Zero-shot Learning: A typical generalized zero-shot model (such as zhang2018model; liu2018generalized; zhang2019co) learns to transfer learned knowledge from seen to unseen classes by learning correlations between the classes at training time, and recognizing both seen and unseen classes at test time. While, efforts such as reed2016generative; zhang2016stackgan; zhang2017stackgan++; xu2018attngan; zhang2018photographic proposed text-to-image generation methodology and then demonstrated results on zero-shot text to image generation. However, no work has studied the problem of generalized zero-shot text-to-labeled-image generation so far. We integrate a set of semantic attribute-based distant supervision, similar to proposed by Lampert et al. lampert2009learning, signals such as color, shape, part etc. to identify seen and zero-shot visual class categories.
Self Supervised Labeled Image Generation

: While self supervised learning is an active area of research, we found only one work

lucic2019high that performs self supervised labeled image generation. In particular, lucic2019high

uses a GAN framework that performs k-means cluster within discriminator and does an unsupervised image generation. In this work, we instead use a set of labeling functions and perform self supervision. We defer the discussion to Section


Figure 2: (1) Overall architecture of the Adversarial Data Programming (ADP) framework. The generator generates image and parameters () which are used by the labeling functions block (LFB) to generate labeled images (); (2) Labeling Functions: Crowd experts give distant supervision signals in the form of weak labeling functions: e.g. presence of blob; (3) Dependencies of Labeling Functions: Labeling functions show “relative accuracies” (solid arrow) and “inter-functional dependencies” (dotted line). The ADP encapsulates all labeling functions in a unified abstract container called LFB. LFB helps learn parameters corresponding to both kinds of dependencies: “relative accuracies” and “inter-functional dependency” to generates labeled images.

3 Adversarial Data Programming: Methodology

Our central aim in this work is to learn parameters of a probabilistic model:


that captures the joint distribution over the image and the corresponding labels conditioned on a latent variable .

To this end, we encode distant supervision signals as a set of (weak) definitions by annotators using which unlabeled image can be labeled. Such distant supervision signals allow us to weakly supervise images where collection of direct labeled image is expensive, time consuming or static. We encapsulate all available distant supervision signals, henceforth called labeling functions, in a unified abstract container called Labeling Functions Block (LFB, see Figure 2.1). Let LFB comprised of labeling functions , where each labeling function is a mapping, i.e.: , that maps an unlabelled image

to a class label vector,

, where is the number of classes labels with . For example, as shown in Figure 2.2, could be thought of as an image from the CIFAR 10 single digit dataset, and would be the corresponding label vector when the labeling function is applied on . The , for instance, could be the one-hot 10-dimensional class vector, see Figure 2.2.

We characterize the set of labeling functions, as having two kinds of dependencies: (i) relative accuracies are weights given to labeling functions based on whether their outputs agree with true class label of an image ; and (ii) inter-function dependencies capture the relationships between the labeling functions with respect to the predicted class label. To obtain a final label for a given data point using the LFB, we use two different sets of parameters, and to capture each of these dependencies between the labeling functions. We, hence, denote the Labeling Function Block (LFB) as:


i.e. given a set of labeling functions , a set of parameters capturing the relative accuracy-based dependencies between the labeling functions, , and a second set of parameters capturing inter-label dependencies, , provides a probabilistic label vector, , for a given data input .

Our Equation 1 that we seek to model in this work (Equation 1) hence becomes:


In the rest of this section, we show how we can learn the parameters of the above distribution modeling image-label pairs using an adversarial framework with a high degree of label fidelity. We use Generative Adversarial Networks (GANs) to model the joint distribution in Equation 3. In particular, we provide a mechanism to integrate the LFB (Equation 2) into the GAN framework, and show how and

can be learned through the framework itself. Our adversarial loss function is given by:


where is the generator module and is the discriminator module. The overall architecture of the proposed ADP framework is shown in Figure 2.1.

3.1 The Proposed Framework

A. ADP Generator : Given a noise input and a set of -labeling functions , the generator outputs an image and the parameters and , the dependencies between the labeling functions described earlier. In particular, consists of three blocks: , and , as shown in Figure 2.1. captures the common high-level semantic relationships between the data and the label space, and is comprised only of fully connected (FC) layers. The output of forks into two branches: and , where generates the image , and generates the parameters . While uses FC layers, uses Fully Convolutional (FCONV) layers to generate the image (more details in Section 6). Figure 2.1 also includes a block diagram for better understanding.

B. ADP Discriminator : The discriminator estimates the likelihood of an image-label input pair being drawn from the real distribution obtained from training data. The takes a batch of either real or generated image and inferred label (from ) pairs as input and maps that to a probability score to estimate the aforementioned likelihood of the image-label pair. To accomplish this, has two branches: and (shown in the Discriminator part in Figure 2.1). These two branches are not coupled in the initial layers, but the branches share weights in later layers and become to extract joint semantic features that help classify correctly if an image-label pair is fake or real.

Input: Labeling functions , Relative accuracies , Output probability vectors of labeling functions of -generated images by
Output: Set ;

I = Identity Matrix */

for  to  do
       /* For each labeling function */
       for  to  do
             /* For each other labeling function */

If one-hot encoding of the outputs of two functions match, increment

th entry in by 1 */
       end for
end for
for  to  do
       (p, .) = ;
end for
Set /* Complete using symmetry */
Algorithm 1 Procedure to compute

C. Labeling Functions Block : This is a critical module of the proposed ADP framework. Our initial work revealed that a simple weighted (linear or non-linear) sum of the labeling functions does not perform well in generating out-of-sample image-label pairs. We hence used a separate adversarial methodology within this block to learn the dependencies. We describe the components of the LFB below.
C.1. Relative Accuracies, , of Labeling Functions: In this, we assume that all the labeling functions infer label of an image independently (i.e. independent decision assumption) and the parameter gives relative weight to each of the labeling functions based on their correctness of inferred label for true class . The output, , of the block in the ADP Generator , provides the relative accuracies of the labeling functions. Given the image output generated by : , the -labeling functions , and the probabilistic label vectors obtained using the labeling functions, we define the aggregated final label as:

Input: Iterations: , Number of steps to train : ,
Output: Trained ADP model
for N do
       for  steps do
             Draw samples from G: and subsequently infer corresponding labels using LFB(.)(Equation 7);
             Update weights of and ( and respectively):
       end for
      Update weights of generator (i.e. ):;
end for
Algorithm 2 Training procedure of ADP

where is the normalized version of , i.e. . The aggregated label, , is provided as an output of the LFB.
C.2. Inter-function Dependencies, , of labeling functions: In practice, a dependency among labeling function is a common observation. Studies ratner2016data; fries2019weakly show that such dependencies among labeling function proportionally increase with the number of labeling functions. Modeling such inter-functional dependencies act as an implicit regularizer in the label space leading to an improvement in the labeled image generation quality and generated image-to-label correspondence.

While recent studies utilized factor graph ratner2016data; fries2019weakly to learn such dependencies among labeling functions, we instead use an adversarial mechanism inside the LFB to capture inter-function dependency that in turns influence the final relative accuracies, . , a discriminator inside LFB, receives two inputs: , which is output by , and , which is obtained from using the procedure described in Algorithm 1. Algorithm 1 computes a matrix of interdependencies between the labeling functions, , by looking at the one-hot encodings of their predicted label vectors. If the one-hot encodings match for given data input, we increase the count of their correlation. The task of the discriminator is to recognize the computed interdependencies as real, and the generated through the network in as fake. The objective function of our second adversarial module is hence:


where and are obtained from as described above. More details of the LFB are provided in implementation details in Section 6.
C.3. Final label prediction, , using LFB: We define the aggregated final label as:


The samples generated using the and modules thus provide samples from the desired joint distribution (Eqn 1) modeled using the framework.

3.2 Final Objective Function

We hence expand our objective function from Equation 4 to the following:


3.3 Training

Algorithm 2 presents the overall stepwise routine of the proposed ADP method. During the training phase, the algorithm updates weights of the model by estimating gradients for a batch of labeled data points.

4 Theoretical Analysis

Theorem: For any fixed generator , the optimal discriminator of the game defined by the objective function is


Proof: The training criterion for the discriminator D, given any generator , is to maximize the quantity . Following goodfellow2016nips, maximizing objective function depends on the result of Radon-Nikodym Theorem donahue2016adversarial, i.e.


The objective function can be reformulated for ADP as:


Following Radon-Nikodym Theorem we can say:


Now, from Equation 11:


Following goodfellow2016nips, for any , the function achieves its maximum in at , which proves the claim.

Theorem: The equilibrium of V(G, D) is achieved if and only if , and attains the value .

Proof: Considering the optimal discriminator and the fixed generator described in Eqn 6 in Theorem 1, the min-max game in Eqn 4 can be reformulated as:


The training criterion reaches its global minimum and at this point, will have the value . We hence have:


So, with a fixed generator , the training criterion attain its best possible value when . At training phase, the criterion uses generator and optimizes the objective function

in this equation, KL denotes the Kullback-Leibler divergence and JSD denotes the Jensen-Shannon divergence. From the property JSD, it results non-negative if

and zero when . So, the global minimum is attained by the training criterion at the . At that point, the criterion attains the value is the global minimum of and at that point the generator perfectly mimics the real joint data-label distribution.

5 Extensibility of ADP in different tasks

5.1 Transfer Learning using ADP

Figure 3: Multi-task joint distribution learning: (a) A modified ADP can generates images of two domains and having a label correspondence, (b) labeled images from MNIST-SVHN and LookBook-FMNIST, (c) Training progress of MNIST-SVHN joint labeled image generation.

Distant supervision signals such as labeling functions (which can often be generic) allows us to extend the proposed ADP to a transfer learning setting. In this setup, we trained ADP initially on a source dataset and then finetuned the model to a target dataset, with very limited training. In particular, we first trained ADP on the MNIST dataset, and subsequently finetuned the branch alone on the SVHN dataset. We note that the weights of , and are unaltered. The final finetuned model is then used to generate image-label pairs (which we hypothesize will look similar to SVHN). Figure 1C (named “Transfer Learning using ADP”) shows encouraging results of our experiments in this regard.

5.2 Multi-task Joint Distribution Learning

Learning a cross-domain joint distribution from heterogeneous domains is a challenging task. We show that the proposed ADP method can be used to achieve this, by modifying its architecture as shown in Figure 3(a) (top), to simultaneously generate data from two different domains, i.e. and . We study this architecture on the: (1) MNIST and SVHN datasets, as well as (2) LookBook and Fashion MNIST datasets; and show promising results of our experiments in Figure 3(b). The LFB acts as a regularizer and maintains the correlations between the domains in this case. We show joint multi-task joint distribution training progress on MNIST and SVHN datasets in Figure 3(c).

5.3 Self Supervised Labeled Image Generation

Figure 4: Block Diagram of SS-ADP:

Thus far, we show labeled image generation process using , which is integrated in the generator . In this section, we show the labeled image generation process from unlabeled data , and show a way to integrate on real unlabeled image to get “distant supervision” image labels. In particular, gets the real unlabeled image as input and infers the label . Similar to chen2019self, we provide a rotated version of image , i.e. and , where , see Figure 4. The relative accuracy parameter and inter-function dependency parameter of are learned based on how close the inferred label of unlabeled image to the inferred label of the rotation of image , i.e:


Hence, the objective function of SS-ADP is as follows:


and the final objective is:


5.4 Generalized Zero-shot Text to Labeled Image Synthesis using ADP

We go further to introduce a first-of-its-kind approach to generalized zero-shot text to labeled image generation, i.e. , using a modified version of ADP, henceforth called ZS-ADP. Generalized zero-shot classification zhang2019co; liu2018generalized; zhang2018model and text-to-(zero-shot)image synthesis in reed2016generative; zhang2016stackgan; zhang2018photographic were studied separately in literature, we go beyond text-to-image synthesis and propose a novel framework ZS-ADP that performs text-to-labeled-image synthesis in a generalized zero-shot setup. To accomplish such objective, we assume to have a dataset with classes. Of classes, first seen classes have samples in the form of tuples {, where an image is associated with a seen class , and denotes textual description (such as caption) of image . Complementarily, we have no images for the rest of the classes, i.e. , and only textual descriptions are available for zero-shot classes. Our primary aim is to learn the parameters of the following probabilistic model at training time:


such that, at inference time, it can generate labeled images of both seen and zero-shot classes:


where and .

To this end, ZS-ADP learns class-independent style-ness information (background, illumination, object orientation). While, ZS-ADP learns content-ness information of visual appearances and attributes such as shape, color, and size etc. using module. We follow the semantic attribute based object class identification work of Lampert et al. lampert2009learning, and modify the labeling functions in in terms of a set of semantic attributes, i.e. . To make this exposition self-contained, we are paraphrasing the idea of “identifying an class based on semantic attributes” of lampert2009learning using an example: a object “zebra” can be classified by recognizing semantic attributes, such as: “four legs”, “has tail” and “white-black stripes on body”. Such semantic attributes can be integrated within as labeling functions without any architectural change in ADP (and hence ZS-ADP). Formally, each semantic attribute acts as a labeling function (similar to of ADP), i.e. . Similar to lampert2009learning, the produces the final class label of the generated image by ranking the similarity scores between ground truth semantic attribute vectors of seen and zero-shot classes , and the semantic attribute vector for :


where is sampled from the generator of ADP, and denotes the Hadamard product. Following lampert2009learning, we assume access to a deterministic semantic attribute vector, (ground truth), for each seen and zero-shot class.

The adversarial framework of ZS-ADP learns a non-linear mapping between text and labeled image space:


Since raw text can be vague and contain redundant information, we encode the raw text using a text encoder and obtain a text encoding . Our encoder is influenced by the Joint Embedding of Text and Image (JETI) encoder proposed by Reed et al. in reed2016generative.

At inference time, the raw text from either seen or zero-shot class is first encoded by the text encoder . The ZS-ADP gets the noise vector and encoded text as input and provides image and dependency parameters as outputs. Following Equation 21, the provides class label by aggregating the decisions of semantic attributes.

Input: Image, Number of Clusters = (equal to number of classes )
Output: Probabilistic label vector

Unsupervised deep learning based labeling function */

= Num of kernels from fifth layer of pre-trained AlexNet trained using DeepCluster method caron2018deep;
for i=1 n do
       for j = 1 m do
             = average value of Frobenius norm of activation map of kernel on subset of training samples from k cluster;
       end for
end for
for j = 1 m do
       = value of Frobenius norm of activation map of kernel on Image;
end for
return OneHot
Labeling Function 3

Labeling function based on Deep Feature Adding Cycle Consistency Loss to Semantic Attributes:

Since zero-shot classes are not present at training time and ZS-ADP is optimized only using seen classes, we often observed a strong bias to seen classes at inference time. Such biases also been reported in earlier efforts such as zhang2019co and we imposing an additional cycle-consistency loss to the ZS-ADP, i.e.:


in addition to ZS-ADP, the generator gets the of seen and zero-shot classes, the noise vector as input to learn dependency parameters (as opposed to the usual text embedding with described in Equation 22). Here,

is a hyperparameter, which we vary and report results in Figure


6 Experiments and Results

6.1 Dataset

Long edges mandal2011handwritten; alfonseca2012pattern,
PatchMatch barnes2009patchmatch,
Local Dense
Features sahu2018reducing,
Holistic spatial
envelope oliva2001modeling,
from Experts
border plourde2006semi; vogelsang1998detection,
model toriwaki1973pattern; van2001computer,
model ougul2015unsupervised,
from Experts
SIFT lowe2004distinctive, k-means clustering lin2010power,
Bags of Keypoints csurka2004visual
DeepCluster caron2018deep, Deep features
from xie2016unsupervised; yang2016joint, Deep Representation
from bautista2016cliquecnn; dosovitskiy2014discriminative; liao2016learning,
Table 1: Description of Labeling Functions used for SVHN SVHN, CIFAR 10 CIFAR10, LSUN LSUN and CHEST-Xray-14 CHEST dataset.

6.1.1. ADP and SS-ADP: We validated ADP and SS-ADP on the following datasets: (i) SVHN SVHN; (ii) CIFAR-10 CIFAR10; (iii) LSUN LSUN; and (iv) CHEST, i.e. Chest-Xray-14 CHEST datasets. For cross-domain multi-task learning and transfer learning using ADP, we validated on: (i) digit dataset MNIST MNIST and SVHN SVHN; (ii) cloth dataset: Fashion MNIST (FMNIST) Fashion-MNIST and LookBook yoo2016pixel. We grouped LookBook dataset 17 classes into 4 classes: coat, pullover, t-shirt, dress, to match number of classes of FMNIST dataset.
6.1.2. ZS-ADP: We evaluated ZS-ADP on: (i) CUB 200 WahCUB_200_2011; (ii) Flower-102 Nilsback08; and (iii) Chest-Xray-14 CHEST. We consider Nodule and Effusion (randomly selected) as zero-shot classes of Chest-Xray-14 dataset in our experiments.

6.2 Labeling Functions

6.2.1. ADP and SS-ADP shown in Table 1: We encode distant supervision signals as a set of (weak) definitions using which unlabeled data points can be labeled. We categorized labeling functions in Table 1 into three categories: (i) Heuristic: We collect the labeling functions based on domain heuristics such as knowledge bases, domain heuristics, ontologies. Additionally, these definitions can be harvested from educated guesses, rule-of-thumb from experts obtained using crowdsourcing. The experts were given a batch ( images of a dataset) and asked to provide a set of labeling functions. (ii) Image Processing: Domain heuristics from Image processing and Computer Vision. (iii) Deep Learning: We collect activation maps from pre-trained deep models (deep models trained in an unsupervised manner). We show an example of labeling functions used for the SVHN dataset in Labeling Functions 3.
6.2.2. ZS-ADP: The CUB 200 dataset WahCUB_200_2011 provides a set of semantic attributes to identify a class. While, for Flower-102 and Chest-XRay-14 dataset we follow al2017automatic to identify semantic attributes, followed by reed2016learning and get a color-based semantic attributes.

6.3 Implementation Details

ADP, SS-ADP and ZS-ADP: We adopt BigGAN architecture brock2018large to implement generators and discriminators of ADP, SS-ADP and ZS-ADP. We slightly change the last layers of BigGAN model to produce images of intended size of a dataset. In particular, is the BigGAN generator, and branch (3 Fully Connected FC layers) is forked after the “Non-Local Block” of the BigGAN generator. Similarly, follows BigGAN discriminator, while branch is added after “Non-Local Block” of the BigGAN discriminator. We follow the official hyperparametres of BigGAN, i.e. , train generator for 250k iterations and 5 discriminator iterations before every generator iteration, optimizer=, learning rate for generator is and for discriminator.

Figure 5: (Best viewed in color and while zoomed) Qualitative results of different generative (both GAN and Non-GAN) methods, such as: WINN lee2018wasserstein, Self Supervised SGAN lucic2019high, semi-supervised SGAN lucic2019high, Plug and Play PnP nguyen2017plug, f-VAEGAN-D2 FD2 xian2019f, JointGAN Joint pu2018jointgan, BigGAN Big brock2018large and our proposed SS-ADP and ADP, are given in columns. (A) SVHN-(B) CIFAR 10: First columns represent class labels. We use abbreviation: Automobile “Auto”, Aerospace “Aero”; (C) LSUN: We use abbreviations: Tower “Twr”, Church “Chrh”, Bridge “BR”, Conference Room “Cnf”, Restaurant “Rst”. (D) CHEST: We use abbreviation, such as: Effusion “Eff”, Nodule “N”, Cardiomegaly “Card” etc. We show wrong labels in color red and correct labels as black. While, labels are provided as input in some methods (WINN, PnP, BigGAN), our proposed methods ADP and SS-ADP generates labeled images. On CHEST generated images, we additionally get disease location mark (shown as red boxes) from experts on generated images.

6.4 Qualitative Results Comparison with Prior Methods

6.4.1. ADP and SS-ADP: We compared our proposed SS-ADP and ADP methods against other generative (both GAN and Non-GAN) methods, such as: WINN lee2018wasserstein, Self Supervised SGAN lucic2019high, semi-supervised SGAN lucic2019high, Plug and Play PnP nguyen2017plug, f-VAEGAN-D2 FD2 xian2019f, JointGAN Joint pu2018jointgan, BigGAN Big brock2018large, and show results in Figure 5. We discuss our improved results in terms of: (i) Image Quality: The results of ADP and SS-ADP show a significant improvement with respect to the baseline BigGAN method. We observe, the image “style” (i.e. background, illumination etc.) and “content” (i.e. object shape, orientation) are capture thoroughly in ADP and SS-ADP. The improvements in ADP and SS-ADP are likely due in part to the fact that acts as a regularizer in the training objective of generator in Algorithm 2 encourages ADP and SS-ADP to capture modes, thus resulting an improved “style”ness and “content”ness. For example: in CIFAR 10, we can observe clear automobile structure and color variation for CIFAR 10 automobile “Auto” generated images (see Figure 5).

Figure 6: (Best Viwed in color, and, please zoom to see details) Generalized zero-shot labeled image generation on Flower-102, CUB 200 and Chest-Xray-14 datasets. (a) Generated labeled images by GAN-E2E reed2016generative, StackGAN++ zhang2017stackgan++, AttnGAN xu2018attngan, Hierarchical zhang2018photographic, FD2 (that is f-VAEGAN-D2) xian2019f and our method ZS-ADP. Existing state-of-the-art methods fail to capture image-to-label correspondence of zero-shot classes, and converge only to seen class labels (red in color). Only ZS-ADP gives quality image and good image-label correspondence. (b) Changing text and noise vector: Generated images not only have different background but the object color also changed, and hence a change in label. (c) Change in text but fixed noise vector: Style of one class (i.e. Sunflower) is transferred to another class (i.e. Windflower) by changing content information.

We note a good variation in background, orientation, and low-level object structure on generated images by ADP and SS-ADP. Similarly, on CHEST dataset, we observe low-level details, such as: exact lungs location, disease mark (shown in red box) etc. on generated images, shown in Figure 3. However, all other methods fail to capture such disease marks and generate the global image of chest X-ray. (ii) Image to Label Correspondence: We observe a good image-to-label correspondence (see Figure 5), thus proving our claim of using labeling functions within adversarial framework of GAN architecture.

width= Methods SVHN CIFAR10 LSUN CHEST MIS FID MIS FID MIS FID MIS FID () () () () () () () () () () () () () () () () WINN 1.21 21.72 56 43 0.73 28.92 54 30 1.01 19.05 62 42 1.08 18.92 58 46 SGAN 1.33 18.03 51 48 1.94 15.91 38 30 1.13 18.72 68 60 2.01 11.84 71 64 SGAN 2.03 10.07 62 58 1.37 17.93 54 43 3.18 7.71 75 64 3.49 6.29 72 71 PnP 1.12 17.94 53 48 0.92 27.60 57 49 2.32 10.41 53 52 3.06 7.64 73 60 FD2 1.83 16.73 72 70 1.63 17.02 58 50 2.81 10.90 73 62 2.89 13.63 69 58 Joint 1.84 13.71 74 71 1.17 18.81 66 61 1.91 14.19 61 51 2.31 9.90 74 73 Big 3.04 8.01 75 67 2.44 13.47 67 49 3.97 7.72 73 63 3.31 6.17 74 63 SSADP 3.51 8.32 79 74 1.61 12.91 69 67 4.02 6.41 74 71 3.62 6.01 72 71 ADP 3.74 7.29 83 81 2.82 9.21 72 70 4.81 5.37 88 87 4.01 5.25 82 80

Table 2: Qualitative results of generative (both GAN and Non-GAN) methods: WINN lee2018wasserstein, Self Supervised SGAN lucic2019high, Semi-supervised SGAN lucic2019high, Plug and Play PnP nguyen2017plug, f-VAEGAN-D2 FD2 xian2019f, JointGAN Joint pu2018jointgan, BigGAN Big brock2018large, SS-ADP and ADP, are given in columns: (i) MIS () Modified Inception Score santurkar2018classification: (Higher value is better) Though our basic setup is based on Big framework, we observe almost 1 unit performance boost for ADP. Similarly, the SS-ADP outperforms Big method all cases showing the efficacy of our proposed method on labeled image generation; (ii) FID () Frechet Inception Distance heusel2017gans: (Lower value is better) Lower values of SS-ADP and ADP w.r.t other methods on FID imply that the generated labeled images are both good in quality and produce versatile labeled image samples; (iii) C (): (Higher value is better) Top-1 classification accuracy percentage (values are in %) of a ResNet-50 classifier trained on real labeled images and tested on generated images; (iV) C (): (Higher value is better) Top-1 classification accuracy percentage (values are in %) of a ResNet-50 classifier trained on generated labeled images and tested on real images.
A. Image to Label Correspondence (HTT/C)
Flower 102
CUB 200
Chest Xray
GAN E2E reed2016generative (5.38/53) (5.91/58) (4.21/59)
StackGAN++ zhang2017stackgan++ (6.21/61) (5.84/64) (5.18/62)
AttnGAN xu2018attngan (7.91/65) (7.42/62) (7.27/67)
Hierarchical zhang2018photographic (8.41/68) (8.46/69) (8.01/68)
FD2 xian2019f (8.72/70) (8.79/72) (8.67/72)
ZS-ADP (9.28/78) (9.11/76) (9.16/79)
B. Image Quality (FID Score)
Flower 102
CUB 200
Chest Xray
GAN E2E reed2016generative 12.88 12.71 14.82
StackGAN++ zhang2017stackgan++ 7.67 7.23 6.18
AttnGAN xu2018attngan 5.91 5.42 5.27
Hierarchical zhang2018photographic 4.41 4.46 4.91
FD2 xian2019f 4.32 4.02 3.67
ZS-ADP 3.28 3.11 3.16
Table 3: (A) Image to label correspondence of zero-shot classes by ZS-ADP. We report results in the form HTT/C to show Human Turing Test (HTT) of labeled image, and, C is the Top-1 classification score of ResNet50 classifier trained on real dataset and tested on the generated dataset; (B) Image Quality: The quality of images are evaluated using the FID score heusel2017gans.

6.4.2 ZS-ADP: We compared our proposed ZS-ADP against five state-of-the-art methods: Reed et al. (GAN-E2E) reed2016generative, StackGAN++ zhang2017stackgan++, AttnGAN (that is AttentionGAN) xu2018attngan, Hierarchical (that is Hierarchical text-to-image synthesis) zhang2018photographic, and FD2 (that is f-VAEGAN-D2) xian2019f. Due to the unavailability of generalized zero-shot text to labeled image generation methods, we modified the generators of reed2016generative; zhang2017stackgan++; xu2018attngan; zhang2018photographic; xian2019f in a way that the last layers of those generators now generate both the image and class label .

Figure 7: (a) Test-time classification cross-entropy loss of a pre-trained ResNet model on image-label pairs generated by ADP, ADP (only its Image-GAN component) with majority voting, and ADP (only its Image-GAN component) with DP for labels; (b) Average running time (in Mins.) of ADP against other methods to estimate relative accuracies and inter-function dependencies.

In our experiment, we fixed the of Equation 23 and get the generated labeled images from text from ZS-ADP. The ZS-ADP generates good quality images as well as good image-label correspondence, see Figure 6

(a). We show latent space interpolation in Figure

6(b)-(c). Figure 6(b) show labeled images where we changed the noise distribution and the textual description. While, Figure 6(c), we fixed the noise distribution but change the text, and we see minimal change in style, i.e. background or orientation, but a change in color and shape.

6.5 Quantitative Result Comparison with prior methods

3 10.23% 21.02% 27.39% 23.82%
5 8.32% 8.53% 21.31% 18.30%
10 1.40% 4.81% 17.93% 11.62%
15 1.33% 4.92% 18.93% 13.05%
20 1.34% 4.80% 18.45% 12.83%
25 1.31% 4.73% 18.43% 12.82%
Table 4: Performance of ADP when number of labeling functions is varied.

6.5.1. ADP and SS-ADP

: For the sake of quantitative comparison among ADP, SS-ADP and other generative methods, we adopted four evaluation metrics (studied in

lucic2019high; pu2018jointgan): (i) MIS () Modified Inception Score proposed in santurkar2018classification computes , where : image, : softmax output of a ResNet-50 classifier (trained on real labeled images), and : label distribution of generated samples; (ii) FID () Frechet Inception Distance (FID) proposed in heusel2017gans computes , where, ,

are mean and co-variance, and

: real image : generated image; (iii) C, i.e. Top-1 classification accuracy (in ) of a ResNet-50 classifier trained on real labeled images and tested on generated images; and (iv) C, i.e. Top-1 classification accuracy (in ) of a ResNet-50 classifier trained on generated labeled images and tested on real images. As a baseline the BigGAN labeled image generator obtains MIS sores: 3.04 on SVHN, 2.44 on CIFAR 10, 3.97 on LSUN, 3.31 on CHEST datasets. While, FID scores: 8.01 on SVHN, 13.47 on CIFAR 10, 7.72 on LSUN and 6.17 on CHEST datasets.

In ADP, we observe that the generated images able to achieve an average of units of MIS and units of FID boosts w.r.t the baseline BigGAN method. On the other hand, we observe a smaller gap between C and C, suggesting that the ResNet-50 classifier of C can classify well the generated images, and the ResNet-50 classifier of C trained on generated labeled image of ADP can classify real images (the generated images are good in quality and have high image-to-label correspondence). We note that, though the BigGAN secured a good C score but fails to perform well at C score across all datasets. Such observation directly implying the advantage of using to infer labels of generated images, within the adversarial framework of baseline BigGAN.

While in SS-ADP, we note that the alternative approach of getting “distant supervised” labels of unlabeled image (as discussed in Section 5), by using labeling functions of , is complementary to the baseline BigGAN model which is trained on real labeled images. The “SSADP” row on Table 2 shows the experimental results. In particular, an improvement of MIS, FID, C and C w.r.t imply show the efficacy of the SS-ADP method.

Figure 8: Sample results of image-label pairs generated by combining a vanilla GAN (for image generation) and DP ratner2016data (for label generation) using the same labeling functions used in this work. Row labels represent the original class label (am = automobile) and column labels are provided by DP. Note the poor image-label correspondence, supporting the need for our work.

6.5.2.ZS-ADP: ZS-ADP is evaluated using three evaluation metrics, such as: (i) Image-Label Correspondence of Zero-Shot Classes; and (ii) Image quality, and the results are shown in Table 3 (A) and (B). to label Correspondence of Zero Shot Classes: We performed: (i) HTT, i.e. Human Turing Test: 40 experts were asked to rate the image to label correspondence of 400 zero shot labeled images. Experts were given a score on a scale of 1-10, and the aggregated result is shown in Table 2. (ii) , i.e. Classification Score: Top-1 classification performance of ResNet-50 classifier trained on real labeled images and tested on generated zero-shot labeled images.

2. Image Quality: Image quality is evaluated using the FID score heusel2017gans. The results are shown in Table 3 (B). We observed our method performs fairly well and secures a good FID for image quality.

7 Discussion and Analysis

7.1 Optimal Number of Labeling Functions

We trained ADP using different number of labeling functions. Table 4 suggests that 10-15 labeling functions provides the best performance. We report the test time cross-entropy error of a pretrained ResNet model with image-label pairs generated by ADP.

7.2 Comparison against Vote Aggregation Methods

We compared ADP, both with majority voting and Data Programming (DP, ratner2016data). We studied the test-time classification cross-entropy loss of a pre-trained ResNet model on image-label pairs generated by ADP, ADP (only its Image-GAN component) with majority voting and DP. The results are presented in Figure 7a.

7.3 Adversarial Data Programming vs MLE-based Data Programming:

To further quantify the benefits of our ADP, we also show how our method compares against Data Programming (DP) ratner2016data using different variants of MLE: MLE, Maximum Pseudo likelihood, and Hamiltonian Monte Carlo. Figure 7b presents the results and shows that ADP is almost 100X faster than MLE-based estimation. Figure 8 also shows sample images generated by the vanilla GAN, along with the corresponding label assigned by MLE-based DP using the same labeling functions as used in our work.

8 Conclusions

Paucity of large curated hand-labeled training data forms a major bottleneck in deploying machine learning methods in practice on varied application domains. Standard data augmentation techniques are often limited in their scope. Our proposed Adversarial Data Programming (ADP) framework learns the joint data-label distribution effectively using a set of weakly defined labeling functions. The method shows promise on standard datasets, as well as in transfer learning and multi-task learning. We also extended the methodology to a generalized zero-shot labeled image generation task, and show its promise. Our future work will involve understanding the theoretical implications of this new framework from a game-theoretic perspective, as well as explore the performance of the method on more complex datasets.