
Robustness of Conditional GANs to Noisy Labels
We study the problem of learning conditional generators from noisy labeled samples, where the labels are corrupted by random noise. A standard training of conditional GANs will not only produce samples with wrong labels, but also generate poor quality samples. We consider two scenarios, depending on whether the noise model is known or not. When the distribution of the noise is known, we introduce a novel architecture which we call Robust Conditional GAN (RCGAN). The main idea is to corrupt the label of the generated sample before feeding to the adversarial discriminator, forcing the generator to produce samples with clean labels. This approach of passing through a matching noisy channel is justified by corresponding multiplicative approximation bounds between the loss of the RCGAN and the distance between the clean real distribution and the generator distribution. This shows that the proposed approach is robust, when used with a carefully chosen discriminator architecture, known as projection discriminator. When the distribution of the noise is not known, we provide an extension of our architecture, which we call RCGANU, that learns the noise model simultaneously while training the generator. We show experimentally on MNIST and CIFAR10 datasets that both the approaches consistently improve upon baseline approaches, and RCGANU closely matches the performance of RCGAN.
11/08/2018 ∙ by Kiran Koshy Thekumparampil, et al. ∙ 18 ∙ shareread it

Efficient Algorithms for Smooth Minimax Optimization
This paper studies first order methods for solving smooth minimax optimization problems _x _y g(x,y) where g(·,·) is smooth and g(x,·) is concave for each x. In terms of g(·,y), we consider two settings  strongly convex and nonconvex  and improve upon the best known rates in both. For stronglyconvex g(·, y), ∀ y, we propose a new algorithm combining MirrorProx and Nesterov's AGD, and show that it can find global optimum in Õ(1/k^2) iterations, improving over current stateoftheart rate of O(1/k). We use this result along with an inexact proximal point method to provide Õ(1/k^1/3) rate for finding stationary points in the nonconvex setting where g(·, y) can be nonconvex. This improves over current bestknown rate of O(1/k^1/5). Finally, we instantiate our result for finite nonconvex minimax problems, i.e., _x _1≤ i≤ m f_i(x), with nonconvex f_i(·), to obtain convergence rate of O(m( m)^3/2/k^1/3) total gradient evaluations for finding a stationary point.
07/02/2019 ∙ by Kiran Koshy Thekumparampil, et al. ∙ 10 ∙ shareread it

Robust conditional GANs under missing or uncertain labels
Matching the performance of conditional Generative Adversarial Networks with little supervision is an important task, especially in venturing into new domains. We design a new training algorithm, which is robust to missing or ambiguous labels. The main idea is to intentionally corrupt the labels of generated examples to match the statistics of the real data, and have a discriminator process the real and generated examples with corrupted labels. We showcase the robustness of this proposed approach both theoretically and empirically. We show that minimizing the proposed loss is equivalent to minimizing true divergence between real and generated data up to a multiplicative factor, and characterize this multiplicative factor as a function of the statistics of the uncertain labels. Experiments on MNIST dataset demonstrates that proposed architecture is able to achieve high accuracy in generating examples faithful to the class even with only a few examples per class.
06/09/2019 ∙ by Kiran Koshy Thekumparampil, et al. ∙ 9 ∙ shareread it

Learning Onehiddenlayer Neural Networks under General Input Distributions
Significant advances have been made recently on training neural networks, where the main challenge is in solving an optimization problem with abundant critical points. However, existing approaches to address this issue crucially rely on a restrictive assumption: the training data is drawn from a Gaussian distribution. In this paper, we provide a novel unified framework to design loss functions with desirable landscape properties for a wide range of general input distributions. On these loss functions, remarkably, stochastic gradient descent theoretically recovers the true parameters with global initializations and empirically outperforms the existing approaches. Our loss function design bridges the notion of score functions with the topic of neural network optimization. Central to our approach is the task of estimating the score function from samples, which is of basic and independent interest to theoretical statistics. Traditional estimation methods (example: kernel based) fail right at the outset; we bring statistical methods of local likelihood to design a novel estimator of score functions, that provably adapts to the local geometry of the unknown density.
10/09/2018 ∙ by Weihao Gao, et al. ∙ 6 ∙ shareread it

InfoGANCR: Disentangling Generative Adversarial Networks with Contrastive Regularizers
Training disentangled representations with generative adversarial networks (GANs) remains challenging, with leading implementations failing to achieve comparable performance to Variational Autoencoder (VAE)based methods. After βVAE and FactorVAE discovered that regularizing the total correlation of the latent vectors promotes disentanglement, numerous VAEbased methods emerged. Such a discovery has yet to be made for GANs, and reported disentanglement scores of GANbased methods are significantly inferior to VAEbased methods on benchmark datasets. To this end, we propose a novel regularizer that achieves higher disentanglement scores than stateoftheart VAE and GANbased approaches. The proposed contrastive regularizer is inspired by a natural notion of disentanglement: latent traversal. Latent traversal refers to generating images by varying one latent code while fixing the rest. We turn this intuition into a regularizer by adding a discriminator that detects how the latent codes are coupled together, in paired examples. Numerical experiments show that this approach improves upon competing stateoftheart approaches on benchmark datasets.
06/14/2019 ∙ by Zinan Lin, et al. ∙ 3 ∙ shareread it

Communication Algorithms via Deep Learning
Coding theory is a central discipline underpinning wireline and wireless modems that are the workhorses of the information age. Progress in coding theory is largely driven by individual human ingenuity with sporadic breakthroughs over the past century. In this paper we study whether it is possible to automate the discovery of decoding algorithms via deep learning. We study a family of sequential codes parameterized by recurrent neural network (RNN) architectures. We show that creatively designed and trained RNN architectures can decode well known sequential codes such as the convolutional and turbo codes with close to optimal performance on the additive white Gaussian noise (AWGN) channel, which itself is achieved by breakthrough algorithms of our times (Viterbi and BCJR decoders, representing dynamic programing and forwardbackward algorithms). We show strong generalizations, i.e., we train at a specific signal to noise ratio and block length but test at a wide range of these quantities, as well as robustness and adaptivity to deviations from the AWGN setting.
05/23/2018 ∙ by Hyeji Kim, et al. ∙ 2 ∙ shareread it

LEARN Codes: Inventing Lowlatency Codes via Recurrent Neural Networks
Designing channel codes under low latency constraints is one of the most demanding requirements in 5G standards. However, sharp characterizations of the performances of traditional codes are only available in the large blocklength limit. Code designs are guided by those asymptotic analyses and require large block lengths and long latency to achieve the desired error rate. Furthermore, when the codes designed for one channel (e.g. Additive White Gaussian Noise (AWGN) channel) are used for another (e.g. nonAWGN channels), heuristics are necessary to achieve any nontrivial performance thereby severely lacking in robustness as well as adaptivity. Obtained by jointly designing Recurrent Neural Network (RNN) based encoder and decoder, we propose an endtoend learned neural code which outperforms canonical convolutional code under block settings. With this gained experience of designing a novel neural block code, we propose a new class of codes under low latency constraint  Lowlatency Efficient Adaptive Robust Neural (LEARN) codes, which outperforms the stateoftheart low latency codes as well as exhibits robustness and adaptivity properties. LEARN codes show the potential of designing new versatile and universal codes for future communications via tools of modern deep learning coupled with communication engineering insights.
11/30/2018 ∙ by Yihan Jiang, et al. ∙ 2 ∙ shareread it

Discovering Potential Correlations via Hypercontractivity
Discovering a correlation from one variable to another variable is of fundamental scientific and practical interest. While existing correlation measures are suitable for discovering average correlation, they fail to discover hidden or potential correlations. To bridge this gap, (i) we postulate a set of natural axioms that we expect a measure of potential correlation to satisfy; (ii) we show that the rate of information bottleneck, i.e., the hypercontractivity coefficient, satisfies all the proposed axioms; (iii) we provide a novel estimator to estimate the hypercontractivity coefficient from samples; and (iv) we provide numerical experiments demonstrating that this proposed estimator discovers potential correlations among various indicators of WHO datasets, is robust in discovering gene interactions from gene expression time series data, and is statistically more powerful than the estimators for other correlation measures in binary hypothesis testing of canonical examples of potential correlations.
09/12/2017 ∙ by Hyeji Kim, et al. ∙ 0 ∙ shareread it

Learning from Comparisons and Choices
When tracking userspecific online activities, each user's preference is revealed in the form of choices and comparisons. For example, a user's purchase history tracks her choices, i.e. which item was chosen among a subset of offerings. A user's comparisons are observed either explicitly as in movie ratings or implicitly as in viewing times of news articles. Given such individualized ordinal data, we address the problem of collaboratively learning representations of the users and the items. The learned features can be used to predict a user's preference of an unseen item to be used in recommendation systems. This also allows one to compute similarities among users and items to be used for categorization and search. Motivated by the empirical successes of the MultiNomial Logit (MNL) model in marketing and transportation, and also more recent successes in word embedding and crowdsourced image embedding, we pose this problem as learning the MNL model parameters that best explains the data. We propose a convex optimization for learning the MNL model, and show that it is minimax optimal up to a logarithmic factor by comparing its performance to a fundamental lower bound. This characterizes the minimax sample complexity of the problem, and proves that the proposed estimator cannot be improved upon other than by a logarithmic factor. Further, the analysis identifies how the accuracy depends on the topology of sampling via the spectrum of the sampling graph. This provides a guideline for designing surveys when one can choose which items are to be compared. This is accompanies by numerical simulations on synthetic and real datasets confirming our theoretical predictions.
04/24/2017 ∙ by Sahand Negahban, et al. ∙ 0 ∙ shareread it

Iterative Bayesian Learning for Crowdsourced Regression
Crowdsourcing platforms emerged as popular venues for purchasing human intelligence at low cost for large volumes of tasks. As many lowpaid workers are prone to give noisy answers, one of the fundamental questions is how to identify more reliable workers and exploit this heterogeneity to infer the true answers accurately. Despite significant research efforts for classification tasks with discrete answers, little attention has been paid to regression tasks with continuous answers. The popular DawidSkene model for discrete answers has the algorithmic and mathematical simplicity in relation to lowrank structures. But it does not generalize for continuous valued answers. To this end, we introduce a new probabilistic model for crowdsourced regression capturing the heterogeneity of the workers, generalizing the DawidSkene model to the continuous domain. We design a messagepassing algorithm for Bayesian inference inspired by the popular belief propagation algorithm. We showcase its performance first by proving that it achieves a near optimal mean squared error by comparing it to an oracle estimator. Asymptotically, we can provide a tighter analysis showing that the proposed algorithm achieves the exact optimal performance. We next show synthetic experiments confirming our theoretical predictions. As a practical application, we further emulate a crowdsourcing system reproducing PASCAL visual object classes datasets and show that denoising the crowdsourced data from the proposed scheme can significantly improve the performance for the vision task.
02/28/2017 ∙ by Jungseul Ok, et al. ∙ 0 ∙ shareread it

Breaking the Bandwidth Barrier: Geometrical Adaptive Entropy Estimation
Estimators of information theoretic measures such as entropy and mutual information are a basic workhorse for many downstream applications in modern data science. State of the art approaches have been either geometric (nearest neighbor (NN) based) or kernel based (with a globally chosen bandwidth). In this paper, we combine both these approaches to design new estimators of entropy and mutual information that outperform state of the art methods. Our estimator uses local bandwidth choices of kNN distances with a finite k, independent of the sample size. Such a local and data dependent choice improves performance in practice, but the bandwidth is vanishing at a fast rate, leading to a nonvanishing bias. We show that the asymptotic bias of the proposed estimator is universal; it is independent of the underlying distribution. Hence, it can be precomputed and subtracted from the estimate. As a byproduct, we obtain a unified way of obtaining both kernel and NN estimators. The corresponding theoretical contribution relating the asymptotic geometry of nearest neighbors to order statistics is of independent mathematical interest.
09/07/2016 ∙ by Weihao Gao, et al. ∙ 0 ∙ shareread it
Sewoong Oh
is this you? claim profile
Assistant Professor at University of Illinois at UrbanaChampaign