
InverseNet: Solving Inverse Problems with Splitting Networks
We propose a new method that uses deep learning techniques to solve the inverse problems. The inverse problem is cast in the form of learning an endtoend mapping from observed data to the groundtruth. Inspired by the splitting strategy widely used in regularized iterative algorithm to tackle inverse problems, the mapping is decomposed into two networks, with one handling the inversion of the physical forward model associated with the data term and one handling the denoising of the output from the former network, i.e., the inverted version, associated with the prior/regularization term. The two networks are trained jointly to learn the endtoend mapping, getting rid of a twostep training. The training is annealing as the intermediate variable between these two networks bridges the gap between the input (the degraded version of output) and output and progressively approaches to the groundtruth. The proposed network, referred to as InverseNet, is flexible in the sense that most of the existing endtoend network structure can be leveraged in the first network and most of the existing denoising network structure can be used in the second one. Extensive experiments on both synthetic data and real datasets on the tasks, motion deblurring, superresolution, and colorization, demonstrate the efficiency and accuracy of the proposed method compared with other image processing algorithms.
12/01/2017 ∙ by Kai Fan, et al. ∙ 0 ∙ shareread it

ZeroShot Learning via ClassConditioned Deep Generative Models
We present a deep generative model for learning to predict classes not seen at training time. Unlike most existing methods for this problem, that represent each class as a point (via a semantic embedding), we represent each seen/unseen class using a classspecific latentspace distribution, conditioned on class attributes. We use these latentspace distributions as a prior for a supervised variational autoencoder (VAE), which also facilitates learning highly discriminative feature representations for the inputs. The entire framework is learned endtoend using only the seenclass training data. The model infers corresponding attributes of a test image by maximizing the VAE lower bound; the inferred attributes may be linked to labels not seen when training. We further extend our model to a (1) semisupervised/transductive setting by leveraging unlabeled unseenclass data via an unsupervised learning module, and (2) fewshot learning where we also have a small number of labeled inputs from the unseen classes. We compare our model with several stateoftheart methods through a comprehensive set of experiments on a variety of benchmark data sets.
11/15/2017 ∙ by Wenlin Wang, et al. ∙ 0 ∙ shareread it

A Convergence Analysis for A Class of Practical VarianceReduction Stochastic Gradient MCMC
Stochastic gradient Markov Chain Monte Carlo (SGMCMC) has been developed as a flexible family of scalable Bayesian sampling algorithms. However, there has been little theoretical analysis of the impact of minibatch size to the algorithm's convergence rate. In this paper, we prove that under a limited computational budget/time, a larger minibatch size leads to a faster decrease of the mean squared error bound (thus the fastest one corresponds to using full gradients), which motivates the necessity of variance reduction in SGMCMC. Consequently, by borrowing ideas from stochastic optimization, we propose a practical variancereduction technique for SGMCMC, that is efficient in both computation and storage. We develop theory to prove that our algorithm induces a faster convergence rate than standard SGMCMC. A number of largescale experiments, ranging from Bayesian learning of logistic regression to deep neural networks, validate the theory and demonstrate the superiority of the proposed variancereduction SGMCMC framework.
09/04/2017 ∙ by Changyou Chen, et al. ∙ 0 ∙ shareread it

ContinuousTime Flows for Deep Generative Models
Normalizing flows have been developed recently as a method for drawing samples from an arbitrary distribution. This method is attractive due to its intrinsic ability to approximate a target distribution arbitrarily well. In practice, however, normalizing flows only consist of a finite number of deterministic transformations, and thus there is no guarantees on the approximation accuracy. In this paper we study the problem of learning deep generative models with continuoustime flows (CTFs), a family of diffusionbased methods that are able to asymptotically approach a target distribution. We discretize the CTF to make training feasible, and develop theory on the approximation error. A framework is then adopted to distill knowledge from a CTF to an efficient inference network. We apply the technique to deep generative models, including a CTFbased variational autoencoder and an adversarialnetworklike density estimator. Experiments on various tasks demonstrate the superiority of the proposed CTF framework compared to existing techniques.
09/04/2017 ∙ by Changyou Chen, et al. ∙ 0 ∙ shareread it

Topic Compositional Neural Language Model
We propose a Topic Compositional Neural Language Model (TCNLM), a novel method designed to simultaneously capture both the global semantic meaning and the local word ordering structure in a document. The TCNLM learns the global semantic coherence of a document via a neural topic model, and the probability of each learned latent topic is further used to build a MixtureofExperts (MoE) language model, where each expert (corresponding to one topic) is a recurrent neural network (RNN) that accounts for learning the local structure of a word sequence. In order to train the MoE model efficiently, a matrix factorization method is applied, by extending each weight matrix of the RNN to be an ensemble of topicdependent weight matrices. The degree to which each member of the ensemble is used is tied to the documentdependent probability of the corresponding topics. Experimental results on several corpora show that the proposed approach outperforms both a pure RNNbased model and other topicguided language models. Further, our model yields sensible topics, and also has the capacity to generate meaningful sentences conditioned on given topics.
12/28/2017 ∙ by Wenlin Wang, et al. ∙ 0 ∙ shareread it

Wide Compression: Tensor Ring Nets
Deep neural networks have demonstrated stateoftheart performance in a variety of realworld applications. In order to obtain performance gains, these networks have grown larger and deeper, containing millions or even billions of parameters and over a thousand layers. The tradeoff is that these large architectures require an enormous amount of memory, storage, and computation, thus limiting their usability. Inspired by the recent tensor ring factorization, we introduce Tensor Ring Networks (TRNets), which significantly compress both the fully connected layers and the convolutional layers of deep neural networks. Our results show that our TRNets approach is able to compress LeNet5 by 11× without losing accuracy, and can compress the stateoftheart Wide ResNet by 243× with only 2.3% degradation in Cifar10 image classification. Overall, this compression scheme shows promise in scientific computing and deep learning, especially for emerging resourceconstrained devices such as smartphones, wearables, and IoT devices.
02/25/2018 ∙ by Wenqi Wang, et al. ∙ 0 ∙ shareread it

Joint Embedding of Words and Labels for Text Classification
Word embeddings are effective intermediate representations for capturing semantic regularities between words, when learning the representations of text sequences. We propose to view text classification as a labelword joint embedding problem: each label is embedded in the same space with the word vectors. We introduce an attention framework that measures the compatibility of embeddings between text sequences and labels. The attention is learned on a training set of labeled samples to ensure that, given a text sequence, the relevant words are weighted higher than the irrelevant ones. Our method maintains the interpretability of word embeddings, and enjoys a builtin ability to leverage alternative sources of information, in addition to input text sequences. Extensive results on the several large text datasets show that the proposed framework outperforms the stateoftheart methods by a large margin, in terms of both accuracy and speed.
05/10/2018 ∙ by Guoyin Wang, et al. ∙ 0 ∙ shareread it

A Unified ParticleOptimization Framework for Scalable Bayesian Sampling
There has been recent interest in developing scalable Bayesian sampling methods for bigdata analysis, such as stochastic gradient MCMC (SGMCMC) and Stein variational gradient descent (SVGD). A standard SGMCMC algorithm simulates samples from a discretetime Markov chain to approximate a target distribution, thus samples could be highly correlated, an undesired property for SGMCMC. In contrary, SVGD directly optimizes a set of particles to approximate a target distribution, and thus is able to obtain good approximate with relatively much fewer samples. In this paper, we propose a principle particleoptimization framework based on Wasserstein gradient flows to unify SGMCMC and SVGD, and to allow new algorithms to be developed. Our framework interprets SGMCMC as particle optimization, revealing strong connections between SGMCMC and SVGD. The key component of our framework is several particleapproximate techniques to efficiently solve the original partial differential equations on the space of probability measures. Extensive experiments on both synthetic data and deep neural networks demonstrate the effectiveness and efficiency of our framework for scalable Bayesian sampling.
05/29/2018 ∙ by Changyou Chen, et al. ∙ 0 ∙ shareread it

Baseline Needs More Love: On Simple WordEmbeddingBased Models and Associated Pooling Mechanisms
Many deep learning architectures have been proposed to model the compositionality in text sequences, requiring a substantial number of parameters and expensive computations. However, there has not been a rigorous evaluation regarding the added value of sophisticated compositional functions. In this paper, we conduct a pointbypoint comparative study between Simple WordEmbeddingbased Models (SWEMs), consisting of parameterfree pooling operations, relative to wordembeddingbased RNN/CNN models. Surprisingly, SWEMs exhibit comparable or even superior performance in the majority of cases considered. Based upon this understanding, we propose two additional pooling strategies over learned word embeddings: (i) a maxpooling operation for improved interpretability; and (ii) a hierarchical pooling operation, which preserves spatial (ngram) information within text sequences. We present experiments on 17 datasets encompassing three tasks: (i) (long) document classification; (ii) text sequence matching; and (iii) short text tasks, including classification and tagging. The source code and datasets can be obtained from https:// github.com/dinghanshen/SWEM.
05/24/2018 ∙ by Dinghan Shen, et al. ∙ 0 ∙ shareread it

NASH: Toward EndtoEnd Neural Architecture for Generative Semantic Hashing
Semantic hashing has become a powerful paradigm for fast similarity search in many information retrieval systems. While fairly successful, previous techniques generally require twostage training, and the binary constraints are handled adhoc. In this paper, we present an endtoend Neural Architecture for Semantic Hashing (NASH), where the binary hashing codes are treated as Bernoulli latent variables. A neural variational inference framework is proposed for training, where gradients are directly backpropagated through the discrete latent variable to optimize the hash function. We also draw connections between proposed method and ratedistortion theory, which provides a theoretical foundation for the effectiveness of the proposed framework. Experimental results on three public datasets demonstrate that our method significantly outperforms several stateoftheart models on both unsupervised and supervised scenarios.
05/14/2018 ∙ by Dinghan Shen, et al. ∙ 0 ∙ shareread it

Distilled Wasserstein Learning for Word Embedding and Topic Modeling
We propose a novel Wasserstein method with a distillation mechanism, yielding joint learning of word embeddings and topics. The proposed method is based on the fact that the Euclidean distance between word embeddings may be employed as the underlying distance in the Wasserstein topic model. The word distributions of topics, their optimal transports to the word distributions of documents, and the embeddings of words are learned in a unified framework. When learning the topic model, we leverage a distilled underlying distance matrix to update the topic distributions and smoothly calculate the corresponding optimal transports. Such a strategy provides the updating of word embeddings with robust guidance, improving the algorithmic convergence. As an application, we focus on patient admission records, in which the proposed method embeds the codes of diseases and procedures and learns the topics of admissions, obtaining superior performance on clinicallymeaningful disease network construction, mortality prediction as a function of admission codes, and procedure recommendation.
09/12/2018 ∙ by Hongteng Xu, et al. ∙ 0 ∙ shareread it
Wenlin Wang
is this you? claim profile