
Unsupervised Text Style Transfer via Iterative Matching and Translation
Text style transfer seeks to learn how to automatically rewrite sentences from a source domain to the target domain in different styles, while simultaneously preserving their semantic contents. A major challenge in this task stems from the lack of parallel data that connects the source and target styles. Existing approaches try to disentangle content and style, but this is quite difficult and often results in poor contentpreservation and grammaticality. In contrast, we propose a novel approach by first constructing a pseudoparallel resource that aligns a subset of sentences with similar content between source and target corpus. And then a standard sequencetosequence model can be applied to learn the style transfer. Subsequently, we iteratively refine the learned style transfer function while improving upon the imperfections in our original alignment. Our method is applied to the tasks of sentiment modification and formality transfer, where it outperforms stateoftheart systems by a large margin. As an auxiliary contribution, we produced a publiclyavailable test set with humangenerated style transfers for future community use.
01/31/2019 ∙ by Zhijing Jin, et al. ∙ 28 ∙ shareread it

What made you do this? Understanding blackbox decisions with sufficient input subsets
Local explanation frameworks aim to rationalize particular decisions made by a blackbox prediction model. Existing techniques are often restricted to a specific type of predictor or based on input saliency, which may be undesirably sensitive to factors unrelated to the model's decision making process. We instead propose sufficient input subsets that identify minimal subsets of features whose observed values alone suffice for the same decision to be reached, even if all other input feature values are missing. General principles that globally govern a model's decisionmaking can also be revealed by searching for clusters of such input patterns across many data points. Our approach is conceptually straightforward, entirely modelagnostic, simply implemented using instancewise backward selection, and able to produce more concise rationales than existing techniques. We demonstrate the utility of our interpretation method on various neural network models trained on text, image, and genomic data.
10/09/2018 ∙ by Brandon Carter, et al. ∙ 22 ∙ shareread it

Learning Optimal Interventions
Our goal is to identify beneficial interventions from observational data. We consider interventions that are narrowly focused (impacting few covariates) and may be tailored to each individual or globally enacted over a population. For applications where harmful intervention is drastically worse than proposing no change, we propose a conservative definition of the optimal intervention. Assuming the underlying relationship remains invariant under intervention, we develop efficient algorithms to identify the optimal intervention policy from limited data and provide theoretical guarantees for our approach in a Gaussian Process setting. Although our methods assume covariates can be precisely adjusted, they remain capable of improving outcomes in misspecified settings where interventions incur unintentional downstream effects. Empirically, our approach identifies good interventions in two practical applications: gene perturbation and writing improvement.
06/16/2016 ∙ by Jonas Mueller, et al. ∙ 0 ∙ shareread it

Principal Differences Analysis: Interpretable Characterization of Differences between Distributions
We introduce principal differences analysis (PDA) for analyzing differences between highdimensional distributions. The method operates by finding the projection that maximizes the Wasserstein divergence between the resulting univariate populations. Relying on the CramerWold device, it requires no assumptions about the form of the underlying distributions, nor the nature of their interclass differences. A sparse variant of the method is introduced to identify features responsible for the differences. We provide algorithms for both the original minimax formulation as well as its semidefinite relaxation. In addition to deriving some convergence results, we illustrate how the approach may be applied to identify differences between cell populations in the somatosensory cortex and hippocampus as manifested by single cell RNAseq. Our broader framework extends beyond the specific choice of Wasserstein divergence.
10/30/2015 ∙ by Jonas Mueller, et al. ∙ 0 ∙ shareread it

Lowrank Bandit Methods for Highdimensional Dynamic Pricing
We consider high dimensional dynamic multiproduct pricing with an evolving but lowdimensional linear demand model. Assuming the temporal variation in crosselasticities exhibits lowrank structure based on fixed (latent) features of the products, we show that the revenue maximization problem reduces to an online bandit convex optimization with side information given by the observed demands. We design dynamic pricing algorithms whose revenue approaches that of the best fixed price vector in hindsight, at a rate that only depends on the intrinsic rank of the demand model and not the number of products. Our approach applies a bandit convex optimization algorithm in a projected lowdimensional space spanned by the latent product features, while simultaneously learning this span via online singular value decomposition of a carefullycrafted matrix containing the observed demands.
01/30/2018 ∙ by Jonas Mueller, et al. ∙ 0 ∙ shareread it

Latent Space Secrets of Denoising TextAutoencoders
While neural language models have recently demonstrated impressive performance in unconditional text generation, controllable generation and manipulation of text remain challenging. Latent variable generative models provide a natural approach for control, but their application to text has proven more difficult than to images. Models such as variational autoencoders may suffer from posterior collapse or learning an irregular latent geometry. We propose to instead employ adversarial autoencoders (AAEs) and add local perturbations by randomly replacing/removing words from input sentences during training. Within the prior enforced by the adversary, structured perturbations in the data space begin to carve and organize the latent space. Theoretically, we prove that perturbations encourage similar sentences to map to similar latent representations. Experimentally, we investigate the tradeoff between textgeneration and autoencoderreconstruction capabilities. Our straightforward approach significantly improves over regular AAEs as well as other autoencoders, and enables altering the tense/sentiment of sentences through simple addition of a fixed vector offset to their latent representation.
05/29/2019 ∙ by Tianxiao Shen, et al. ∙ 0 ∙ shareread it

Maximizing Overall Diversity for Improved Uncertainty Estimates in Deep Ensembles
The inaccuracy of neural network models on inputs that do not stem from the training data distribution is both problematic and at times unrecognized. Model uncertainty estimation can address this issue, where uncertainty estimates are often based on the variation in predictions produced by a diverse ensemble of models applied to the same input. Here we describe Maximize Overall Diversity (MOD), a straightforward approach to improve ensemblebased uncertainty estimates by encouraging larger overall diversity in ensemble predictions across all possible inputs that might be encountered in the future. When applied to various neural network ensembles, MOD significantly improves predictive performance for outofdistribution test examples without sacrificing indistribution performance on 38 ProteinDNA binding regression datasets, 9 UCI datasets, and the IMDBWiki image dataset. Across many Bayesian optimization tasks, the performance of UCB acquisition is also greatly improved by leveraging MOD uncertainty estimates.
06/18/2019 ∙ by Siddhartha Jain, et al. ∙ 0 ∙ shareread it
Jonas Mueller
is this you? claim profile