Simplifying Models with Unlabeled Output Data

06/29/2020
by   Sang Michael Xie, et al.
8

We focus on prediction problems with high-dimensional outputs that are subject to output validity constraints, e.g. a pseudocode-to-code translation task where the code must compile. For these problems, labeled input-output pairs are expensive to obtain, but "unlabeled" outputs, i.e. outputs without corresponding inputs, are freely available and provide information about output validity (e.g. code on GitHub). In this paper, we present predict-and-denoise, a framework that can leverage unlabeled outputs. Specifically, we first train a denoiser to map possibly invalid outputs to valid outputs using synthetic perturbations of the unlabeled outputs. Second, we train a predictor composed with this fixed denoiser. We show theoretically that for a family of functions with a discrete valid output space, composing with a denoiser reduces the complexity of a 2-layer ReLU network needed to represent the function and that this complexity gap can be arbitrarily large. We evaluate the framework empirically on several datasets, including image generation from attributes and pseudocode-to-code translation. On the SPoC pseudocode-to-code dataset, our framework improves the proportion of code outputs that pass all test cases by 3-4

READ FULL TEXT
research
07/26/2017

Enforcing Constraints on Outputs with Unconstrained Inference

Increasingly, practitioners apply neural networks to complex problems in...
research
11/30/2017

Toward Multimodal Image-to-Image Translation

Many image-to-image translation problems are ambiguous, as a single inpu...
research
12/08/2020

In-N-Out: Pre-Training and Self-Training using Auxiliary Information for Out-of-Distribution Robustness

Consider a prediction setting where a few inputs (e.g., satellite images...
research
08/17/2022

CCTEST: Testing and Repairing Code Completion Systems

Code completion, a highly valuable topic in the software development dom...
research
03/05/2017

A Theory of Output-Side Unsupervised Domain Adaptation

When learning a mapping from an input space to an output space, the assu...
research
03/08/2023

Automatically Auditing Large Language Models via Discrete Optimization

Auditing large language models for unexpected behaviors is critical to p...
research
05/27/2018

Adversarial Constraint Learning for Structured Prediction

Constraint-based learning reduces the burden of collecting labels by hav...

Please sign up or login with your details

Forgot password? Click here to reset