Provable benefits of representation learning

06/14/2017
by   Sanjeev Arora, et al.
0

There is general consensus that learning representations is useful for a variety of reasons, e.g. efficient use of labeled data (semi-supervised learning), transfer learning and understanding hidden structure of data. Popular techniques for representation learning include clustering, manifold learning, kernel-learning, autoencoders, Boltzmann machines, etc. To study the relative merits of these techniques, it's essential to formalize the definition and goals of representation learning, so that they are all become instances of the same definition. This paper introduces such a formal framework that also formalizes the utility of learning the representation. It is related to previous Bayesian notions, but with some new twists. We show the usefulness of our framework by exhibiting simple and natural settings -- linear mixture models and loglinear models, where the power of representation learning can be formally shown. In these examples, representation learning can be performed provably and efficiently under plausible assumptions (despite being NP-hard), and furthermore: (i) it greatly reduces the need for labeled data (semi-supervised learning) and (ii) it allows solving classification tasks when simpler approaches like nearest neighbors require too much data (iii) it is more powerful than manifold learning methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2020

Pseudo-Representation Labeling Semi-Supervised Learning

In recent years, semi-supervised learning (SSL) has shown tremendous suc...
research
05/10/2020

Supervision and Source Domain Impact on Representation Learning: A Histopathology Case Study

As many algorithms depend on a suitable representation of data, learning...
research
02/22/2016

Semi-supervised Clustering for Short Text via Deep Representation Learning

In this work, we propose a semi-supervised method for short text cluster...
research
07/25/2023

Speech representation learning: Learning bidirectional encoders with single-view, multi-view, and multi-task methods

This thesis focuses on representation learning for sequence data over ti...
research
12/08/2021

Constrained Mean Shift Using Distant Yet Related Neighbors for Representation Learning

We are interested in representation learning in self-supervised, supervi...
research
10/30/2019

A Unified Framework for Data Poisoning Attack to Graph-based Semi-supervised Learning

In this paper, we proposed a general framework for data poisoning attack...
research
11/05/2018

Representation Learning by Reconstructing Neighborhoods

Since its introduction, unsupervised representation learning has attract...

Please sign up or login with your details

Forgot password? Click here to reset