Evaluating representations by the complexity of learning low-loss predictors

09/15/2020
by   William F. Whitney, et al.
7

We consider the problem of evaluating representations of data for use in solving a downstream task. We propose to measure the quality of a representation by the complexity of learning a predictor on top of the representation that achieves low loss on a task of interest, and introduce two methods, surplus description length (SDL) and ε sample complexity (εSC). In contrast to prior methods, which measure the amount of information about the optimal predictor that is present in a specific amount of data, our methods measure the amount of information needed from the data to recover an approximation of the optimal predictor up to a specified tolerance. We present a framework to compare these methods based on plotting the validation loss versus training set size (the "loss-data" curve). Existing measures, such as mutual information and minimum description length probes, correspond to slices and integrals along the data-axis of the loss-data curve, while ours correspond to slices and integrals along the loss-axis. We provide experiments on real data to compare the behavior of each of these methods over datasets of varying size along with a high performance open source library for representation evaluation at https://github.com/willwhitney/reprieve.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

09/20/2020

Latent Representation Prediction Networks

Deeply-learned planning methods are often based on learning representati...
08/03/2020

Low-loss connection of weight vectors: distribution-based approaches

Recent research shows that sublevel sets of the loss surfaces of overpar...
03/27/2020

Information-Theoretic Probing with Minimum Description Length

To measure how well pretrained representations encode some linguistic pr...
03/28/2019

Wasserstein Dependency Measure for Representation Learning

Mutual information maximization has emerged as a powerful learning objec...
06/05/2017

ToPs: Ensemble Learning with Trees of Predictors

We present a new approach to ensemble learning. Our approach constructs ...
02/26/2018

Learning Anonymized Representations with Adversarial Neural Networks

Statistical methods protecting sensitive information or the identity of ...
07/10/2021

Prediction of concept lengths for fast concept learning in description logics

Concept learning approaches based on refinement operators explore partia...

Code Repositories

reprieve

A library for evaluating representations.


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.