Understanding Dynamics of Nonlinear Representation Learning and Its Application

06/28/2021
by   Kenji Kawaguchi, et al.
0

Representations of the world environment play a crucial role in machine intelligence. It is often inefficient to conduct reasoning and inference directly in the space of raw sensory representations, such as pixel values of images. Representation learning allows us to automatically discover suitable representations from raw sensory data. For example, given raw sensory data, a multilayer perceptron learns nonlinear representations at its hidden layers, which are subsequently used for classification (or regression) at its output layer. This happens implicitly during training through minimizing a supervised or unsupervised loss. In this paper, we study the dynamics of such implicit nonlinear representation learning. We identify a pair of a new assumption and a novel condition, called the common model structure assumption and the data-architecture alignment condition. Under the common model structure assumption, the data-architecture alignment condition is shown to be sufficient for the global convergence and necessary for the global optimality. Our results provide practical guidance for designing a model structure: e.g., the common model structure assumption can be used as a justification for using a particular model structure instead of others. As an application, we then derive a new training framework, which satisfies the data-architecture alignment condition without assuming it by automatically modifying any given training algorithm dependently on each data and architecture. Given a standard training algorithm, the framework running its modified version is empirically shown to maintain competitive (practical) test performances while providing global convergence guarantees for ResNet-18 with convolutions, skip connections, and batch normalization with standard benchmark datasets, including MNIST, CIFAR-10, CIFAR-100, Semeion, KMNIST and SVHN.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/01/2018

Representation Learning in Partially Observable Environments using Sensorimotor Prediction

In order to explore and act autonomously in an environment, an agent nee...
research
04/12/2021

A Recipe for Global Convergence Guarantee in Deep Neural Networks

Existing global convergence guarantees of (stochastic) gradient descent ...
research
10/28/2022

When does mixup promote local linearity in learned representations?

Mixup is a regularization technique that artificially produces new sampl...
research
08/18/2021

Generalizing MLPs With Dropouts, Batch Normalization, and Skip Connections

A multilayer perceptron (MLP) is typically made of multiple fully connec...
research
02/24/2021

Nonlinear Invariant Risk Minimization: A Causal Approach

Due to spurious correlations, machine learning systems often fail to gen...
research
04/08/2021

Detecting of a Patient's Condition From Clinical Narratives Using Natural Language Representation

This paper proposes a joint clinical natural language representation lea...
research
08/09/2023

When and How Does Known Class Help Discover Unknown Ones? Provable Understanding Through Spectral Analysis

Novel Class Discovery (NCD) aims at inferring novel classes in an unlabe...

Please sign up or login with your details

Forgot password? Click here to reset