Learning in Gated Neural Networks

06/06/2019
by   Ashok Vardhan Makkuva, et al.
0

Gating is a key feature in modern neural networks including LSTMs, GRUs and sparsely-gated deep neural networks. The backbone of such gated networks is a mixture-of-experts layer, where several experts make regression decisions and gating controls how to weigh the decisions in an input-dependent manner. Despite having such a prominent role in both modern and classical machine learning, very little is understood about parameter recovery of mixture-of-experts since gradient descent and EM algorithms are known to be stuck in local optima in such models. In this paper, we perform a careful analysis of the optimization landscape and show that with appropriately designed loss functions, gradient descent can indeed learn the parameters accurately. A key idea underpinning our results is the design of two distinct loss functions, one for recovering the expert parameters and another for recovering the gating parameters. We demonstrate the first sample complexity results for parameter recovery in this model for any algorithm and demonstrate significant performance gains over standard loss functions in numerical experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2021

A novel multi-scale loss function for classification problems in machine learning

We introduce two-scale loss functions for use in various gradient descen...
research
02/21/2018

Globally Consistent Algorithms for Mixture of Experts

Mixture-of-Experts (MoE) is a widely popular neural network architecture...
research
10/18/2017

Characterization of Gradient Dominance and Regularity Conditions for Neural Networks

The past decade has witnessed a successful application of deep learning ...
research
01/07/2021

A Comprehensive Study on Optimization Strategies for Gradient Descent In Deep Learning

One of the most important parts of Artificial Neural Networks is minimiz...
research
01/12/2022

There is a Singularity in the Loss Landscape

Despite the widespread adoption of neural networks, their training dynam...
research
08/15/2016

A Geometric Framework for Convolutional Neural Networks

In this paper, a geometric framework for neural networks is proposed. Th...
research
07/25/2023

On the Learning Dynamics of Attention Networks

Attention models are typically learned by optimizing one of three standa...

Please sign up or login with your details

Forgot password? Click here to reset