On the Provable Generalization of Recurrent Neural Networks

by   Lifu Wang, et al.

Recurrent Neural Network (RNN) is a fundamental structure in deep learning. Recently, some works study the training process of over-parameterized neural networks, and show that over-parameterized networks can learn functions in some notable concept classes with a provable generalization error bound. In this paper, we analyze the training and generalization for RNNs with random initialization, and provide the following improvements over recent works: 1) For a RNN with input sequence x=(X_1,X_2,...,X_L), previous works study to learn functions that are summation of f(β^T_lX_l) and require normalized conditions that ||X_l||≤ϵ with some very small ϵ depending on the complexity of f. In this paper, using detailed analysis about the neural tangent kernel matrix, we prove a generalization error bound to learn such functions without normalized conditions and show that some notable concept classes are learnable with the numbers of iterations and samples scaling almost-polynomially in the input length L. 2) Moreover, we prove a novel result to learn N-variables functions of input sequence with the form f(β^T[X_l_1,...,X_l_N]), which do not belong to the “additive” concept class, i,e., the summation of function f(X_l). And we show that when either N or l_0=max(l_1,..,l_N)-min(l_1,..,l_N) is small, f(β^T[X_l_1,...,X_l_N]) will be learnable with the number iterations and samples scaling almost-polynomially in the input length L.


page 1

page 2

page 3

page 4


Can SGD Learn Recurrent Neural Networks with Provable Generalization?

Recurrent Neural Networks (RNNs) are among the most popular models in se...

On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models

In this paper, we study the generalization performance of min ℓ_2-norm o...

SeqPoint: Identifying Representative Iterations of Sequence-based Neural Networks

The ubiquity of deep neural networks (DNNs) continues to rise, making th...

A Revision of Neural Tangent Kernel-based Approaches for Neural Networks

Recent theoretical works based on the neural tangent kernel (NTK) have s...

Preventing RNN from Using Sequence Length as a Feature

Recurrent neural networks are deep learning topologies that can be train...

What Can ResNet Learn Efficiently, Going Beyond Kernels?

How can neural networks such as ResNet efficiently learn CIFAR-10 with t...

The Value of Out-of-Distribution Data

More data helps us generalize to a task. But real datasets can contain o...

Please sign up or login with your details

Forgot password? Click here to reset