Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

01/06/2022
by   Alethea Power, et al.
22

In this paper we propose to study generalization of neural networks on small algorithmically generated datasets. In this setting, questions about data efficiency, memorization, generalization, and speed of learning can be studied in great detail. In some situations we show that neural networks learn through a process of "grokking" a pattern in the data, improving generalization performance from random chance level to perfect generalization, and that this improvement in generalization can happen well past the point of overfitting. We also study generalization as a function of dataset size and find that smaller datasets require increasing amounts of optimization for generalization. We argue that these datasets provide a fertile ground for studying a poorly understood aspect of deep learning: generalization of overparametrized neural networks beyond memorization of the finite training dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/19/2015

Reducing Overfitting in Deep Networks by Decorrelating Representations

One major challenge in training Deep Neural Networks is preventing overf...
research
10/03/2022

Omnigrok: Grokking Beyond Algorithmic Data

Grokking, the unusual phenomenon for algorithmic datasets where generali...
research
02/02/2013

A New Constructive Method to Optimize Neural Network Architecture and Generalization

In this paper, after analyzing the reasons of poor generalization and ov...
research
06/15/2020

Weighted Optimization: better generalization by smoother interpolation

We provide a rigorous analysis of how implicit bias towards smooth inter...
research
08/13/2021

Datasets for Studying Generalization from Easy to Hard Examples

We describe new datasets for studying generalization from easy to hard e...
research
04/16/2018

Compressibility and Generalization in Large-Scale Deep Learning

Modern neural networks are highly overparameterized, with capacity to su...
research
07/30/2022

Delving into Effective Gradient Matching for Dataset Condensation

As deep learning models and datasets rapidly scale up, network training ...

Please sign up or login with your details

Forgot password? Click here to reset