Towards Understanding Grokking: An Effective Theory of Representation Learning

05/20/2022
by   Ziming Liu, et al.
167

We aim to understand grokking, a phenomenon where models generalize long after overfitting their training set. We present both a microscopic analysis anchored by an effective theory and a macroscopic analysis of phase diagrams describing learning performance across hyperparameters. We find that generalization originates from structured representations whose training dynamics and dependence on training set size can be predicted by our effective theory in a toy setting. We observe empirically the presence of four learning phases: comprehension, grokking, memorization, and confusion. We find representation learning to occur only in a "Goldilocks zone" (including comprehension and grokking) between memorization and confusion. Compared to the comprehension phase, the grokking phase stays closer to the memorization phase, leading to delayed generalization. The Goldilocks phase is reminiscent of "intelligence from starvation" in Darwinian evolution, where resource limitations drive discovery of more efficient solutions. This study not only provides intuitive explanations of the origin of grokking, but also highlights the usefulness of physics-inspired tools, e.g., effective theories and phase diagrams, for understanding deep learning.

READ FULL TEXT

page 3

page 7

page 8

page 17

page 18

research
05/19/2022

Neural network topological snake models for locating general phase diagrams

Machine learning for locating phase diagram has received intensive resea...
research
02/15/2021

Compression phase is not necessary for generalization in representation learning

The outstanding performance of deep learning in various fields has been ...
research
07/22/2019

Calculating phase diagrams with ATAT

This document is a short and informal tutorial on some aspects of calcul...
research
10/03/2022

Omnigrok: Grokking Beyond Algorithmic Data

Grokking, the unusual phenomenon for algorithmic datasets where generali...
research
06/01/2022

Realistic Deep Learning May Not Fit Benignly

Studies on benign overfitting provide insights for the success of overpa...
research
02/17/2022

Limitations of Neural Collapse for Understanding Generalization in Deep Learning

The recent work of Papyan, Han, Donoho (2020) presented an intriguin...
research
01/11/2020

Intelligence, physics and information – the tradeoff between accuracy and simplicity in machine learning

How can we enable machines to make sense of the world, and become better...

Please sign up or login with your details

Forgot password? Click here to reset