DeepAI AI Chat
Log In Sign Up

On The Power of Curriculum Learning in Training Deep Networks

by   Guy Hacohen, et al.

Training neural networks is traditionally done by providing a sequence of random mini-batches sampled uniformly from the entire training data. In this work, we analyze the effects of curriculum learning, which involves the dynamic non-uniform sampling of mini-batches, on the training of deep networks, and specifically CNNs trained on image recognition. To employ curriculum learning, the training algorithm must resolve 2 problems: (i) sort the training examples by difficulty; (ii) compute a series of mini-batches that exhibit an increasing level of difficulty. We address challenge (i) using two methods: transfer learning from some competitive "teacher" network, and bootstrapping. We show that both methods show similar benefits in terms of increased learning speed and improved final performance on test data. We address challenge (ii) by investigating different pacing functions to guide the sampling. The empirical investigation includes a variety of network architectures, using images from CIFAR-10, CIFAR-100 and subsets of ImageNet. We conclude with a novel theoretical analysis of curriculum learning, where we show how it effectively modifies the optimization landscape. We then define the concept of an ideal curriculum, and show that under mild conditions it does not change the corresponding global minimum of the optimization function.


page 1

page 2

page 3

page 4


Curriculum Learning by Transfer Learning: Theory and Experiments with Deep Networks

Our first contribution in this paper is a theoretical investigation of c...

Curriculum Learning Strategies for IR: An Empirical Study on Conversation Response Ranking

Neural ranking models are traditionally trained on a series of random ba...

Curriculum Based Multi-Task Learning for Parkinson's Disease Detection

There is great interest in developing radiological classifiers for diagn...

LeRaC: Learning Rate Curriculum

Most curriculum learning methods require an approach to sort the data sa...

A bandit approach to curriculum generation for automatic speech recognition

The Automated Speech Recognition (ASR) task has been a challenging domai...

Improving the Accuracy of Early Exits in Multi-Exit Architectures via Curriculum Learning

Deploying deep learning services for time-sensitive and resource-constra...

Skip Connections Eliminate Singularities

Skip connections made the training of very deep networks possible and ha...