Knowledge Matters: Importance of Prior Information for Optimization

01/17/2013
by   Caglar Gulcehre, et al.
0

We explore the effect of introducing prior information into the intermediate level of neural networks for a learning task on which all the state-of-the-art machine learning algorithms tested failed to learn. We motivate our work from the hypothesis that humans learn such intermediate concepts from other individuals via a form of supervision or guidance using a curriculum. The experiments we have conducted provide positive evidence in favor of this hypothesis. In our experiments, a two-tiered MLP architecture is trained on a dataset with 64x64 binary inputs images, each image with three sprites. The final task is to decide whether all the sprites are the same or one of them is different. Sprites are pentomino tetris shapes and they are placed in an image with different locations using scaling and rotation transformations. The first part of the two-tiered MLP is pre-trained with intermediate-level targets being the presence of sprites at each location, while the second part takes the output of the first part as input and predicts the final task's target binary event. The two-tiered MLP architecture, with a few tens of thousand examples, was able to learn the task perfectly, whereas all other algorithms (include unsupervised pre-training, but also traditional algorithms like SVMs, decision trees and boosting) all perform no better than chance. We hypothesize that the optimization difficulty involved when the intermediate pre-training is not performed is due to the composition of two highly non-linear tasks. Our findings are also consistent with hypotheses on cultural learning inspired by the observations of optimization problems with deep learning, presumably because of effective local minima.

READ FULL TEXT

page 7

page 13

page 14

page 16

page 24

page 33

research
04/02/2021

TAPAS at SemEval-2021 Task 9: Reasoning over tables with intermediate pre-training

We present the TAPAS contribution to the Shared Task on Statement Verifi...
research
06/21/2021

Does Optimal Source Task Performance Imply Optimal Pre-training for a Target Task?

Pre-trained deep nets are commonly used to improve accuracies and traini...
research
03/14/2012

Evolving Culture vs Local Minima

We propose a theory that relates difficulty of learning in deep architec...
research
04/18/2021

On the Influence of Masking Policies in Intermediate Pre-training

Current NLP models are predominantly trained through a pretrain-then-fin...
research
06/18/2022

Replacing Labeled Real-image Datasets with Auto-generated Contours

In the present work, we show that the performance of formula-driven supe...
research
06/08/2021

Coarse-to-Fine Curriculum Learning

When faced with learning challenging new tasks, humans often follow sequ...

Please sign up or login with your details

Forgot password? Click here to reset