Large Scale Learning of General Visual Representations for Transfer
Transfer of pre-trained representations improves sample efficiency and simplifies hyperparameter tuning when training deep neural networks for vision. We revisit the paradigm of pre-training on large supervised datasets and fine-tuning the weights on the target task. We scale up pre-training, and create a simple recipe that we call Big Transfer (BiT). By combining a few carefully selected components, and transferring using a simple heuristic, we achieve strong performance on over 20 datasets. BiT performs well across a surprisingly wide range of data regimes - from 10 to 1M labeled examples. BiT achieves 87.8 the Visual Task Adaptation Benchmark (which includes 19 tasks). On small datasets, BiT attains 86.4 97.6 the main components that lead to high transfer performance.
READ FULL TEXT