A Simple Cache Model for Image Recognition

05/21/2018
by   A. Emin Orhan, et al.
0

Training large-scale image recognition models is computationally expensive. This raises the question of whether there might be simple ways to improve the test performance of an already trained model without having to re-train or even fine-tune it with new data. Here, we show that, surprisingly, this is indeed possible. The key observation we make is that the layers of a deep network close to the output layer contain independent, easily extractable class-relevant information that is not contained in the output layer itself. We propose to extract this extra class-relevant information using a simple key-value cache memory to improve the classification performance of the model at test time. Our cache memory is directly inspired by a similar cache model previously proposed for language modeling (Grave et al., 2017). This cache component does not require any training or fine-tuning; it can be applied to any pre-trained model and, by properly setting only two hyper-parameters, leads to significant improvements in its classification performance. Improvements are observed across several architectures and datasets. In the cache component, using features extracted from layers close to the output (but not from the output layer itself) as keys leads to the largest improvements. Concatenating features from multiple layers to form keys can further improve performance over using single-layer features as keys. The cache component also has a regularizing effect, a simple consequence of which is that it substantially increases the robustness of models against adversarial attacks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/15/2023

Improving Reliability of Fine-tuning with Block-wise Optimisation

Finetuning can be used to tackle domain-specific tasks by transferring k...
research
06/02/2022

FACM: Correct the Output of Deep Neural Network with Middle Layers Features against Adversarial Samples

In the strong adversarial attacks against deep neural network (DNN), the...
research
07/10/2019

Large Memory Layers with Product Keys

This paper introduces a structured memory which can be easily integrated...
research
01/27/2021

CNN with large memory layers

This work is centred around the recently proposed product key memory str...
research
07/06/2023

Focused Transformer: Contrastive Training for Context Scaling

Large language models have an exceptional capability to incorporate new ...
research
05/26/2022

TransBoost: Improving the Best ImageNet Performance using Deep Transduction

This paper deals with deep transductive learning, and proposes TransBoos...
research
05/31/2018

A mixture model for aggregation of multiple pre-trained weak classifiers

Deep networks have gained immense popularity in Computer Vision and othe...

Please sign up or login with your details

Forgot password? Click here to reset