Enhancing Few-shot Image Classification with Cosine Transformer

by   Quang Huy Nguyen, et al.

This paper addresses the few-shot image classification problem. One notable limitation of few-shot learning is the variation in describing the same category, which might result in a significant difference between small labeled support and large unlabeled query sets. Our approach is to obtain a relation heatmap between the two sets in order to label the latter one in a transductive setting manner. This can be solved by using cross-attention with the scaled dot-product mechanism. However, the magnitude differences between two separate sets of embedding vectors may cause a significant impact on the output attention map and affect model performance. We tackle this problem by improving the attention mechanism with cosine similarity. Specifically, we develop FS-CT (Few-shot Cosine Transformer), a few-shot image classification method based on prototypical embedding and transformer-based framework. The proposed Cosine attention improves FS-CT performances significantly from nearly 5 in accuracy compared to the baseline scaled dot-product attention in various scenarios on three few-shot datasets mini-ImageNet, CUB-200, and CIFAR-FS. Additionally, we enhance the prototypical embedding for categorical representation with learnable weights before feeding them to the attention module. Our proposed method FS-CT along with the Cosine attention is simple to implement and can be applied for a wide range of applications. Our codes are available at https://github.com/vinuni-vishc/Few-Shot-Cosine-Transformer


page 14

page 19


Neural Attention Memory

We propose a novel perspective of the attention mechanism by reinventing...

The Document Vectors Using Cosine Similarity Revisited

The current state-of-the-art test accuracy (97.42%) on the IMDB movie re...

Self-Supervised Learning For Few-Shot Image Classification

Few-shot image classification aims to classify unseen classes with limit...

Few-Shot Learning Meets Transformer: Unified Query-Support Transformers for Few-Shot Classification

Few-shot classification which aims to recognize unseen classes using ver...

ATRM: Attention-based Task-level Relation Module for GNN-based Few-shot Learning

Recently, graph neural networks (GNNs) have shown powerful ability to ha...

A Universal Representation Transformer Layer for Few-Shot Image Classification

Few-shot classification aims to recognize unseen classes when presented ...

KS-DETR: Knowledge Sharing in Attention Learning for Detection Transformer

Scaled dot-product attention applies a softmax function on the scaled do...

Please sign up or login with your details

Forgot password? Click here to reset