The Curious Layperson: Fine-Grained Image Recognition without Expert Labels

11/05/2021
by   Subhabrata Choudhury, et al.
9

Most of us are not experts in specific fields, such as ornithology. Nonetheless, we do have general image and language understanding capabilities that we use to match what we see to expert resources. This allows us to expand our knowledge and perform novel tasks without ad-hoc external supervision. On the contrary, machines have a much harder time consulting expert-curated knowledge bases unless trained specifically with that knowledge in mind. Thus, in this paper we consider a new problem: fine-grained image recognition without expert annotations, which we address by leveraging the vast knowledge available in web encyclopedias. First, we learn a model to describe the visual appearance of objects using non-expert image descriptions. We then train a fine-grained textual similarity model that matches image descriptions with documents on a sentence-level basis. We evaluate the method on two datasets and compare with several strong baselines and the state of the art in cross-modal retrieval. Code is available at: https://github.com/subhc/clever

READ FULL TEXT

page 1

page 2

page 5

page 9

research
07/21/2023

Generating Image-Specific Text Improves Fine-grained Image Classification

Recent vision-language models outperform vision-only models on many imag...
research
08/19/2021

Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

Attention mechanism has demonstrated great potential in fine-grained vis...
research
08/21/2023

UbiPhysio: Support Daily Functioning, Fitness, and Rehabilitation with Action Understanding and Feedback in Natural Language

We introduce UbiPhysio, a milestone framework that delivers fine-grained...
research
05/21/2021

Sharing Pain: Using Domain Transfer Between Pain Types for Recognition of Sparse Pain Expressions in Horses

Orthopedic disorders are a common cause for euthanasia among horses, whi...
research
03/29/2022

Fine-Grained Visual Entailment

Visual entailment is a recently proposed multimodal reasoning task where...
research
03/30/2017

Dynamic Computational Time for Visual Attention

We propose a dynamic computational time model to accelerate the average ...
research
03/16/2023

ELFIS: Expert Learning for Fine-grained Image Recognition Using Subsets

Fine-Grained Visual Recognition (FGVR) tackles the problem of distinguis...

Please sign up or login with your details

Forgot password? Click here to reset