DeepAI AI Chat
Log In Sign Up

What Is Considered Complete for Visual Recognition?

by   Lingxi Xie, et al.

This is an opinion paper. We hope to deliver a key message that current visual recognition systems are far from complete, i.e., recognizing everything that human can recognize, yet it is very unlikely that the gap can be bridged by continuously increasing human annotations. Based on the observation, we advocate for a new type of pre-training task named learning-by-compression. The computational models (e.g., a deep network) are optimized to represent the visual data using compact features, and the features preserve the ability to recover the original data. Semantic annotations, when available, play the role of weak supervision. An important yet challenging issue is the evaluation of image recovery, where we suggest some design principles and future research directions. We hope our proposal can inspire the community to pursue the compression-recovery tradeoff rather than the accuracy-complexity tradeoff.


page 2

page 4

page 5

page 6


Atoms of recognition in human and computer vision

Discovering the visual features and representations used by the brain to...

VASR: Visual Analogies of Situation Recognition

A core process in human cognition is analogical mapping: the ability to ...

SparseFormer: Sparse Visual Recognition via Limited Latent Tokens

Human visual recognition is a sparse process, where only a few salient v...

Methods for Estimating and Improving Robustness of Language Models

Despite their outstanding performance, large language models (LLMs) suff...

A Survey on Label-efficient Deep Segmentation: Bridging the Gap between Weak Supervision and Dense Prediction

The rapid development of deep learning has made a great progress in segm...

The Effect of Visual Design in Image Classification

Financial companies continuously analyze the state of the markets to ret...