Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions

05/27/2022
by   Huaizu Jiang, et al.
16

A significant gap remains between today's visual pattern recognition models and human-level visual cognition especially when it comes to few-shot learning and compositional reasoning of novel concepts. We introduce Bongard-HOI, a new visual reasoning benchmark that focuses on compositional learning of human-object interactions (HOIs) from natural images. It is inspired by two desirable characteristics from the classical Bongard problems (BPs): 1) few-shot concept learning, and 2) context-dependent reasoning. We carefully curate the few-shot instances with hard negatives, where positive and negative images only disagree on action labels, making mere recognition of object categories insufficient to complete our benchmarks. We also design multiple test sets to systematically study the generalization of visual learning models, where we vary the overlap of the HOI concepts between the training and test sets of few-shot instances, from partial to no overlaps. Bongard-HOI presents a substantial challenge to today's visual recognition models. The state-of-the-art HOI detection model achieves only 62 binary prediction while even amateur human testers on MTurk have 91 With the Bongard-HOI benchmark, we hope to further advance research efforts in visual reasoning, especially in holistic perception-reasoning systems and better representation learning.

READ FULL TEXT

page 10

page 11

page 12

page 13

page 14

page 15

page 17

page 18

research
10/02/2020

Bongard-LOGO: A New Benchmark for Human-Level Concept Learning and Reasoning

Humans have an inherent ability to learn novel concepts from only a few ...
research
10/06/2020

CURI: A Benchmark for Productive Concept Learning Under Uncertainty

Humans can learn and reason under substantial uncertainty in a space of ...
research
08/11/2023

Compositional Learning in Transformer-Based Human-Object Interaction Detection

Human-object interaction (HOI) detection is an important part of underst...
research
06/11/2022

A Benchmark for Compositional Visual Reasoning

A fundamental component of human vision is our ability to parse complex ...
research
11/09/2020

Closing the Generalization Gap in One-Shot Object Detection

Despite substantial progress in object detection and few-shot learning, ...
research
09/07/2023

Cross-Image Context Matters for Bongard Problems

Current machine learning methods struggle to solve Bongard problems, whi...
research
03/27/2022

Discovering Human-Object Interaction Concepts via Self-Compositional Learning

A comprehensive understanding of human-object interaction (HOI) requires...

Please sign up or login with your details

Forgot password? Click here to reset