PQA: Perceptual Question Answering

04/08/2021
by   Yonggang Qi, et al.
0

Perceptual organization remains one of the very few established theories on the human visual system. It underpinned many pre-deep seminal works on segmentation and detection, yet research has seen a rapid decline since the preferential shift to learning deep models. Of the limited attempts, most aimed at interpreting complex visual scenes using perceptual organizational rules. This has however been proven to be sub-optimal, since models were unable to effectively capture the visual complexity in real-world imagery. In this paper, we rejuvenate the study of perceptual organization, by advocating two positional changes: (i) we examine purposefully generated synthetic data, instead of complex real imagery, and (ii) we ask machines to synthesize novel perceptually-valid patterns, instead of explaining existing data. Our overall answer lies with the introduction of a novel visual challenge – the challenge of perceptual question answering (PQA). Upon observing example perceptual question-answer pairs, the goal for PQA is to solve similar questions by generating answers entirely from scratch (see Figure 1). Our first contribution is therefore the first dataset of perceptual question-answer pairs, each generated specifically for a particular Gestalt principle. We then borrow insights from human psychology to design an agent that casts perceptual organization as a self-attention problem, where a proposed grid-to-grid mapping network directly generates answer patterns from scratch. Experiments show our agent to outperform a selection of naive and strong baselines. A human study however indicates that ours uses astronomically more data to learn when compared to an average human, necessitating future research (with or without our dataset).

READ FULL TEXT

page 3

page 4

page 5

page 7

page 8

research
02/22/2020

Training Question Answering Models From Synthetic Data

Question and answer generation is a data augmentation method that aims t...
research
08/28/2020

A Dataset and Baselines for Visual Question Answering on Art

Answering questions related to art pieces (paintings) is a difficult tas...
research
08/05/2018

Principles of perceptual grouping: implications for image-guided surgery

Gestalt theory has provided perceptual science with a conceptual framewo...
research
10/12/2022

OpenCQA: Open-ended Question Answering with Charts

Charts are very popular to analyze data and convey important insights. P...
research
05/19/2022

Automated Crossword Solving

We present the Berkeley Crossword Solver, a state-of-the-art approach fo...
research
05/09/2016

Ask Your Neurons: A Deep Learning Approach to Visual Question Answering

We address a question answering task on real-world images that is set up...
research
11/23/2018

Explicability? Legibility? Predictability? Transparency? Privacy? Security? The Emerging Landscape of Interpretable Agent Behavior

There has been significant interest of late in generating behavior of ag...

Please sign up or login with your details

Forgot password? Click here to reset