Learning Structured Representations of Visual Scenes

07/09/2022
by   Meng-Jiun Chiou, et al.
0

As the intermediate-level representations bridging the two levels, structured representations of visual scenes, such as visual relationships between pairwise objects, have been shown to not only benefit compositional models in learning to reason along with the structures but provide higher interpretability for model decisions. Nevertheless, these representations receive much less attention than traditional recognition tasks, leaving numerous open challenges unsolved. In the thesis, we study how machines can describe the content of the individual image or video with visual relationships as the structured representations. Specifically, we explore how structured representations of visual scenes can be effectively constructed and learned in both the static-image and video settings, with improvements resulting from external knowledge incorporation, bias-reducing mechanism, and enhanced representation models. At the end of this thesis, we also discuss some open challenges and limitations to shed light on future directions of structured representation learning for visual scenes.

READ FULL TEXT

page 18

page 23

page 29

page 36

page 40

research
02/15/2022

Compositional Scene Representation Learning via Reconstruction: A Survey

Visual scene representation learning is an important research problem in...
research
05/14/2018

Deep Attentional Structured Representation Learning for Visual Recognition

Structured representations, such as Bags of Words, VLAD and Fisher Vecto...
research
12/23/2015

Mid-level Representation for Visual Recognition

Visual Recognition is one of the fundamental challenges in AI, where the...
research
07/23/2021

Constellation: Learning relational abstractions over objects for compositional imagination

Learning structured representations of visual scenes is currently a majo...
research
10/08/2021

ModeRNN: Harnessing Spatiotemporal Mode Collapse in Unsupervised Predictive Learning

Learning predictive models for unlabeled spatiotemporal data is challeng...
research
06/27/2017

A Pig, an Angel and a Cactus Walk Into a Blender: A Descriptive Approach to Visual Blending

A descriptive approach for automatic generation of visual blends is pres...
research
12/19/2022

Diffusing Surrogate Dreams of Video Scenes to Predict Video Memorability

As part of the MediaEval 2022 Predicting Video Memorability task we expl...

Please sign up or login with your details

Forgot password? Click here to reset