Exploring Temporal Information for Improved Video Understanding

05/25/2019
by   Yi Zhu, et al.
0

In this dissertation, I present my work towards exploring temporal information for better video understanding. Specifically, I have worked on two problems: action recognition and semantic segmentation. For action recognition, I have proposed a framework, termed hidden two-stream networks, to learn an optimal motion representation that does not require the computation of optical flow. My framework alleviates several challenges faced in video classification, such as learning motion representations, real-time inference, multi-framerate handling, generalizability to unseen actions, etc. For semantic segmentation, I have introduced a general framework that uses video prediction models to synthesize new training samples. By scaling up the training dataset, my trained models are more accurate and robust than previous models even without modifications to the network architectures or objective functions. I believe videos have much more potential to be mined, and temporal information is one of the most important cues for machines to perceive the visual world better.

READ FULL TEXT
research
11/29/2017

Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition

Motion representation plays a vital role in human action recognition in ...
research
04/02/2017

Hidden Two-Stream Convolutional Networks for Action Recognition

Analyzing videos of human actions involves understanding the temporal re...
research
07/19/2019

Only Time Can Tell: Discovering Temporal Data for Temporal Modeling

Understanding temporal information and how the visual world changes over...
research
04/14/2021

Adaptive Intermediate Representations for Video Understanding

A common strategy to video understanding is to incorporate spatial and m...
research
11/29/2020

UVid-Net: Enhanced Semantic Segmentation of UAV Aerial Videos by Embedding Temporal Information

Semantic segmentation of aerial videos has been extensively used for dec...
research
05/16/2018

Fast Retinomorphic Event Stream for Video Recognition and Reinforcement Learning

Good temporal representations are crucial for video understanding, and t...
research
06/06/2023

Semantic Segmentation on VSPW Dataset through Contrastive Loss and Multi-dataset Training Approach

Video scene parsing incorporates temporal information, which can enhance...

Please sign up or login with your details

Forgot password? Click here to reset