Tree-Structured Policy based Progressive Reinforcement Learning for Temporally Language Grounding in Video

01/18/2020
by   Jie Wu, et al.
12

Temporally language grounding in untrimmed videos is a newly-raised task in video understanding. Most of the existing methods suffer from inferior efficiency, lacking interpretability, and deviating from the human perception mechanism. Inspired by human's coarse-to-fine decision-making paradigm, we formulate a novel Tree-Structured Policy based Progressive Reinforcement Learning (TSP-PRL) framework to sequentially regulate the temporal boundary by an iterative refinement process. The semantic concepts are explicitly represented as the branches in the policy, which contributes to efficiently decomposing complex policies into an interpretable primitive action. Progressive reinforcement learning provides correct credit assignment via two task-oriented rewards that encourage mutual promotion within the tree-structured policy. We extensively evaluate TSP-PRL on the Charades-STA and ActivityNet datasets, and experimental results show that TSP-PRL achieves competitive performance over existing state-of-the-art methods.

READ FULL TEXT

page 2

page 3

research
01/21/2019

Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos

The task of video grounding, which temporally localizes a natural langua...
research
07/02/2019

Conservative Q-Improvement: Reinforcement Learning for an Interpretable Decision-Tree Policy

There is a growing desire in the field of reinforcement learning (and ma...
research
09/18/2020

Reinforcement Learning for Weakly Supervised Temporal Grounding of Natural Language in Untrimmed Videos

Temporal grounding of natural language in untrimmed videos is a fundamen...
research
10/05/2021

Attaining Interpretability in Reinforcement Learning via Hierarchical Primitive Composition

Deep reinforcement learning has shown its effectiveness in various appli...
research
07/09/2018

Video Summarisation by Classification with Deep Reinforcement Learning

Most existing video summarisation methods are based on either supervised...
research
07/29/2019

Action Grammars: A Cognitive Model for Learning Temporal Abstractions

Hierarchical Reinforcement Learning algorithms have successfully been ap...
research
09/11/2019

Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction

The task of temporally grounding language queries in videos is to tempor...

Please sign up or login with your details

Forgot password? Click here to reset