Tracking as Online Decision-Making: Learning a Policy from Streaming Videos with Reinforcement Learning

We formulate tracking as an online decision-making process, where a tracking agent must follow an object despite ambiguous image frames and a limited computational budget. Crucially, the agent must decide where to look in the upcoming frames, when to reinitialize because it believes the target has been lost, and when to update its appearance model for the tracked object. Such decisions are typically made heuristically. Instead, we propose to learn an optimal decision-making policy by formulating tracking as a partially observable decision-making process (POMDP). We learn policies with deep reinforcement learning algorithms that need supervision (a reward signal) only when the track has gone awry. We demonstrate that sparse rewards allow us to quickly train on massive datasets, several orders of magnitude more than past work. Interestingly, by treating the data source of Internet videos as unlimited streams, we both learn and evaluate our trackers in a single, unified computational stream.

READ FULL TEXT

page 1

page 3

page 7

page 8

page 9

research
08/09/2017

Learning Policies for Adaptive Tracking with Deep Feature Cascades

Visual object tracking is a fundamental and time-critical vision task. R...
research
01/31/2017

Deep Reinforcement Learning for Visual Object Tracking in Videos

In this paper we introduce a fully end-to-end approach for visual tracki...
research
09/18/2019

Visual Tracking by means of Deep Reinforcement Learning and an Expert Demonstrator

In the last decade many different algorithms have been proposed to track...
research
06/04/2018

Playing Atari with Six Neurons

Deep reinforcement learning on Atari games maps pixel directly to action...
research
10/30/2014

Efficient Decision-Making by Volume-Conserving Physical Object

We demonstrate that any physical object, as long as its volume is conser...
research
05/19/2022

Beyond Greedy Search: Tracking by Multi-Agent Reinforcement Learning-based Beam Search

Existing trackers usually select a location or proposal with the maximum...
research
11/22/2022

β-Multivariational Autoencoder for Entangled Representation Learning in Video Frames

It is crucial to choose actions from an appropriate distribution while l...

Please sign up or login with your details

Forgot password? Click here to reset