Belief Tree Search for Active Object Recognition

08/13/2017
by   Mohsen Malmir, et al.
0

Active Object Recognition (AOR) has been approached as an unsupervised learning problem, in which optimal trajectories for object inspection are not known and are to be discovered by reducing label uncertainty measures or training with reinforcement learning. Such approaches have no guarantees of the quality of their solution. In this paper, we treat AOR as a Partially Observable Markov Decision Process (POMDP) and find near-optimal policies on training data using Belief Tree Search (BTS) on the corresponding belief Markov Decision Process (MDP). AOR then reduces to the problem of knowledge transfer from near-optimal policies on training set to the test set. We train a Long Short Term Memory (LSTM) network to predict the best next action on the training set rollouts. We sho that the proposed AOR method generalizes well to novel views of familiar objects and also to novel objects. We compare this supervised scheme against guided policy search, and find that the LSTM network reaches higher recognition accuracy compared to the guided policy method. We further look into optimizing the observation function to increase the total collected reward of optimal policy. In AOR, the observation function is known only approximately. We propose a gradient-based method update to this approximate observation function to increase the total reward of any policy. We show that by optimizing the observation function and retraining the supervised LSTM network, the AOR performance on the test set improves significantly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2022

Optimal policies for Bayesian olfactory search in turbulent flows

In many practical scenarios, a flying insect must search for the source ...
research
12/17/2015

Deep Active Object Recognition by Joint Label and Action Prediction

An active object recognition system has the advantage of being able to a...
research
02/24/2021

Memory-based Deep Reinforcement Learning for POMDP

A promising characteristic of Deep Reinforcement Learning (DRL) is its c...
research
05/29/2018

The Actor Search Tree Critic (ASTC) for Off-Policy POMDP Learning in Medical Decision Making

Off-policy reinforcement learning enables near-optimal policy from subop...
research
10/31/2020

Pseudo Random Number Generation through Reinforcement Learning and Recurrent Neural Networks

A Pseudo-Random Number Generator (PRNG) is any algorithm generating a se...
research
10/02/2020

Reinforcement Learning of Simple Indirect Mechanisms

We introduce the use of reinforcement learning for indirect mechanisms, ...
research
04/11/2020

Optimal Learning for Sequential Decisions in Laboratory Experimentation

The process of discovery in the physical, biological and medical science...

Please sign up or login with your details

Forgot password? Click here to reset