Optimal policies for Bayesian olfactory search in turbulent flows

07/09/2022
by   Robin A. Heinonen, et al.
0

In many practical scenarios, a flying insect must search for the source of an emitted cue which is advected by the atmospheric wind. On the macroscopic scales of interest, turbulence tends to mix the cue into patches of relatively high concentration over a background of very low concentration, so that the insect will only detect the cue intermittently and cannot rely on chemotactic strategies which simply climb the concentration gradient. In this work, we cast this search problem in the language of a partially observable Markov decision process (POMDP) and use the Perseus algorithm to compute strategies that are near-optimal with respect to the arrival time. We test the computed strategies on a large two-dimensional grid, present the resulting trajectories and arrival time statistics, and compare these to the corresponding results for several heuristic strategies, including (space-aware) infotaxis, Thompson sampling, and QMDP. We find that the near-optimal policy found by our implementation of Perseus outperforms all heuristics we test by several measures. We use the near-optimal policy to study how the search difficulty depends on the starting location. We discuss additionally the choice of initial belief and the robustness of the policies to changes in the environment. Finally, we present a detailed and pedagogical discussion about the implementation of the Perseus algorithm, including the benefits – and pitfalls – of employing a reward shaping function.

READ FULL TEXT
research
08/13/2017

Belief Tree Search for Active Object Recognition

Active Object Recognition (AOR) has been approached as an unsupervised l...
research
10/07/2021

Reinforcement Learning in Reward-Mixing MDPs

Learning a near optimal policy in a partially observable system remains ...
research
09/12/2016

DESPOT: Online POMDP Planning with Regularization

The partially observable Markov decision process (POMDP) provides a prin...
research
02/20/2018

Deep Reinforcement Learning for Dynamic Multichannel Access in Wireless Networks

We consider a dynamic multichannel access problem, where multiple correl...
research
06/30/2020

Delayed Q-update: A novel credit assignment technique for deriving an optimal operation policy for the Grid-Connected Microgrid

A microgrid is an innovative system that integrates distributed energy r...
research
05/30/2018

Optimal Testing in the Experiment-rich Regime

Motivated by the widespread adoption of large-scale A/B testing in indus...
research
12/31/2020

Multiple Plans are Better than One: Diverse Stochastic Planning

In planning problems, it is often challenging to fully model the desired...

Please sign up or login with your details

Forgot password? Click here to reset