A study of first-passage time minimization via Q-learning in heated gridworlds

10/05/2021
by   M. A. Larchenko, et al.
0

Optimization of first-passage times is required in applications ranging from nanobots navigation to market trading. In such settings, one often encounters unevenly distributed noise levels across the environment. We extensively study how a learning agent fares in 1- and 2- dimensional heated gridworlds with an uneven temperature distribution. The results show certain bias effects in agents trained via simple tabular Q-learning, SARSA, Expected SARSA and Double Q-learning. While high learning rate prevents exploration of regions with higher temperature, low enough rate increases the presence of agents in such regions. The discovered peculiarities and biases of temporal-difference-based reinforcement learning methods should be taken into account in real-world physical applications and agent design.

READ FULL TEXT

page 4

page 5

page 6

page 12

research
09/01/2022

Intrinsic fluctuations of reinforcement learning promote cooperation

In this work, we ask for and answer what makes classical reinforcement l...
research
06/28/2022

Applications of Reinforcement Learning in Finance – Trading with a Double Deep Q-Network

This paper presents a Double Deep Q-Network algorithm for trading single...
research
06/29/2020

Using Reinforcement Learning to Herd a Robotic Swarm to a Target Distribution

In this paper, we present a reinforcement learning approach to designing...
research
11/22/2021

Optimistic Temporal Difference Learning for 2048

Temporal difference (TD) learning and its variants, such as multistage T...
research
09/03/2018

Flatland: a Lightweight First-Person 2-D Environment for Reinforcement Learning

We propose Flatland, a simple, lightweight environment for fast prototyp...
research
03/01/2023

A Deep Reinforcement Learning Trader without Offline Training

In this paper we pursue the question of a fully online trading algorithm...
research
02/25/2022

Learning to Liquidate Forex: Optimal Stopping via Adaptive Top-K Regression

We consider learning a trading agent acting on behalf of the treasury of...

Please sign up or login with your details

Forgot password? Click here to reset