Predicting Periodicity with Temporal Difference Learning

09/20/2018
by   Kristopher De Asis, et al.
0

Temporal difference (TD) learning is an important approach in reinforcement learning, as it combines ideas from dynamic programming and Monte Carlo methods in a way that allows for online and incremental model-free learning. A key idea of TD learning is that it is learning predictive knowledge about the environment in the form of value functions, from which it can derive its behavior to address long-term sequential decision making problems. The agent's horizon of interest, that is, how immediate or long-term a TD learning agent predicts into the future, is adjusted through a discount rate parameter. In this paper, we introduce an alternative view on the discount rate, with insight from digital signal processing, to include complex-valued discounting. Our results show that setting the discount rate to appropriately chosen complex numbers allows for online and incremental estimation of the Discrete Fourier Transform (DFT) of a signal of interest with TD learning. We thereby extend the types of knowledge representable by value functions, which we show are particularly useful for identifying periodic effects in the reward sequence.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2020

γ-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction

We introduce the γ-model, a predictive model of environment dynamics wit...
research
04/05/2022

Learning to Bid Long-Term: Multi-Agent Reinforcement Learning with Long-Term and Sparse Reward in Repeated Auction Games

We propose a multi-agent distributed reinforcement learning algorithm th...
research
05/23/2023

Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning

Auction-based recommender systems are prevalent in online advertising pl...
research
01/29/2022

Bellman Meets Hawkes: Model-Based Reinforcement Learning via Temporal Point Processes

We consider a sequential decision making problem where the agent faces t...
research
06/10/2020

Model-Free Algorithm and Regret Analysis for MDPs with Long-Term Constraints

In the optimization of dynamical systems, the variables typically have c...
research
08/15/2019

Examining the Use of Temporal-Difference Incremental Delta-Bar-Delta for Real-World Predictive Knowledge Architectures

Predictions and predictive knowledge have seen recent success in improvi...
research
06/14/2023

FTIO: Detecting I/O Periodicity Using Frequency Techniques

Characterizing the temporal I/O behavior of an HPC application is a chal...

Please sign up or login with your details

Forgot password? Click here to reset