GLDQN: Explicitly Parameterized Quantile Reinforcement Learning for Waste Reduction

05/30/2022
by   Sami Jullien, et al.
3

We study the problem of restocking a grocery store's inventory with perishable items over time, from a distributional point of view. The objective is to maximize sales while minimizing waste, with uncertainty about the actual consumption by costumers. This problem is of a high relevance today, given the growing demand for food and the impact of food waste on the environment, the economy, and purchasing power. We frame inventory restocking as a new reinforcement learning task that exhibits stochastic behavior conditioned on the agent's actions, making the environment partially observable. We introduce a new reinforcement learning environment based on real grocery store data and expert knowledge. This environment is highly stochastic, and presents a unique challenge for reinforcement learning practitioners. We show that uncertainty about the future behavior of the environment is not handled well by classical supply chain algorithms, and that distributional approaches are a good way to account for the uncertainty. We also present GLDQN, a new distributional reinforcement learning algorithm that learns a generalized lambda distribution over the reward space. We show that GLDQN outperforms other distributional reinforcement learning approaches in our partially observable environments, in both overall reward and generated waste.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2017

Distributional Reinforcement Learning with Quantile Regression

In reinforcement learning an agent interacts with the environment by tak...
research
06/22/2019

A neurally plausible model learns successor representations in partially observable environments

Animals need to devise strategies to maximize returns while interacting ...
research
03/26/2023

Robotic Packaging Optimization with Reinforcement Learning

Intelligent manufacturing is becoming increasingly important due to the ...
research
02/11/2023

Distributional GFlowNets with Quantile Flows

Generative Flow Networks (GFlowNets) are a new family of probabilistic s...
research
03/16/2020

Value Variance Minimization for Learning Approximate Equilibrium in Aggregation Systems

For effective matching of resources (e.g., taxis, food, bikes, shopping ...
research
06/27/2012

Apprenticeship Learning for Model Parameters of Partially Observable Environments

We consider apprenticeship learning, i.e., having an agent learn a task ...
research
04/20/2022

Joint Learning of Reward Machines and Policies in Environments with Partially Known Semantics

We study the problem of reinforcement learning for a task encoded by a r...

Please sign up or login with your details

Forgot password? Click here to reset