Recurrent Sum-Product-Max Networks for Decision Making in Perfectly-Observed Environments

06/12/2020
by   Hari Teja Tatavarti, et al.
0

Recent investigations into sum-product-max networks (SPMN) that generalize sum-product networks (SPN) offer a data-driven alternative for decision making, which has predominantly relied on handcrafted models. SPMNs computationally represent a probabilistic decision-making problem whose solution scales linearly in the size of the network. However, SPMNs are not well suited for sequential decision making over multiple time steps. In this paper, we present recurrent SPMNs (RSPMN) that learn from and model decision-making data over time. RSPMNs utilize a template network that is unfolded as needed depending on the length of the data sequence. This is significant as RSPMNs not only inherit the benefits of SPMNs in being data driven and mostly tractable, they are also well suited for sequential problems. We establish conditions on the template network, which guarantee that the resulting SPMN is valid, and present a structure learning algorithm to learn a sound template network. We demonstrate that the RSPMNs learned on a testbed of sequential decision-making data sets generate MEUs and policies that are close to the optimal on perfectly-observed domains. They easily improve on a recent batch-constrained reinforcement learning method, which is important because RSPMNs offer a new model-based approach to offline reinforcement learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/28/2019

Meta-Learning surrogate models for sequential decision making

Meta-learning methods leverage past experience to learn data-driven indu...
research
11/13/2015

Dynamic Sum Product Networks for Tractable Inference on Sequence Data (Extended Version)

Sum-Product Networks (SPN) have recently emerged as a new class of tract...
research
10/05/2020

Learning to Generalize for Sequential Decision Making

We consider problems of making sequences of decisions to accomplish task...
research
10/01/2019

Reinforcement Learning for Multi-Objective Optimization of Online Decisions in High-Dimensional Systems

This paper describes a purely data-driven solution to a class of sequent...
research
11/14/2022

Treatment-RSPN: Recurrent Sum-Product Networks for Sequential Treatment Regimes

Sum-product networks (SPNs) have recently emerged as a novel deep learni...
research
06/03/2022

Robotic Planning under Uncertainty in Spatiotemporal Environments in Expeditionary Science

In the expeditionary sciences, spatiotemporally varying environments – h...
research
01/17/2022

Parametrized Convex Universal Approximators for Decision-Making Problems

Parametrized max-affine (PMA) and parametrized log-sum-exp (PLSE) networ...

Please sign up or login with your details

Forgot password? Click here to reset