Learning GFlowNets from partial episodes for improved convergence and stability

09/26/2022
by   Kanika Madan, et al.
11

Generative flow networks (GFlowNets) are a family of algorithms for training a sequential sampler of discrete objects under an unnormalized target density and have been successfully used for various probabilistic modeling tasks. Existing training objectives for GFlowNets are either local to states or transitions, or propagate a reward signal over an entire sampling trajectory. We argue that these alternatives represent opposite ends of a gradient bias-variance tradeoff and propose a way to exploit this tradeoff to mitigate its harmful effects. Inspired by the TD(λ) algorithm in reinforcement learning, we introduce subtrajectory balance or SubTB(λ), a GFlowNet training objective that can learn from partial action subsequences of varying lengths. We show that SubTB(λ) accelerates sampler convergence in previously studied and new environments and enables training GFlowNets in environments with longer action sequences and sparser reward landscapes than what was possible before. We also perform a comparative analysis of stochastic gradient dynamics, shedding light on the bias-variance tradeoff in GFlowNet training and the advantages of subtrajectory balance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2022

Trajectory Balance: Improved Credit Assignment in GFlowNets

Generative Flow Networks (GFlowNets) are a method for learning a stochas...
research
05/11/2023

Towards Understanding and Improving GFlowNet Training

Generative flow networks (GFlowNets) are a family of algorithms that lea...
research
12/17/2019

On the Bias-Variance Tradeoff: Textbooks Need an Update

The main goal of this thesis is to point out that the bias-variance trad...
research
01/10/2013

The Optimal Reward Baseline for Gradient-Based Reinforcement Learning

There exist a number of reinforcement learning algorithms which learnby ...
research
11/24/2019

Merging Deterministic Policy Gradient Estimations with Varied Bias-Variance Tradeoff for Effective Deep Reinforcement Learning

Deep reinforcement learning (DRL) on Markov decision processes (MDPs) wi...
research
01/26/2023

Partial advantage estimator for proximal policy optimization

Estimation of value in policy gradient methods is a fundamental problem....
research
09/22/2017

On overfitting and asymptotic bias in batch reinforcement learning with partial observability

This paper stands in the context of reinforcement learning with partial ...

Please sign up or login with your details

Forgot password? Click here to reset