Trajectory Balance: Improved Credit Assignment in GFlowNets

01/31/2022
by   Nikolay Malkin, et al.
1

Generative Flow Networks (GFlowNets) are a method for learning a stochastic policy for generating compositional objects, such as graphs or strings, from a given unnormalized density by sequences of actions, where many possible action sequences may lead to the same object. Prior temporal difference-like learning objectives for training GFlowNets, such as flow matching and detailed balance, are prone to inefficient credit propagation across action sequences, particularly in the case of long sequences. We propose a new learning objective for GFlowNets, trajectory balance, as a more efficient alternative to previously used objectives. We prove that any global minimizer of the trajectory balance objective can define a policy that samples exactly from the target distribution. In experiments on four distinct domains, we empirically demonstrate the benefits of the trajectory balance objective for GFlowNet convergence, diversity of generated samples, and robustness to long action sequences and large action spaces.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/11/2023

Towards Understanding and Improving GFlowNet Training

Generative flow networks (GFlowNets) are a family of algorithms that lea...
research
09/26/2022

Learning GFlowNets from partial episodes for improved convergence and stability

Generative flow networks (GFlowNets) are a family of algorithms for trai...
research
10/14/2022

A Variational Perspective on Generative Flow Networks

Generative flow networks (GFNs) are a class of models for sequential sam...
research
06/08/2021

Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation

This paper is about the problem of learning a stochastic policy for gene...
research
02/03/2023

Better Training of GFlowNets with Local Credit and Incomplete Trajectories

Generative Flow Networks or GFlowNets are related to Monte-Carlo Markov ...
research
06/28/2021

Modularity in Reinforcement Learning via Algorithmic Independence in Credit Assignment

Many transfer problems require re-using previously optimal decisions for...
research
09/18/2023

Generalizing Trajectory Retiming to Quadratic Objective Functions

Trajectory retiming is the task of computing a feasible time parameteriz...

Please sign up or login with your details

Forgot password? Click here to reset