Curiosity-driven Reinforcement Learning for Diverse Visual Paragraph Generation

08/01/2019
by   Yadan Luo, et al.
0

Visual paragraph generation aims to automatically describe a given image from different perspectives and organize sentences in a coherent way. In this paper, we address three critical challenges for this task in a reinforcement learning setting: the mode collapse, the delayed feedback, and the time-consuming warm-up for policy networks. Generally, we propose a novel Curiosity-driven Reinforcement Learning (CRL) framework to jointly enhance the diversity and accuracy of the generated paragraphs. First, by modeling the paragraph captioning as a long-term decision-making process and measuring the prediction uncertainty of state transitions as intrinsic rewards, the model is incentivized to memorize precise but rarely spotted descriptions to context, rather than being biased towards frequent fragments and generic patterns. Second, since the extrinsic reward from evaluation is only available until the complete paragraph is generated, we estimate its expected value at each time step with temporal-difference learning, by considering the correlations between successive actions. Then the estimated extrinsic rewards are complemented by dense intrinsic rewards produced from the derived curiosity module, in order to encourage the policy to fully explore action space and find a global optimum. Third, discounted imitation learning is integrated for learning from human demonstrations, without separately performing the time-consuming warm-up in advance. Extensive experiments conducted on the Standford image-paragraph dataset demonstrate the effectiveness and efficiency of the proposed method, improving the performance by 38.4

READ FULL TEXT
research
09/17/2018

Automata Guided Reinforcement Learning With Demonstrations

Tasks with complex temporal structures and long horizons pose a challeng...
research
10/07/2022

Generative Augmented Flow Networks

The Generative Flow Network is a probabilistic framework where an agent ...
research
06/23/2023

CLUE: Calibrated Latent Guidance for Offline Reinforcement Learning

Offline reinforcement learning (RL) aims to learn an optimal policy from...
research
02/02/2023

Visual Imitation Learning with Patch Rewards

Visual imitation learning enables reinforcement learning agents to learn...
research
05/25/2018

Visceral Machines: Reinforcement Learning with Intrinsic Rewards that Mimic the Human Nervous System

The human autonomic nervous system has evolved over millions of years an...
research
06/22/2023

Learning from Visual Observation via Offline Pretrained State-to-Go Transformer

Learning from visual observation (LfVO), aiming at recovering policies f...
research
10/12/2019

Curiosity-Driven Recommendation Strategy for Adaptive Learning via Deep Reinforcement Learning

The design of recommendations strategies in the adaptive learning system...

Please sign up or login with your details

Forgot password? Click here to reset