Action-Constrained Reinforcement Learning for Frame-Level Bit Allocation in HEVC/H.265 through Frank-Wolfe Policy Optimization

03/10/2022
by   Yung-Han Ho, et al.
0

This paper presents a reinforcement learning (RL) framework that leverages Frank-Wolfe policy optimization to address frame-level bit allocation for HEVC/H.265. Most previous RL-based approaches adopt the single-critic design, which weights the rewards for distortion minimization and rate regularization by an empirically chosen hyper-parameter. More recently, the dual-critic design is proposed to update the actor network by alternating the rate and distortion critics. However, the convergence of training is not guaranteed. To address this issue, we introduce Neural Frank-Wolfe Policy Optimization (NFWPO) in formulating the frame-level bit allocation as an action-constrained RL problem. In this new framework, the rate critic serves to specify a feasible action set, and the distortion critic updates the actor network towards maximizing the reconstruction quality while conforming to the action constraint. Experimental results show that when trained to optimize the video multi-method assessment fusion (VMAF) metric, our NFWPO-based model outperforms both the single-critic and the dual-critic methods. It also demonstrates comparable rate-distortion performance to the 2-pass average bit rate control of x265.

READ FULL TEXT
research
09/27/2022

Neural Frank-Wolfe Policy Optimization for Region-of-Interest Intra-Frame Coding with HEVC/H.265

This paper presents a reinforcement learning (RL) framework that utilize...
research
04/05/2021

A Dual-Critic Reinforcement Learning Framework for Frame-level Bit Allocation in HEVC/H.265

This paper introduces a dual-critic reinforcement learning (RL) framewor...
research
02/28/2017

Bridging the Gap Between Value and Policy Based Reinforcement Learning

We establish a new connection between value and policy based reinforceme...
research
10/31/2021

An Actor-Critic Method for Simulation-Based Optimization

We focus on a simulation-based optimization problem of choosing the best...
research
12/03/1998

Training Reinforcement Neurocontrollers Using the Polytope Algorithm

A new training algorithm is presented for delayed reinforcement learning...
research
07/14/2018

Convex Optimization Based Bit Allocation for Light Field Compression under Weighting and Consistency Constraints

Compared with conventional image and video, light field images introduce...
research
05/30/2022

Stock Trading Optimization through Model-based Reinforcement Learning with Resistance Support Relative Strength

Reinforcement learning (RL) is gaining attention by more and more resear...

Please sign up or login with your details

Forgot password? Click here to reset