PPO-ABR: Proximal Policy Optimization based Deep Reinforcement Learning for Adaptive BitRate streaming

by   Mandan Naresh, et al.

Providing a high Quality of Experience (QoE) for video streaming in 5G and beyond 5G (B5G) networks is challenging due to the dynamic nature of the underlying network conditions. Several Adaptive Bit Rate (ABR) algorithms have been developed to improve QoE, but most of them are designed based on fixed rules and unsuitable for a wide range of network conditions. Recently, Deep Reinforcement Learning (DRL) based Asynchronous Advantage Actor-Critic (A3C) methods have recently demonstrated promise in their ability to generalise to diverse network conditions, but they still have limitations. One specific issue with A3C methods is the lag between each actor's behavior policy and central learner's target policy. Consequently, suboptimal updates emerge when the behavior and target policies become out of synchronization. In this paper, we address the problems faced by vanilla-A3C by integrating the on-policy-based multi-agent DRL method into the existing video streaming framework. Specifically, we propose a novel system for ABR generation - Proximal Policy Optimization-based DRL for Adaptive Bit Rate streaming (PPO-ABR). Our proposed method improves the overall video QoE by maximizing sample efficiency using a clipped probability ratio between the new and the old policies on multiple epochs of minibatch updates. The experiments on real network traces demonstrate that PPO-ABR outperforms state-of-the-art methods for different QoE variants.


Deep Reinforcement Learning with Importance Weighted A3C for QoE enhancement in Video Delivery Services

Adaptive bitrate (ABR) algorithms are used to adapt the video bitrate ba...

Federated Deep Reinforcement Learning-based Bitrate Adaptation for Dynamic Adaptive Streaming over HTTP

In video streaming over HTTP, the bitrate adaptation selects the quality...

Probabilistic Mixture-of-Experts for Efficient Deep Reinforcement Learning

Deep reinforcement learning (DRL) has successfully solved various proble...

Prioritized Trace Selection: Towards High-Performance DRL-based Network Controllers

Deep Reinforcement Learning (DRL) based controllers offer high performan...

Cross Layer Optimization and Distributed Reinforcement Learning Approach for Tile-Based 360 Degree Wireless Video Streaming

Wirelessly streaming high quality 360 degree videos is still a challengi...

Proximal Policy Optimization with Adaptive Threshold for Symmetric Relative Density Ratio

Deep reinforcement learning (DRL) is one of the promising approaches for...

ANT: Learning Accurate Network Throughput for Better Adaptive Video Streaming

Adaptive Bit Rate (ABR) decision plays a crucial role for ensuring satis...

Please sign up or login with your details

Forgot password? Click here to reset