Automatic Deduction Path Learning via Reinforcement Learning with Environmental Correction

06/16/2023
by   Shuai Xiao, et al.
0

Automatic bill payment is an important part of business operations in fintech companies. The practice of deduction was mainly based on the total amount or heuristic search by dividing the bill into smaller parts to deduct as much as possible. This article proposes an end-to-end approach of automatically learning the optimal deduction paths (deduction amount in order), which reduces the cost of manual path design and maximizes the amount of successful deduction. Specifically, in view of the large search space of the paths and the extreme sparsity of historical successful deduction records, we propose a deep hierarchical reinforcement learning approach which abstracts the action into a two-level hierarchical space: an upper agent that determines the number of steps of deductions each day and a lower agent that decides the amount of deduction at each step. In such a way, the action space is structured via prior knowledge and the exploration space is reduced. Moreover, the inherited information incompleteness of the business makes the environment just partially observable. To be precise, the deducted amounts indicate merely the lower bounds of the available account balance. To this end, we formulate the problem as a partially observable Markov decision problem (POMDP) and employ an environment correction algorithm based on the characteristics of the business. In the world's largest electronic payment business, we have verified the effectiveness of this scheme offline and deployed it online to serve millions of users.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/11/2018

Deep Hierarchical Reinforcement Learning Algorithm in Partially Observable Markov Decision Processes

In recent years, reinforcement learning has achieved many remarkable suc...
research
03/14/2023

Act-Then-Measure: Reinforcement Learning for Partially Observable Environments with Active Measuring

We study Markov decision processes (MDPs), where agents have direct cont...
research
06/29/2023

End-to-end Reinforcement Learning for Online Coverage Path Planning in Unknown Environments

Coverage path planning is the problem of finding the shortest path that ...
research
12/09/2020

Interactive Search Based on Deep Reinforcement Learning

With the continuous development of machine learning technology, major e-...
research
08/27/2017

Novel Sensor Scheduling Scheme for Intruder Tracking in Energy Efficient Sensor Networks

We consider the problem of tracking an intruder using a network of wirel...
research
05/25/2023

Bayesian Reinforcement Learning for Automatic Voltage Control under Cyber-Induced Uncertainty

Voltage control is crucial to large-scale power system reliable operatio...
research
03/10/2017

Towards Wi-Fi AP-Assisted Content Prefetching for On-Demand TV Series: A Reinforcement Learning Approach

The emergence of smart Wi-Fi APs (Access Point), which are equipped with...

Please sign up or login with your details

Forgot password? Click here to reset