Ungeneralizable Contextual Logistic Bandit in Credit Scoring

12/15/2022
by   Pojtanut Manopanjasiri, et al.
0

The application of reinforcement learning in credit scoring has created a unique setting for contextual logistic bandit that does not conform to the usual exploration-exploitation tradeoff but rather favors exploration-free algorithms. Through sufficient randomness in a pool of observable contexts, the reinforcement learning agent can simultaneously exploit an action with the highest reward while still learning more about the structure governing that environment. Thus, it is the case that greedy algorithms consistently outperform algorithms with efficient exploration, such as Thompson sampling. However, in a more pragmatic scenario in credit scoring, lenders can, to a degree, classify each borrower as a separate group, and learning about the characteristics of each group does not infer any information to another group. Through extensive simulations, we show that Thompson sampling dominates over greedy algorithms given enough timesteps which increase with the complexity of underlying features.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/28/2017

Exploiting the Natural Exploration In Contextual Bandits

The contextual bandit literature has traditionally focused on algorithms...
research
02/12/2018

Practical Evaluation and Optimization of Contextual Bandit Algorithms

We study and empirically optimize contextual bandit learning, exploratio...
research
09/16/2016

Exploration Potential

We introduce exploration potential, a quantity that measures how much a ...
research
09/10/2017

Variational inference for the multi-armed contextual bandit

In many biomedical, science, and engineering problems, one must sequenti...
research
08/04/2019

Dueling Posterior Sampling for Preference-Based Reinforcement Learning

In preference-based reinforcement learning (RL), an agent interacts with...
research
05/21/2015

Regulating Greed Over Time

In retail, there are predictable yet dramatic time-dependent patterns in...
research
10/06/2021

Residual Overfit Method of Exploration

Exploration is a crucial aspect of bandit and reinforcement learning alg...

Please sign up or login with your details

Forgot password? Click here to reset