Risk Aversion In Learning Algorithms and an Application To Recommendation Systems

05/10/2022
by   Andreas Haupt, et al.
0

Consider a bandit learning environment. We demonstrate that popular learning algorithms such as Upper Confidence Band (UCB) and ε-Greedy exhibit risk aversion: when presented with two arms of the same expectation, but different variance, the algorithms tend to not choose the riskier, i.e. higher variance, arm. We prove that ε-Greedy chooses the risky arm with probability tending to 0 when faced with a deterministic and a Rademacher-distributed arm. We show experimentally that UCB also shows risk-averse behavior, and that risk aversion is present persistently in early rounds of learning even if the riskier arm has a slightly higher expectation. We calibrate our model to a recommendation system and show that algorithmic risk aversion can decrease consumer surplus and increase homogeneity. We discuss several extensions to other bandit algorithms, reinforcement learning, and investigate the impacts of algorithmic risk aversion for decision theory.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset