Learning Risk Preferences in Markov Decision Processes: an Application to the Fourth Down Decision in Football

09/01/2023
by   Nathan Sandholtz, et al.
0

For decades, National Football League (NFL) coaches' observed fourth down decisions have been largely inconsistent with prescriptions based on statistical models. In this paper, we develop a framework to explain this discrepancy using a novel inverse optimization approach. We model the fourth down decision and the subsequent sequence of plays in a game as a Markov decision process (MDP), the dynamics of which we estimate from NFL play-by-play data from the 2014 through 2022 seasons. We assume that coaches' observed decisions are optimal but that the risk preferences governing their decisions are unknown. This yields a novel inverse decision problem for which the optimality criterion, or risk measure, of the MDP is the estimand. Using the quantile function to parameterize risk, we estimate which quantile-optimal policy yields the coaches' observed decisions as minimally suboptimal. In general, we find that coaches' fourth-down behavior is consistent with optimizing low quantiles of the next-state value distribution, which corresponds to conservative risk preferences. We also find that coaches exhibit higher risk tolerances when making decisions in the opponent's half of the field than in their own, and that league average fourth down risk tolerances have increased over the seasons in our data.

READ FULL TEXT

page 3

page 17

page 34

page 36

page 38

research
11/15/2017

Quantile Markov Decision Process

In this paper, we consider the problem of optimizing the quantiles of th...
research
07/04/2019

Markov Decision Processes under Ambiguity

We consider statistical Markov Decision Processes where the decision mak...
research
06/26/2022

Stackelberg Risk Preference Design

Risk measures are commonly used to capture the risk preferences of decis...
research
09/06/2015

Research: Analysis of Transport Model that Approximates Decision Taker's Preferences

Paper provides a method for solving the reverse Monge-Kantorovich transp...
research
10/04/2020

Learning Time Varying Risk Preferences from Investment Portfolios using Inverse Optimization with Applications on Mutual Funds

The fundamental principle in Modern Portfolio Theory (MPT) is based on t...
research
06/10/2022

Conformal Prediction Intervals for Markov Decision Process Trajectories

Before delegating a task to an autonomous system, a human operator may w...
research
10/17/2022

Risk-Sensitive Markov Decision Processes with Long-Run CVaR Criterion

CVaR (Conditional Value at Risk) is a risk metric widely used in finance...

Please sign up or login with your details

Forgot password? Click here to reset