Markov Decision Process

What is the Markov Decision Process?

The Markov decision process is a model of predicting outcomes. Like a Markov chain, the model attempts to predict an outcome given only information provided by the current state. However, the Markov decision process incorporates the characteristics of actions and motivations. At each step during the process, the decision maker may choose to take an action available in the current state, resulting in the model moving to the next step and offering the decision maker a reward.

Image provided by Quora.com

Applying the Markov Decision Process to Machine Learning

A machine learning algorithm may be tasked with an optimization problem. Using reinforcement learning, the algorithm will attempt to optimize the actions taken within an environment, in order to maximize the potential reward. Where supervised learning techniques require correct input/output pairs to create a model, reinforcement learning uses Markov decision processes to determine an optimal balance of exploration and exploitation. Machine learning may use reinforcement learning by way of the Markov decision process when the probabilities and rewards of an outcome are unspecified or unknown.