What is Thompson Sampling?
Thompson sampling is a heuristic learning algorithm that chooses an action which maximizes the expected reward for a randomly assigned belief. The problem this sampling addresses is also called the exploration-exploitation dilemma.
- Consuming resources and time to exploit what is already known to maximize immediate performance.
- Investing resources and time to accumulate new information that might improve future performance.
To maximize the probability of achieving the maximum reward from either course of action, like so: