Sayer: Using Implicit Feedback to Optimize System Policies

10/28/2021
by   Mathias Lecuyer, et al.
0

We observe that many system policies that make threshold decisions involving a resource (e.g., time, memory, cores) naturally reveal additional, or implicit feedback. For example, if a system waits X min for an event to occur, then it automatically learns what would have happened if it waited <X min, because time has a cumulative property. This feedback tells us about alternative decisions, and can be used to improve the system policy. However, leveraging implicit feedback is difficult because it tends to be one-sided or incomplete, and may depend on the outcome of the event. As a result, existing practices for using feedback, such as simply incorporating it into a data-driven model, suffer from bias. We develop a methodology, called Sayer, that leverages implicit feedback to evaluate and train new system policies. Sayer builds on two ideas from reinforcement learning – randomized exploration and unbiased counterfactual estimators – to leverage data collected by an existing policy to estimate the performance of new candidate policies, without actually deploying those policies. Sayer uses implicit exploration and implicit data augmentation to generate implicit feedback in an unbiased form, which is then used by an implicit counterfactual estimator to evaluate and train new policies. The key idea underlying these techniques is to assign implicit probabilities to decisions that are not actually taken but whose feedback can be inferred; these probabilities are carefully calculated to ensure statistical unbiasedness. We apply Sayer to two production scenarios in Azure, and show that it can evaluate arbitrary policies accurately, and train new policies that outperform the production policies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/01/2022

Implicit Feedback for Dense Passage Retrieval: A Counterfactual Approach

In this paper we study how to effectively exploit implicit feedback in D...
research
11/19/2019

Fair Learning-to-Rank from Implicit Feedback

Addressing unfairness in rankings has become an increasingly important p...
research
02/03/2022

Variance-Optimal Augmentation Logging for Counterfactual Evaluation in Contextual Bandits

Methods for offline A/B testing and counterfactual learning are seeing r...
research
11/29/2020

Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior Policies

Off-policy evaluation is a key component of reinforcement learning which...
research
02/06/2023

Improved Policy Evaluation for Randomized Trials of Algorithmic Resource Allocation

We consider the task of evaluating policies of algorithmic resource allo...
research
06/07/2018

Unbiased Estimation of the Value of an Optimized Policy

Randomized trials, also known as A/B tests, are used to select between t...

Please sign up or login with your details

Forgot password? Click here to reset