Re-evaluating evaluation

06/07/2018
by   David Balduzzi, et al.
0

Progress in machine learning is measured by careful evaluation on problems of outstanding common interest. However, the proliferation of benchmark suites and environments, adversarial attacks, and other complications has diluted the basic evaluation model by overwhelming researchers with choices. Deliberate or accidental cherry picking is increasingly likely, and designing well-balanced evaluation suites requires increasing effort. In this paper we take a step back and propose Nash averaging. The approach builds on a detailed analysis of the algebraic structure of evaluation in two basic scenarios: agent-vs-agent and agent-vs-task. The key strength of Nash averaging is that it automatically adapts to redundancies in evaluation data, so that results are not biased by the incorporation of easy tasks or weak agents. Nash averaging thus encourages maximally inclusive evaluation -- since there is no harm (computational cost aside) from including all available tasks and agents.

READ FULL TEXT
research
07/18/2022

A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari Games

This paper proposes novel, end-to-end deep reinforcement learning algori...
research
09/01/2020

Learning Nash Equilibria in Zero-Sum Stochastic Games via Entropy-Regularized Policy Approximation

We explore the use of policy approximation for reducing the computationa...
research
05/20/2018

Knowledge Aggregation via Epsilon Model Spaces

In many practical applications, machine learning is divided over multipl...
research
05/04/2022

Creating Teams of Simple Agents for Specified Tasks: A Computational Complexity Perspective

Teams of interacting and co-operating agents have been proposed as an ef...
research
03/02/2020

Iterate Averaging Helps: An Alternative Perspective in Deep Learning

Iterate averaging has a rich history in optimisation, but has only very ...
research
12/05/2022

Distributed Stochastic Gradient Descent with Cost-Sensitive and Strategic Agents

This study considers a federated learning setup where cost-sensitive and...
research
02/13/2019

Anytime Tail Averaging

Tail averaging consists in averaging the last examples in a stream. Comm...

Please sign up or login with your details

Forgot password? Click here to reset