
ModelFree Learning for TwoPlayer ZeroSum Partially Observable Markov Games with Perfect Recall
We study the problem of learning a Nash equilibrium (NE) in an imperfect...
read it

Taylor Expansion of Discount Factors
In practical reinforcement learning (RL), the discount factor used for e...
read it

Stochastic Shortest Path: Minimax, ParameterFree and Towards HorizonFree Regret
We study the problem of learning in the stochastic shortest path (SSP) s...
read it

Broaden Your Views for SelfSupervised Video Learning
Most successful selfsupervised learning methods are trained to align th...
read it

UCB Momentum Qlearning: Correcting the bias without forgetting
We propose UCBMQ, Upper Confidence Bound Momentum Qlearning, a new algo...
read it

Revisiting Peng's Q(λ) for Modern Reinforcement Learning
Offpolicy multistep reinforcement learning algorithms consist of conse...
read it

Mine Your Own vieW: SelfSupervised Learning Through AcrossSample Prediction
Stateoftheart methods for selfsupervised learning (SSL) build repres...
read it

Bootstrapped Representation Learning on Graphs
Current stateoftheart selfsupervised learning methods for graph neur...
read it

Geometric Entropic Exploration
Exploration is essential for solving complex Reinforcement Learning (RL)...
read it

On the Approximation Relationship between Optimizing Ratio of Submodular (RS) and Difference of Submodular (DS) Functions
We demonstrate that from an algorithm guaranteeing an approximation fact...
read it

Improved Sample Complexity for Incremental Autonomous Exploration in MDPs
We investigate the exploration of an unknown environment when no reward ...
read it

Game Plan: What AI can do for Football, and What Football can do for AI
The rapid progress in artificial intelligence (AI) and machine learning ...
read it

BYOL works even without batch statistics
Bootstrap Your Own Latent (BYOL) is a selfsupervised learning approach ...
read it

Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited
In this paper, we propose new problemindependent lower bounds on the sa...
read it

Fast active learning for pure exploration in reinforcement learning
Realistic environments often provide agents with very limited feedback. ...
read it

MonteCarlo Tree Search as Regularized Policy Optimization
The combination of MonteCarlo tree search (MCTS) with deep reinforcemen...
read it

A Provably Efficient Sample Collection Strategy for Reinforcement Learning
A common assumption in reinforcement learning (RL) is to have access to ...
read it

A KernelBased Approach to NonStationary Reinforcement Learning in Metric Spaces
In this work, we propose KeRNS: an algorithm for episodic reinforcement ...
read it

Gamification of Pure Exploration for Linear Bandits
We investigate an active pureexploration setting, that includes bestar...
read it

Sampling from a kDPP without looking at all items
Determinantal point processes (DPPs) are a useful probabilistic model fo...
read it

Stochastic bandits with armdependent delays
Significant work has been recently dedicated to the stochastic delayed b...
read it

Bootstrap Your Own Latent: A New Approach to SelfSupervised Learning
We introduce Bootstrap Your Own Latent (BYOL), a new approach to selfsu...
read it

Statistical Efficiency of Thompson Sampling for Combinatorial SemiBandits
We investigate stochastic combinatorial multiarmed bandit with semiban...
read it

Adaptive RewardFree Exploration
Rewardfree exploration is a reinforcement learning setting recently stu...
read it

Planning in Markov Decision Processes with GapDependent Sample Complexity
We propose MDPGapE, a new trajectorybased MonteCarlo Tree Search algo...
read it

Regret Bounds for KernelBased Reinforcement Learning
We consider the explorationexploitation dilemma in finitehorizon reinf...
read it

Taylor Expansion Policy Optimization
In this work, we investigate the application of Taylor expansions in rei...
read it

Fast sampling from βensembles
We study sampling algorithms for βensembles with time complexity less t...
read it

Nearlinear Time Gaussian Process Optimization with Adaptive Batching and Resparsification
Gaussian processes (GP) are one of the most successful frameworks to mod...
read it

NoRegret Exploration in GoalOriented Reinforcement Learning
Many popular reinforcement learning problems (e.g., navigation in a maze...
read it

FixedConfidence Guarantees for Bayesian BestArm Identification
We investigate and provide new insights on the sampling rule called Top...
read it

DerivativeFree OrderRobust Optimisation
In this paper, we formalise orderrobust optimisation as an instance of ...
read it

Multiagent Evaluation under Incomplete Information
This paper investigates the evaluation of learned multiagent strategies ...
read it

Exact sampling of determinantal point processes with sublinear time preprocessing
We study the complexity of sampling from a distribution over all index s...
read it

Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret
Gaussian processes (GP) are a popular Bayesian approach for the optimiza...
read it

Exploiting Structure of Uncertainty for Efficient Combinatorial SemiBandits
We improve the efficiency of algorithms for stochastic combinatorial sem...
read it

Optimistic optimization of a Brownian
We address the problem of optimizing a Brownian motion. We consider a (r...
read it

Rotting bandits are no harder than stochastic ones
In bandits, arms' distributions are stationary. This is often violated i...
read it

A simple parameterfree and adaptive approach to optimization under a minimal local smoothness assumption
We study the problem of optimizing a function under a budgeted number of...
read it

Compressing the Input for CNNs with the FirstOrder Scattering Transform
We study the firstorder scattering transform as a candidate for reducin...
read it

DPPy: Sampling Determinantal Point Processes with Python
Determinantal point processes (DPPs) are specific probability distributi...
read it

Finding the Bandit in a Graph: Sequential SearchandStop
We consider the problem where an agent wants to find a hidden object tha...
read it

Distributed Adaptive Sampling for Kernel Matrix Approximation
Most kernelbased methods, such as kernel or Gaussian process regression...
read it

SecondOrder Kernel Online Convex Optimization with Adaptive Sketching
Kernel online convex optimization (KOCO) is a framework combining the ex...
read it

Zonotope hitandrun for efficient sampling from projection DPPs
Determinantal point processes (DPPs) are distributions over sets of item...
read it

Analysis of Kelner and Levin graph sparsification algorithm for a streaming setting
We derive a new proof to show that the incremental resparsification algo...
read it

Online Influence Maximization under Independent Cascade Model with SemiBandit Feedback
We study the stochastic online problem of learning to influence in a soc...
read it

Incremental Spectral Sparsification for LargeScale GraphBased SemiSupervised Learning
While the harmonic function solution performs well in many semisupervis...
read it

Simple regret for infinitely many armed bandits
We consider a stochastic bandit problem with infinitely many arms. In th...
read it

Learning to Act Greedily: Polymatroid SemiBandits
Many important optimization problems, such as the minimum spanning tree ...
read it
Michal Valko
verfied profile
Michal is a machine learning scientist in DeepMind Paris on the leave from SequeL team at Inria Lille  Nord Europe, France, lead by Philippe Preux and Rémi Munos. He also teaches the master course Graphs in Machine Learning at l'ENS ParisSaclay. Michal is primarily interested in designing algorithms that would require as little human supervision as possible. This means 1) reducing the “intelligence” that humans need to input into the system and 2) minimizing the data that humans need to spend inspecting, classifying, or “tuning” the algorithms. Another important feature of machine learning algorithms should be the ability to adapt to changing environments. That is why he is working in domains that are able to deal with minimal feedback, such as online learning, bandit algorithms, semisupervised learning, and anomaly detection. Most recently he has worked on sequential algorithms with structured decisions where exploiting the structure leads to provably faster learning. Structured learning requires more time and space resources and therefore the most recent work of Michal includes efficient approximations such as graph and matrix sketching with learning guarantees. In past, the common thread of Michal's work has been adaptive graphbased learning and its application to realworld applications such as recommender systems, medical error detection, and face recognition. His industrial collaborators include Adobe, Intel, Technicolor, and Microsoft Research. He received his Ph.D. in 2011 from the University of Pittsburgh under the supervision of Miloš Hauskrecht and after was a postdoc of Rémi Munos before taking a permanent position at Inria in 2012.