
Bellmanconsistent Pessimism for Offline Reinforcement Learning
The use of pessimism, when reasoning about datasets lacking exhaustive e...
read it

Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation
Policy optimization methods are popular reinforcement learning algorithm...
read it

Provably Correct Optimization and Exploration with Nonlinear Policies
Policy optimization methods remain a powerful workhorse in empirical Rei...
read it

Towards a DimensionFree Understanding of Adaptive Linear Control
We study the problem of adaptive control of the linear quadratic regulat...
read it

Modelfree Representation Learning and Exploration in Lowrank MDPs
The low rank MDP has emerged as an important model for studying represen...
read it

PCPG: Policy Cover Directed Exploration for Provable Policy Gradient Learning
Direct policy gradient methods for reinforcement learning are a successf...
read it

Provably Good Batch Reinforcement Learning Without Great Exploration
Batch reinforcement learning (RL) is important to apply RL algorithms to...
read it

Policy Improvement from Multiple Experts
Despite its promise, reinforcement learning's realworld adoption has be...
read it

Optimizing Interactive Systems via DataDriven Objectives
Effective optimization is essential for realworld interactive systems t...
read it

FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs
In order to deal with the curse of dimensionality in reinforcement learn...
read it

Reparameterized Variational Divergence Minimization for Stable Imitation
While recent stateoftheart results for adversarial imitationlearning...
read it

Federated Residual Learning
We study a new form of federated learning where the clients train person...
read it

Taking a hint: How to leverage loss predictors in contextual bandits?
We initiate the study of learning in contextual bandits with the help of...
read it

Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes
Policy gradient methods are among the most effective methods in challeng...
read it

Bias Correction of Learned Generative Models using LikelihoodFree Importance Weighting
A learned generative model often produces biased statistics relative to ...
read it

On the Optimality of Sparse ModelBased Planning for Markov Decision Processes
This work considers the sample complexity of obtaining an ϵoptimal poli...
read it

Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds
We design a new algorithm for batch active learning with deep neural net...
read it

Fair Regression: Quantitative Definitions and Reductionbased Algorithms
In this paper, we study the prediction of a realvalued target, such as ...
read it

Metareasoning in Modular Software Systems: OntheFly Configuration using Reinforcement Learning with Rich Contextual Representations
Assemblies of modular subsystems are being pressed into service to perfo...
read it

OffPolicy Policy Gradient with State Distribution Correction
We study the problem of offpolicy policy optimization in Markov decisio...
read it

Provably efficient RL with Rich Observations via Latent State Decoding
We study the exploration problem in episodic MDPs with rich observations...
read it

Warmstarting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback
We investigate the feasibility of learning from both fullylabeled super...
read it

ModelBased Reinforcement Learning in Contextual Decision Processes
We study the sample complexity of modelbased reinforcement learning in ...
read it

A Reductions Approach to Fair Classification
We present a systematic approach for achieving fairness in a binary clas...
read it

Practical Contextual Bandits with Regression Oracles
A major challenge in contextual bandits is to design generalpurpose alg...
read it

On Polynomial Time PAC Reinforcement Learning with Rich Observations
We study the computational tractability of provably sampleefficient (PA...
read it

Hierarchical Imitation and Reinforcement Learning
We study the problem of learning policies over long time horizons. We pr...
read it

Practical Evaluation and Optimization of Contextual Bandit Algorithms
We study and empirically optimize contextual bandit learning, exploratio...
read it

Efficient Contextual Bandits in Nonstationary Worlds
Most contextual bandit algorithms minimize regret to the best fixed poli...
read it

Active Learning for CostSensitive Classification
We design an active learning algorithm for costsensitive multiclass cla...
read it

Corralling a Band of Bandit Algorithms
We study the problem of combining multiple bandit algorithms (that is, o...
read it

Contextual Decision Processes with Low Bellman Rank are PACLearnable
This paper studies systematic exploration for reinforcement learning wit...
read it

Offpolicy evaluation for slate recommendation
This paper studies the evaluation of policies that recommend an ordered ...
read it

Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains
Highdimensional observations and complex realworld dynamics present ma...
read it

PAC Reinforcement Learning with Rich Observations
We propose and study a new model for reinforcement learning with rich ob...
read it

Efficient and Parsimonious Agnostic Active Learning
We develop a new active learning algorithm for the streaming setting sat...
read it

Contextual Semibandits via Supervised Learning Oracles
We study an online decision making problem where on each round a learner...
read it

Learning to Search Better Than Your Teacher
Methods for learning to search for structured prediction typically imita...
read it

A Lower Bound for the Optimization of Finite Sums
This paper presents a lower bound for optimizing a finite sum of n funct...
read it

Scalable Nonlinear Learning with Adaptive Polynomial Expansions
Can we effectively learn a nonlinear representation in time comparable t...
read it

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
We present a new algorithm for the contextual bandit learning problem, w...
read it

Least Squares Revisited: Scalable Approaches for Multiclass Prediction
This work provides simple algorithms for multiclass (and multilabel) p...
read it

A Clustering Approach to Learn SparselyUsed Overcomplete Dictionaries
We consider the problem of learning overcomplete dictionaries in the con...
read it

Oracle inequalities for computationally adaptive model selection
We analyze general model selection procedures using penalized empirical ...
read it

Stochastic optimization and sparse statistical recovery: An optimal algorithm for high dimensions
We develop and analyze stochastic optimization algorithms for problems i...
read it

A Reliable Effective Terascale Linear Learning System
We present a system and a set of techniques for learning linear predicto...
read it

The Generalization Ability of Online Algorithms for Dependent Data
We study the generalization performance of online learning algorithms tr...
read it

Ergodic Mirror Descent
We generalize stochastic subgradient descent methods to situations in wh...
read it

Distributed Delayed Stochastic Optimization
We analyze the convergence of gradientbased optimization algorithms tha...
read it

Fast global convergence of gradient methods for highdimensional statistical recovery
Many statistical Mestimators are based on convex optimization problems ...
read it
Alekh Agarwal
is this you? claim profile
Researcher in the New York lab of Microsoft Research at Microsoft, Postdoctoral Researcher at Microsoft from 20122014, PhD in Computer Science from UC Berkeley