
Policy Improvement from Multiple Experts
Despite its promise, reinforcement learning's realworld adoption has be...
Optimizing Interactive Systems via DataDriven Objectives
Effective optimization is essential for realworld interactive systems t...
FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs
In order to deal with the curse of dimensionality in reinforcement learn...
Reparameterized Variational Divergence Minimization for Stable Imitation
While recent stateoftheart results for adversarial imitationlearning...
Federated Residual Learning
We study a new form of federated learning where the clients train person...
Taking a hint: How to leverage loss predictors in contextual bandits?
We initiate the study of learning in contextual bandits with the help of...
Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes
Policy gradient methods are among the most effective methods in challeng...
Bias Correction of Learned Generative Models using LikelihoodFree Importance Weighting
A learned generative model often produces biased statistics relative to ...
On the Optimality of Sparse ModelBased Planning for Markov Decision Processes
This work considers the sample complexity of obtaining an ϵoptimal poli...
Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds
We design a new algorithm for batch active learning with deep neural net...
Fair Regression: Quantitative Definitions and Reductionbased Algorithms
In this paper, we study the prediction of a realvalued target, such as ...
Metareasoning in Modular Software Systems: OntheFly Configuration using Reinforcement Learning with Rich Contextual Representations
Assemblies of modular subsystems are being pressed into service to perfo...
OffPolicy Policy Gradient with State Distribution Correction
We study the problem of offpolicy policy optimization in Markov decisio...
Provably efficient RL with Rich Observations via Latent State Decoding
We study the exploration problem in episodic MDPs with rich observations...
Warmstarting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback
We investigate the feasibility of learning from both fullylabeled super...
ModelBased Reinforcement Learning in Contextual Decision Processes
We study the sample complexity of modelbased reinforcement learning in ...
A Reductions Approach to Fair Classification
We present a systematic approach for achieving fairness in a binary clas...
Practical Contextual Bandits with Regression Oracles
A major challenge in contextual bandits is to design generalpurpose alg...
On Polynomial Time PAC Reinforcement Learning with Rich Observations
We study the computational tractability of provably sampleefficient (PA...
Hierarchical Imitation and Reinforcement Learning
We study the problem of learning policies over long time horizons. We pr...
Practical Evaluation and Optimization of Contextual Bandit Algorithms
We study and empirically optimize contextual bandit learning, exploratio...
Efficient Contextual Bandits in Nonstationary Worlds
Most contextual bandit algorithms minimize regret to the best fixed poli...
Active Learning for CostSensitive Classification
We design an active learning algorithm for costsensitive multiclass cla...
Corralling a Band of Bandit Algorithms
We study the problem of combining multiple bandit algorithms (that is, o...
Contextual Decision Processes with Low Bellman Rank are PACLearnable
This paper studies systematic exploration for reinforcement learning wit...
Offpolicy evaluation for slate recommendation
This paper studies the evaluation of policies that recommend an ordered ...
Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains
Highdimensional observations and complex realworld dynamics present ma...
PAC Reinforcement Learning with Rich Observations
We propose and study a new model for reinforcement learning with rich ob...
Efficient and Parsimonious Agnostic Active Learning
We develop a new active learning algorithm for the streaming setting sat...
Contextual Semibandits via Supervised Learning Oracles
We study an online decision making problem where on each round a learner...
Learning to Search Better Than Your Teacher
Methods for learning to search for structured prediction typically imita...
A Lower Bound for the Optimization of Finite Sums
This paper presents a lower bound for optimizing a finite sum of n funct...
Scalable Nonlinear Learning with Adaptive Polynomial Expansions
Can we effectively learn a nonlinear representation in time comparable t...
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
We present a new algorithm for the contextual bandit learning problem, w...
Least Squares Revisited: Scalable Approaches for Multiclass Prediction
This work provides simple algorithms for multiclass (and multilabel) p...
A Clustering Approach to Learn SparselyUsed Overcomplete Dictionaries
We consider the problem of learning overcomplete dictionaries in the con...
Oracle inequalities for computationally adaptive model selection
We analyze general model selection procedures using penalized empirical ...
Stochastic optimization and sparse statistical recovery: An optimal algorithm for high dimensions
We develop and analyze stochastic optimization algorithms for problems i...
A Reliable Effective Terascale Linear Learning System
We present a system and a set of techniques for learning linear predicto...
The Generalization Ability of Online Algorithms for Dependent Data
We study the generalization performance of online learning algorithms tr...
Ergodic Mirror Descent
We generalize stochastic subgradient descent methods to situations in wh...
Distributed Delayed Stochastic Optimization
We analyze the convergence of gradientbased optimization algorithms tha...
Fast global convergence of gradient methods for highdimensional statistical recovery
Many statistical Mestimators are based on convex optimization problems ...
Online and Batch Learning Algorithms for Data with Missing Features
We introduce new online and batch algorithms that are robust to data wit...
Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions
We analyze a class of estimators based on convex relaxation for solving ...
Informationtheoretic lower bounds on the oracle complexity of stochastic convex optimization
Relative to the large literature on upper bounds on complexity of convex...
Alekh Agarwal
Researcher in the New York lab of Microsoft Research at Microsoft, Postdoctoral Researcher at Microsoft from 20122014, PhD in Computer Science from UC Berkeley