
EMaQ: ExpectedMax QLearning Operator for Simple Yet Effective Offline and Online RL
Offpolicy reinforcement learning (RL) holds the promise of sampleeffic...
read it

OffPolicy Evaluation via the Regularized Lagrangian
The recently proposed distribution correction estimation (DICE) family o...
read it

Scalable Deep Generative Modeling for Sparse Graphs
Learning graph generative models is a challenging task for deep learning...
read it

A maximumentropy approach to offpolicy evaluation in averagereward MDPs
This work focuses on offpolicy evaluation (OPE) with function approxima...
read it

On the Global Convergence Rates of Softmax Policy Gradient Methods
We make three contributions toward better understanding policy gradient ...
read it

EnergyBased Processes for Exchangeable Data
Recently there has been growing interest in modeling sets with exchangea...
read it

Variational Inference for Deep Probabilistic Canonical Correlation Analysis
In this paper, we propose a deep probabilistic multiview model that is ...
read it

Batch Stationary Distribution Estimation
We consider the problem of approximating the stationary distribution of ...
read it

ConQUR: Mitigating Delusional Bias in Deep Qlearning
Delusional bias is a fundamental source of error in approximate Qlearni...
read it

GenDICE: Generalized Offline Estimation of Stationary Values
An important problem that arises in reinforcement learning and Monte Car...
read it

Learning to Combat CompoundingError in ModelBased Reinforcement Learning
Despite its potential to improve sample complexity versus modelfree app...
read it

AlgaeDICE: Policy Gradient from Arbitrary Experience
In many realworld applications of reinforcement learning (RL), interact...
read it

Domain Aggregation Networks for MultiSource Domain Adaptation
In many realworld applications, we want to exploit multiple source data...
read it

Striving for Simplicity in Offpolicy Deep Reinforcement Learning
Reflecting on the advances of offpolicy deep reinforcement learning (RL...
read it

Advantage Amplification in Slowly Evolving LatentState Environments
Latentstate environments with long horizons, such as those faced by rec...
read it

Exponential Family Estimation via Adversarial Dynamics Embedding
We present an efficient algorithm for maximum likelihood estimation (MLE...
read it

Learning to Generalize from Sparse and Underspecified Rewards
We consider the problem of learning from sparse and underspecified rewar...
read it

A Geometric Perspective on Optimal Representations for Reinforcement Learning
This paper proposes a new approach to representation learning based on g...
read it

The Value Function Polytope in Reinforcement Learning
We establish geometric and topological properties of the space of value ...
read it

Understanding the impact of entropy on policy optimization
Entropy regularization is commonly used to improve policy optimization i...
read it

Understanding the impact of entropy in policy learning
Entropy regularization is commonly used to improve policy optimization i...
read it

Kernel Exponential Family Estimation via Doubly Dual Embedding
We investigate penalized maximum loglikelihood estimation for exponenti...
read it

Planning and Learning with Stochastic Action Sets
In many practical uses of reinforcement learning (RL) the set of actions...
read it

Variational Rejection Sampling
Learning latent variable models with stochastic variational inference is...
read it

Smoothed Action Value Functions for Learning Gaussian Policies
Stateaction value functions (i.e., Qvalues) are ubiquitous in reinforc...
read it

TrustPCL: An OffPolicy Trust Region Method for Continuous Control
Trust region methods, such as TRPO, are often used to stabilize policy o...
read it

Bridging the Gap Between Value and Policy Based Reinforcement Learning
We establish a new connection between value and policy based reinforceme...
read it

Improving Policy Gradient by Exploring Underappreciated Rewards
This paper presents a novel form of policy gradient for modelfree reinf...
read it

Stochastic Neural Networks with Monotonic Activation Functions
We propose a Laplace approximation that creates a stochastic unit from a...
read it

Generalized Conditional Gradient for Sparse Estimation
Structured sparsity is an important modeling tool that expands the appli...
read it

Adaptive Monte Carlo via Bandit Allocation
We consider the problem of sequentially choosing between a set of unbias...
read it

Convex Relaxations of Bregman Divergence Clustering
Although many convex relaxations of clustering have been proposed in the...
read it

Learning Bayesian Nets that Perform Well
A Bayesian net (BN) is more than a succinct way to encode a probabilisti...
read it

Monte Carlo Inference via Greedy Importance Sampling
We present a new method for conducting Monte Carlo inference in graphica...
read it

Boltzmann Machine Learning with the Latent Maximum Entropy Principle
We present a new statistical learning paradigm for Boltzmann machines ba...
read it

Monte Carlo Matrix Inversion Policy Evaluation
In 1950, Forsythe and Leibler (1950) introduced a statistical technique ...
read it

Maximum Margin Bayesian Networks
We consider the problem of learning Bayesian network classifiers that ma...
read it

Regularizers versus Losses for Nonlinear Dimensionality Reduction: A Factored View with New Convex Relaxations
We demonstrate that almost all nonparametric dimensionality reduction m...
read it

Convex Structure Learning for Bayesian Networks: Polynomial Feature Selection and Approximate Ordering
We present a new approach to learning the structure and parameters of a ...
read it

Rank/Norm Regularization with ClosedForm Solutions: Application to Subspace Clustering
When data is sampled from an unknown subspace, principal component analy...
read it
Dale Schuurmans
is this you? claim profile
Professor of Department of Computing Science at University of Alberta, Research Scientist at Google Brain