Dale Schuurmans

research

∙ 06/02/2023

Probabilistic Adaptation of Text-to-Video Models

Large text-to-video models trained on internet-scale data have demonstra...

5 Mengjiao Yang, et al. ∙

research

∙ 03/07/2023

Gradient-Free Structured Pruning with Unlabeled Data

Large Language Models (LLMs) have achieved great success in solving diff...

0 Azade Nova, et al. ∙

research

∙ 03/07/2023

Foundation Models for Decision Making: Problems, Methods, and Opportunities

Foundation models pretrained on diverse data at scale have demonstrated ...

0 Sherry Yang, et al. ∙

research

∙ 01/31/2023

Learning Universal Policies via Text-Guided Video Generation

A goal of artificial intelligence is to construct an agent that can solv...

7 Yilun Du, et al. ∙

research

∙ 01/16/2023

The Role of Baselines in Policy Gradient Optimization

We study the effect of baselines in on-policy stochastic policy gradient...

12 Jincheng Mei, et al. ∙

research

∙ 01/10/2023

Memory Augmented Large Language Models are Computationally Universal

We show that transformer-based large language models are computationally...

0 Dale Schuurmans, et al. ∙

research

∙ 12/17/2022

Latent Variable Representation for Reinforcement Learning

Deep latent variable models have achieved significant empirical successe...

10 Tongzheng Ren, et al. ∙

research

∙ 11/30/2022

Score-based Continuous-time Discrete Diffusion Models

Score-based modeling through stochastic differential equations (SDEs) ha...

15 Haoran Sun, et al. ∙

research

∙ 11/28/2022

What learning algorithm is in-context learning? Investigations with linear models

Neural sequence models, especially transformers, exhibit a remarkable ca...

0 Ekin Akyürek, et al. ∙

research

∙ 11/21/2022

TEMPERA: Test-Time Prompting via Reinforcement Learning

Careful prompt design is critical to the use of large language models in...

0 Tianjun Zhang, et al. ∙

research

∙ 11/14/2022

Learning to Optimize with Stochastic Dominance Constraints

In real-world decision-making, uncertainty is important yet difficult to...

18 Hanjun Dai, et al. ∙

research

∙ 10/24/2022

Dichotomy of Control: Separating What You Can Control from What You Cannot

Future- or return-conditioned supervised learning is an emerging paradig...

0 Mengjiao Yang, et al. ∙

research

∙ 09/16/2022

Optimal Scaling for Locally Balanced Proposals in Discrete Spaces

Optimal scaling has been well studied for Metropolis-Hastings (M-H) algo...

0 Haoran Sun, et al. ∙

research

∙ 08/19/2022

Spectral Decomposition Representation for Reinforcement Learning

Representation learning often plays a critical role in reinforcement lea...

6 Tongzheng Ren, et al. ∙

research

∙ 07/14/2022

Making Linear MDPs Practical via Contrastive Representation Learning

It is common to address the curse of dimensionality in Markov decision p...

3 Tianjun Zhang, et al. ∙

research

∙ 07/02/2022

Rationale-Augmented Ensembles in Language Models

Recent research has shown that rationales, or step-by-step chains of tho...

1 Xuezhi Wang, et al. ∙

research

∙ 06/29/2022

Discrete Langevin Sampler via Wasserstein Gradient Flow

Recently, a family of locally balanced (LB) samplers has demonstrated ex...

2 Haoran Sun, et al. ∙

research

∙ 05/27/2022

Multimodal Masked Autoencoders Learn Transferable Representations

Building scalable models to learn from diverse, multimodal data remains ...

0 Xinyang Geng, et al. ∙

research

∙ 05/22/2022

Chain of Thought Imitation with Procedure Cloning

Imitation learning aims to extract high-performance policies from logged...

0 Mengjiao Yang, et al. ∙

research

∙ 05/21/2022

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

We propose a novel prompting strategy, least-to-most prompting, that ena...

1 Denny Zhou, et al. ∙

research

∙ 04/25/2022

Reinforcement Teaching

We propose Reinforcement Teaching: a framework for meta-learning in whic...

10 Alex Lewandowski, et al. ∙

research

∙ 03/21/2022

Self-Consistency Improves Chain of Thought Reasoning in Language Models

We explore a simple ensemble strategy, self-consistency, that significan...

0 Xuezhi Wang, et al. ∙

research

∙ 02/02/2022

On the Effect of Log-Barrier Regularization in Decentralized Softmax Gradient Play in Multiagent Systems

Softmax policy gradient is a popular algorithm for policy optimization i...

0 Runyu Zhang, et al. ∙

research

∙ 01/28/2022

Chain of Thought Prompting Elicits Reasoning in Large Language Models

Although scaling up language model size has reliably improved performanc...

9 Jason Wei, et al. ∙

research

∙ 12/01/2021

Neural Stochastic Dual Dynamic Programming

Stochastic dual dynamic programming (SDDP) is a state-of-the-art method ...

0 Hanjun Dai, et al. ∙

research

∙ 10/29/2021

Understanding the Effect of Stochasticity in Policy Optimization

We study the effect of stochasticity in on-policy policy optimization, a...

0 Jincheng Mei, et al. ∙

research

∙ 10/28/2021

SMORE: Knowledge Graph Completion and Multi-hop Reasoning in Massive Knowledge Graphs

Knowledge graphs (KGs) capture knowledge in the form of head–relation–ta...

6 Hongyu Ren, et al. ∙

research

∙ 07/12/2021

Combiner: Full Attention Transformer with Sparse Computation Cost

Transformers provide a class of expressive architectures that are extrem...

3 Hongyu Ren, et al. ∙

research

∙ 06/18/2021

On the Sample Complexity of Batch Reinforcement Learning with Policy-Induced Data

We study the fundamental question of the sample complexity of learning a...

0 Chenjun Xiao, et al. ∙

research

∙ 06/13/2021

Characterizing the Gap Between Actor-Critic and Policy Gradient

Actor-critic (AC) methods are ubiquitous in reinforcement learning. Alth...

0 Junfeng Wen, et al. ∙

research

∙ 05/13/2021

Leveraging Non-uniformity in First-order Non-convex Optimization

Classical global convergence results for first-order methods rely on uni...

14 Jincheng Mei, et al. ∙

research

∙ 04/15/2021

Joint Attention for Multi-Agent Coordination and Social Learning

Joint attention - the ability to purposefully coordinate attention with ...

14 Dennis Lee, et al. ∙

research

∙ 04/06/2021

On the Optimality of Batch Policy Optimization Algorithms

Batch policy optimization considers leveraging existing data for policy ...

0 Chenjun Xiao, et al. ∙

research

∙ 02/11/2021

Optimization Issues in KL-Constrained Approximate Policy Iteration

Many reinforcement learning algorithms can be seen as versions of approx...

0 Nevena Lazic, et al. ∙

research

∙ 12/12/2020

Offline Policy Selection under Uncertainty

The presence of uncertainty in policy evaluation significantly complicat...

0 Mengjiao Yang, et al. ∙

research

∙ 11/10/2020

Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration

Discrete structures play an important role in applications like program ...

14 Hanjun Dai, et al. ∙

research

∙ 10/22/2020

CoinDICE: Off-Policy Confidence Interval Estimation

We study high-confidence behavior-agnostic off-policy evaluation in rein...

0 Bo Dai, et al. ∙

research

∙ 09/29/2020

Attention that does not Explain Away

Models based on the Transformer architecture have achieved better accura...

0 Nan Ding, et al. ∙

research

∙ 07/21/2020

EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL

Off-policy reinforcement learning (RL) holds the promise of sample-effic...

0 Seyed Kamyar Seyed Ghasemipour, et al. ∙

research

∙ 07/07/2020

Off-Policy Evaluation via the Regularized Lagrangian

The recently proposed distribution correction estimation (DICE) family o...

4 Mengjiao Yang, et al. ∙

research

∙ 06/28/2020

Scalable Deep Generative Modeling for Sparse Graphs

Learning graph generative models is a challenging task for deep learning...

8 Hanjun Dai, et al. ∙

research

∙ 06/17/2020

A maximum-entropy approach to off-policy evaluation in average-reward MDPs

This work focuses on off-policy evaluation (OPE) with function approxima...

0 Nevena Lazic, et al. ∙

research

∙ 05/13/2020

On the Global Convergence Rates of Softmax Policy Gradient Methods

We make three contributions toward better understanding policy gradient ...

4 Jincheng Mei, et al. ∙

research

∙ 03/17/2020

Energy-Based Processes for Exchangeable Data

Recently there has been growing interest in modeling sets with exchangea...

0 Mengjiao Yang, et al. ∙

research

∙ 03/09/2020

Variational Inference for Deep Probabilistic Canonical Correlation Analysis

In this paper, we propose a deep probabilistic multi-view model that is ...

0 Mahdi Karami, et al. ∙

research

∙ 03/02/2020

Batch Stationary Distribution Estimation

We consider the problem of approximating the stationary distribution of ...

7 Junfeng Wen, et al. ∙

research

∙ 02/27/2020

ConQUR: Mitigating Delusional Bias in Deep Q-learning

Delusional bias is a fundamental source of error in approximate Q-learni...

20 Andy Su, et al. ∙

research

∙ 02/21/2020

GenDICE: Generalized Offline Estimation of Stationary Values

An important problem that arises in reinforcement learning and Monte Car...

11 Ruiyi Zhang, et al. ∙

research

∙ 12/24/2019

Learning to Combat Compounding-Error in Model-Based Reinforcement Learning

Despite its potential to improve sample complexity versus model-free app...

0 Chenjun Xiao, et al. ∙

research

∙ 12/04/2019

AlgaeDICE: Policy Gradient from Arbitrary Experience

In many real-world applications of reinforcement learning (RL), interact...

0 Ofir Nachum, et al. ∙

Dale Schuurmans

Featured Co-authors

Sign in with Google

Consider DeepAI Pro