When writing programs, people have the ability to tackle a new complex t...
In the real world, some of the most complex settings for learned agents
...
Many practical applications, such as recommender systems and learning to...
Inferring reward functions from human behavior is at the center of value...
Offline reinforcement learning (RL) promises the ability to learn effect...
Offline reinforcement learning (RL) algorithms can acquire effective pol...
When writing programs, people have the ability to tackle a new complex t...
Mean rewards of actions are often correlated. The form of these correlat...
Meta-, multi-task, and federated learning can be all viewed as solving
s...
We study Thompson sampling (TS) in online decision-making problems where...
Users of recommender systems often behave in a non-stationary fashion, d...
In many sequence learning tasks, such as program synthesis and document
...
A latent bandit problem is one in which the learning agent knows the arm...
Off-policy learning is a framework for evaluating and optimizing policie...
We focus on the problem of predicting future states of entities in compl...
The evolution of the internet has created an abundance of unstructured d...