
Policy Optimization as Online Learning with Mediator Feedback
Policy Optimization (PO) is a widely used approach to address continuous...
An Asymptotically Optimal PrimalDual Incremental Algorithm for Contextual Linear Bandits
In the contextual linear bandit setting, algorithms built on the optimis...
Option Hedging with Risk Averse Reinforcement Learning
In this paper we show how riskaverse reinforcement learning can be used...
Inverse Reinforcement Learning from a Gradientbased Learner
Inverse Reinforcement Learning addresses the problem of inferring an exp...
Newtonbased Policy Optimization for Games
Many learning problems involve multiple agents optimizing different inte...
A Policy Gradient Method for TaskAgnostic Exploration
In a rewardfree environment, what is a suitable intrinsic objective for...
Sequential Transfer in Reinforcement Learning with a Generative Model
We are interested in how to design reinforcement learning agents that pr...
TimeVariant Variational Transfer for Value Functions
In most transfer learning approaches to reinforcement learning (RL) the ...
A Novel ConfidenceBased Algorithm for Structured Bandits
We study finitearmed stochastic bandits where the rewards of each arm m...
Online Joint Bid/Daily Budget Optimization of Internet Advertising Campaigns
Payperclick advertising includes various formats (e.g., search, contex...
Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning
The choice of the control frequency of a system has a relevant impact on...
MushroomRL: Simplifying Reinforcement Learning Research
MushroomRL is an opensource Python library developed to simplify the pr...
RiskAverse Trust Region Optimization for RewardVolatility Reduction
In realworld decisionmaking problems, for instance in the fields of fi...
GradientAware Modelbased Policy Search
Traditional modelbased reinforcement learning approaches learn a model ...
Policy Space Identification in Configurable Environments
We study the problem of identifying the policy space of a learning agent...
Feature Selection via Mutual Information: New Theoretical Insights
Mutual information has been successfully adopted in filter featureselec...
An IntrinsicallyMotivated Approach for Learning Highly Exploring and Fast Mixing Policies
What is a good exploration strategy for an agent that interacts with an ...
Smoothing Policies and Safe Policy Gradients
Policy gradient algorithms are among the best candidates for the much an...
Policy Optimization via Importance Sampling
Policy optimization is an effective reinforcement learning approach to s...
Stochastic VarianceReduced Policy Gradient
In this paper, we propose a novel reinforcement learning algorithm cons...
Configurable Markov Decision Processes
In many realworld problems, there is the possibility to configure, to a...
Importance Weighted Transfer of Samples in Reinforcement Learning
We consider the transfer of experience samples (i.e., tuples < s, a, s',...
CostSensitive Approach to Batch Size Adaptation for Gradient Descent
In this paper, we propose a novel approach to automatically determine th...
Unimodal Thompson Sampling for GraphStructured Arms
We study, to the best of our knowledge, the first Bayesian algorithm for...
Multiobjective Reinforcement Learning with Continuous Pareto Frontier Approximation Supplementary Material
This document contains supplementary material for the paper "Multiobjec...
Transfer from Multiple MDPs
Transfer reinforcement learning (RL) methods leverage on the experience ...
Marcello Restelli
