
Policy Optimization as Online Learning with Mediator Feedback
Policy Optimization (PO) is a widely used approach to address continuous...
read it

An Asymptotically Optimal PrimalDual Incremental Algorithm for Contextual Linear Bandits
In the contextual linear bandit setting, algorithms built on the optimis...
read it

Option Hedging with Risk Averse Reinforcement Learning
In this paper we show how riskaverse reinforcement learning can be used...
read it

Inverse Reinforcement Learning from a Gradientbased Learner
Inverse Reinforcement Learning addresses the problem of inferring an exp...
read it

Newtonbased Policy Optimization for Games
Many learning problems involve multiple agents optimizing different inte...
read it

A Policy Gradient Method for TaskAgnostic Exploration
In a rewardfree environment, what is a suitable intrinsic objective for...
read it

Sequential Transfer in Reinforcement Learning with a Generative Model
We are interested in how to design reinforcement learning agents that pr...
read it

TimeVariant Variational Transfer for Value Functions
In most transfer learning approaches to reinforcement learning (RL) the ...
read it

A Novel ConfidenceBased Algorithm for Structured Bandits
We study finitearmed stochastic bandits where the rewards of each arm m...
read it

Online Joint Bid/Daily Budget Optimization of Internet Advertising Campaigns
Payperclick advertising includes various formats (e.g., search, contex...
read it

Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning
The choice of the control frequency of a system has a relevant impact on...
read it

MushroomRL: Simplifying Reinforcement Learning Research
MushroomRL is an opensource Python library developed to simplify the pr...
read it

RiskAverse Trust Region Optimization for RewardVolatility Reduction
In realworld decisionmaking problems, for instance in the fields of fi...
read it

GradientAware Modelbased Policy Search
Traditional modelbased reinforcement learning approaches learn a model ...
read it

Policy Space Identification in Configurable Environments
We study the problem of identifying the policy space of a learning agent...
read it

Feature Selection via Mutual Information: New Theoretical Insights
Mutual information has been successfully adopted in filter featureselec...
read it

An IntrinsicallyMotivated Approach for Learning Highly Exploring and Fast Mixing Policies
What is a good exploration strategy for an agent that interacts with an ...
read it

Smoothing Policies and Safe Policy Gradients
Policy gradient algorithms are among the best candidates for the much an...
read it

Policy Optimization via Importance Sampling
Policy optimization is an effective reinforcement learning approach to s...
read it

Stochastic VarianceReduced Policy Gradient
In this paper, we propose a novel reinforcement learning algorithm cons...
read it

Configurable Markov Decision Processes
In many realworld problems, there is the possibility to configure, to a...
read it

Importance Weighted Transfer of Samples in Reinforcement Learning
We consider the transfer of experience samples (i.e., tuples < s, a, s',...
read it

CostSensitive Approach to Batch Size Adaptation for Gradient Descent
In this paper, we propose a novel approach to automatically determine th...
read it

Unimodal Thompson Sampling for GraphStructured Arms
We study, to the best of our knowledge, the first Bayesian algorithm for...
read it

Multiobjective Reinforcement Learning with Continuous Pareto Frontier Approximation Supplementary Material
This document contains supplementary material for the paper "Multiobjec...
read it

Transfer from Multiple MDPs
Transfer reinforcement learning (RL) methods leverage on the experience ...
read it
Marcello Restelli
verfied profile