Despite the popularity of policy gradient methods, they are known to suf...
Policy-gradient methods are widely used for learning control policies. T...
Cloud datacenters are exponentially growing both in numbers and size. Th...
We present the problem of reinforcement learning with exogenous terminat...
The classical Policy Iteration (PI) algorithm alternates between greedy
...
We consider the problem of using expert data with unobserved confounders...
Tree Search (TS) is crucial to some of the most influential successes in...
We approach the task of network congestion control in datacenters using
...
The standard Markov Decision Process (MDP) formulation hinges on the
ass...
With deep reinforcement learning (RL) methods achieving results that exc...
Policy evaluation in reinforcement learning is often conducted using
two...
Finite-horizon lookahead policies are abundantly used in Reinforcement
L...
Multiple-step lookahead policies have demonstrated high empirical compet...
The famous Policy Iteration algorithm alternates between policy improvem...
We address the problem of deploying a reinforcement learning (RL) agent ...
Outage scheduling aims at defining, over a horizon of several months to
...
TD(0) is one of the most commonly used algorithms in reinforcement learn...
We devise the Unit Commitment Nearest Neighbor (UCNN) algorithm to be us...
The power grid is a complex and vital system that necessitates careful
r...