
Attention ActorCritic algorithm for MultiAgent Constrained Cooperative Reinforcement Learning
In this work, we consider the problem of computing optimal actions for R...
Robust Quadrupedal Locomotion on Sloped Terrains: A Linear Policy Approach
In this paper, with a view toward fast deployment of locomotion gaits in...
Hindsight Experience Replay with Kronecker Product Approximate Curvature
Hindsight Experience Replay (HER) is one of the efficient algorithm to s...
A reinforcement learning approach to hybrid control design
In this paper we design hybrid control policies for hybrid systems whose...
Learning Stable Manoeuvres in Quadruped Robots from Expert Demonstrations
With the research into development of quadruped robots picking up pace, ...
A Stochastic Game Framework for Efficient Energy Management in Microgrid Networks
We consider the problem of energy management in microgrid networks. A mi...
Gait Library Synthesis for Quadruped Robots via Augmented Random Search
In this paper, with a view toward fast deployment of learned locomotion ...
Hierarchical Average Reward Policy Gradient Algorithms
Optioncritic learning is a generalpurpose reinforcement learning (RL) ...
A Convergent OffPolicy Temporal Difference Algorithm
Learning the value function of a given policy (target policy) from the d...
Generalized Speedy Qlearning
In this paper, we derive a generalization of the Speedy Qlearning (SQL)...
Solution of TwoPlayer ZeroSum Game by Successive Relaxation
We consider the problem of twoplayer zerosum game. In this setting, th...
Learning Active Spine Behaviors for Dynamic and Efficient Locomotion in Quadruped Robots
In this work, we provide a simulation framework to perform systematic st...
Reinforcement Learning in NonStationary Environments
Reinforcement learning (RL) methods learn optimal decisions in the prese...
Second Order Value Iteration in Reinforcement Learning
Value iteration is a fixed point iteration technique utilized to obtain ...
ActorCritic Algorithms for Constrained Multiagent Reinforcement Learning
In cooperative stochastic games multiple agents work towards learning jo...
Successive Over Relaxation QLearning
In a discounted reward Markov Decision Process (MDP) the objective is to...
An Online Sample Based Method for Mode Estimation using ODE Analysis of Stochastic Approximation Algorithms
One of the popular measures of central tendency that provides better rep...
Design, Development and Experimental Realization of a Quadrupedal Research Platform: Stoch
In this paper, we present a complete description of the hardware design ...
Memorybased Deep Reinforcement Learning for Obstacle Avoidance in UAV with Limited Environment Knowledge
This paper presents our method for enabling a UAV quadrotor, equipped wi...
Realizing Learned Quadruped Locomotion Behaviors through Kinematic Motion Primitives
Humans and animals are believed to use a very minimal set of trajectorie...
Random directions stochastic approximation with deterministic perturbations
We introduce deterministic perturbation schemes for the recently propose...
An Online Prediction Algorithm for Reinforcement Learning with Linear Function Approximation using Cross Entropy Method
In this paper, we provide two new stable online algorithms for the probl...
Asynchronous stochastic approximations with asymptotically biased errors and deep multiagent learning
Asynchronous stochastic approximations are an important class of modelf...
A Cross Entropy based Optimization Algorithm with Global Convergence Guarantees
The cross entropy (CE) method is a model based search method to solve op...
An Incremental Offpolicy Search in a Modelfree Markov Decision Process Using a Single Sample Path
In this paper, we consider a modified version of the control problem in ...
RLWS: A Reinforcement Learning based GPU Warp Scheduler
The Streaming Multiprocessors (SMs) of a Graphics Processing Unit (GPU) ...
A unified decision making framework for supply and demand management in microgrid networks
This paper considers two important problems  on the supplyside and dem...
Conditions for Stability and Convergence of SetValued Stochastic Approximations: Applications to Approximate Value and Fixed point Iterations
The main aim of this paper is the development of easily verifiable suffi...
Novel Sensor Scheduling Scheme for Intruder Tracking in Energy Efficient Sensor Networks
We consider the problem of tracking an intruder using a network of wirel...
MultiAgent QLearning for Minimizing DemandSupply Power Deficit in Microgrids
We consider the problem of minimizing the difference in the demand and t...
Analysis of gradient descent methods with nondiminishing, bounded errors
The main aim of this paper is to provide an analysis of gradient descent...
Shaping ProtoValue Functions via Rewards
In this paper, we combine taskdependent reward shaping and taskindepen...
Stability of Stochastic Approximations with `Controlled Markov' Noise and Temporal Difference Learning
In this paper we present a `stability theorem' for stochastic approximat...
Two Timescale Stochastic Approximation with Controlled Markov noise and Offpolicy temporal difference learning
We present for the first time an asymptotic convergence analysis of two ...
Stochastic recursive inclusion in two timescales with an application to the Lagrangian dual problem
In this paper we present a framework to analyze the asymptotic behavior ...
A Generalization of the BorkarMeyn Theorem for Stochastic Recursive Inclusions
In this paper the stability theorem of Borkar and Meyn is extended to in...
ActorCritic Algorithms for Learning Nash Equilibria in Nplayer GeneralSum Games
We consider the problem of finding stationary Nash equilibria (NE) in a ...
Shalabh Bhatnagar
Professor and Chair Dept of Computer Science and Automation at Indian Institute of Science