Shalabh Bhatnagar

research

∙ 05/20/2023

Off-Policy Average Reward Actor-Critic with Deterministic Policy Search

The average reward criterion is relatively less studied as most existing...

0 Naman Saxena, et al. ∙

research

∙ 05/20/2023

A Framework for Provably Stable and Consistent Training of Deep Feedforward Networks

We present a novel algorithm for training deep neural networks in superv...

0 Arunselvan Ramaswamy, et al. ∙

research

∙ 04/21/2023

A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning

We consider the problem of control in the setting of reinforcement learn...

0 Mizhaan Prajit Maniyar, et al. ∙

research

∙ 03/13/2023

n-Step Temporal Difference Learning with Optimal n

We consider the problem of finding the optimal value of n in the n-step ...

0 Lakshmi Mandal, et al. ∙

research

∙ 12/20/2022

Generalized Simultaneous Perturbation Stochastic Approximation with Reduced Estimator Bias

We present in this paper a family of generalized simultaneous perturbati...

0 Shalabh Bhatnagar, et al. ∙

research

∙ 10/14/2022

Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm

During initial iterations of training in most Reinforcement Learning (RL...

0 Ashish Kumar Jayant, et al. ∙

research

∙ 10/10/2022

A policy gradient approach for Finite Horizon Constrained Markov Decision Processes

The infinite horizon setting is widely adopted for problems of reinforce...

0 Soumyajit Guin, et al. ∙

research

∙ 10/10/2022

Actor-Critic or Critic-Actor? A Tale of Two Time Scales

We revisit the standard formulation of tabular actor-critic algorithm as...

0 Shalabh Bhatnagar, et al. ∙

research

∙ 07/30/2022

A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization

In this paper, we present a stochastic gradient algorithm for minimizing...

0 Akash Mondal, et al. ∙

research

∙ 01/02/2022

Reinforcement Learning for Task Specifications with Action-Constraints

In this paper, we use concepts from supervisory control theory of discre...

0 Arun Raman, et al. ∙

research

∙ 11/23/2021

Schedule Based Temporal Difference Algorithms

Learning the value function of a given policy from data samples is an im...

0 Rohan Deb, et al. ∙

research

∙ 11/22/2021

Gradient Temporal Difference with Momentum: Stability and Convergence

Gradient temporal difference (Gradient TD) algorithms are a popular clas...

0 Rohan Deb, et al. ∙

research

∙ 10/19/2021

Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

Learning optimal behavior from existing data is one of the most importan...

8 Raghuram Bharadwaj Diddigi, et al. ∙

research

∙ 01/07/2021

Attention Actor-Critic algorithm for Multi-Agent Constrained Co-operative Reinforcement Learning

In this work, we consider the problem of computing optimal actions for R...

0 P. Parnika, et al. ∙

research

∙ 10/30/2020

Robust Quadrupedal Locomotion on Sloped Terrains: A Linear Policy Approach

In this paper, with a view toward fast deployment of locomotion gaits in...

0 Kartik Paigwar, et al. ∙

research

∙ 10/09/2020

Hindsight Experience Replay with Kronecker Product Approximate Curvature

Hindsight Experience Replay (HER) is one of the efficient algorithm to s...

0 Dhuruva Priyan G M, et al. ∙

research

∙ 09/02/2020

A reinforcement learning approach to hybrid control design

In this paper we design hybrid control policies for hybrid systems whose...

0 Meet Gandhi, et al. ∙

research

∙ 07/28/2020

Learning Stable Manoeuvres in Quadruped Robots from Expert Demonstrations

With the research into development of quadruped robots picking up pace, ...

2 Sashank Tirumala, et al. ∙

research

∙ 02/06/2020

A Stochastic Game Framework for Efficient Energy Management in Microgrid Networks

We consider the problem of energy management in microgrid networks. A mi...

0 Shravan Nayak, et al. ∙

research

∙ 12/30/2019

Gait Library Synthesis for Quadruped Robots via Augmented Random Search

In this paper, with a view toward fast deployment of learned locomotion ...

0 Sashank Tirumala, et al. ∙

research

∙ 11/20/2019

Hierarchical Average Reward Policy Gradient Algorithms

Option-critic learning is a general-purpose reinforcement learning (RL) ...

0 Akshay Dharmavaram, et al. ∙

research

∙ 11/13/2019

A Convergent Off-Policy Temporal Difference Algorithm

Learning the value function of a given policy (target policy) from the d...

0 Raghuram Bharadwaj Diddigi, et al. ∙

research

∙ 11/01/2019

Generalized Speedy Q-learning

In this paper, we derive a generalization of the Speedy Q-learning (SQL)...

0 Indu John, et al. ∙

research

∙ 06/16/2019

Solution of Two-Player Zero-Sum Game by Successive Relaxation

We consider the problem of two-player zero-sum game. In this setting, th...

0 Raghuram Bharadwaj Diddigi, et al. ∙

research

∙ 05/15/2019

Learning Active Spine Behaviors for Dynamic and Efficient Locomotion in Quadruped Robots

In this work, we provide a simulation framework to perform systematic st...

0 Shounak Bhattacharya, et al. ∙

research

∙ 05/10/2019

Reinforcement Learning in Non-Stationary Environments

Reinforcement learning (RL) methods learn optimal decisions in the prese...

0 Sindhu Padakandla, et al. ∙

research

∙ 05/10/2019

Second Order Value Iteration in Reinforcement Learning

Value iteration is a fixed point iteration technique utilized to obtain ...

0 Chandramouli Kamanchi, et al. ∙

research

∙ 05/08/2019

Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning

In cooperative stochastic games multiple agents work towards learning jo...

0 Raghuram Bharadwaj Diddigi, et al. ∙

research

∙ 03/09/2019

Successive Over Relaxation Q-Learning

In a discounted reward Markov Decision Process (MDP) the objective is to...

0 Chandramouli Kamanchi, et al. ∙

research

∙ 02/11/2019

An Online Sample Based Method for Mode Estimation using ODE Analysis of Stochastic Approximation Algorithms

One of the popular measures of central tendency that provides better rep...

0 Chandramouli Kamanchi, et al. ∙

research

∙ 01/03/2019

Design, Development and Experimental Realization of a Quadrupedal Research Platform: Stoch

In this paper, we present a complete description of the hardware design ...

0 Dhaivat Dholakiya, et al. ∙

research

∙ 11/08/2018

Memory-based Deep Reinforcement Learning for Obstacle Avoidance in UAV with Limited Environment Knowledge

This paper presents our method for enabling a UAV quadrotor, equipped wi...

0 Abhik Singla, et al. ∙

research

∙ 10/09/2018

Realizing Learned Quadruped Locomotion Behaviors through Kinematic Motion Primitives

Humans and animals are believed to use a very minimal set of trajectorie...

0 Abhik Singla, et al. ∙

research

∙ 08/08/2018

Random directions stochastic approximation with deterministic perturbations

We introduce deterministic perturbation schemes for the recently propose...

0 Prashanth L. A., et al. ∙

research

∙ 06/15/2018

An Online Prediction Algorithm for Reinforcement Learning with Linear Function Approximation using Cross Entropy Method

In this paper, we provide two new stable online algorithms for the probl...

0 Ajin George Joseph, et al. ∙

research

∙ 02/22/2018

Asynchronous stochastic approximations with asymptotically biased errors and deep multi-agent learning

Asynchronous stochastic approximations are an important class of model-f...

0 Arunselvan Ramaswamy, et al. ∙

research

∙ 01/31/2018

A Cross Entropy based Optimization Algorithm with Global Convergence Guarantees

The cross entropy (CE) method is a model based search method to solve op...

0 Ajin George Joseph, et al. ∙

research

∙ 01/31/2018

An Incremental Off-policy Search in a Model-free Markov Decision Process Using a Single Sample Path

In this paper, we consider a modified version of the control problem in ...

0 Ajin George Joseph, et al. ∙

research

∙ 11/17/2017

RLWS: A Reinforcement Learning based GPU Warp Scheduler

The Streaming Multiprocessors (SMs) of a Graphics Processing Unit (GPU) ...

0 Jayvant Anantpur, et al. ∙

research

∙ 11/14/2017

A unified decision making framework for supply and demand management in microgrid networks

This paper considers two important problems - on the supply-side and dem...

0 Raghuram Bharadwaj Diddigi, et al. ∙

research

∙ 09/14/2017

Conditions for Stability and Convergence of Set-Valued Stochastic Approximations: Applications to Approximate Value and Fixed point Iterations

The main aim of this paper is the development of easily verifiable suffi...

0 Arunselvan Ramaswamy, et al. ∙

research

∙ 08/27/2017

Novel Sensor Scheduling Scheme for Intruder Tracking in Energy Efficient Sensor Networks

We consider the problem of tracking an intruder using a network of wirel...

0 Raghuram Bharadwaj Diddigi, et al. ∙

research

∙ 08/25/2017

Multi-Agent Q-Learning for Minimizing Demand-Supply Power Deficit in Microgrids

We consider the problem of minimizing the difference in the demand and t...

0 Raghuram Bharadwaj Diddigi, et al. ∙

research

∙ 04/01/2016

Analysis of gradient descent methods with non-diminishing, bounded errors

The main aim of this paper is to provide an analysis of gradient descent...

0 Arunselvan Ramaswamy, et al. ∙

research

∙ 11/27/2015

Shaping Proto-Value Functions via Rewards

In this paper, we combine task-dependent reward shaping and task-indepen...

0 Chandrashekar Lakshmi Narayanan, et al. ∙

research

∙ 04/23/2015

Stability of Stochastic Approximations with `Controlled Markov' Noise and Temporal Difference Learning

In this paper we present a `stability theorem' for stochastic approximat...

0 Arunselvan Ramaswamy, et al. ∙

research

∙ 03/31/2015

Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning

We present for the first time an asymptotic convergence analysis of two ...

0 Prasenjit Karmakar, et al. ∙

research

∙ 02/06/2015

Stochastic recursive inclusion in two timescales with an application to the Lagrangian dual problem

In this paper we present a framework to analyze the asymptotic behavior ...

0 Arunselvan Ramaswamy, et al. ∙

research

∙ 02/06/2015

A Generalization of the Borkar-Meyn Theorem for Stochastic Recursive Inclusions

In this paper the stability theorem of Borkar and Meyn is extended to in...

0 Arunselvan Ramaswamy, et al. ∙

research

∙ 01/08/2014

Actor-Critic Algorithms for Learning Nash Equilibria in N-player General-Sum Games

We consider the problem of finding stationary Nash equilibria (NE) in a ...

0 H. L Prasad, et al. ∙

Shalabh Bhatnagar

Featured Co-authors

Sign in with Google

Consider DeepAI Pro