I Introduction
Legged locomotion has been well studied for more than six decades, starting from the inception of the GE walking truck in [1]. A variety of approaches, starting from inverted pendulum models [2]
, zero moment points
[3], passivity based control [4, 5], capture points [6], hybrid zero dynamics [7]to even machine learning based approaches
[8, 9] have been explored. A more recent trend has been the use of deep reinforcement learning (DRL) methods [10, 11] to determine optimal policies for efficient and robust walking behaviors in both bipeds and quadrupeds. DRL was successfully used to achieve walking in the quadrupedal robot Minitaur [10], where the control policy was trained via one of the well known policy gradient based approaches called the proximal policy optimization (PPO) [12].While all the approaches described above were applicable to many kinds of legged robots, there was also an independent group of researchers focusing entirely on capturing symmetries in quadrupedal walking, and then utilize these symmetries to obtain a low dimensional representation for walking [13]. In essence, the focus here was more on trajectory reconstruction for each of the joints of the robot from a minimal number of basis functions, called the kinematic motion primitives (kMPs) [14]. It was shown in [15] that only fourfive primitives are enough to represent the various locomotion patterns like walking, trotting, and canter. Comparisons were even made with kMPs derived from horse [13], showing a similarity of the kMPs irrespective of the gait used. This type of minimal representation of walking point to the fact that the joint angle trajectories do not contain “high frequency” modulations. Alternative representations of these motion primitives by using canonical walking functions [16], and order Bezier polynomials [7] also indicate this low dimensional behavior of walking. In fact, [16] showed that mean human walking data from the joints also had a high correlation () with time responses of linear springmassdamper systems. All these observations strongly indicate that a finite number of basic trajectories (or basis functions) are sufficient to realize periodic trajectories in the individual joints of the robot, which results in walking.
The walking community is currently divided on deciding which approaches are best for achieving legged locomotion. Some use formal methods, which are model based and determine the underlying principles of walking, and then develop controllers that yield stable locomotion [4, 5, 7]. Others use modelfree methods like RL (and also DRL) to derive robust locomotion behaviors in both bipeds and quadrupeds [8, 9, 10, 17]. This methodology not only eliminates the difficult and often unpleasant task of identification and modeling of the robot, but also enables the computer to continuously adapt to changing environments. However, it is important to note that the controller learned is modelfree, but the training is normally done in simulation which is often not realistic. This reality gap is also a well known problem faced by traditional model based optimization tools for walking [18, 19, 20].
One of the main goals of this paper is to approach this problem of reality gap by using the tools used in formal methods. Formal methods usually employ the optimization tools only for generating the reference gaits (trajectories), and then these trajectories are tracked at the joint level by model free methods (e.g. PD control) [18, 21, 22]. Therefore, we would like to view DRL based training as a trajectory generator (optimization tool) more than the onboard compute platform for walking. This is similar to the method followed by [17], where RL was used to find the optimal set of parameters of the foot trajectory. We will obtain a gait from DRL, extract kMPs from this gait, and then realize a set of derived behaviors, namely trot, walk, gallop and bound in Stoch experimentally. Trot and gallop were achieved in [10] by separate instances of training, while we were able to realize more than four behaviors by one instance of training. A pictorial representation of our approach is given in Fig. 2. This hybrid approach allows for a mechanistic way of generating gaits in complex legged locomotion—that combines the data driven DRL based gait synthesis in simulation leading to extraction of kMPs which are then used for gait tracking in hardware with minimum computational overhead.
The paper is organized as follows: Section II introduces the Deep Reinforcement Learning (DRL) framework for walking. We will also briefly discuss the simulator and training framework used. Section III has a detailed description on kinematic motion primitives (kMPs) and the associated kMP analysis of DRL based gaits. We will also show that the kMPs thus obtained can also be used to generate other types of gaits like walk, gallop and bound. Finally Section IV describes the experimental results in our custom quadruped Stoch.
Ii Reinforcement Learning Framework for Quadrupedal Walking
In this section, we will study the deep reinforcement learning (DRL) framework used for realizing quadrupedal walking in simulation. We will also discuss the associated challenges with the learning methods, and the transferability issue from simulation to real hardware.
Iia Definitions
We formulate the problem of locomotion as a Markov Decision Process (MDP). An MDP is represented by a tuple
. Here is the set of robot states referred to as state space, and is the set of feasible actions referred to as the action space.is the transition probability function that models the evolution of states based on actions, and
is the reinforcement or the reward obtained for the given state and action. is called the discount factor defined in the range .The goal in RL is to achieve a policy that maximizes the expected cumulative reward over time. A policy, denoted as
, is the probability distribution of the actions over the states. When dealing with huge state and action spaces, a parametrized policy is used, denoted as
, with the parameters . The optimal parameters and/or policy yields the maximum sum of the cumulative rewards. We use the following to obtain the optimal policy :(1) 
where is the marginalization of the stateaction pair given the set of all possible trajectories following policy . The optimal parameters are evaluated iteratively by taking gradient ascent steps in the direction of increasing cumulative reward. More details on the algorithm, and the associated optimization technique are provided in Section IIC.
IiB State and Action Space
We will briefly describe the quadrupedal model of Stoch. There are four identical legs and eight actuators in the robot which are distributed equally among its joints. Each leg has a hip and a knee joint, and the actuators provide flexionextension in each joint. Fig. 1 shows both the experimental and the simulated robot model. Details on the experimental model are in Section IV.
IiB1 State Space
The state for the training is represented by angles, velocities and torques of the active joints. Body orientation (in quaternions) is also included in the state space. The combined representation yields a 28D state space. Note that the motor torques are typically treated as states in RL framework.
IiB2 Action Space
Reference [23] demonstrated the effectiveness of choosing actuator joint angles as action space. The planar four bar linkage configuration of the legs, however, lead to selfcolliding regions in the motor space. This creates a nonconvex search space and degrades the learning efficiency. Hence, we chose to learn the legs’ endpoint position in polar coordinates represented as where . The polar coordinates for the four legs collectively provide an dimensional action space. For each , we can obtain the joint angles (both hip and knee of leg ) via an inverse kinematics solver. We will denote the angles obtained from the inverse kinematics solver as . The polar coordinates can be restricted to a bounding box, thereby indirectly imposing angle limits on the joint angles.
IiC Network and Learning Algorithm
Since the states and actions are continuous, we seek for algorithms that enable us to construct optimal policies that yield action values in the continuous space. Proximal Policy Optimization (PPO) [12] have proved to be very successful for obtaining optimal continuous action values [24]
. It is an onpolicy, modelfree algorithm based on actorcritic learning framework. An actor (also called policy) decides an action given a state and a critic aims to improve the policy by evaluating the action value function. Both actor and critic networks are parameterized by neural networks with parameter
and respectively.The learning algorithm samples episodes of maximum length using the current policy. The experience tuples consisting of , with , are stored and used for onpolicy update of the parameters. Policy gradient method [25] gives the objective function to optimize the policy parameters given as:
(2) 
where
is called the estimated Advantage function. It is given by
, where is the value function at any state. Furthermore, in order to account for the disparity between the behavioral policy , and the current updating policy , the advantage estimation is multiplied with the importance sampling ratio given by(3) 
Lastly, in order to avoid huge and sudden policy variations during training, PPO penalizes on the high KLdivergence and solves an unconstrained optimization. Hence the final objective for the PPO algorithm is given by
(4) 
where is an adaptive penalty coefficient. Given a target KLdivergence , for next policy update is calculated as:

if

if
We used the open source implementation of PPO by Tensorflow Agents
[26] that creates the symbolic representation of the computation graph. The implementation is highly parallelized and performs fullbatch gradient ascent updates, using Adam [27] optimizer, on the batch of data collected from multiple environment instances.IiD Simulation Framework
We used Pybullet [28] simulator, built on top of Bullet3 physics engine, for a realistic model simulation. A threedimensional computeraideddesign (CAD) model is developed using SolidWorks [29] to capture the kinematics and inertial properties of the robot. This model is transferred to Pybullet by using a Universal Robot Description Format [30] (URDF) exporter. In addition, actual mass of all the links, actuator force limits, joint limits and kinematicloop constraints of the flexural joints are measured and manually updated in the URDF file for a more realistic simulation (refer Fig. 1).
IiE Reward Function
We designed a reward function that gives positive reinforcement with the increasing robot’s base speed and simultaneously penalizes high energy consumptions. The agent receives a scalar reward after each action step according to the reward function
(5) 
Here is the difference between the current and the previous base position along the axis. is the energy spent by the actuators for the current step, and and are weights corresponding to each term. is computed as
(6) 
where are the motor torques, and are the motor velocities of the leg respectively.
IiF Gait Generation and Simulation Results
The actor and critic network in the learning algorithm consists of two fully connected layers with the first and second layers consisting of 200 and 100 nodes respectively. Activation units in the actor and critic networks are ReLU
ReLU , and ReLU ReLU linear respectively. Other hyperparameters are mentioned in Table. I. The pybullet simulator is configured to spawn multiple agents for faster collection of samples—in our case 30 agents were used in parallel threads.Quadrupedal animals typically have eight types of gaits [31, 32]. Work done in [11] used reference trajectories and [10] used openloop signal to learn a specific gait. However in this work, we learned everything from scratch. To learn a specific gait type, we leveraged gait symmetries and put hard constraints on the foot position. For e.g. in trot, diagonally opposite legs are in sync. Hence, we provide same to these legs. This also results in reduction in the dimensions of action space from to .
The proposed approach yields walking gaits that are efficient, fast and also robust in simulation. The learning algorithm runs for a maximum of million steps and the observed training time is hours on a Intel Core i7 @3.5Ghz cores and 32 GB RAM machine.
Entity  Value 

Discount factor  
Learning rate (actor, critic)  
Training steps per epoch 

KL target ()  
Endpoint position limit ()  
Endpoint position limit ()  cm, cm 
IiG Challenges
There are two main challenges in realizing DRL based gaits in quadrupeds:
IiG1 Training for model and environment updates
The training framework has a large set of hyperparameters (a small subset of them are shown in Table I) that require tuning in order to get desirable gaits. These gaits are, indeed, robust and efficient, but any update in the model or change in constraints in the gait requires retuning of these hyperparameters in order to obtain desirable gaits. Retuning is not intuitive and requires a series of training iterations. This is a common problem even in nonlinear model based walking optimizations [33], where small variations in constraints often lead to infeasible solutions.
IiG2 Transferability on real hardware
Transferability of these learned gaits in the real hardware faces numerous challenges. Some of them are mainly due to uncertainties due to unaccounted compliances, imperfect sensors, and actuator saturations. In addition, inertial and electrical properties of actuators vary from one joint to another. It can be observed that imposing very conservative constraints like tighter velocity and angle limits resulted in longer training times, which often converged to very unrealistic gaits.
In order to put these challenges in context, it is necessary to review the methodology followed in [10]. They proposed to develop accurate robot model, identify the actuator parameters and study the latencies of the system. This procedure, indeed, resulted in more realistic gaits, but required multiple instances of training and is highly sensitive for any update in hardware. In addition, DNNs are computationally intensive (NVIDIA Jetson TX2 processor was used). Therefore, as a deviation from this approach, we primarily want to focus on a methodology that is more robust to hardware changes, and also computationally less intensive. As we add more and more tasks and objectives for a diverse set of environments, it is important to focus on using a compressed set of walking parameters with minimal training. We will discuss this in detail in the next section.
Iii Realizing Behaviors through kMPs
The kinematic motion primitives (kMPs) are invariant waveforms, sufficient to explain a wide variety of complex coordinated motions, including discrete (e.g., reaching for a target with one hand) and periodic behaviors like walking and running [15]. This concept of motion primitives arose from biological observations [15, 34, 13]. In essence, it was observed that biological motions can be described to a good extent with the combination of a small number of lower dimensional behaviors [14, 35].
The process of determining kMPs are primarily conducted by Principal Component Analysis (PCA) [36, 34, 13, 37, 38]
. PCA is useful when there is extensive data from walking demonstrations, which can be processed to extract the kMPs. These kMPs can then be adapted/transformed to realize walking in robots. In essence, principal components yield vectors in the joint space with the highest variance for the given data. A linear transformation of the joint space to these principal components results in dimensionality reduction. A more detailed exploration of this analysis is done in
[36] and [39].We will first describe the methods used for the data analysis, and then the associated principal components. kMPs for the different gaits, trot, pace and bound, will be extracted and compared. Towards the end of this section, we will also provide a statistical comparison between the different kMPs.
Iiia Extraction of kMPs
Having obtained walking in simulation based on a trained policy, we record the joint angles and body orientation of the robot for steps. The data contains a set of joint trajectories for each trial. The recorded data is then processed as follows: First, we remove front and rear
% of the data set to avoid transient cases. Second, we divide the data into segments by peak detection. All these data segments are then shifted and scaled w.r.t. time in order to align the end points. By polynomial interpolation, we reconstruct trajectories in between the data points. With this reconstruction, we can directly obtain the mean data (average of the data at every time instant).
Given the mean data, let be the total number of points for each joint. Denote this data by , where is the number of actuated joints ( for Stoch). We get a matrix of joint angle values as Let be the zero mean data obtained from . We compute the covariance as
(7) 
The principal components are precisely the eigenvectors,
, of . Hence, for joint angles, we have principal components. Given the principal components, we have the kMP defined as(8) 
where (infinity norm) is used to normalize the magnitude. We typically use fewer components ( for Stoch) for reconstructing the joint angle trajectories from the kMPs, since they account for over 99% of the cumulative percentage variance (see table in Fig. 3).
Having obtained the kMPs, we reconstruct the original joint trajectories as follows. Denote the vector of kMPs as , where is typically less than (for Stoch ). We have the final reconstruction of the joint angles as
(9) 
where is the matrix of desired joint angle values, is coefficient matrix, also called as kMP synergy map, that maps the kMPs to the joint space, and is the zero mean offset matrix (derived from ), which is added back to the joint angles.
IiiB Discussion & Comparison
Fig. 3 shows the result of reconstruction from the kMPs. A table showing the cumulative percent variance of the original trajectory with reconstruction from each kMP is also provided. It can be verified that only four kMPs were sufficient for the final reconstruction^{1}^{1}1It is important to note that this observation is true for any continuous trajectory that does not have “highfrequency” oscillations or even large slopes. This property was highly utilized to reconstruct walking trajectories by using ellipses in [17] and simple basis functions in [8].. Fig. 4 shows the comparison between the end foot trajectories obtained from the simulation, and from the reconstruction. It must be noted that kMPs yield trajectories that are as good as the time averaged trajectories of the original data (which is eight dimensional). In addition, with kMPs, there is existing literature for obtaining derived behaviors like walk, trot and gallop [15, 37], which will be explained more in the next section.
It was shown in [13] that biological quadrupeds (like horses) inherently contain some basic patterns that were consistent across different gaits and models. Based on this result, we compared the trot data obtained from our simulation with the trot data obtained from the horse, and the results are in Fig. 5. We observed that there was a high correlation not only between the kMPs obtained from the learned and horse gait (Fig. 5A), but also between the kMPs obtained from different learned gaits (Fig. 5B). Table III provides more statistical details, where it is shown that crosscovariance w.r.t. the first two kMPs is more than . These observations point to the fact that even the kMPs derived from the RL based simulation followed very similar patterns, irrespective of the type of gaits and the model used.
Iv Experimental Results
In this section we discuss the experimental results conducted on the quadrupedal robot Stoch. We first describe the hardware, and then describe the methodologies used to realize walking along with the results.
Iva Hardware Description of Stoch
Stoch is a quadrupedal robot (see Fig. 1) designed and developed inhouse at IISc, Bangalore. Model and actuator specifications are summarized in Table II. The legs are assembled with lightweight carbon fiber (CF) tubes for the linkages, and D printed polylactic acid (PLA) parts for the joints. The servomotors (KondoHV, Robokits Ultra Torque: RKI) are mounted in a PLA housing. For measuring the body orientation, we used a axis IMU (Sparkfun MPU ). For measuring the joint angles, we used magnetic rotary (MR) encoders (Bourns EMSa). PWM signals to the servo motors are sent via Adafruit Channel bit: PCA. Main computation platform used is Raspberry Pi B. The electric power to the robot is supplied from LiPo batteries (V(3s)mAh for hip motors, and V(2s)mAh for knee motors).
total leg length  230 mm  min./max. hip joint  45/ 45 

leg segment length  120 mm  min./max. knee joint  70/ 70 
total mass  3.0 kg  max hip speed  461/s 
max hip torque  29 kgcm  max knee speed  461/s 
max knee torque  32 kgcm  motor power(servo)  16 W 
Crosscovariance  Delay  
1st  2nd  3rd  4th  1st  2nd  3rd  4th  
RTET  0.97  0.74  0.93  0.79  0  0.02  0.02  0.04  
A.  RTHT  0.99  0.89  0.86  0.82  0  0.02  0.05  0.04 
ETHT  0.99  0.77  0.88  0.87  0  0.15  0.08  0.03  
RTRP  0.87  0.94  0.91  0.7  0.03  0.07  0.08  0.16  
B.  RTRB  0.81  0.84  0.72  0.88  0  0.02  0.05  0.04 
RPRB  0.96  0.92  0.85  0.72  0.01  0.0  0.08  0.18 
IvB Synergy Matrices
The general procedure to obtain the walking gaits for Stoch is as follows. We first reconstruct the trot gait from the kMPs by using the synergy matrix . If is the trot data obtained from the simulation, then we obtain as
(10) 
where is the pseudoinverse of . This is computed offline. Synergy matrices for other types of gait are obtained by modifying . For example, to obtain the bound gait, we modify (by time shifting) in such a way that the front legs are in phase (resp. hind legs). We have also reshaped the end point trajectories in small amounts, and obtained the corresponding . Since we are manually modifying the data, a better approach would have been to use optimization, which will be explored in future.
IvC Results
Having obtained the kMPs, and the synergy matrices for trot, walk, gallop and bound gaits, we reconstruct the joint angle trajectories by using (9) in hardware. These joint angle trajectories are then used as command positions to each of the actuators. Video results are shown in https://youtu.be/swcStUm0Nuk
Due to space constraints, we only show the experimental data for the trot and its derived gaits. We have also provided kMPs extracted directly from the experimental trajectory, and the results are in Fig. 5A and Table III. It can be verified that experimental kMPs also have a high correlation with the horse and RL kMPs. We have also provided Table IV to compare the speeds of the various gaits obtained.
Gait type  Trot  Walk  Gallop  Bound  Modified Trot 1  Modified Trot 2 

Speed  0.60  0.51  0.51  0.55  0.62  0.59 
In order to put these observations in perspective, it is important to discuss the kMPs derived from horse data more. We cannot realize walking by using only horse kMPs. Based on (9), we will need to determine the right synergy matrix S that suits the given set of kMPs. It is also important to note that the kMPs derived from horses may not be the optimal choice for our quadruped (there are infinitely many kMPs). On the other hand, due to the fact that we already have (locally) optimal trajectories obtained from training, we focus on computing the synergy matrix from simulation kMPs. With these kMPs we were able to demonstrate not only trotting, but also other derived behaviors like walking, galloping and bounding.
V Conclusion
We demonstrated walk, trot, gallop and bound gaits in the quadruped robot Stoch experimentally. The key contribution was in the methodology followed in transferring of the walking gaits, obtained via DRL based walking, from simulation to real hardware. Our approaches were mainly motivated by [13] and [15], which pointed to the fact that motion primitives have a very similar underlying structure that are independent of the type of gait and the model. Small variations in gaits and model do not require retraining via DRL methods from scratch, and kMPs are sufficient for such scenarios. On the other hand, large variations in models and environments require a thorough search, thereby requiring learning based approaches. Since we mainly focused on data driven kMPs in this paper, future work will involve obtaining these motion primitives from continuous functions like polynomials, canonical walking functions or other basis functions that are equally suitable for generating walking gaits.
Acknowledgment
We acknowledge Ashish Joglekar, Balachandra Hegde, Ajay G and Abhimanyu for the PCB design and software development.
References
 [1] “GE Walking Truck:,” https://youtu.be/ZMGCFLEYakM.
 [2] M. H. Raibert, “Legged robots,” Communications of the ACM, vol. 29, no. 6, pp. 499–514, 1986.
 [3] M. Vukobratović and B. Borovac, “Zeromoment point—thirty five years of its life,” International journal of humanoid robotics, vol. 1, no. 01, pp. 157–173, 2004.
 [4] T. McGeer, “Passive dynamic walking,” The International Journal of Robotics Research, vol. 9, no. 2, pp. 62–82, 1990. [Online]. Available: https://doi.org/10.1177/027836499000900206
 [5] S. H. Collins, M. Wisse, and A. Ruina, “A threedimensional passivedynamic walking robot with two legs and knees,” The International Journal of Robotics Research, vol. 20, no. 7, pp. 607–615, 2001.
 [6] J. Pratt, J. Carff, S. Drakunov, and A. Goswami, “Capture point: A step toward humanoid push recovery,” in 2006 6th IEEERAS International Conference on Humanoid Robots, Dec 2006, pp. 200–207.
 [7] E. R. Westervelt, J. W. Grizzle, and D. E. Koditschek, “Hybrid zero dynamics of planar biped walkers,” IEEE Transactions on Automatic Control, vol. 48, no. 1, pp. 42–56, Jan 2003.
 [8] R. Tedrake, T. W. Zhang, and H. S. Seung, “Stochastic policy gradient reinforcement learning on a simple 3d biped,” in Intelligent Robots and Systems, 2004.(IROS 2004). Proceedings. 2004 IEEE/RSJ International Conference on, vol. 3. IEEE, pp. 2849–2854.
 [9] S. Wang, J. Braaksma, R. Babuska, and D. Hobbelen, “Reinforcement learning control for biped robot walking on uneven surfaces,” in The 2006 IEEE International Joint Conference on Neural Network Proceedings, July 2006, pp. 4173–4178.
 [10] J. Tan, T. Zhang, E. Coumans, A. Iscen, Y. Bai, D. Hafner, S. Bohez, and V. Vanhoucke, “Simtoreal: Learning agile locomotion for quadruped robots,” CoRR, vol. abs/1804.10332, 2018. [Online]. Available: http://arxiv.org/abs/1804.10332
 [11] Z. Xie, G. Berseth, P. Clary, J. Hurst, and M. van de Panne, “Feedback control for cassie with deep reinforcement learning,” CoRR, vol. abs/1803.05580, 2018. [Online]. Available: http://arxiv.org/abs/1803.05580
 [12] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” CoRR, vol. abs/1707.06347, 2017. [Online]. Available: http://arxiv.org/abs/1707.06347
 [13] F. L. Moro, A. Spröwitz, A. Tuleu, M. Vespignani, N. G. Tsagarakis, A. J. Ijspeert, and D. G. Caldwell, “Horselike walking, trotting, and galloping derived from kinematic motion primitives (kmps) and their application to walk/trot transitions in a compliant quadruped robot,” Biological cybernetics, vol. 107, no. 3, pp. 309–320, 2013.
 [14] T. D. Sanger, “Human arm movements described by a lowdimensional superposition of principal components,” Journal of Neuroscience, vol. 20, no. 3, pp. 1066–1072, 2000.
 [15] F. L. Moro, N. G. Tsagarakis, and D. G. Caldwell, “On the kinematic motion primitives (kmps)theory and application,” Frontiers in neurorobotics, vol. 6, p. 10, 2012.
 [16] A. D. Ames, “Humaninspired control of bipedal walking robots,” IEEE Transactions on Automatic Control, vol. 59, no. 5, pp. 1115–1130, May 2014.
 [17] N. Kohl and P. Stone, “Policy gradient reinforcement learning for fast quadrupedal locomotion,” in Robotics and Automation, 2004. Proceedings. ICRA’04. 2004 IEEE International Conference on, vol. 3. IEEE, pp. 2619–2624.
 [18] J. Reher, E. A. Cousineau, A. Hereid, C. M. Hubicki, and A. D. Ames, “Realizing dynamic and efficient bipedal locomotion on the humanoid robot durus,” in 2016 IEEE International Conference on Robotics and Automation (ICRA), May 2016, pp. 1794–1801.
 [19] S. Kuindersma, R. Deits, M. Fallon, A. Valenzuela, H. Dai, F. Permenter, T. Koolen, P. Marion, and R. Tedrake, “Optimizationbased locomotion planning, estimation, and control design for the atlas humanoid robot,” Autonomous Robots, vol. 40, no. 3, pp. 429–455, Mar 2016. [Online]. Available: https://doi.org/10.1007/s1051401594793
 [20] A. Hereid and A. D. Ames, “Frost: Fast robot optimization and simulation toolkit,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Vancouver, BC, Canada: IEEE/RSJ, Sep. 2017.
 [21] K. Sreenath, H.W. Park, I. Poulakakis, and J. W. Grizzle, “A compliant hybrid zero dynamics controller for stable, efficient and fast bipedal walking on mabel,” The International Journal of Robotics Research, vol. 30, no. 9, pp. 1170–1193, 2011. [Online]. Available: https://doi.org/10.1177/0278364910379882
 [22] S. N. Yadukumar, M. Pasupuleti, and A. D. Ames, “From formal methods to algorithmic implementation of human inspired control on bipedal robots,” in Algorithmic Foundations of Robotics X, E. Frazzoli, T. LozanoPerez, N. Roy, and D. Rus, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 511–526.
 [23] X. B. Peng and M. van de Panne, “Learning locomotion skills using deeprl: Does the choice of action space matter?” CoRR, vol. abs/1611.01055, 2016. [Online]. Available: http://arxiv.org/abs/1611.01055

[24]
P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger, “Deep
reinforcement learning that matters,” in
Proceedings of the ThirtySecond AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, February 27, 2018
, 2018. [Online]. Available: https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16669  [25] R. S. Sutton, D. Mcallester, S. Singh, and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation,” in In Advances in Neural Information Processing Systems 12. MIT Press, 2000, pp. 1057–1063.
 [26] D. Hafner, J. Davidson, and V. Vanhoucke, “Tensorflow agents: Efficient batched reinforcement learning in tensorflow,” arXiv preprint arXiv:1709.02878, 2017.
 [27] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2014. [Online]. Available: http://arxiv.org/abs/1412.6980
 [28] “Pybullet:,” https://pybullet.org/wordpress/.
 [29] “Solidworks:,” https://www.solidworks.com.
 [30] “URDF:,” https://wiki.ros.org/urdf.
 [31] D. Owaki and A. Ishiguro, “A quadruped robot exhibiting spontaneous gait transitions from walking to trotting to galloping,” Scientific reports, vol. 7, no. 1, p. 277, 2017.
 [32] Y. Fukuoka, Y. Habu, and T. Fukui, “A simple rule for quadrupedal gait generation determined by leg loading feedback: a modeling study,” Scientific Reports, vol. 5, p. 8169, Feb 2015, article. [Online]. Available: http://dx.doi.org/10.1038/srep08169
 [33] J. P. Reher, A. Hereid, S. Kolathaya, C. M. Hubicki, and A. D. Ames, “Algorithmic foundations of realizing multicontact locomotion on the humanoid robot DURUS,” in The International Workshop on the Algorithmic Foundations of Robotics (WAFR), 2016. [Online]. Available: http://ames.caltech.edu/WAFR_2016_multicontact.pdf
 [34] F. L. Moro, N. G. Tsagarakis, and D. G. Caldwell, “A humanlike walking for the compliant humanoid coman based on com trajectory reconstruction from kinematic motion primitives,” in Humanoid Robots (Humanoids), 2011 11th IEEERAS International Conference on. IEEE, 2011, pp. 364–370.
 [35] G. J. Stephens, B. JohnsonKerner, W. Bialek, and W. S. Ryu, “Dimensionality and dynamics in the behavior of c. elegans,” PLoS computational biology, vol. 4, no. 4, p. e1000028, 2008.
 [36] B. Lim, S. Ra, and F. C. Park, “Movement primitives, principal component analysis, and the efficient generation of natural motions,” in Robotics and Automation, 2005. ICRA 2005. Proceedings of the 2005 IEEE International Conference on. IEEE, 2005, pp. 4630–4635.
 [37] A. T. Sprowitz, A. Tuleu, A. J. Ijspeert et al., “Kinematic primitives for walking and trotting gaits of a quadruped robot with compliant legs,” Frontiers in computational neuroscience, vol. 8, p. 27, 2014.
 [38] I. Jolliffe, “Principal component analysis,” in International encyclopedia of statistical science. Springer, 2011, pp. 1094–1096.
 [39] S. Kim and F. C. Park, “Fast robot motion generation using principal components: Framework and algorithms,” IEEE Trans. Industrial Electronics, vol. 55, no. 6, pp. 2506–2516, 2008.
Comments
There are no comments yet.