More than three million older adults every year in the United States are treated for fall injuries. In 2015, the medical costs for falls amounted to more than $50 billion. Compounding to the direct injuries, fall-related accidents have long-lasting impact because falling once doubles one’s chances of falling again. Even with successful recovery, many older adults develop fear of falling, which may make them reduce their everyday activities. When a person is less active, their health condition plummets which increases their chances of falling again.
Robotic assistive walking devices or exoskeletons are designed to improve the user’s ability to ambulate . Previous work has shown that these devices can increase the gait stability and efficiency when worn by older adults or people with disabilities [7, 34, 28, 6]. In this work, we explore the possibility to augment the existing assistive walking devices with the capability to prevent falls while respecting the functional and ergonomical constraints of the device.
Designing a control policy to prevent falls on an existing wearable robotic device has multiple challenges. First, the control policy needs to run in real-time with limited sensing and actuation capabilities dictated by the walking device. Second, a large dataset of human falling motions is difficult to acquire and unavailable to public to date, which imposes fundamental obstacles to learning-based approaches. Lastly and perhaps most importantly, the development and evaluation of the fall-prevention policy depends on intimate interaction with human users. The challenge of modeling realistic human behaviors in simulation is daunting, but the risk of testing on real humans is even greater.
We tackle these issues by taking the approach of model-free reinforcement learning (RL) in simulation to train a fall-prevention policy that operates on the walking device in real-time, as well as to model the human locomotion under disturbances. The model-free RL is particularly appealing for learning a fall-prevention policy because the problem involves non-differentiable dynamics and lacks existing examples to imitate. In addition, demonstrated by recent work in learning policies for human motor skills[21, 36], the model-free RL provides a simple and automatic approach to solving under-actuated control problems with contacts, as is the case of human locomotion. To ensure the validity of these models, we compare the key characteristics of human gait under disturbances to those reported in the biomechanics literature [32, 31].
Specifically, we propose a framework to automate the process of developing a fall predictor and a recovery policy on an assistive walking device, by only utilizing the onboard sensors and actuators. When the fall predictor predicts that a fall is imminent based on the current state of the user, the recovery policy will be activated to prevent the fall and deactivated when the stable gait cycle is recovered. The core component of this work is a robust human walking policy a moderate level of perturbations. We use this human walking policy to provide training data for the fall predictor, as well as to teach the recovery policy how to best modify the person’s gait to prevent falling.
Our evaluation shows that the human policy generates walking sequences similar to the real-world human walking data both with and without perturbation. We also show quantitative evaluation on the stability of the recovery policy against various perturbation. In addition, our method provides a quantitative way to evaluate the design choices of assistive walking device. We analyze and compare the performance of six different configurations of sensors and actuators, enabling the engineers to make informed design decisions which account for the control capability prior to manufacturing process.
Ii Related Work
Ii-a Control of assistive devices
Many researchers have developed control algorithms for robotic assistive walking devices. As reported by Yan et al. 
, existing methods can be broadly classified into trajectory tracking controllers[30, 11, 2] and model-based controllers [5, 13, 29]. Although trajectory tracking approaches can be easily applied to regular walking cycles, it is unclear how to generate trajectories for unexpected situations due to perturbations. On the other hand, model-based controllers can capture adaptive behaviors by developing control laws with respect to external perturbations but they often require an accurate dynamic model. Based on neuromuscular model of human lower limbs, Thatte et al [28, 27] presented an optimization-based framework which converges better than other control strategies, such as quasi-stiffness control , minimum jerk swing control , virtual constraint control  and impedance control . However, all of these methods focused on controlling regular walking cycles rather than adjusting to external perturbations. More recently, Thatte et al. 
presented a real-time trip avoidance algorithm which estimates future knee and hip positions using a Gaussian filter and updates a trajectory using a fast quadratic program solver.
Ii-B Deep RL for assistive devices
Many researchers have demonstrated learning effective policies for high-dimensional control problems using deep reinforcement learning (deep RL) techniques [23, 24]. However, there has been limited work at the intersection of deep RL and control of assistive devices. Hamaya et al.  presented model-based reinforcement learning algorithm to train a policy for a handheld device that takes muscular effort measured by electromyography signals (EMGs) as inputs. The work requires collecting user interaction data on the real-world device to build a model of the user’s EMG patterns. However, collecting a large amount of data for lower-limb assistive devices is less practical due to the safety concerns. Another recent work, Bingjing et al. , developed an adaptive-admittance model-based control algorithm for a hip-knee assistive device, the reinforcement learning aspect of this work focused on learning parameters of the admittance model. Our method is agnostic to the assistive walking devices and can be used to augment any device that allows for feedback control.
Ii-C Simulating human behaviors
Our recovery policy assumes that the person is able to walk under a moderate level of perturbation, similar to the real healthy people or people assisted by a walking device. The early work of Shiratori et al.  developed a trip recovery controller during walking from motion capture data. However, it requires careful analysis of human responses to design state machines. Recently, deep RL techniques have been proven to be effective for developing control policies that can reproduce natural human behaviors in simulated environments. The proposed techniques vary from formulating a reward function from motion capture data , to training a low-energy policy with curriculum learning , to developing a full-body controller for a muscular-skeleton model , to modeling biologically realistic torque limits . Particularly, our work is inspired by the DeepMimic technique proposed by Peng et al.  which produces a visually pleasing walking gait by mimicking the reference motion and shows stable control near the target trajectory.
Ii-D Human responses to perturbations
Our framework relies on accurate modelling of human motions, particularly in response to external perturbations. Researchers in biomechanics studied postural responses of human bodies to falling and slipping incidents and identified underlying principles [18, 19]. Connor et al.  and Hof et al.  also studied balancing strategies of humans for rejecting medial-lateral perturbations during walking. Particularly, we validate the learned human policy by comparing its footstep placement against the data collected by Wang et al. , which reports a strong correlation between the pelvic states and the next footstep locations.
We propose a framework to automate the process of augmenting an assistive walking device with the capability of fall prevention. Our method is built on three components: a human walking policy, a fall predictor, and a recovery policy. We formulate the problem of learning human walking and recovery policies as Markov Decision Processes (MDPs),, where is the state space, is the action space, is the transition function, is the reward function, is the initial state distribution and is a discount factor. We take the approach of model-free reinforcement learning to find a policy , such that it maximizes the accumulated reward:
where , and .
We denote the human walking policy as and the recovery policy as , where , , , and , represent the corresponding states and actions, respectively. Our method can be applied to assitive walking devices with any sensors or actuators, though we assume that the observable state of the walking device is a subset of the full human state due to sensor limitations. Since our method is intended to augment an assistive walking device, we also assume that the user who wears the device is capable of walking. Under such an assumption, our method only needs to model normal human gait instead of various pathological gaits.
We model a 29-Degree of Freedom(DoF) humanoid and the 2-DoF exoskeleton in PyDart.Right : Assistive device design used in our experiments.
Iii-a Human Walking Policy
We take the model-free reinforcement learning approach to developing a human locomotion policy . To achieve natural walking behaviors, we train a policy that imitates the human walking reference motion similar to Peng et al. . The human 3D model (agent) consists of actuated joints with a floating base as shown in Figure 1. This gives rise to a dimensional state space , including joint positions, joint velocities, linear and angular velocities of the center of mass (COM), and a phase variable that indicates the target frame in the motion clip. We model the intrinsic sensing delay of a human musculoskeletal system by adding a latency of
milliseconds to the state vector before it is fed into the policy. The action determines the target joint anglesof the proportional-derivative (PD) controllers, deviating from the joint angles in the reference motion:
where is the corresponding joint position in the reference motion at the given phase . Our reward function is designed to imitate the reference motion:
where , , and are the desired joint positions, COM positions, and end-effector positions from the reference motion data, respectively. The reward function also penalizes the magnitude of torque . We use the same weight , , , and for all experiments. We also use early termination of the rollouts, if the agent’s pelvis drops below a certain height, we end the rollout and re-initialize the state.
Although the above formulation can produce control policies that reject small disturbances near the target trajectory, they often fail to recover from perturbations with larger magnitude, such as those encountered during locomotion. It is critical to ensure that our human walking policy can withstand the same level of perturbation as a capable real person, so that we can establish a fair baseline to measure the increased stability due to our recovery policy.
Therefore, we exert random forces to the agent during policy training. Each random force has a magnitude uniformly sampled from and a direction uniformly sampled from [-,], applied for milliseconds on the agent’s pelvis in parallel to the ground. The maximum force magnitude induces a velocity change of roughly
m/sec. We also randomize the time when the force is applied within a gait cycle. Training in such a stochastic environment is crucial for reproducing the human ability to recover from a larger disturbance during locomotion. We represent a human policy as a multi-layered perceptron (MLP) neural network with two hidden layers ofneurons each. The formulated MDP is trained with Proximal Policy Optimization (PPO) [PPO].
Iii-B Fall Predictor
Being able to predict a fall before it happens gives the recovery policy critical time to alter the outcome in the near future. We take a data-drive approach to train a classifier capable of predicting the probability of the fall in the nextmilliseconds. Collecting the training data from the real world is challenging because induced human falls can be unrealistic and dangerous/tedious to instrument. As such, we propose to train such a classifier using only simulated human motion. Our key idea is to automatically label the future outcome of a state by leveraging the trained human policy . We randomly sample a set of states and add random perturbations to them. By following the policy from each of the sampled states, we simulate a rollout to determine whether the state leads to successful recovery or falling. We then label the corresponding state observed by the walking device, if succeeds, or if fails. We collect about training samples. Note that the input of the training data corresponds to the state of the walking device, not the full state of human, as the classifier will only have access to the information available to the onboard sensors.
Iii-C Recovery Policy
The recovery policy aims to utilize the onboard actuators of the assistive walking device to stabilize the gait such that the agent can continue to walk uninterruptedly. The recovery policy is trained to provide optimal assistance to the human walking policy when a fall is detected. The state of is defined as , which comprises of global angular acceleration, angular velocity, and hip joint angle position and velocity, amounting to dimensional state space. The action space consists of torques at two hip joints. The reward function maximizes the quality of the gait while minimizing the impact of disturbance:
where evaluates walking performance using Equation 2 except for the last term, and and are the global linear and angular velocities. Note that the input to the reward function includes the full human state . While the input to the recovery policy should be restricted by the onboard sensing capability of the assistive walking device, the input to the reward function can take advantage of the full state of the simulated world, since the reward function is only needed at training time. The policy is represented as a MLP neural network with two hidden layers of 64 neurons each and trained with PPO.
Iv Experiments and Results
We validate the proposed framework using the open-source physics engine DART . Our human agent is modeled as an articulated rigid body system with degrees of freedom (dofs) including the six dofs for the floating base. The body segments and the mass distribution are determined based on a th percentile adult male in North America. We select the prototype of our assistive walking device as the testbed.Similar prototypes are described in [3, 4, 33]. It has two cable-driven actuators at hip joints, which can exert about Nm at maximum. However, we limit the torque capacity to Nm as a hard constraint. Sensors, such as Inertial Measurement Units (IMU) and hip joint motor encoders, are added to the device. We also introduce a sensing delay of to ms. We modeled the interaction between the device and human by adding positional constraints on the thigh and anchor points. For all experiments, the simulation time step is set to s.
We design experiments to systematically validate the learned human behaviors and effectiveness of the recovery policy. Particularly, our goal is to answer the following questions:
How does the motion generated by the learned human policy compare to data in the biomechanics literature?
Does the recovery policy improve the robustness of the gaits to external pushes?
How does the effectiveness of the assistive walking device change with design choices?
Iv-a Comparison of Policy and Human Recovery Behaviors
We first validate the steady walking behavior of the human policy by comparing it to the data collected from human-subject experiments. Figure 2 shows that the hip and knee joint angles generated by the walking policy well match the data reported in Winter et al. . We also compare the “torque loop” between the gait generated by our learned policy and the gait recorded from the real world . A torque loop is a plot that shows the relation between the joint degree of freedom and the torque it exerts, frequently used in the biomechanics literature as a metric to quantify human gait. Although the torque loops in Figure 2(a) are not identical, both trajectories form loops during a single gait cycle indicating energy being added and removed during the cycle. We also notice that the torque range and the joint angle range are similar.
In addition, we compare adjusted footstep locations due to external perturbations with the studies reported by Wang et al. . Their findings strongly indicate that the COM dynamics is crucial in predicting the step placement after disturbance that leads to a balanced state. They introduced a model to predict the changes in location of the foot placement of a normal gait as a function of the COM velocity. Figure 2(b) illustrates the foot placements of our model and the model of Wang et al. against four pushes with different magnitudes in the sagittal plane. For all scenarios, the displacement error is below cm.
Iv-B Effectiveness of Recovery Policy
We test the performance of the learned recovery policy in the simulated environment with external pushes. As a performance criterion, we define the stability region as a range of external pushes from which the policy can return to the steady gait without falling. For better 2D visualization, we fix the pushes to be parallel to the plane, applied on the same location with the same timing and duration (40 milliseconds). All the experiments in this section use the default sensors and actuators provided by the prototype of the walking device: an IMU, hip joint motor encoders, and hip actuators that control the flexion and extension of the hip.
Figure 5 compares the stability region with and without the learned recovery policy. The area of stability region is expanded by 35% when the recovery policy is used. Note that the stability region has very small coverage on the negative side of y-axis which corresponds to the rightward forces. This is because we push the agent when the swing leg is the left one, making it difficult to counteract the rightward pushes. Figure 7 shows one example of recovery motion.
The timing of the push in a gait cycle has a great impact on fall prevention. We test our recovery policy with perturbation applied at four different phases during the swing phase (Figure 4). We found that the stability region is the largest when the push is applied at 30% of the swing phase and the smallest at 90% (Figure 6, Top). This indicates that the perturbation occurs right before heel strike is more difficult to recover than the one occurs in early swing phase possibly due to the lack of time to adjust the foot location. The difference in the stability region is approximately %. The bottom of Figure 6 shows the impact of the perturbation timing on COM velocity over four gait cycles. The results echo the previous finding as it shows that the agent fails to return to the steady state when the perturbation occurs later in the swing phase.
We also compare the generated torques with and without the recovery policy when perturbation is applied. Figure 8 shows the torques at the hip joint over the entire gait cycle (not just swing phase). We collect trajectory for each scenario by applying random forces ranging from N to N at the fixed timing of % of the gait cycle. The results show that hip torques exerted by the human together with the recovery policy do not change the overall torque profile significantly, suggesting that the recovery policy makes minor modification to the torque profile across the remaining gait cycle, instead of generating a large impulse to stop the fall. We also show that the torque exerted by the recovery policy never exceeds the actuation limits of the device.
Iv-C Evaluation of Different Design Choices
Our method can be used to inform the selection of sensors and actuators when designing a walking device with the capability of fall prevention. We test two versions of actuators: the D hip device can actuate the hip joints only in the sagittal plane while the D device also allows actuation in the frontal plane. We also consider three different configurations of sensors: an inertial measurement unit (IMU) that provides the COM velocity and acceleration, an motor encoder that gives us hip joint angles, and the combination of IMU and motor encoder. In total, we train six different recovery policies with three sensory inputs and two different actuation capabilities. For each sensor configuration, we train a fall predictor using only sensors available to that configuration.
Figure 9 shows the stability region for each of the six design configurations. The results indicate that 3D actuation expands the stability region in all directions significantly comparing to 2D actuation, even when the external force lies on the sagittal plane. We also found that the IMU sensor plays a more important role than the motor encoder, which suggests that COM information is more critical than the hip joint angle in informing the action for recovery. The recovery policy performs the best when combining the IMU and the joint encoder, as expected.
V Conclusion and Discussion
We presented an approach to automate the process of augmenting an assistive walking device with ability to prevent falls. Our method has three key components : A human walking policy, fall predictor and a recovery policy. In a simulated environment we showed that an assistive device can indeed help recover balance from a wider range of external perturbations. We introduced stability region as a quantitative metric to show the benefit of using a recovery policy. In addition to this, stability region can also be used to analyze different design choices for an assistive device. We evaluated six different sensor and actuator configurations.
In this work, we only evaluated the effectiveness of using a recovery policy for an external push. It would be interesting to extend our work to other kinds of disturbances such as tripping and slipping. Another future direction would like to take is deploying our recovery policy on the real-world assistive device. This would need additional efforts to make sure that our recovery policy also can adjust for the differences in body structure of users.
We thank Pawel Golyski and Seung-yong Hyung for their assistance with this work. This work was supported by the Global Research Outreach program of Samsung Advanced Institute of Technology.
-  (2019) Human–robot interactive control based on reinforcement learning for gait rehabilitation training robot. International Journal of Advanced Robotic Systems 16 (2), pp. 1729881419839584. External Links: Cited by: §II-B.
-  (2004-03) Adaptive control of a variable-impedance ankle-foot orthosis to assist drop-foot gait. IEEE Transactions on Neural Systems and Rehabilitation Engineering 12 (1), pp. 24–31. External Links: Cited by: §II-A.
-  (2014-02) A Universal Ankle–Foot Prosthesis Emulator for Human Locomotion Experiments. Journal of Biomechanical Engineering 136 (3). Note: 035002 External Links: Cited by: §IV.
-  (2019) Design of lower-limb exoskeletons and emulator systems.. In Wearable Robotics, Ferguson (Ed.), pp. . Cited by: §IV.
-  (2010-02) Path control: a method for patient-cooperative robot-aided gait rehabilitation. IEEE Transactions on Neural Systems and Rehabilitation Engineering 18 (1), pp. 38–48. External Links: Cited by: §II-A.
-  (2017) Exoskeleton plantarflexion assistance for elderly. Gait and PostureScientific ReportsEncyclopedia of Forensic Sciences: Second EditionGait and postureIEEE Transactions on Biomedical EngineeringJournal of Experimental Biology 52, pp. 183 – 188. External Links: Cited by: §I.
-  (2019) Real-time reactive trip avoidance for powered transfemoral prostheses. Robotics Systems and Sciences (RSS). Cited by: §I, §II-A.
-  (2014-12) Virtual constraint control of a powered prosthetic leg: from simulation to experiments with transfemoral amputees. IEEE Transactions on Robotics 30 (6), pp. 1455–1471. External Links: Cited by: §II-A.
-  (2017) Learning assistive strategies for exoskeleton robots from user-robot physical interaction. Pattern Recognition Letters 99, pp. 67 – 76. Note: User Profiling and Behavior Adaptation for Human-Robot Interaction External Links: Cited by: §II-B.
-  (2010) Balance responses to lateral perturbations in human treadmill walking. Journal of Experimental Biology 213 (15), pp. 2655–2664. External Links: Cited by: §II-D.
-  (2004-06) Automatic gait-pattern adaptation algorithms for rehabilitation with a 4-dof robotic orthosis. IEEE Transactions on Robotics and Automation 20 (3), pp. 574–582. External Links: Cited by: §II-A.
-  (2019-07) Synthesis of biologically realistic human motion using joint torque actuation. ACM Trans. Graph. 38 (4), pp. 72:1–72:12. External Links: Cited by: §II-C.
-  (2006) Hybrid control of the berkeley lower extremity exoskeleton (bleex). The International Journal of Robotics Research 25 (5-6), pp. 561–573. External Links: Cited by: §II-A.
-  (2018) DART: dynamic animation and robotics toolkit. The Journal of Open Source Software 3 (22), pp. 500. Cited by: §IV.
-  (2019-07) Scalable muscle-actuated human simulation and control. ACM Trans. Graph. 38 (4), pp. 73:1–73:13. External Links: Cited by: §II-C.
-  (2014-08) Minimum jerk swing control allows variable cadence in powered transfemoral prostheses. In 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Vol. , pp. 2492–2495. External Links: Cited by: §II-A.
-  (2014-12) Speed-adaptation mechanism: robotic prostheses can actively regulate joint torque. IEEE Robotics Automation Magazine 21 (4), pp. 94–107. External Links: Cited by: §II-A.
-  (2012) Biomechanics of Human Gait - Slip and Fall Analysis. 2, pp. 466–476. External Links: Cited by: §II-D.
-  (2009) Biomechanics of trailing leg response to slipping-evidence of interlimb and intralimb coordination. 29 (4), pp. 565–570. Cited by: §II-D.
-  (2009) Direction-dependent control of balance during walking and standing. J Neurophysiol. 2009;102(3):1411–1419. doi:10.1152/jn.00131.2009. Cited by: §II-D.
-  (2018) DeepMimic: example-guided deep reinforcement learning of physics-based character skills. ACM Transactions on Graphics (Proc. SIGGRAPH 2018). Cited by: §I, §II-C, §III-A.
-  (2006) Falls in older people: epidemiology, risk factors and strategies for prevention. Age and ageing 35 (suppl_2), pp. ii37–ii41. Cited by: §I.
Trust region policy optimization.
International Conference on Machine Learning, pp. 1889–1897. Cited by: §II-B.
-  (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. Cited by: §II-B.
-  (2009) Simulating Balance Recovery Responses to Trips based on Biomechanical Principles. Proc. of ACM SIGGRAPH/Eurographics Symposium on Computer Animation 2009, pp. 37–46. External Links: Cited by: §II-C.
-  (2009-12) Preliminary evaluations of a self-contained anthropomorphic transfemoral prosthesis. IEEE/ASME Transactions on Mechatronics 14 (6), pp. 667–676. External Links: Cited by: §II-A.
-  (2017) A Sample-Efficient Black-Box Optimizer to Train Policies for Human-in-the-Loop Systems with User Preferences. pp. 1–8. Cited by: §II-A.
-  (2015) Toward Balance Recovery with Leg Prostheses using Neuromuscular Model Control.. PP (99), pp. 1. External Links: Cited by: §I, §II-A.
-  (2008-Sep.) Compliant actuation of rehabilitation robots. IEEE Robotics Automation Magazine 15 (3), pp. 60–69. External Links: Cited by: §II-A.
-  (2015-03) Design and control of the mindwalker exoskeleton. IEEE Transactions on Neural Systems and Rehabilitation Engineering 23 (2), pp. 277–286. External Links: Cited by: §II-A.
-  (2014) Stepping in the direction of the fall: The next foot placement can be predicted from current upper body state in steady-state walking. Biology Letters 10 (9). External Links: Cited by: §I, §II-D, Fig. 3, §IV-A.
-  (1991) Biomechanics and motor control of human gait: normal, elderly and pathological. Waterloo Biomechanics. Cited by: §I, Fig. 2, Fig. 3, §IV-A.
-  (2015-05) Design of two lightweight, high-bandwidth torque-controlled ankle exoskeletons. In 2015 IEEE International Conference on Robotics and Automation (ICRA), Vol. , pp. 1223–1228. External Links: Cited by: §IV.
-  (2015) Design and control of a powered hip exoskeleton for walking assistance. International Journal of Advanced Robotic Systems 12. External Links: Cited by: §I.
-  (2015) Review of assistive strategies in powered lower-limb orthoses and exoskeletons. Robotics and Autonomous Systems 64, pp. 120 – 136. External Links: Cited by: §II-A.
-  (2018) Learning symmetric and low-energy locomotion. ACM Transactions on Graphics (Proc. SIGGRAPH 2018) 37 (4). Cited by: §I, §II-C.