Collaborative tasks [ajoudani2018progress] are essential in areas like social robotics and human-robot interaction. Usually, social-aware robot navigation [charalampous2017recent] is involved in these tasks in different ways like, for example, assistive robots [urdiales2011new], approaching task, collaborative searching [marc_searching] or side-by-side navigation [repiso2020people].
The most typical robot navigation approaches take into account the map knowledge to compute a global plan to a goal avoiding obstacles in combination with a local planner to react in a short range. Dynamic Windows Approach (DWA) [fox1997dynamic] or Social Force Model (SFM) [helbing1995] are reactive local planners and they are useful in presence of unpredictable situations in dynamic environments. However, when there are moving obstacles (e.g., person, bicycles, scooters, etc.), reactive behaviors are not enough. It is more useful to have an anticipative behavior to deal with them [ferrer2019anticipative].
Due to the noise and the incomplete environment knowledge, planners which require accurate data like Artificial Potential Fields (APF) [chiang2015path] can fail in real life applications. In these cases, Deep Reinforcement Learning (DRL) can provide a feasible solution through algorithms like Deep Deterministic Policy Gradients (DDPG) [lillicrap2015continuous] and Proximal Policy Optimization (PPO) [han_yiheng]. These approaches yield good performance in robot navigation without complete map knowledge.
In this work, we present three different anticipative strategies that can be used in different motion planners. We have tested these strategies using our hybrid motion planning model for robot navigation that combines a DDPG policy based on [chiang2019learning] and SFM. The three models denominated ”Anticipative Turn (AT)”, ”Anticipative Robot and Pedestrian’s Propagation (ARP)” and ”Anticipative Pedestrian’s Propagation (APP)”, compute the best hybrid navigation decision taking into account the prediction of the mobile objects in open and complex dynamic scenarios (refer to Fig. 1). We have validated the models through real-life experiments and using simulation in dynamic scenarios for different noise values and moving obstacle velocities.
The remainder of this paper is organized as follows. In section 2, the related work is introduced with a description of the hybrid model that combines DDPG and SFM. The anticipative strategies are described in section 3. In Section 4, a quantitative and qualitative evaluation is presented through simulations and real-life experiments. Finally, in section 5, the conclusions are provided.
2 Related Work
2.1 AutoRL for Robot Navigation
Automated Reinforcement Learning (AutoRL) [chiang2019learning, faust2019evolving] is a training algorithm that uses DDPG or other DRL approaches [mnih2015human, haarnoja2018soft]
In [chiang2019learning], the algorithm is applied to point-to-point navigation task (P2P), which consists on navigating from an initial position to a goal using as policy actions, the velocities at each navigation step. The environment information is the robot Lidar observations in goal polar coordinates. The reward function is:
where are the reward hyperparameters for the P2P task, is a constant penalty at each step in the episode with value 1, is the negative Euclidean distance to the goal, is 1 when there are collisions and zero otherwise. is the negative angular speed, is the distance to the closest obstacle and is 1 when the agent reaches the goal and zero, in the remaining cases.
AutoRL have been implemented as a local planner that complements a global planner as PRM [francis2020long] in large indoor environments. The results in real environments show high performance and robustness to noise. However, AutorRL requires a very large training time and it has high computational cost.
2.2 Hybrid Policy with SFM
This subsection explains the hybrid model proposed in [oscar_anais_sensors2021] resulting from a DDPG training and the SFM with collision prediction (CP) [zanlungo2011].
First, the model is trained using the DDPG algorithm with the same noise and optimal hyperparameters, obtained in [francis2020long] for a differential drive robot model. The training is performed with only static obstacles.
This algorithm uses feed-forward networks for both the actor and the critic. The actor network is the policy, , with network parameters , that takes the environment observations and provides an action (linear and angular velocities): .
Second, in the evaluation phase, the people velocities obtained using a tracker [vaquero2019robust, linder2016multi] are used in the SFM with CP to compute velocity changes as is explained in [oscar_anais_sensors2021]. These velocity changes are added to the DDPG actions to calculate the actions provided by the hybrid model:
This hybrid model takes the agent velocities as additional entries regarding the pure DDPG model. Another option is to exploit this information as an entry for training a new DDPG model. This option has several disadvantages like searching for the new optimal hyperparameters for the implementation and a longer time to learn the policies in a more complex environment.
2.3 Anticipative Planning
Anticipation is the ability to react taking into account not only the actual situation, but also a prediction in a future time window. It can be useful in human-aware motion planning to avoid dangerous situations in the future, where finding a solution could be very difficult or maybe impossible.
Some works [cosgun2016anticipatory, ferrer2019anticipative] have implemented anticipation in path planning based on human trajectory prediction [rudenko2020human]. There are several methods to predict trajectories based on previous human positions or other environment cues. Despite the unpredictable and multimodal nature of the problem, the constant velocity model (CVM) [scholler2020constant] gives a good approximation in many cases.
3 Anticipative Models
The hybrid model described in the previous section avoids collisions but when the robot is close to the person. This section explains the three anticipative strategies proposed in this paper to avoid collisions in a human-aware way.
3.1 Problem Formulation
As in [chiang2019learning]
, this work tackles the robot P2P navigation task as a Partially Observable Markov Decision Process (POMDP). The action space,, includes the linear and angular velocities for a unicycle robot model: where m/s and rad/s. The robot and the agents or moving obstacles (e.g., pedestrian) are modelled in the environment as circles of radius and respectively. The reward function is the one described in (1).
The observations, , are divided in two parts: , includes the 64 1-D Lidar distance measures between 0 and 5 m, which are taken during the last simulation steps and includes the goal polar coordinates. The Lidar field of view is .
3.2 Anticipative Turn (AT)
This approach propagates the robot during a time-step using the actual velocity command before moving the robot with that command, and checks the possible collisions in the next movement with the last Lidar’s points.
In case of detection of a future collision, the robot does not execute the velocity command given by the hybrid model, but changes it for a command sequence of 6 steps. During the first 4 steps, the robot only turns using the angular velocity in a random way. In the next 2 steps, the robot moves only with the linear velocity. If a collision is detected when the robot movement is propagated the last 2 steps, the sequence is initialized until the robot can execute the complete sequence without collision.
3.3 Anticipative Robot and Pedestrian’s Propagation (ARP)
The AT approach has been designed for static environments. For this reason, it only works well when the velocity of the moving agents is very low or with only static obstacles. For these reasons, the AT model is not activated when a possible collision with moving obstacles is detected.
To avoid static and moving obstacles the ARP model is proposed, where at each step, the robot and the moving agents are propagated during time-steps in a local map. In this propagation, the static obstacles and moving agents of the environment are used to build the local map using the information of the last Lidar’s observations. The robot motion is propagated using the hybrid policy and the moving agents are propagated using the CVM. The propagation ends before the steps when there is a collision or the goal is reached.
After the propagation, the action that will be taken by the robot at the present time
is the same as the action taken when the robot was at the minimum distance of the obstacle during the propagation phase. This action increases the probability of avoiding a collisionsteps before it occurs.
3.4 Anticipative Pedestrian’s Propagation (APP)
The ARP model, described in the previous section, requires that the local map, with the static and dynamic objects, has to be modified taken into account the Lidar’s future propagation measurements. In some cases, the future Lidar’s measurements cannot detect objects, because of partial occlusions in the present time, before the robot or agent’s propagation. For these reasons, the model in this subsection does not create a local map or propagate all the environment.
The APP model considers a constant velocity prediction for all moving agents detected by the Lidar in the next
time-steps. Each new predicted agent position has an associated uncertainty that can be modelled as a two-dimensional Gaussian distributionc, centered in the position
A circle centered in with radius, , represents the area with more probability to localize the agent. If we consider that the uncertainty of the agent’s motions at each position is independent from other agent’s motions, the propagation of the uncertainty and the radius during the steps is the sum of the variances, .
The total radius of the anticipative circle after steps is the agent radius plus the uncertainty :
The APP model considers anticipative circles as virtual obstacles detected using the Lidar. At each step, the robot detects all the anticipative circles of each one of the agents for the next time-steps. If the robot collides with a circle, it is not considered a real collision. That circle and the next circles disappear and only the circles from the present time to the circle previous to the collision remain.
A scheme of the approaches is shown in Fig. 2.
4 Simulations and Experiments
The training and evaluation in simulation are executed in maps (refer to Fig. 3), which are the same floor maps used in [oscar_anais_sensors2021]. A new empty scenario has been added to evaluate the performance in presence of only moving obstacles. In this evaluation, the OP-DDPG model is the DDPG trained model without adding the velocity changes or anticipative strategies in the evaluation phase.
To evaluate the models, 3 metrics have been used:
Success Rate (SR): Percent of episodes in which the robot can achieve the goal without collision. This metric shows the overall performance.
Collision Rate (CR): Percent of episodes that ends in collision.
Timeout Rate (TR): Percent of episodes where episode time limit is reached.
Other metrics, as for example, the number of steps and the goal distance, give a less global description of the performance.
4.2 Simulation Results
Several versions have been evaluated in different conditions to explore the performance and the limits of the anticipative models. In this evaluation, the trained OP-DDPG model is the same for all the simulations. The differences are in the evaluation phase, where the hybrid model and the different anticipative strategies are used. It is important to remark that the AT, ARP and APP models are always used in combination with the hybrid model, rather than the OP-DDPG model. All the evaluations have been performed during 100 episodes as in [chiang2019learning].
Two ways for evaluation have been considered: using a Euclidean Distance (ED) between the goal and the initial position of the robot in a distance between and ; and using the approximate path distance (PD) between and of a RRT planner between the initial position and a goal.
The path distance offers episodes without very large, complex navigation for which the P2P task has not been designed, because the trained model does not use the complete map knowledge.
The results shown in Table 1 demonstrate that the hybrid model increases the success rate in all the evaluation maps. The AT model increases the performance even more avoiding most of the collisions, but with some timeouts caused by the large-scale local minima. The ARP and APP models are not included because there are not moving obstacles.
|Building 1 ED||60 - 34 - 6||66 - 11 -23||78 - 0 - 22|
Building 2 ED
|50 - 46 - 4||74 - 15 - 11||87 - 1 - 12|
Building 3 ED
|65 - 35 - 0||81 - 11 - 8||92 - 0 - 8|
|Building 1 PD||75 - 11 - 14||88 - 3 - 9||94 - 0 - 6|
Building 2 PD
|87 - 13 - 0||96 - 1 - 3||97 - 0 - 3|
Building 3 PD
|87 - 12 - 1||97 - 1 - 2||98 - 0 - 2|
The OP-DDPG model results are worse than the results in [chiang2019learning], due to the AutoRL optimal hyperparameters [chiang2019learning] are suboptimal in a different implementation. In this work, the results are focused in the advantages of applying the hybrid model and the anticipative strategies, regardless the AutoRL optimization.
Table 2 shows that the anticipative models are robust against the number of agents. Only in environments with a very high density of moving obstacles, like Building 2 with 100 obstacles, the success rate is significantly reduced. The AT+APP model gives the best results for all the combination of models when there are static and moving obstacles.
|Building 1-20||74 - 11 - 15||84 - 4 - 12||84 - 6 - 10||85 - 5 - 10||96 - 1 - 3|
|80 - 20 - 0||89 - 3 - 8||88 - 4 - 8||89 - 2 - 9||93 - 4 - 3|
|84 - 16 - 0||88 - 2 - 10||88 - 3 - 9||87 - 3 - 11||95 - 1 - 4|
|Building 1-100||67 - 16 - 17||84 - 3 - 13||84 - 5 - 11||86 - 4 - 10||97 - 1 - 2|
|61 - 39 - 0||76 - 16 - 8||75 - 16 - 9||76 - 14 - 10||81 - 17 - 2|
|83 - 17 - 0||92 - 0 - 8||93 - 1 - 6||93 - 1 - 6||94 - 3 - 3|
The empty map scenario with only moving obstacles is very useful to evaluate the anticipative strategies. In this way, in Table 3, we demonstrate the positive effect of the anticipative strategies in collision avoidance with moving obstacles. In very crowded scenes, where the Euclidean distance for the goal is between and , the success rate improves a 10 percent.
|E. Map ED-35||95 - 5 -0||95 - 5 - 0||98 - 2 - 0|
E. Map ED-50
|89 - 11 - 0||90 - 10 - 0||96 - 4 - 0|
E. Map ED-70
|81 - 19 - 0||86 - 14 - 0||93 - 6 - 1|
The robustness to noise has been evaluated in building 2 with 20 moving agents. Like in AutoRL work, Gaussian noise is considered,
, for 4 different standard deviations,. The success rate in these cases decreases less than a . In these evaluations, the maximum velocity for moving agents is and the SFM is used to simulate a pedestrian as in [chiang2019learning].
The last evaluation is in the empty map scenario through intersections between the robot and moving agents for different agent maximum velocities. In this challenging scene, the intersections are forced to check how the robot reacts to avoid collisions. The agents do not avoid obstacles because they use the CVM.
This evaluation has been performed using 12 cases with up to 3 agent intersections in different directions regarding robot orientation. These directions are . The results in Fig. 4 show less decay for the APP model. For more than it is almost impossible to avoid collisions because in the Lidar’s range, the robot has less than a second to react.
4.3 Real-Life Experiments
Real indoor experiments have been performed using the Helena robot. Helena is a transporter robot with a Pioneer P3-DX base and a RS-LiDAR-16. The Lidar observations have been adapted to be entries of the trained policy.
Two experiments have been performed to check the feasibility of the APP model with people in real environments:
Frontal Encounter (FE): A person and the robot cross paths walking in opposite directions.
Perpendicular Encounter (PE): A person and the robot cross paths walking in perpendicular directions.
In Fig. 5, the FE experiments are shown. With the hybrid model the robot only reacts close to the person. Using the APP model, the robot reacts around before the person arrives. In Fig. 6, the PE experiments show that the robot using the APP model not only reacts before, but chooses to navigate behind the person to avoid collisions which is a more efficient solution.
In Fig. 7, the PE experiment is performed using the ROS Navigation Stack with DWA (NS-DWA) and the APP model through anticipative circles created in the costmap. The NS-DWA reduces the robot velocity too close of the person but using the APP model the robot stops before reach the person. Overall, the hybrid model robot movements are more natural than the ones observed with the NS-DWA when a person is close to the robot, because the robot does not stop to replan its path, avoiding in a better way the frozen robot problem. The minimum distance to the robot increases and the navigation time is reduced (refer to Table 4).
|NS-DWA||NS-DWA+APP||Hybrid Model||APP model|
|Navigation Time (s)||18.8||19.6||20.0||17.4|
M. distance (m)
4.4 Implementation Details
In this subsection, the main features used in our implementation are detailed. All the hyperparameters used in OP-DDPG have been taken from [francis2020long].
For the neural network, the critic joint, critic observation and actor layer widths areare . At each step, the number of observations considered is . The noise parameters are the ones described in [francis2020long]. All the results have been obtained with the same trained model during 2 million steps and 500 steps as maximum episode length.
The implementation has been performed in PyTorch using the Adam optimizer, the Gym library and the Shapely library.
The force parameters used in SFM are and . For the ARP and APP models the propagation number of steps is . The anticipative circles in APP model are created every 5 steps with . This value produces circles with a shorter diameter than corridors where the robot and a moving agent can navigate.
For all simulations, each simulation step is . The agents and the robot are circles with a radius of 0.3 . To perform real experiments, the implementation has been transferred to the Robot Operation System (ROS) middleware and the people velocities are obtained through the Spencer tracker [linder2016multi].
In this work, we have analyzed different methods to avoid collisions with people in a reactive and anticipative way, using a model trained in a static environment.
In scenarios with static obstacles, the hybrid model obtains better results than the OP-DDPG model. In scenarios with static and moving obstacles, the combination between the hybrid model and 3 different anticipative strategies to prevent collisions in a range, obtains better results than the hybrid model.
The AT model results show an important improvement avoiding collisions with static obstacles. For dynamic environments, the ARP and APP models improve the performance of the other models, because the robot anticipates the collision and reacts before the collision happens. Specifically, the APP model can avoid in some situations high-velocity agents with speeds of up to . Furthermore, the APP improves the robot anticipation in real experiments with people for the hybrid model and the NS-DWA. These anticipative models could be applied for velocities greater than with a longer Lidar’s range.