IPMPlanner
Introduction to the paper "Efficient Speed Planning for Autonomous Driving in Dynamic Environment with Interaction Point Model"
view repo
Safely interacting with other traffic participants is one of the core requirements for autonomous driving, especially in intersections and occlusions. Most existing approaches are designed for particular scenarios and require significant human labor in parameter tuning to be applied to different situations. To solve this problem, we first propose a learningbased Interaction Point Model (IPM), which describes the interaction between agents with the protection time and interaction priority in a unified manner. We further integrate the proposed IPM into a novel planning framework, demonstrating its effectiveness and robustness through comprehensive simulations in highly dynamic environments.
READ FULL TEXT VIEW PDFIntroduction to the paper "Efficient Speed Planning for Autonomous Driving in Dynamic Environment with Interaction Point Model"
Planning safe trajectories for autonomous vehicles (AVs) in complex environments is a challenging task. As demonstrated in Fig. 1, it requires the planning system to have the ability to safely react in realtime under various scenarios, such as intersection, merging, and traversing areas with a limited field of view (FoV).
A typical strategy in solving these problems is to transform traffic rules and prediction results into constraints in an optimization problem [25, 5]. Although this method can generalize to different scenarios, it does not model the prediction and interaction uncertainties well. Another strategy applies the POMDPbased methods[8, 17], which searches for the best reaction trajectory by simulating the traffic interactions in the belief space. However, these methods introduce a high computational burden, and are hard to reflect the interaction patterns using manually designed reward functions.
To better capture the interaction among traffic participants, it is essential to focus on modeling driving patterns around an “interaction point” without traffic regulations. The interaction points are observable situations where at least two road users intend to occupy the same space in the near future[6]. Some works implicitly model the planning problem on these points. For instance, carfollowing models[21, 18] and lanemerging approaches[10] use different values of time gap ( in Fig. (a)a) to generate the proper reactive behavior for AVs. Prediction work[16] analyzes the rightofway of collision points in the intersection. However, these methods are tailored for specific scenarios. It brings a heavy load on parameter tuning when applying multiple methods in different situations.
When reacting to other road users at different speed and angles in various scenarios, human drivers generally show different degrees of caution that can be indicated by the time gap. Therefore, we focus on the following two questions. Question 1: When two agents intend to pass the same interaction point in various scenarios, what is the minimum time gap between them? Question 2: What determines the priority of agents to pass the interaction point in Question 1?
To answer these questions, we propose a novel Interaction Point Model (IPM), where Question 1 is solved by statistical analysis in real driving records, and Question 2
is explained using a multilayer perceptron (MLP) network. Then, based on IPM, we present a unified planning framework
^{2}^{2}2IPM model files and demonstration videos are available on the project website: https://github.com/ChenYingbing/IPMPlanner. to achieve safe and interactive motion planning, which guides the ego vehicle pass an interaction point only when it has a higher priority than other traffic participants.Overall, we propose a unified interactive planning framework for AVs. The major contributions are:
A novel and general interaction solution called IPM, which uniformly formulates the interaction in terms of the protection time and priority.
A unified planning framework for AV in complex dynamic environments, including an efficient st graph searching to calculate the trajectory distribution under traffic regulations, and a priority determination module based on the IPM to select a safe trajectory.
Evaluations in unsignalized urban driving simulations showing our method’s computational efficiency, robustness, and effectiveness.
Existing methods used different ways to represent traffic interactions in planning. One strategy was to represent prediction results and simulation outcomes[17, 14] in the spatiotemporal (st) graph. Then, the sampling methods[27, 24] and optimization methods[26] were used to extract the bestcost or lowestrisk trajectory. Another strategy applied reactive or interactive models[8] to generate action for AVs directly. Methods based on elaborated rules[21, 29]
[28], and control theory[4] were widely used in autonomous driving. Compared to these methods, our work applies both st graph representation and interactive model (IPM). This combination makes it avoid deterministic representation[27, 26, 13] and limited expression of interaction[21, 29, 28, 4]. In addition, our method is highly efficient as it does not require lowefficient simulation[17, 14] and accurate collision checking[27]. It also outperforms learningbased methods[12] on interpretability, as the IPM is simpler and more intuitive.In interactive planning, the motion of other traffic agents and the ego vehicle is modeled as interdependent. This interdependency makes it impracticable to separate the problems of prediction and planning[8]. There were several interactive planning approaches. For the discussed POMDPs[8, 17, 14], interactions were reflected by timeconsuming simulations but not modeled directly. In recent years, learningbased approaches[7, 11]
have been promising to learn trajectory generation and prediction in a unified architecture. However, machine learning technologies lacked traceability when an error occurred in practice. Another was gametheoretic plannings
[20, 23], aiming to compute the Nash equilibrium solutions for all players iteratively. The major shortage of these methods was their high computational cost when the number of players increased. In addition, the game rules were elaborately designed for specific situation, making them hard to generalize to different scenarios.Besides these, branch MPC[30] and failsafe planners[2, 32, 33] emphasized the worstcase performance of AVs. For branch MPC[30], it was an optimization method over risk measures to minimize the worstcase expectation, which did not model the traffic interactions well. Instead of accurate risk avoidance, failsafe plannings [32, 33]
were reactive planners that guaranteed AV’s safe reactions to all possible prediction uncertainties. It achieved this by keeping a collisionfree maneuver available at all times. Following this idea, our method shows a conservative maneuver only when the interaction priority is estimated as low, which prevents the AV from too conservative behaviors
[33].An overview of the proposed planning framework is presented in Fig.3. In this system, the IPM has two functions: 1) determining whether the AV has a higher priority at an interaction point and 2) extracting the speed limit if the AV intends to pass an occluded area. The framework functions as follows: Given the inputs of the path and speed limits, the spatiotemporal (st) graph search first outputs the valid trajectory distribution. Then, the interaction priority determination decides a safe and efficient trajectory for the AV. Finally, the trajectory is smoothed to meet the kinodynamic feasibility and is delivered to the controller for execution.
We extract pairs of interaction data from the INTERACTION dataset [31] to analyze the protection time and priority in the interaction points. Two examples are shown in Fig. 4, where car following is a special case of the line overlap [25]. It should be noted that the proposed IPM is also applicable in onetomany interactions.
The humanrobot interaction model[9] indicates that participants will exchange their intentions beyond a certain distance called the social distance. Within the social distance, the determined form of interaction is performed. Similarly, [6] points out that, before an interaction happens, traffic participants show their intentions via implicit (interactive behavior) or explicit (e.g., turn signal) communications. Inspired by these works, we make the following assumptions.
Assumption 1: The participants will be conservative when facing potential interactions until the maneuver (e.g., who goes first) is determined via communication [6].
Assumption 2: The maneuver for an interaction point can be determined only after all participants are observed. There is a location called an observation point at which the AV has no blind zone in FoV.
For Assumption 1, we believe it is rational because even in a carfollowing scenario [21], traffic agents act conservatively to keep a safe distance from the front vehicle. That is, the planned trajectory should guarantee the safety of the AV for the worstcase situations in the future [33]. As for Assumption 2, an example of the observation point is shown in Fig. (b)b, where the AV cannot determine its maneuver to interact with the until it reaches the observation point . How to obtain an observation point is not discussed in this work, it requires an analysis of the FoV and highdefinition map [19].
As illustrated in Fig. (a)a, denotes the th interaction point, and and
are the speeds of the moment that agents pass the
. We further define the properties of the as(1) 
where and are the relative interaction speed and angle, and is the time gap in Fig. (a)a. In the equation, similar to the definition of , is the yaw state of the AV at the location of , and is the arrival time. The subscript letter indicates the agent, and the superscript corresponds to the index of the interaction point. Moreover, is nonzero because agents cannot pass the same position simultaneously.
Following Question 1, the interaction protection time is the minimum value of the time gap . It has two cases: A negative corresponds to overtaking, and a positive one corresponds to yielding for others. As a result, we define
(2) 
where denotes the maximum time gap when overtaking, is the minimum time gap when giving way, and is the set of possible values of the time gap in the data set. Then, is modeled as functions of and respectively.
(3)  
where and are represented as fitting curves, and is the fitting coefficient. Given the values of and together, the boundaries become
(4) 
More illustrations will be given in Sect. VIIB.
Recalling Assumption 1 and 2, before all occluded vehicles are observed, the agent needs to prepare for the worstcase in the future, that is, to give way to any emergent vehicles. As the example shown in Fig. (b)b, before the AV reaches the observation point , the remaining time to reach should be greater than . This means the speed upper bound
(5) 
where is the distance between the AV and the , returns the Euler distance, and the upper bound meets its minimum . In this equation, can be quickly estimated by setting to zero, which means there is no need to predict agents’ speed in the occluded area.
The definition of the interaction priority is from Question 2. It is treated as a conditional distribution , where
is the feature vector. Before representing the input features, we introduce two metrics: the minimum arrival time
and the maximum arrival time . These two metrics are calculated by the constant acceleration model(6) 
(7)  
where denotes the initial speed of agent , is the possible average acceleration before reaching the interaction point, and and are the maximum and minimum values of acceleration. When the AV and intend to pass , two features named the overtaking ability and giving way ability are defined as
(8)  
where the expression of indicates that the overtaking needs to suffer the protection time , and is the time difference when the two agents slow down simultaneously. In this equation, when is less than zero, the AV can safely occupy the interaction point in advance because it can arrive much earlier than . When is positive, if the AV wants to overtake, the communications mentioned in Assumption 1 and the traffic regulations must be considered.
Our goal is to learn a binary classifier
. Since agents in the experiments are not as intelligent as in the dataset, the classifier is modified as(9) 
where MLP is the trained 3layer MLP model, is the time threshold, and are constants ( is positive). When the AV is far from the interaction point ( is large), it is assumed that the interaction follows patterns learned from the dataset ( is less than ). In contrast, when is small, it turns into a more conservative model in which plays a key role.
The minimum arrival time in the IPM uses a constant acceleration model, which is not competent when the speed limit changes. Hence, a spatiotemporal (st) graph search method is applied to sample speed profiles along the path. This module does not generate a collisionfree speed profile but obtains the future trajectory distribution under variational speed limits. As shown in Fig. 5, we present the graph node as , where is the node index, is the accumulated distance, the subscript is called the layer index, and and are the speed and timestamp values.
The first step is to select a sequence of path points to build the search space, where is the maximum layer. This process considers the curvatures and locations of the interaction points, which obeys: (i) The selected point should reflect the speed limit change between bends and straights. (ii) The path point should involve the location of the interaction point and observation point. (iii) The number of layers should stay a suitable size. Too many layers will increase the computation amount, while a toosmall number will reduce the search space.
During forward simulation, traffic elements, e.g., traffic lights, stop signs, and speed limits, can be transformed into spatiotemporal constraints in the search space [26]. This work involves two kinds of velocity constraints: one comes from the curvature, and the other from the observation point in (5). We denote as the maximum lateral acceleration and as the curvature at . The speed limit of the curvature [13] is
(10) 
We present the whole process in Algorithm 1 and a demonstration in Fig. (a)a. During the expansion, the best node popped from the open list (Algo. 1, Line 4) is forwarded by
(11) 
where , and is the child node. , are the speed bounds at layer . is the acceleration from a set of discretized control inputs . During the expansion, any child node which exceeds the layer bound or time limit , or has a speed that is lower than will not be expanded (Algo. 1, Line 6). The expanded node has cost
(12) 
where denotes the cost of the parent node, is the control cost, is the deviation from the speed limit, and is the weights.
As illustrated in Fig. (a)a, for nodes in a discretized grid with resolutions and , only the node with the lowest cost is allowed to undergo expansion (Algo. 1, Line 10). This operation significantly improves the search efficiency, and it has little impact on the search result since the abandoned nodes have similar values to the lowestcost one.
Once the graph search is finished, trajectories to graph nodes and the lowestcost speed profile are obtained. For node in , we use notation to represent both the node and layer index for convenience. The next step is to determine which trajectory to perform, and the IPM is applied for priority checking among the sampled trajectories.
The whole process is outlined in Algorithm 2. denotes the graph nodes from layer to , and the IPM is used to check the priority of in turn until it reaches the maximum layer or quits when the AV’s priority is lower than that of other participants (Algo. 2, Line 4):
(13) 
where is the set of agents who intend to pass the interaction point at layer ,
denotes the probability in (
9), and is a constant threshold. In the implementation, protection times and are calculated by setting to zero since it is hard to estimate accurately. Moreover, in uses the value in , and is calculated via the constant acceleration model. Following the illustration in Fig. (b)b, once has a lower priority, nodes at previous layers (e.g., ) will be checked to see if they meet the following condition(14) 
where is the invariable safe set [33] that at least one action exists for the AV to remain safe for an infinite time horizon. Our work involves two kinds of invariable safe sets. One is to promise the AV can stop before the intersection (TimetoBrake [15] greater than 0), and another is to obey the responsibility safety criterion [21] when following a car. Then, a onestep forward expansion will be conducted at , and the generated child should obey its parent’s constraint (14). For example, as the green edges show in Fig. (b)b, child nodes at the new layer are sampled if they allow the AV to stop before a safe distance to the intersection.
Finally, the best speed profile is picked up, and the above operations are executed iteratively (Algo. 2, Line 12) until all nodes of the speed profile have a higher interaction priority. Fig. (b)b shows three speed profiles in orange, purple and grey, respectively. The orange one is the best speed profile derived from the forward simulation while having a low priority at . As a result, the speed profile in purple is selected instead. Then, the grey speed profile will be applied if the purple speed profile has a low priority at .
The chosen speed profile can be smoothed further to meet the kinodynamic feasibility using a quintic piecewise Bézier curve in the spatiotemporal semantic corridor (SSC [26]). Based on the and speed limits in the search space, our method use as the initial seeds to build SSC, while dynamic obstacles are not mapped to the SSC because the initial seeds have guaranteed the AV’s safety.
We train and test the IPM in the INTERACTION dataset [31] because it contains motions of various traffic participants in various driving scenarios without traffic lights. The proposed planner is validated in the CARLA [3].
We use Pytorch
[1] to train the MLP network in IPM, and the whole planning framework is implemented with C++ backend. All experiments are conducted on a computer with an AMD5600X CPU (3.7 GHz). From the INTERACTION dataset [31], we first extract the data that have coincident trajectories between agents (as plotted in Fig. 4) and divide them into different pairs of interactions. Some examples are available on our project website, where agents are simplified as circles. At the same interaction point, agent overtaking is identical to the case where agent gives way to . As these two cases are essentially the same scenario viewed from different perspectives, we only add this type of data to our dataset once to avoid data duplication.In the simulation, the AV’s acceleration limits are , and its field of view is limited, where other traffic agents will not be observed if their distance to the AV is greater than m. In the IPM, the MLP network and parameters in (3
) are available on our project website. For each intersection with a limited field of view, we apply the heuristic function
to roughly estimate the speed limit in (5), where is the length of the lane in the intersection. During forward simulation, the maximal distance between two sampled path points is . In Algorithm 1, the expansion conditions , m/s, grid resolutions m/s, and s. In the classifier (9), , , , and .



Time  1ms  110ms  1020ms  2030ms  3040ms 
Percentage  10.97%  42.53%  39.31%  7.16%  0.03% 




Town02 (Seed 1)  Town02 (Seed 2)  
Methods  Completion (m)  Collision Num.  Avg. Speed^{a} (m/s)  Completion (m)  Collision Num.  Avg. Speed (m/s) 
SSC[26]  1232.00 187.71  0.0 0.00  4.10 0.61  1444.94 081.07  0.2 0.45  4.8 0.25 
MCTS[14]  1295.73 054.82  0.0 0.00  4.43 0.31  1401.86 104.27  0.0 0.00  4.7 0.36 
MMFNExpert[22]  1340.84 412.39  1.0 0.71  4.86 0.53  1620.00 059.71  0.8 0.84  5.4 0.18 
PD  1193.86 107.04  0.0 0.00  3.98 0.38  1303.22 035.57  0.0 0.00  4.3 0.11 
PDCVel  1303.08 161.48  0.3 0.48  4.41 0.52  1398.94 081.53  0.2 0.45  4.7 0.30 
PDIPM  1358.11 060.73  0.0 0.00  4.71 0.48  1457.96 042.54  0.0 0.00  4.9 0.13 

is the average velocity before the AV collides with other vehicles and cannot continue with the navigation task.
We analyze the extracted pairs of interactions (mentioned in Sect. VIIA) and plot and in Fig. (a)a. Inside a box, the green line represents the mean value, and the orange line denotes the median. The color of the box represents the sample size, where orange means the size is greater than 10000, yellow (than 1000), blue (than 100), light blue (than 10), and black is [1,10). In addition, the solid lines are fitting curves of and in (3), which are based on the Gaussian intervals.
As shown in Fig. (b)b, we label samples with negative as overtaking (red points) and positive as giving way (cyan points) in the dataset. The background color denotes in (9), and there are four areas. For B and C, the priority is clear that can be quickly classified. In A and D, plays a key role. Most agents in the dataset can give way to their competitors when the giving way ability is higher. We further classify as overtaking to evaluate the prediction accuracy. The trained MLP network achieves 99.79% accuracy in the training set with a data size of 5283 and 99.81% in the validation set (10071 data size). In the validation experiment, we test all time slots before reaching the interaction point, which makes the validation set’s size larger than the training set’s.
Besides the response time factor, the IPM explains why an advance deceleration is necessary for an emergency from another perspective. As shown in Fig. (c)c, the purple line represents the braking speed limit, which is the maximal speed when the braking distance is less than , and the dashed lines are the speed limits defined by (5). This figure suggests that when is s and the AV runs at m/s, the braking is hard to prevent a collision if a vehicle suddenly appears within m. That is, it reflects the speed limits of the grey area.
To verify that the proposed planning framework is competent in highly interactive scenarios, we test our method in Town02 and Town04 of CARLA. In the experiments, traffic agents in the simulation are set to ignore the traffic signals and signs, and two simulation results are discussed in Fig. 7. In addition, more qualitative results are available in the supplementary video on the project website.
1) Realtime Performance: A 30minute navigation test with dynamical agents is conducted in CARLA to verify the computational efficiency of our proposed planning framework. The test results are shown in Table I, where the computation time includes all planning modules in Fig. 3, and the average calculation time is ms. We observe that in 92.8% of cases, the runtime of our proposed method is less than ms, which shows the realtime performance.
2) Performance Comparisons: We select Town02 as the test scenario because the map is relatively small and more suitable for comparing the interaction performance. In addition, the position of interaction points in Town02 is fixed so that we do not need to predict other agents’ paths in advance. Like previous experiments, all junctions and intersections on the map are unprotected by the traffic rules. The following algorithms are compared with the same initial location and different random seeds. SSC [26]: constant velocity (CVel) model is applied to generate the SSC. MCTS [14]: Monte Carlo tree search algorithm, where other agents are predicted to slow down to stop or speed up to a certain speed when tracking the path, and their goal probabilities are uniform. MMFNExpert [22]: an elaborated expert agent. PDM: this method is based on the proposed planning framework, but the priority determination module decides to overtake when is less than zero. PDCVel: like PD, it overtakes when the estimated by CVel is less than zero. PDIPM: the proposed planning framework using (9) to determine the priority. The speed constraint (5) is enforced in all methods since most benchmarks [26, 14] did not model the occluded factors.
Each algorithm is tested with a series of goals in five minutes. The tests are repeated five times, and the performances are recorded in Table II
. In the table, the numbers are the mean values and standard deviations for three metrics: completion distance, number of collisions, and average speed. The experimental results demonstrate that our method is safe and efficient. In the seed 1 setting, PDIPM achieves no collision situations with the longest completion length. In the seed 2 setting, although the MMFNExpert has the longest length and highest average speed, it is unsafe and has a high collision number in red. By contrast, our PDIPM gets a zerocollision rate and outperforms other approaches
[26, 14]. Besides this, the performance of PD is conservative but safe, which shows our planning framework’s reliability. The comparison between PDCVel and PDIPM reflects that the IPM can make the planning system more effective.This paper presented a novel unified planning framework based on the proposed interaction point model (IPM), which enabled a uniform description of the interaction under various driving scenarios. First, the IPM was trained and tested in the real traffic data, showing the general interaction rule in different scenarios. Next, the planning framework was introduced, and the efficient graph search and interaction priority determination were elaborated. Experiments showed that our framework enabled the system to interact safely with other agents, and the IPM made it more efficient. Future work will include introducing more complicated scenarios into the planning framework, with interactions in open space, with pedestrians, or between other agents under more traffic regulations modeled.
Pytorch: an imperative style, highperformance deep learning library
. NeurIPS 32. Cited by: §VIIA.Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR)
, pp. 16107–16116. Cited by: §IIB.