I Introduction
Connected and Automated Vehicles (CAVs) are believed to be a key role in the nextgeneration transportation systems [1]. With the aid of vehicletovehicle (V2V) communication, CAVs can share their driving states (position, velocity, etc.) and intentions with adjacent vehicles [2, 3] to better coordinate their motions to alleviate traffic congestion and improve traffic safety.
In the last decade, various strategies had been proposed to make optimal coordination for CAVs at a typical driving scenario: unsignalized intersection. It is pointed out in [4] and [5] that the key problem is to determine the optimal order of CAVs that passed the intersection. As summarized in [6], there are two kinds of cooperative driving strategies, planning based and ad hoc negotiation based, for determining the passing order.
Planning based strategies aim to enumerate all possible passing orders to find the globally optimal solution [7]
. There are two equivalent formulations of the problem. Most stateoftheart studies formulate the problem as a mixed integer linear programming problem of vehicles’ passing time scheduling
[1, 8]. The objective is usually set to minimize the total delay of all CAVs. Li et al. showed that we can also view this problem as a tree search problem. Each tree node indicates a special (partial) passing order. The equivalent objective is to find the leaf node corresponds to the minimum total delay of all CAVs [4, 5]. It was shown in [9] that some planning based strategies work well for ramp metering scenarios. However, the time to enumerate all the nodes increases sharply as the number of vehicles increases, especially for unsignalized intersection scenarios. This problem hinders their applications in practice.Ad hoc negotiation based strategies aim to find an acceptable passing order using some heuristic rules within a very short time. For example, Stone et al. proposed autonomous intersection management (AIM) cooperative driving strategy which divides the intersection into grids (resources) and assigns these grids to CAVs in a roughly FirstInFirstOut (FIFO) manner [10, 11]. This strategy has several variations, including reservation strategy [12]. However, as shown in [6], the passing orders found by ad hoc negotiation based strategies were not good enough in many situations.
To keep a good tradeoff between performance and computation flexibility, we propose a new cooperative driving strategy based on the tree representation of the solution space for the passing order. Its key idea is to use the limited planning time to explore the nodes that are potential to be the optimal solution. To this end, we combine Monte Carlo tree search (MCTS) and some heuristic rules to accelerate the searching process, since the solution space of this problem has special structures to be exploited. Testing results show that we can find a nearly globaloptimal passing order within a short enough planning time.
To give a better presentation of our finding, the rest of this paper is arranged as follows. Section II formulates the problem and briefly reviews the existing strategies. Section III presents the new strategy. Section IV validates the effectiveness of the proposed strategy via numerical testing results. Finally, Section V gives concluding remarks.
Ii Problem Formulation
Fig. 1 shows a typical intersection scenario with multiple lanes in each leg. The area within the circle is called the control zone, and the shadow area is called the conflict zone where lateral collisions might happen. According to the geometry of the intersection, the conflict zone can be further divided into several conflict subzones.
We assign each vehicle that enters the control zone a unique identity . We also use the set to denote the conflict subzones that will pass through. For example, means will pass through Conflict Subzone 4 and Conflict Subzone 1 in sequence.
To simplify the problem, we adopt the following assumptions:

Each vehicle instantly and thoroughly shares its driving states (position, velocity, etc.) and intentions with other vehicles via vehicletovehicle (V2V) communication.

Changing lane maneuver is prohibited in the control zone to ensure vehicle safety.
The cooperative driving strategy aims to minimize the total delay of vehicles by scheduling the velocity and acceleration profiles of all vehicles [14]. So, we can get the following optimization problem
(1) 
where is the desired arrival time to the conflict subzone for , is the minimum arrival time to the conflict subzone when travels at the maximum velocity and the maximum acceleration, is the first element in the set , is the number of vehicles in the control zone.
To directly attack Problem (1) often leads to a mixed integer linear programming (MILP) problem whose computation time increases exponentially with the increase of the number of vehicles [8, 9].
Noticing that the traffic efficiency mainly depends on the passing order of vehicles [6], we can formulate the whole problem as a tree search problem in the solution space that consists of all possible passing orders. Each leaf node represents a passing order of vehicles which can also be denoted as a string [5]. For example, string CAB means vehicle C, vehicle A, and vehicle B enter the conflict zone sequentially.
Let us take the intersection scenario shown in Fig. 2 as an example to explain how to build the tree representation of the solution space gradually . At first, we set the passing order in the root node to be empty. Then, each direct child node of the root node (in the second layer) refers to one index symbol that indicates the first vehicle in a special passing order. The nodes in the third layer refer to one string consisting of two indices symbols that indicate the first two vehicles in a special passing order. Similarly, the child nodes expand their child nodes, and all possible passing order are generated as leaf nodes in the bottom layer of the solution tree as shown in Fig. 3.
If a (partial) passing order is given, the desired arrival times for all the vehicles that has been covered in this (partial) passing order can be directly derived by the following Passing Order to Trajectory Interpretation Algorithm. Our objective turns to seek the leaf node that corresponds to the shortest total delay. Moreover, the total delay values of leaf nodes can be used to evaluate the potential of their parent nodes in a backpropagation way. This method provides us a chance to find a nearly globaloptimal leaf node but only search a small part of the whole tree.
In Algorithm 1, is the th element in the input (partial) passing order, is the largest arrival time that the subzone has been occupied. is the minimum safety gap between two consecutive vehicles passing through the same subzone. Obviously, the time complexity of Algorithm 1 is . A detailed explanation of Algorithm 1 can be found in our previous report [7].
Iii MCTS Based Cooperative Driving Strategies
It is usually impossible to expand all the nodes of the solution tree within the limited computation budget, when there are lots of vehicles in the control zone. In this paper, we use MCTS + heuristic rules to select nodes with the potential to be the optimal solution. The recent success of the MCTS method in the game of Go shows it is an effective way to deal with such problems [15, 16].
Iiia The Classical MCTS Based Strategy
In MCTS, each node in the formulated tree will be assigned a score to evaluate its potential. The score of a leaf node is equal to the total delay of its corresponding passing order. MCTS uses these scores to determine which branch of tree to explore.
Generally, MCTS gradually builds a search tree in an iteration way. One iteration consists of four steps: selection, expansion, simulation, and backpropagation [17]; see Fig. 4.

Selection: Starting at the root node, we select the most urgent expandable node based on the following policy [18]
(2) where is the score of child node and the value of is within . is the number of times the current node has been visited, is the number of times child node has been visited, and is a weighting parameter. The child node with the largest total score is selected. Here, an expandable node refers to a node that is not a leaf node and has unvisited child nodes.
This child node selection policy is suggested in the field of computer Go and is called UCB1 [18]. The first term in the equation encourages to select the child node that is currently believed to be optimal, while the second term encourages to explore more child nodes.

Expansion: We randomly select one unvisited child node of the most urgent expandable node to be a new node that is added to the tree.

Simulation: We run several rollout simulations to determine a complete passing order based on the partial passing order represented by the current new node to evaluate the potential of the new node.
The classical MCTS randomly samples and adds the uncovered vehicles into the passing order string one by one, until we find a complete passing order string and reach the maximum depth of the tree from the current new node without branching [16]. For example, when we apply random sampling policy to the node CB shown in Fig. 4, we can randomly expand a direct child node in its next layer; say node CBA. The node CBA will be further expanded by repeating such a process until a leaf node (e.g., node CBADE) is reached. Finally, the partial passing order will be evaluated by all its simulated offspring leaf nodes (passing orders). Sometimes, the generated passing order is not invalid, because it may violate the prohibition of lane change, such solutions will be discarded after check.
After simulation, we update the scores of the current new node as follows:

Apply Algorithm 1 to calculate the total delay of the partial order corresponds with the current new node.

Apply Algorithm 1 to calculate the total delay of the partial order corresponds with the best offspring node of the current new node via simulation.

Calculate the score of the current new node as
(3) where is a weighting parameter. Since , we normalize and into before updating .


Backpropagation: The simulation result is backpropagated through the selected nodes to update the scores of all its parent nodes.
During the building process of the search tree, the stateoftheart best passing order is continuously updated. As soon as the computation budget is reached, the search terminates and returns the stateoftheart best passing order. The planned arrival times of vehicles can be determined by using Algorithm 1. The velocity and acceleration profiles of each vehicle plan will be finally calculated by using the motion planning method proposed in [14].
We can see that the performance of the proposed strategy is influenced by the choice of the parameters including the maximum search time and two weighting parameters and . We will discuss how to choose these parameters in Section IV below.
IiiB The MCTS + Heuristic Rules
As aforementioned, the classical MCTS strategy uses random sampling to generate a leaf node (a passing order) in the simulation step. However, because of the huge number of possible passing orders, the passing orders generated by random sampling cannot help us quickly capture the real potential of a node during simulation.
Thus, we propose the following heuristic rules to help decide which nodes (vehicles) should be expanded (added into the candidate passing order string) during simulation. Heuristic rule 1 helps to quickly prune the invalid passing order [5]. Heuristic rule 2 determines the vehicle among the candidates to be chosen.

For the vehicles on the same lane, the vehicle which is the closest to the conflict zone should be added earlier than other vehicles since changing lane maneuver is prohibited.

For the vehicles passing through the same conflict subzone, the vehicle with a less desired arrival time should be added earlier.
The simulation step can be summarized as Algorithm 2. We can see that the classic MCTS applies random sampling in both expansion and simulation steps; while our MCTS + heuristic rules applies random sampling only in expansion step.
The Ad hoc negotiation based strategies organize all the vehicles according to the FIFO principle. In contrast, the new simulation policy tends to organize just a part of vehicles (the vehicles uncovered in the current partial passing order) according to the FIFO strategy. This trick helps to avoid the convergence to a overgreedy solution.
Iv Simulation Results
Iva Simulation Settings
We design three experiments to determine the best parameter set for the new cooperative driving strategy and compare it with some classical ones. These experiments are conducted for the intersection with three lanes in each leg shown in Fig. 1. The mandatory signs stipulate the permitted directions for each lane. According to the geometry of the intersection, the conflict zone is further divided into 36 subzones. The vehicles arrival is assumed to be a Poisson process. We vary the mean value of this Poisson process to test the performance of the proposed strategy under different traffic demands. The vehicles arrival rates at all lanes are the same unless otherwise specified. It should be pointed out that we had tested other intersections with different road geometries and various vehicle arrival patterns, but the conclusions remain unchanged.
To accurately describe the total delays of vehicles, we adopt the pointqueue model in the simulation [6, 19]. The model assumes vehicles travel in free flow state until it gets to the boundary of the intersection we study. If the preceding vehicle leaves enough spaces, the first vehicle in the pointqueue will dequeue and enter the intersection. Otherwise, it will stay in the virtual queue. Each lane has an independent pointqueue.
In this paper, we reschedule the passing order of all the vehicles within the control zone every 2 seconds. As suggested in [6], we set the minimum safety gap between two consecutive vehicles passing through the same subzone as a slightly enlarged constant as
(4) 
to avoid the collisions caused by position measurement errors and communication delay.
IvB The Choice of Parameters
In this paper, we consider two performance indices: the delay of the given vehicles and the traffic throughput (the number of vehicles that has passed the intersection control zone) within a given time interval to compare different cooperative driving strategies. Specially, we highlight the decreased ratio of the total delay if being compared withe baseline solution that is gotten by the FIFO strategy
(5) 
where is the objective value of the FIFO passing order, and is the objective value of the best passing order from the MCTS based strategy.
To determine the best parameter setting of the new MCTS + heuristic rules, we first fix the time budget as 0.1 s and vary and from 0 to 1. To better understand the performance of the strategy under different traffic conditions, we vary the vehicle arrival rate to generate a series of intersection scenarios with different number of vehicles.
Fig. 5 gives the improvement rates for the intersection shown in Fig.1 with 30 vehicles. We can see that a significant improvement can be achieved even with the worst parameter setting. The parameter and are not so critical but may still influence the balance between exploitation and exploration, partly because we use heuristic rules in simulation step to reduce the influence of random sampling. We further study the scenarios with other numbers of vehicles and the results are all similar. Thus, in the rest of this paper, we set and .
Then, to determine an appropriate time budget, we vary the time limits of tree search. To eliminate the influence of the computing power of the device, we examine the improvement rates with respect to the number of nodes that has been searched.
It can be seen from Fig. 6 that the improvement rate increases significantly when the number of searched nodes increases from 10 to 1000. However, the improvement rate soon becomes saturated after that. Thus, we believe that the proposed strategy can obtain a good enough passing order through searching 1000 nodes. For most intersection scenarios, 1000 nodes can be searched within 0.1 s in our personal computer, so we set the maximum search time as 0.1 s for the following experiments.
IvC Comparisons of Different Cooperative Driving Strategies
To further clarify the difference between the FIFO strategy and our new strategy, we study a typical intersection scenario with single lane in each leg and 20 vehicles. We calculate the objective values for all the valid solutions (passing orders) and plot them in a histogram manner; see Fig. 1.
It is clear that the solution found by the MCTS based strategy is nearly the same as the global optimal solution found by the enumeration based strategy, while the computation time of the MCTS based strategy is much less. For the FIFO based strategy, the computation time is the least, but the solution is far away from the optimal solution. The solution found by the MCTS based strategy ranks 648th in the nearly 10 billion solutions; while the solution of the FIFO based strategy ranks 4563421793th.
We then carry out another comparison for the intersection shown in Fig. 1, where the average arrival rate is varied to explore the influence of different traffic demands. For each arrival rate, we simulate 20minute. It is obvious that our new strategy further reduces the average delay and improves the traffic throughput in all situations.
Arrival rate veh/(lane*h)  Strategies  Average delay (s)  Traffic Throughput (veh) 
150  FIFO  1.3053  589 
MCTS  0.4499  605  
300  FIFO  39.8313  1095 
MCTS  1.1407  1168  
450  FIFO  41.6996  1205 
MCTS  4.8743  1766 

The computation time of the MCTS based strategy is 0.1s.
IvD A Further Look into the Structure of the Obtained Search Tree
Fig. 8 shows the formulated search tree of our new strategy for an intersection scenario with 50 vehicles. Similar to the classical MCTS strategy, our new strategy tends to first find some promising branches (partial passing orders) of the tree and spends most search time to further explore these branches. However, the search tree generated by the classical MCTS strategy contains much more unnecessary leaf nodes. In contrast, when the heuristic rules are introduced, only a very small number of leaf nodes will be finally reached. For this case, although there are more than possible passing orders, only about two thousands passing orders are explored by our new strategy within 0.1 s. This difference explains why the classical MCTS needs much more time to find a good enough passing order.
V Conclusion
In this paper, we propose a cooperative driving strategy that combines Monte Carlo simulation and heuristic rule simulation to accelerate the search of the passing order. This new method can quickly learn the tree structure knowledge of the given scenario and find a nearly optimal solution with a short time. Although we only discuss the schedule of vehicles at unsignalized intersections, this method can be easily adapted to other scenarios (e.g., ramping areas and working zones). We are currently building several automated vehicle prototypes so that we can test our new strategy in field studies in the near future.
References
 [1] P. T. Li and X. Zhou, “Recasting and optimizing intersection automation as a connectedandautomatedvehicle (cav) scheduling problem: A sequential branchandbound search approach in phasetimetraffic hypernetwork,” Transportation Research Part B: Methodological, vol. 105, pp. 479–506, 2017.
 [2] L. Li, D. Wen, and D. Yao, “A survey of traffic control with vehicular communications,” IEEE Transactions on Intelligent Transportation Systems, vol. 15, no. 1, pp. 425–432, 2014.
 [3] T. Sukuvaara and P. Nurmi, “Wireless traffic service platform for combined vehicletovehicle and vehicletoinfrastructure communications,” IEEE Wireless Communications, vol. 16, no. 6, 2009.
 [4] S. I. Guler, M. Menendez, and L. Meier, “Using connected vehicle technology to improve the efficiency of intersections,” Transportation Research Part C: Emerging Technologies, vol. 46, pp. 121–131, 2014.
 [5] L. Li and F. Wang, “Cooperative driving at blind crossings using intervehicle communication,” IEEE Transactions on Vehicular technology, vol. 55, no. 6, pp. 1712–1724, 2006.
 [6] Y. Meng, L. Li, F. Wang, K. Li, and Z. Li, “Analysis of cooperative driving strategies for nonsignalized intersections,” IEEE Transactions on Vehicular Technology, vol. 67, no. 4, pp. 2900–2911, 2018.
 [7] H. Xu, S. Feng, Y. Zhang, and L. Li, “A grouping based cooperative driving strategy for cavs merging problems,” arXiv preprint arXiv:1804.01250, 2018.
 [8] E. R. Müller, R. C. Carlson, and W. K. Junior, “Intersection control for automated vehicles with milp,” IFACPapersOnLine, vol. 49, no. 3, pp. 37–42, 2016.
 [9] L. Chen and C. Englund, “Cooperative intersection management: a survey,” IEEE Transactions on Intelligent Transportation Systems, vol. 17, no. 2, pp. 570–586, 2016.
 [10] K. Dresner and P. Stone, “Multiagent traffic management: A reservationbased intersection control mechanism,” in Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent SystemsVolume 2. IEEE Computer Society, 2004, pp. 530–537.

[11]
——, “A multiagent approach to autonomous intersection management,”
Journal of artificial intelligence research
, vol. 31, pp. 591–656, 2008.  [12] M. Choi, A. Rubenecia, and H. H. Choi, “Reservationbased cooperative traffic management at an intersection of multilane roads,” in Information Networking (ICOIN), 2018 International Conference on. IEEE, 2018, pp. 456–460.
 [13] Y. Zhang, A. A. Malikopoulos, and C. G. Cassandras, “Decentralized optimal control for connected automated vehicles at intersections including left and right turns,” arXiv preprint arXiv:1703.06956, 2017.
 [14] A. A. Malikopoulos, C. G. Cassandras, and Y. J. Zhang, “A decentralized energyoptimal control framework for connected automated vehicles at signalfree intersections,” Automatica, vol. 93, pp. 244–256, 2018.
 [15] M. Enzenberger, M. Muller, B. Arneson, and R. Segal, “Fuego an opensource framework for board games and go engine based on monte carlo tree search,” IEEE Transactions on Computational Intelligence and AI in Games, vol. 2, no. 4, pp. 259–270, 2010.
 [16] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton et al., “Mastering the game of go without human knowledge,” Nature, vol. 550, no. 7676, pp. 354–359, 2017.
 [17] C. B. Browne, E. Powley, D. Whitehouse, S. M. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis, and S. Colton, “A survey of monte carlo tree search methods,” IEEE Transactions on Computational Intelligence and AI in games, vol. 4, no. 1, pp. 1–43, 2012.

[18]
L. Kocsis and C. Szepesvári, “Bandit based montecarlo planning,” in
European conference on machine learning
. Springer, 2006, pp. 282–293.  [19] X. J. Ban, J. Pang, H. X. Liu, and R. Ma, “Continuoustime pointqueue models in dynamic network loading,” Transportation Research Part B: Methodological, vol. 46, no. 3, pp. 360–380, 2012.
Comments
There are no comments yet.