Highway Traffic Control via Smart e-Mobility – Part I: Theory

02/18/2021 ∙ by Carlo Cenedese, et al. ∙ University of Groningen University of Pavia ETH Zurich Delft University of Technology 0

In this paper, we study how to alleviate highway traffic congestion by encouraging plug-in hybrid and electric vehicles to stop at a charging station around peak congestion times. Specifically, we design a pricing policy to make the charging price dynamic and dependent on the traffic congestion, predicted via the cell transmission model, and the availability of charging spots. Furthermore, we develop a novel framework to model how this policy affects the drivers' decisions by formulating a mixed-integer potential game. Technically, we introduce the concept of "road-to-station" (r2s) and "station-to-road" (s2r) flows, and show that the selfish actions of the drivers converge to charging schedules that are individually optimal in the sense of Nash. In the second part of this work, submitted as a separate paper (Part II: Case Study), we validate the proposed strategy on a simulated highway stretch between The Hague and Rotterdam, in The Netherlands.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

I-a Motivation

In the recent years, urban mobility in highly populated cities is becoming a central issue in many countries. Some alarming statistics show a pressing need for change, as the cost of congestion to the EU society is no less than € billion per year [eu:2019:traffic]. In fact, an inefficient transportation system deteriorates not only the citizens’ well-being, but also the environment, since traffic jams heavily increase the emission of [barth:2009:traffic]. The classical solution to the TDM problem is to increase the roads’ capacity or to build alternative routes. Although this approach produces tangible benefits [ganine:2017:resilience], policymakers and researchers are exploring alternatives that may be sensibly faster and cheaper to implement, and provide dynamic solutions that adapt to the traffic evolution.

I-B “Hard” and “soft” policies

In the past years, there has been a growing interest from the research community in the problem of ATDM, i.e., a dynamic or even real-time solution to the traffic control problem. The literature on the topic can be divided into the design of “hard” and “soft” policies to address the problem [goodwin:2008:traffic_soft_measures]. The hard type of measures tries to enforce changes in the drivers’ behavior by imposing some constraints or penalizing undesired actions. For example, several works studied the use of dynamic traffic signaling or traffic lights to influence the current traffic flow [como:2016:dynamic_signaling, guo:2019:signal_control_survey]. In [piacentini:ferrara:2018:moving_bottleneck], the authors impose an artificial bottleneck to decrease the flow in strategic part of the road and achieve an alleviation of the congestion. Another approach is to increase the transit price of the most congested roads in order to boost the use of alternative routes [downs:2005:still_stuck_in_traffic].

On the other hand, the so-called soft measures are designed to incentivize virtuous driver behaviors, and have their roots in behavioural economics and psychology. The word soft refers to the possibility of the drivers to ignore the incentives and stick to their regular conduct [goodwin:2008:traffic_soft_measures]. Usually, these policies do not imply any physical change of the infrastructure. In fact, they rely on economic incentives or leverage psychological phenomena to change the drivers’ habits. Most of the solutions based on this approach lack strong theoretical fundations, and an a posteriori analysis is performed to study their consequences. In [riggs:2017:painting_fence], the author explores the effectiveness of: monetary incentives, gifts and social nudges tapping into altruistic values. A personalized set of incentives (mostly monetary) is proposed in [hu:2015:incentive_based_ADM], where a platform is introduced that enables the commuters to receive incentives if they change their departure time to off-peak hours or use an alternative. Several other pilot studies have been performed and they have experimentally validated the benefits of soft policies, see [benalia:2011:changing_commuters_bahvior] among others. It is important to stress that these two classes of measures are not always mutually exclusive but they can be used in combination to amplify the final effect of congestion alleviation, as we advocate in this paper.

I-C Smart charging of Pev

The continuous growth of the number of PEV is also due to the improvements in the smart charging, allowing the vehicles to charge up to . This technology increases the appeal of short stops for the users, making the PEV more similar to fuel vehicles. This motivates several studies on how the PEV drivers may optimize their charging schedules and how they affect the distribution network. Some classic results [cenedese:2019:PEV_MIG, J_Hiskens_2013], tackle the problem of high peaks in the energy demand by proposing a dynamic energy price that leads to a change in the charging habits of the PEV owners, and consequently to the so-called “valley filling” effect [parise:colombino:2014:mean_field_charging_PEV]. Some recent works considered smart charging coupled with mobility. In [xu:2018:EV_urban_mobility_nature], the smart charging problem is enhanced by considering also the travel habits. However, the goal is solely to decrease the energy peak demand rather than the traffic congestion level. Other works focus on optimizing the charging of the PEV to decrease their travel time [wentao:2018:congestion_patterns_EV, razo:2016:smart_charging_for_highway_travel]. However, in these works the overall congestion level is not taken into account in the decision process. For this reason, we cannot consider these solutions as a form of ATDM. To the best of the authors’ knowledge, it is still an open, yet appealing, problem to develop ATDM strategies based on smart-charging of PEV whose main goal is traffic congestion alleviation.

I-D Paper contribution

Inspired by the conventional ramp metering control and motivated by the rising number of PEV, we propose for the first time a novel ATDM based on soft measures (via monetary incentives) that leverages smart fast charging of PEV in the road to alleviate traffic congestion during rush hours. Specifically, we propose a dynamic energy price discounted proportionally to the (predicted) congestion level. This approach encourages the PEV owners to stop for charging when the congestion level is (going to be) high, thus aligning the goal of the traffic control with the drivers’ self-interest. In the following, we emphasize our main contributions:

  • We use for the first time the electricity price as control input for the ATDM. While historically, traffic control had suffered from a lack of control means, this additional control input may prove itself essential to achieve the desired results by acting in synergy with the classical TDM.

  • We enrich the CTM with the introduction of r2s and s2r flows, newly defined to model the entering and leaving of the PEV in and out of a CS.

  • We carry out a formal analysis of the effects of the presented soft policy by describing the decision process of the PEV drivers as a generalized exact potential game.

  • We propose a semi-decentralized control scheme ensuring that the PEV involved in the decision reach an optimal charging schedule that represents their individual best trade-off between monetary saving and travel time.

In the second part of this work [cenedese:2020:highway_control_pII](Part II: Case Study), we validate this ATDM strategy on a simulated highway stretch between The Hague and Rotterdam, in The Netherlands.

Ii Cell Transmission Model with Charging Station

We consider a freeway stretch without ramps and only one CS where PEV may stop. In the literature, the most used model for traffic control is the CTM, see [daganzo:1995:CTM_part2].

Fig. 1: Partitioning of the highway in cells and CS for the PEV; compact graphical representation of the CTM and the notation for the first two cells.

Here, we explicitly introduce a revised version of the CTM described in [ferrara2018freeway, Sec. 3.3.1] adapted to our problem. We consider the discretized version of the model where each time interval of length is denoted by an integer . The highway stretch is modeled as a chain of subsequent cells (Figure 1). The vehicles in each cell are a mixture of PEV and non-PEV moving at a constant speed. Two subsequent cells are connected via an interface that models a certain flow of vehicles, whose value depends on the cells’ demand and supply capabilities. Without loss of generality, we assume that the CS is located between the first two cells. To formalize the CTM, for every cell and interval , we introduce the following set of variables:

  • : traffic density of cell during ;

  • (resp. ): total flow entering (exiting) the cell during ;

  • : flow entering cell from cell during ; (resp. ) is the flow entering (exiting) the highway during the same interval;

We enrich the conventional CTM introducing two flows:

  • : flow of PEV entering the CS during ;

  • : flow of PEV exiting the CS during .

Then, we associate a set of fixed parameter to each cell :

  • : cell length;

  • : free-flow speed;

  • : congestion wave speed;

  • : maximum cell capacity;

  • : maximum jam density.

Each cell can be seen as an input-output system where the inflow is the input and the outflow the output. The dynamics of the density of cell read as

(1)

where the inflow and outflow are defined as

(2a)
(2b)

Thus, the flows entering and exiting the CS modify only the definition of the in-flow of cell and out-flow from cell .

Remark 1 (r2s and s2r flows)

The concepts of r2s and s2r flows are inspired by the “off-ramp” and “on-ramp” flows [daganzo:1995:CTM_part2], respectively, and used to model the temporary stop of some PEV at the CS, which leads, differently from the off- and on-ramp flows, to a mutual dependency between r2s and s2r. We investigate this further in Sections III and V. In the literature, only the on-ramp flow can be controlled, e.g. via a toll, while our control action influences the off-ramp flow as well.

The demand of cell and the supply of cell directly influence the admissible flow between the two cells. The former is the flow that cell can send to cell in the time interval , while describes the flow that cell can receive in the same interval:

(3a)
(3b)

The relations in (3) directly define in (2). In fact, if , then the flow between the cells reads as . On the other hand, the flow between cell and is described by a more complex relation, due to the presence of the CS:

(4)

where the time dependency is omitted. The first case in (4) reflects the free-flow scenario, while the second reflects the presence of a congestion, as the supply of cell is saturated by and . Finally, and are the input and output flows of the CTM, respectively.

Throughout this section, we have defined the whole CTM except for r2s and s2r. The remainder of the paper is devoted to design the decision process that the PEV carry out to choose whether or not it is worth stopping at the CS. In turn, this determines r2s and s2r, as we show in Section V.

Iii Decision making process

We assume the presence of a HO that aims at minimizing the congestion. It does that by discounting the energy price if the level of congestion grows (or is expected to grow). In this setup, the HO would have the role of the so-called choice architect, by designing the price at all time intervals. Our solution leverages two main aspects: first, if the road is congested, the benefit of keep driving decreases, due to a longer travel time, and, at the same time, stopping at a CS to charge becomes more profitable due to an energy price discount. Secondly, we take advantage of the range anxiety, which is a well-known cognitive bias affecting PEV drivers making them impatient to stop at a CS even if they do not strictly need it [nuubauer:2014:range_anxiety].

We model the multi-agent decision process of the PEV exiting cell , at every time interval , by defining a set of interdependent local optimization problems. Each PEV (or agent/player) aims at minimizing its own local cost function subject to a set of constraints, where couplings between the agents arise in both the cost functions and the constraints. From a mathematical point of view, specific in our problem setup, the collection of all these optimization problems determines a mixed-integer potential game subject to best-response dynamics. The output of the decision making process (or game) is the set of all the choices (or strategies) of the PEV to stop or not at the CS, which affects the r2s flow. The s2r flow is instead a consequence of how long the agents decide to linger at the CS.

Iii-a Cost function

Next, we design the cost function of the PEV exiting cell that are involved in the decision making process. We postulate that the interest of each driver is twofold: he is interested in minimizing the travel time, while he is also willing to save money for charging his PEV. Between the two, in most cases, the primary concern will be the travel time, especially in normal traffic conditions, when no discount is present. Nevertheless, the travel time aspect becomes less relevant if a heavy congestion arises; in this situation, the relative impact of the travel time spent at the CS decreases, and at the same time, the discounted energy price may steer the decision of the agent towards the choice of stopping to charge the PEV.

Iii-A1 Number of vehicles

At each time interval , the number of vehicles involved in the decision process is , which may vary due to the traffic conditions. From (2), we show that the total number of vehicles exiting cell during is . Between those, the fraction of PEV is denoted by . By relying on (2a) and (4), we attain

(5)

where the time dependency is omitted and denotes the floor of .

To compute via (5), the value of r2s is necessary, even though it is the solution of the decision process that we are defining. Thus, it is not possible to exactly define . At the same time, s2r does not affect the computation of , since it is due to PEV already at the CS, so not involved in the decision process arising at time . To overcome this impasse, we compute the number of agents involved in the game under the assumption of maximum congestion, namely if no agent stops at the charging station (r2s). Then the number of agents taking part in the decision process is obtained as

This assumption not only implies that can be computed for every , but also that all the vehicles involved in the game manage to exit cell during the time interval , and therefore being able to implement their strategies.

Iii-A2 Decision variables and the SoC dynamics

The time-varying set indexing the PEV taking part in the game during the -th time interval is denoted by . The decisions of the agents are performed over a prediction horizon of time intervals. The length of the time intervals in the decision making process may be longer than the one used in the CTM. Specifically, we assume intervals of length , with . Thus, the PEV should plan their behavior over the set of intervals , where each index denotes an interval of length , e.g., represents here and similarly denotes .

The SoC of the battery of every PEV at time is denoted by , where represents a fully charged battery, while a completely discharged one. The amount of energy purchased by agent during the time period is . For every interval , the SoC reads as

(6)

where is a coefficient associated to the battery efficiency and capacity . Let us introduce the variable as a binary decision variable, which takes value if the vehicle is actively charging at the CS and otherwise:

(7)

The logical implication above entails that the energy purchased by PEV is positive if and only if , therefore we define , with (respectively ) being the maximum (minimum) energy that the vehicle can receive from the CS in a single time interval. We define a first collection of decision variables associated with each PEV , over the whole horizon

, as the collective vectors

, and the evolution of the SoC, .

Iii-A3 Travel time and congestion

An important quantity influencing the PEV decisions is the additional time that a PEV would experience due to the presence of congestion. Specifically, denotes the difference between the travel time that a PEV experiences to actually travel throughout the cells and the one it would spend in conditions of free flow. It provides insightful information on the traffic evolution, allowing the PEV to discern whether or not they prefer to stop at the CS. If agent decides to stop for charging, the congestion it will experience, when it merges back in the mainstream during the time interval , depends also on those that were behind it at the time of the decision . Among all the vehicles exiting cell in the time interval , the PEV have the possibility to stop for charging, deciding via a process akin to the one that agent is currently carrying out. For this reason, the exact value of cannot be computed in advance by agent for the whole prediction horizon. We work around this difficulty by adopting a conservative approximation of , computed assuming that all the PEV in and the ones following them do not stop at the CS. This leads to a value of

that over-estimates the actual experienced travel time. This approximation allows the agents to cope with the worst-case scenario, hence being able to meet possible time constraints. Moreover, it can be computed at every time interval and provides insightful information on the potential traffic evolution.

For a cell , the vehicles’ speed is attained as . If an agent enters cell during the time interval and it takes intervals to travel through it, then the velocity at which it will move when it enters the next cell is . This observation motivates the following recursive, but implementable, definition

(8)

where

Here, denotes the vehicles’ speed computed in the worst-case scenario described above. The value represents the travel time in the case of no congestion in the cell . Under the assumption above, can be always computed by letting the CTM evolve freely. It is worth noticing that, if the PEV will experience no congestion along the whole freeway, i.e., for all , then .

Another important quantity related to the congestion is the number of vehicles that leave the CS at every time instant. In fact, if an agent leaves the CS when many others are also merging back into the mainstream, it may experience high levels of congestion. To model this phenomenon, we introduce a binary variable

for every .

For every PEV entering cell at time , and for every , we define if , and otherwise. Thus, is a rectangular function of width at most intervals and at least (Figure 2). This variable is used to capture the influence of the PEV entering cell around the same time as agent . The value of depends on . In fact, if is large, then PEV will not experience the congestion due to the PEV that precede or follow him, so . On the other hand, if is small, the value of has to be high to model correctly the possible congestion due to those agents that enter cell around the same time as agent . We elaborate further on this in the next section. Also in this case, we denote the collective vector over the whole as . Therefore, for every and , approximates the extra time that agent would experience due to those PEV entering cell around time and it reads as

(9)

The first double summation, denoted by for all , represents the number of PEV, that already completed the decision process, entering cell during one of the intervals . The coefficient is proportional to the average amount of time agent spends for every PEV entering cell during the intervals . This coefficient may be estimated via historical data and engineering understanding or based on the worst-case scenario.

Fig. 2: Feasible choices of and for when and . The two illustrations show when: (a) the PEV does not stop at the CS; (b) the PEV decides to stop.

Iii-A4 Energy price

In our model the dynamic energy price is discounted by the HO in conjunction with a traffic congestion. On the other hand, it is also linked to the local energy demand required in the distribution network, i.e., , where is the total energy purchased by the PEV and denotes the base energy demand of the local network. During the time interval , we can study the congestion by looking at how much the travel time increases w.r.t. the free-flow case, for each cell . This quantity is defined by , and . Here we assume it is used by the HO to link the price to the congestion level, namely the higher the lower the price. Thus, the energy price that the HO imposes for every unit of energy purchased reads as

(10)

where are scaling parameters tuned by the HO.

We note that the exact energy price applied in the future time intervals cannot be computed in advance by the HO, since it depends on the traffic evolution, which is not completely known due to the arbitrary future choices of the drivers. Nevertheless, to allow the PEV to perform an informed choice, we let the HO compute an estimation of the real for the whole prediction horizon . Then, this value is broadcast to the PEV in and it is used by them to execute the decision process. If the congestion grows, then the price should drop, even though, intuitively, the discounted price leads to a larger number of PEV stopping, and consequently an increment of . We define as an approximate value of , which is computed by assuming that no agent exiting cell during the prediction horizon stops at the CS. This assumption translates into r2s for all . The density and the flow during the time interval are attained by letting the CTM evolve freely. Therefore, the approximation of the additional time spent by the agent, due to the congestion in the cell , reads as

(11)

The value of overestimates the additional travel time spent due to the congestion level on the road during the interval . We introduce two time-varying vectors of offsets and coefficients, and respectively, and define the estimated price, for every time interval , by

(12)

At every time instant , the HO may use historical data on the traffic flow to compute the values of and

that are supposed to minimize the error between the real and estimated price. This may be done with several techniques (e.g. linear regression or Bayesian estimation).

Remark 2

The definition in (12) implies that the estimated price is not affected by the strategies of the other agents in the game. Nevertheless, the strategies implemented by the PEV involved during , directly influence the estimated price used by the PEV that will play the game during . Therefore, the price dynamically changes over time and is assumed to be fixed only inside the single decision process.

Iii-A5 Cost function formulation

The goal of each PEV is to find the best trade-off between saving money, and travel time. These two cost terms are described for every PEV by the functions and , respectively. The amount agent saves by charging at a discounted price depends on the total energy it purchases:

where represents the average cost agent would experience via standard fast charging, and it might vary between PEV. Next, we define the cost associated to the total travel time experienced by vehicle by

where the notation is used to denote . The quantity weights the time spent at the charging station, while the rest approximates the additional time spent if the agent enters cell during the time interval . The parameter weights the different perception that the agent has in spending time at the CS or in a congestion. The time-varying factor normalizes the cost function with respect to the width of the rectangular function . Note that the presence of creates a coupling between all the decisions of the PEV in the game. In , the presence of entails that only some elements of the summation are not zero. Furthermore, its rectangular shape implies that the decision of agent depends also on those agents that enter cell during an interval distant at most intervals from the one in which will enter cell , see Figure 2. As anticipated, this feature models the different speeds of the vehicles in a cell.

Each agent may weight differently the two objectives thus we model the final cost as a convex combination of the two:

(13)

for some . We study the effects of this parameter on the performance in [cenedese:2020:highway_control_pII](Part II: Case study). Finally, we highlight that the nature of the approximation of and the estimation are intrinsically different. In fact, despite both uncertainties are due to the presence of the human in the loop, the second is part of the policy designed by the HO to reduce the congestion, while the first is used to model the drivers’ local decision. Consequently, the complete policy includes the actual price applied and its estimation over the prediction horizon, that are broadcast by the HO to the PEV and used to influence their decision.

Iii-B Local and coupling constraints

We model the constraints on the drivers’ possible choices as a collection of logical implications. First, we impose that the intervals in which are consecutive. To do so, we require that, for all , changes its value from to and back to only once:

(14a)
(14b)

where , so the raising edge must precede the falling one. Then, we force the intervals in which to be consecutive, and hence for all and it must hold that

(15)

Clearly, if a PEV enters cell at time , it cannot charge in the remaining time intervals (Figure 2), and thus we obtain the following relation, for all and :

(16)

This condition models also the case in which a PEV is not stopping, so and (16) implies that the PEV is never charging, i.e., for all . Furthermore, (14) and (15) imply that the PEV disconnect at least time intervals before the end of the prediction horizon. Next, we impose that, when an agent is done with charging, it exits the CS, and hence, for all , agent has to satisfy

(17)

We impose that each PEV charges for at least consecutive time intervals. In fact, the value may be small and it is unreasonable to allow a PEV to stop for charging for only one time interval (e.g. minutes). This translates into

(18)

Similarly, if a PEV decides to stop, then we assume it remains at the CS for at least time intervals, and hence

(19)

For each PEV, the minimum level of SoC necessary to reach the final destination from cell is denoted by . Thus, we assume that a PEV can enter cell if , otherwise it must stop (or remain) at the CS for charging, so for all ,

(20)

where implies that PEV cannot leave the charging station during the time interval . The next constraint limits the maximum amount of energy that the CS can supply during each time interval by . Thus, for all , we have the following coupling constraint on the connected PEV:

(21)

where is the total energy that the agents, that already completed the decision process, planned to purchase during the time interval .

Finally, we consider that if several PEV stop at the CS simultaneously there can be a scarcity of charging plugs. Let denote the total number of plugs at the CS. Then, we have

(22)

where is defined analogously to .

The above constraints allow the PEV to stop at the CS and do not start charging immediately (for example due to a lack of free plugs), and this may lead to the formation of a queue. We model the queue as a First-In-First-Out (FIFO), i.e., the vehicles already waiting have the priority over the PEV entering it afterwards. This aspect is important to realistically model the PEV behaviors, which would be hard to formalize without the use of mixed-integer variables.

In Figure 2, we qualitatively represent a feasible choice of and for PEV and how it is reflected in the driver’s behavior. In Figure 2a, agent does not stop at the CS. In comparison, in Figure 2b, the PEV enters the CS, but, since all the plugs are busy, it waits for the first two time intervals before connecting to the CS. Once it finishes the charging phase, i.e., , it merges back into the mainstream, according to (17).

We conclude this section by introducing a preliminary formulation of the set of inter-dependent mixed-integer optimization problems that model the decision process performed by the PEV during every time interval :

()

Several constraints in () are expressed via logical implications, thus this problem should be mathematically reformulated to be solved. Specifically, in the Appendix we adopt a process akin to the one used in [cenedese:2019:PEV_MIG, fabiani2018mvad] to transform the logical implications into mixed-integer affine coupling constraints by additional auxiliary variables.

Iv Formulation of the mixed-integer game

As a result of translating the logical implications into affine constraints, we recast () as the following mixed-integer aggregative game, subject only to linear mixed-integer inequalities:

()

The vector of all the decision variables in () is defined as

and , we obtain a compact form of :

(23)

where is the set of strategy that satisfy the local constraints of , while and are of suitable dimensions and are used to describe all the coupling constraints between the agents. We denote the set of all feasible strategies of player as

(24)

where indicates the collective strategy vector, with being any feasible strategy of . Then, the set of all the feasible collective strategies is

where .

Perhaps, the most popular notion of equilibrium for games like is the Generalized Nash Equilibrium (GNE), where no agent can reduce its cost by unilaterally changing its strategy to another feasible one [cenedese_et_al:2019:TAC:proximal_point, cenedese:2019:ECC]. Here, we are interested in an approximate solution for mixed-integer games, i.e., MINE.

Definition 1 (-Mixed-Integer Nash equilibrium)

A set of strategies is an MINE, with , of the game if, for all ,

with as in (24).

Iv-a Potential game structure

In this subsection, we prove that the game is an exact potential game [monderer:shapley:1996:potential_games]. Potential games are characterized by the existence of a potential function that describes the variation of the cost when an agent changes strategy.

Definition 2 (Exact potential function)

A continuous function is an exact potential function for the game () if, for all , and , , it satisfies

To find the potential function , we first reorganize the local cost function as:

(25)

where depends on the local variables only, and incorporates the cross terms depending on the other players’ strategy . From (9), we derive that

(26)

Thus, meaning that the agents influence each other in a symmetric way. In the next statement, we introduce the exact potential function for the game in .

Theorem 1

For each , the game is an exact potential game with

as an exact potential function, where is as in (26).

Proof:

The proof is akin to the one in [fabiani2019nash].

The pivotal result that highlights the importance of the above theorem is that an -approximated minimum of the potential function is also an MINE of the game , see [sagratella2017algorithms, Th. 2]. Thus, it is sufficient to show that the proposed algorithm converges to a minimum of the potential function in order to achieve the sought convergence result.

V CTM traffic control scheme

We can now focus on the connection between the traffic dynamics and the decision process of the PEV. Then, we describe in details our proposed algorithm that the agents can use to seek an equilibrium of the game.

V-a Iterative semi-decentralized algorithm

We propose here a semi-decentralized iterative algorithm (Algorithm 1) that the agents in can adopt to solve the MI-GPG . The notation denotes the strategy of agent at the -th iteration of the algorithm.

After the initialization step, where the players receive the information broadcast by the HO, each PEV decides to update its strategy independently from the others. If an agent wants to update, it sends a request to the HO. If no other player is currently updating, then agent starts its local update given the aggregate quantities , and , used to compute the cost and the coupling constraints (21), (22). On the other hand, if another agent is performing the update, agent

enters a FIFO queue from which the HO extracts sequentially the future agents that are allowed to update. At the moment of the update, agent

computes a best-response strategy w.r.t. the strategies of the others. We define the mixed-integer best-response mapping for agent , as

(27)

where may be a set, thus .

Agent updates its current strategy only if leads to an (at least) -improvement in terms of minimization of its cost.

The iteration is completed after the PEV communicates to the HO its (possibly) new strategy and the HO uses it to revise all the quantities in the game that depend on .

In the following result, we show that Algorithm 1 converges to an MINE of the game , under the assumption that all the players manage to update their strategies over a sufficiently large number of iterations.

Proposition 1

Let and , and assume that for every and there exists a such that . Then, Algorithm 1 computes an MINE of the game in (23).

Proof:

From Theorem 1, is an MI-GPG with an exact potential function for all . Therefore, the result in [sagratella2017algorithms, Th. 4] applies to show that the sequential best-response based algorithm proposed in Algorithm 1 converges to an MINE of the game.

Remark 3 (Privacy and scalability)

In Algorithm 1, the HO shares with each PEV only aggregate information on the choices of the others. This feature allows to preserve the privacy of the agents in the game, since an agent cannot retrieve the local decision strategy of another PEV based on the data received from the HO. Moreover, using aggregate information is also important to preserve the scalability of Algorithm 1. In fact, the amount of data shared between each PEV and the HO does not grow with . This is crucial to obtain an implementable solution, due to the (possibly) large number of vehicles involved.

Initialization: For , HO sends to every the coefficients and , ,,, , .
Update: Choose , set while  is not an MINE do
       HO do
             Extracts from the waiting queue .
             Sends , and to .
      endPlayer do
             Computes as in (27)
             if 
                  
             else
                  
             end if
            Sends to HO
            
      endSet ,
end while
Algorithm 1 Sequential best-response

V-B Complete CTM control loop

The HO, introduced in Section III, plays a crucial role in collecting and broadcasting information from and to the vehicles on the highway stretch. We propose the following decision process which takes place at the beginning of every time interval via the following four steps.

s.1) HO collects information

The HO collects information, from the sensors on the highway (placed at the interfaces between cells), on the cells’ density, i.e., for all . The HO computes the following set of variables: via (8), , , , , via (11), via (10) and via (12), by exploiting the CTM and the strategies of the PEV that performed the process during the previous time intervals.

s.2) HO broadcasts information

Those PEV that have the possibility to stop during the time interval , i.e., the ones leaving cell , connect with the HO, forming the set of players involved in the game. The HO broadcasts to all of them the quantities they need to initialize the game , i.e., the initialization phase in Algorithm 1. Moreover, the HO applies the price in (10) to the energy purchased by the PEV currently charging at the CS.

S.3) Iterative solution of the decision process

After the initialization, the agents update their strategy as shown in Algorithm 1, and described in Section V-A. The PEV keep updating until they converge to an MINE of the game , hence a feasible set of strategies , which is convenient to each of the PEV. We stress that the iterations to solve the game are unrelated to the intervals of the CTM or the intervals in , and in fact they are all completed within the interval .

S.4) Strategy implementation

The agents in implement their final strategies (i.e., stop at the CS or continue driving) and the process will start again from (S.1) at the beginning of the interval .

The presence of the human in the loop imposes a bi-level implementation of step (S.4). We envision that every PEV performs the computations in (S.3) via a dedicated software, then the final strategy is translated into a simple message that is prompted to the human user advising whether it is convenient or not to stop at the CS. In the end, the driver implements the suggested behavior.

Finally, we want to elaborate on how to compute, starting from , the in and out flow of the CS, i.e., r2s and s2r respectively. From the constrains in Sec. III-B, agent does not stop at the CS if and only if , thus the flow entering the CS is defined by

(28)

The flow exiting the CS is