Introduction
Smart navigation systems (e.g. GPS devices, navigation applications on mobile/smart devices) have transformed the transportation domain in terms of reducing cognitive overload in travelers. However, such technological advancements have had little impact on several fundamental issues such as mitigating congestion INRIX and reducing carbon emissions energy, which have only worsened over time. For instance, current stateoftheart navigation systems employ traditional shortest paths algorithms, such as Dijkstra’s algorithm lanning2014dijkstra, Bellman–Ford or WarshallFloyd, and algorithms 10.5555/1614191 to recommend routes and mitigate travelers’ cognitive overload. On the other hand, selfish travelers exhibit multiattribute preferences, which are typically misaligned from system’s interests. As a result, travelers often reject route recommendations that involve nonpersonal transport modalities, such as public transportation, ridesharing services and other micromobility services USCensus; mckenzie2015drives. Although unintentional, people have steered away from personal car usage during the ongoing COVID19 pandemic in 2020 INRIX2020, which have resulted in significant cost reductions in terms of congestion, carbon emissions as well as collisions. Our goal in this paper is to steer selfish travelers away from personal car usage (even under nonpandemic conditions), via offering them alternative routing choices in a persuasive manner.
Selfish routing is a strategic framework where travelers employ their bestresponse routes selfishly according to their respective preferences to form an equilibrium. However, the central authority (e.g., a city transportation department) chooses a socialwelfare objective that is not necessarily aligned with all travelers’ interests. This leads to system inefficiency, which can be quantified by priceofanarchy (PoA) roughgarden2002bad. Several techniques have been proposed to drive PoA towards unity, which happens when the equilibrium outcome is optimal in terms of the system’s objective. A seminal example is marginal cost pricing, where selfish travelers are imposed taxes based on their marginal contribution to the system’s objective ruggles1949recent
. Although the idea of marginal cost pricing has been floating around for several decades, the technique remains practically infeasible due to our inability to estimate marginal costs accurately.
In sharon2019marginal, the authors studied the effects of underestimating marginal costs on the optimality in terms of system objectives, and showed that taxing underestimating marginal costs produces an outcome that is at least as good as having no taxes. Although attempts have been made to implement such solutions by authoritarian regimes yang2020marginal, the friction to adopt marginal cost pricing continues to persist due to various political reasons in democratic nations. Another powerful idea to influence traveler behavior is Stackelberg routing, where a fraction of agents are routed centrally, while the remaining agents are allowed to choose their routes selfishly swamy2007effectiveness. A similar routing algorithm is proposed In samal2018towards based on multiobjective A with a goal to design routes that decrease the overall network congestion.
Meanwhile, informationrevelation systems have also been proposed acemoglu2018informational; arnott1991does; mahmassani1991system, where the traffic state is revealed to travelers as opposed to recommending routes. Although such systems do not mitigate cognitiveoverload at the travelers, they have been found to generate a positive impact on traffic congestion and other global objectives even in nonstrategic settings. However, these systems still suffer from poor persuasive ability, in terms of inducing behavior modification among travelers. A natural and effective solution is to design information strategically at the city transportation department, and present it to the travelers to steer their routing decisions towards socially optimal outcomes.
Recently, strategic information design has been studied in the transportation domain when the network congestion state is uncertainly available at the travelers. For example, in das2017reducing, the authors computed bestresponse signals under firstbest, full information, publicsignal and optimal information structure scenarios in the context of Wheatstone Network; they demonstrated that optimal information structures reveal only partial information revelation to mitigate network congestion. Similar results have been found in wu2019information in the case of Pigou networks (graphs with parallel routes between a singlesource and a singledestination) in the presence of state uncertainty on one of the routes. Optimal information structures have been found using Bayesian persuasion framework to reduce average traffic spillover on a specific route in a Pigou network.
Despite the above development, existing works in strategic information design in transportation settings make several impractical assumptions. Since this is still a fledgling topic, almost all efforts assume that travelers are expected utility maximizers (EUM). However, there has been a strong evidence from realworld observations that travelers deviate from EUM behavior quite frequently. Such an effort was first made in nadendla2018effects, which studied strategic information design in a singlesender, singlereceiver setting when both are prospecttheoretic agents. Nevertheless, this framework is not applicable to transportation domain where there are multiple receivers. Another impractical assumption is the consideration of singleattribute costs and unimodal transportation networks, all of which are far from reality. Therefore, in this paper, we consider a more realistic transportation framework and develop a novel strategic information design framework as stated below.
First, we assume that the travelers’ responses exhibit quantal response equilibrium (QRE), where deviations from EUM at each traveler are captured by the randomness within the stochastic utility maximization framework luce1959individual. We model the strategic interaction with the system as a novel StackelbergQRE game, where the system (leader) exhibits EUM behavior, while the travelers (followers) exhibit logit responses. Second, we assume that both the system and travelers exhibit nonidentically weighted multiattribute preferences. Specifically, we assume that the system’s motive is to reduce both network congestion (in terms of travel time) and carbon emissions on the entire transportation network, whereas the traveler wishes to minimize travel time and/or carbon emissions along his/her personal route.
Inspired from Bayesian persuasion kamenica2019bayesian as well as the method in bergemann2016bayes; mathevet2020information, when there is a single sender and multiple receivers, we develop a novel, approximate strategic information design algorithm to steer Logit Response travelers towards social welfare using strategic Information design (in short, LoRI). Our proposed algorithm LoRI uses the predictorcorrector method to find quantal responses at the travelers, and finds a locallyoptimal stateinformation signal using interiorpoint algorithms that minimizes a nonconvex system cost. Simulation results demonstrate that LoRI outperforms single source shortest path algorithms (e.g., Dijkstra’s algorithm) and improves social welfare in a Wheatstone network. We show that the system’s cost reduces by when SSSP algorithm is designed with a misaligned objective function.
System Model and Problem Formulation
Let a multimodal transportation network consisting of travelers at time , be represented as a graph , where represents the set of physical locations (vertices), and represents the transport interconnections (edges) between various locations in . Let support a gamut of transport modalities . For the sake of convenience, we expand the network into a multilayered graph using unimodal subgraphs , and switch edge sets which interconnect modality to modality within each vertex. For example, consider a Wheatstone road network with four vertices and ten edges, as illustrated in Figure 1. Consider transport modalities on this network, and ={Private Car (colored black), Metro Train (colored blue) and Walking (colored green)}.
Using unimodal subgraphs and switch edges (depicted using dashed lines), we expand the example network into a multilayered graph , as shown in Figure 2.
We model the network state as , where is the number of travelers on edge at time .
Let there be a central entity (a.k.a. the system), which evaluates the network state in terms of the overall traffic congestion and carbon emissions using a weighted multiattribute cost. Assuming that there are attributes, each edge
has a multiattribute cost vector
]. The system evaluates the cost of each edge at time as(1) 
Since centralized systems typically have access to sensing infrastructure across the network to measure the network state in realtime, we assume that the system has greater information regarding the current state than the travelers.
In this paper, we assume that the system constructs a multidimensional signal to steer traveler’s decision, where
(2) 
is the state transition probability shared by the system to the
traveler. The system constructs this signal with the goal of steering travelers’ decisions towards system’s optimal (a.k.a. social welfare).Note that the overall system cost after a finite time horizon depends on decisions taken by all the active travelers and all the signals presented to the active travelers. It comprises of both past and future costs, and is given by
(3) 
where is the signal profile sent to all the travelers in the network; is the path profile chosen by the travelers; and denotes the a priori system’s belief probability regarding the state of edge being at time . Then, we define the system’s rationality as follows:
Definition 1.
The system’s motive is to minimize its cost function that depends on all the travelers’ decisions and the signals presented by the system. The motive is given by:
(4) 
Although these signals can be revealed by the system at any time, the travelers can take advantage of this information and change their path only when they are present at some node. We label such agents as active travelers. In other words, we can define the state of the traveler at time as
(5) 
In other words, an active traveler’s state gets updated to an inactive state as soon as an active traveler chooses the next edge, and remains so until he/she traverses that edge completely and reaches the other vertex as shown in Figure 3. That is, is equal to the total number of inactive travelers on edge at time .
Furthermore, we assume that the travelers cannot fully observe the true network state at any given time, but can construct a multidimensional belief about at time based on prior experiences, where
(6) 
is the traveler’s belief vector regarding the state of the edge at time , and . Assuming that the traveler’s multiattribute cost^{1}^{1}1If some attribute is not applicable to a given edge , then we let . For example, the attribute ‘CO emissions’ is not applicable to all the edges of mode ”walking”, for these edges, we let . on edge at time is a weighted linear combination of all attributewise edge costs , as given by
(7) 
we model the traveler’s stochastic expected cost for choosing a path as
(8) 
where
is a probability distribution over the set of all paths
, denotes the nominal (known) expected cost of the traveler, and is the noise (random parameter) term that captures any uncertainty regarding traveler’s rationality. The decision policy adopted by the traveler at time is denoted as the path , where represents the set of all paths available for the traveler.Let denote the sequence of edges that the traveler has already taken (committed) until time . Then, the traveler’s expected cost comprises of two terms: the incurred (deterministic) cost from traversed, and the future (unknown) cost from the remaining path to be traversed. In other words, we have
(9) 
where is the time at which the traveler is at the head of edge , and represents the sequence of edges that the traveler will travel in the future, if he/she continues to stay on the same decision policy . Then, the traveler’s rationality is defined as follows:
Definition 2.
The traveler’s motive is to minimize the random cost function that depends on the signals presented by the system and the path chosen by the traveler, which is given as:
(10) 
Given that both the system and travelers have nonidentical utilities (i.e., mismatched motives), it is natural to model their interaction as a oneshot StackelbergQuantalResponse (SQR) game, where the system commits to its signaling strategy as defined in Definition 1, before travelers choose their stochastic policies as per Definition 2 fudenberg1991game.
Definition 3.
The equilibrium of an SQR game between the system and travellers is defined as the pair , where
(11) 
Similar to solving traditional StackelbergNash games, we propose a novel solution approach named LoRI based on backward induction, which evaluates travelers’ quantal response equilibrium as a function of system’s signal , and then evaluate the best response signal at the system. We present the technical details of our approach in the following section, and later analyze its performance in simulation experiments.
Equilibrium Analysis
In order to carry out equilibrium analysis, it is necessary to evaluate the path costs at the traveler, which depend on thee network state. However, given that the network state evolves over time with all the active travelers’ path choice updates, we first compute the network statee based on travelers’ strategy profiles using Algorithm 1. Given the network state, we evaluate the cost of traversing a path at the traveler using Algorithm 2. Note that the term in Algorithm 2 represents the cost of traveling on edge at time at the traveler, when its state is given by . Given the cost matrix, we now proceed to evaluating the equilibrium of the proposed SQR game using backward induction, i.e. evaluate travelers’ QRE as a function of system’s signal, and then compute the bestresponse signal at the system.
Traveler’s Quantal Response Analysis
Given the system’s signal , the traveler updates his prior belief defined in Equation (6) using Bayes rule to obtain the following posterior belief regarding the network state:
(12) 
We assume that the denominator in Equation (12) always converges to some value in the region and every traveler’s belief regarding the future state of the network remains stationary until the system presents a signal.
The cost that the traveler attains by choosing a path is decomposed into (i) known (nominal) cost at the traveler, and (ii) an unknown random cost , as shown in Equation (8). In this paper, we assume that the noise term in the traveler’s expected cost is independently, identically distributed extreme value, also known as Gumbell distribution.
Theorem 1 (luce1959individual).
The traveler’s logit choice probability for the path at time is given by:
(13) 
where is the parameter of the quantal response model.
To compute the Quantal Response Equilibrium for the travellers, we use Gambit mckelvey2006gambit
. Gambit is a library of game theory software and tools for the construction and analysis of finite extensive and strategic games. We build a strategic game (NormalForm game) between all the travellers and use Gambit’s tool
to solve for QRE. Gambit computes the principle branch of the (logit) quantal response correspondence using the predictorcorrector method based on the procedure described in turocy2005dynamic. The predictorcorrector method first generates a prediction using differential equations describing the branch of the correspondence, followed by a corrector step which refines the prediction using Newton’s method for finding a zero of a function.ApproximateResponse Signaling
At any time , let there be a total of travelers on the network. Assuming that the traveler is on edge at time due to the decision , we can compute the total number of travelers on edge at the time as
(14) 
where represents the indicator function which takes the value 1 whenever the argument holds true. Given at time on every edge , we can now compute the state transition probability as follows:

(15) 
Let denote the probability that the traveler is present on edge at time given that he is on edge at time . Then, the state transition probability can be evaluated using the following recursive relation:
(16) 
where
(17) 
The leader’s optimal strategy is to minimize its cost which can be computed as:
(P1) 
Using Equation (16), we write the term and expand as shown in Equation (18).
(18) 
We further expand this using Equation (17). For better representation, we write . By Definition 2, is a vector of for every edge in the network. is a vector of probabilities for all possible values of . We assume that the upper bound of is the capacity of the edge . Since the system has a cost minimization rationality, it will send a signal at time to the traveler such that the path chosen by the traveler minimizes the system’s overall cost. Since we have a leaderfollower game, we use backward induction to solve for the optimal leader strategy, i.e., the optimal signal at the system. System cannot send signal to every traveller at the same time as every traveller’s decision depend on all the other travellers as well. Therefore, for every time step, the system sends to signals travellers in a roundrobin fashion. The system sends a signal to the traveler while all the other travelers are fixed on their respective paths.
The search space in this optimization problem comprises of all right stochastic matrices which can be shown as a convex set. However, it is analytically hard to verify whether or not, the objective function stated in Equation (18) is convex in . Note that the term represents logit probabilities which are known to be nonconvex. Equation (18) comprises of convex combination of sum of logit probabilities whose convexity properties are hard to verify. Therefore, we employ interior point algorithms to compute the approximate signal. In our simulation experiments, we use CVXPY diamond2016cvxpy package to implement interior point search in Algorithm 3.
Results and Discussions
In this section, we discuss our simulation experiments along with our findings in terms of the performance of LoRI, in comparison to singlesource shortest path (SSSP) algorithms used by traditional navigation systems in the context of a Wheatstone network shown in Figure 2. We assume that SSSP algorithms are constructed based on a single attribute, namely Travel Time, whereas our proposed algorithm (LoRI) relies on two attributes, namely Travel Time, and CO Emissions. Depending on the transport mode, we employed wellknown cost models found in the literature, to carry out our simulation experiments. For example, travel time on edge can be calculated for transport modes serviced on a road network (e.g. car, taxi, bus) using Bureau of Public Roads (BPR) formula manualbureau:
(19) 
where is the number of vehicles at time , is the capacity, of the edge on edge , denotes the freeflow travel time of edge . and ares constants in the BPR function (usually is 0.15 and is 4). Similarly, the rate of carbon emissions per vehicle can be calculated using a a nonlinear, static emission model for network links proposed by Wallace et al. wallace1998transyt, as shown below:
(20) 
where is the link length (in kilometers), is the travel time (in minutes) for link , and is measured in grams per vehicle per hour.
The travel time for edges that support subway mode can be extracted from their arrival and departures time. In our example network in Figure 2, we asumme travel times as . For simplicity, we assume CO emissions per traveler on a subway to be . For the edge corresponding to walking modality, we assume the travel time and CO emissions to be simply . In this example, we consider the total cost of traveling on a switch edge to be . We implement our simulation experiments in two different scenarios using the following Python packages: pythonigraph v, gambit v, cvxpy v, numpy v, matplotlib v and all their dependencies.
Scenario 1
We simulate three different travelers with unique origindestination pairs: , each of whom interacts with SSSP for a route recommendation and LoRI for network state information. In our first experiment, we assume the LoRI’s weight for travel time to be , and traveler’s weights for travel time to be . We compute the empirical average costs across different traveler motives at both traveler and system, and plot them in Figure 4. Although the travelers’ average cost remains the same for both SSSP or LoRI, the system cost reduces by about when the travelers interact with the LoRI in lieu of SSSP. Specifically, LoRI reduces the congestion rate by about and CO emissions rate by across the entire multimodal network.
In our second experiment, we assume that the three traveler’s weights for travel time are , and varied LoRI’s weights across . We evaluated average system costs across different origindestination pairs and plot them as shown in Figure 5. It is quite evident that LoRI’s costs are at least as good as that of SSSP. Specifically, system obtains a tremendous gain by adopting LoRI when there is a motive mismatch between SSSP and the system. For example, when the system’s weight for travel time is , the adoption of LoRI reduces the overall network congestion by .
In our third experiment, we assume every travelers’ weight for travel time to be for all possible origindestinations pairs. In this experiment, we observe that LoRI successfully persuades the traveler with probability, i.e. LoRI’s strategically designed information was able to successfully steer travelers’ routes towards socially optimal choices over of all origindestination pairs.
Scenario 2
In this scenario, we distribute 30 travelers across the entire multimodal transportation network. We perform this experiment with number of travelers who interact with LoRI, while all the other travelers interact with SSSP. We assume the system’s weight for travel time to be . We calculated average system costs across different traveler paths and present them in Table 1. Note that the system’s social welfare consistently improves as the number of travelers interacting with LoRI increases. However, there is a significant tradeoff in terms of run time. Table 2 shows how the runtime of LoRI and SSSP varies with . Although SSSP’s runtime remains almost unchanged with increasing number of travelers in our experiment, LoRI’s runtime increases exponentially with increasing number of interacting travelers. This exponential increase in runtime happens because of significant increase in the number of possible signaling strategies at the system, which in turn depends on all possible combinations of all the paths available at every active travelers () interacting with LoRI.
Conclusion
In summary, we proposed a novel Stackelberg signaling framework to improve the inefficiency of selfish routing in the presence of behavioral agents. We modeled the interaction between the system and quantal response travelers as a Stackelberg game, and developed a novel approximate algorithm LoRI that constructs strategic, personalized information regarding the state of the network. The system presents this information as a private signal to each traveler to steer their route decisions towards socially optimal outcomes. We demonstrate the performance of LoRI and compare with that of a SSSP algorithm on a Wheatstone network with multimodal routes. We presented the tradeoff between system’s costs and runtime within strategic information design framework. In the future, we will design computationally efficient, approximate algorithms at the system with better runtime performance. We will also consider strategic information design for diverse agent rationalities.