Log In Sign Up

Maximizing Social Welfare in Selfish Multi-Modal Routing using Strategic Information Design for Quantal Response Travelers

by   Sainath Sanga, et al.

Traditional selfish routing literature quantifies inefficiency in transportation systems with single-attribute costs using price-of-anarchy (PoA), and provides various technical approaches (e.g. marginal cost pricing) to improve PoA of the overall network. Unfortunately, practical transportation systems have dynamic, multi-attribute costs and the state-of-the-art technical approaches proposed in the literature are infeasible for practical deployment. In this paper, we offer a paradigm shift to selfish routing via characterizing idiosyncratic, multi-attribute costs at boundedly-rational travelers, as well as improving network efficiency using strategic information design. Specifically, we model the interaction between the system and travelers as a Stackelberg game, where travelers adopt multi-attribute logit responses. We model the strategic information design as an optimization problem, and develop a novel approximate algorithm to steer Logit Response travelers towards social welfare using strategic Information design (in short, LoRI). We demonstrate the performance of LoRI on a Wheatstone network with multi-modal route choices at the travelers. In our simulation experiments, we find that LoRI outperforms SSSP in terms of system utility, especially when there is a motive mismatch between the two systems and improves social welfare. For instance, we find that LoRI persuades a traveler towards a socially optimal route for 66.66 time on average, when compared to SSSP, when the system has 0.3 weight on carbon emissions. However, we also present a tradeoff between system performance and runtime in our simulation results.


Bounding the Inefficiency of Route Control in Intelligent Transport Systems

Route controlled autonomous vehicles could have a significant impact in ...

Crowd-aware itinerary recommendation: a game-theoretic approach to optimize social welfare

The demand for Itinerary Planning grows rapidly in recent years as the e...

Strategic Behavior is Bliss: Iterative Voting Improves Social Welfare

Recent work in iterative voting has defined the difference in social wel...

Algorithms for flows over time with scheduling costs

Flows over time have received substantial attention from both an optimiz...

Providing slowdown information to improve selfish routing

Recent research in the social sciences has identified situations in whic...

Pirates in Wonderland: Liquid Democracy has Bicriteria Guarantees

Liquid democracy has a natural graphical representation, the delegation ...

UnLimited TRAnsfers for Multi-Modal Route Planning: An Efficient Solution

We study a multi-modal route planning scenario consisting of a public tr...


Smart navigation systems (e.g. GPS devices, navigation applications on mobile/smart devices) have transformed the transportation domain in terms of reducing cognitive overload in travelers. However, such technological advancements have had little impact on several fundamental issues such as mitigating congestion INRIX and reducing carbon emissions energy, which have only worsened over time. For instance, current state-of-the-art navigation systems employ traditional shortest paths algorithms, such as Dijkstra’s algorithm lanning2014dijkstra, Bellman–Ford or Warshall-Floyd, and -algorithms 10.5555/1614191 to recommend routes and mitigate travelers’ cognitive overload. On the other hand, selfish travelers exhibit multi-attribute preferences, which are typically misaligned from system’s interests. As a result, travelers often reject route recommendations that involve non-personal transport modalities, such as public transportation, ridesharing services and other micro-mobility services USCensus; mckenzie2015drives. Although unintentional, people have steered away from personal car usage during the ongoing COVID-19 pandemic in 2020 INRIX2020, which have resulted in significant cost reductions in terms of congestion, carbon emissions as well as collisions. Our goal in this paper is to steer selfish travelers away from personal car usage (even under non-pandemic conditions), via offering them alternative routing choices in a persuasive manner.

Selfish routing is a strategic framework where travelers employ their best-response routes selfishly according to their respective preferences to form an equilibrium. However, the central authority (e.g., a city transportation department) chooses a social-welfare objective that is not necessarily aligned with all travelers’ interests. This leads to system inefficiency, which can be quantified by price-of-anarchy (PoA) roughgarden2002bad. Several techniques have been proposed to drive PoA towards unity, which happens when the equilibrium outcome is optimal in terms of the system’s objective. A seminal example is marginal cost pricing, where selfish travelers are imposed taxes based on their marginal contribution to the system’s objective ruggles1949recent

. Although the idea of marginal cost pricing has been floating around for several decades, the technique remains practically infeasible due to our inability to estimate marginal costs accurately.

In sharon2019marginal, the authors studied the effects of underestimating marginal costs on the optimality in terms of system objectives, and showed that taxing underestimating marginal costs produces an outcome that is at least as good as having no taxes. Although attempts have been made to implement such solutions by authoritarian regimes yang2020marginal, the friction to adopt marginal cost pricing continues to persist due to various political reasons in democratic nations. Another powerful idea to influence traveler behavior is Stackelberg routing, where a fraction of agents are routed centrally, while the remaining agents are allowed to choose their routes selfishly swamy2007effectiveness. A similar routing algorithm is proposed In samal2018towards based on multi-objective A with a goal to design routes that decrease the overall network congestion.

Meanwhile, information-revelation systems have also been proposed acemoglu2018informational; arnott1991does; mahmassani1991system, where the traffic state is revealed to travelers as opposed to recommending routes. Although such systems do not mitigate cognitive-overload at the travelers, they have been found to generate a positive impact on traffic congestion and other global objectives even in non-strategic settings. However, these systems still suffer from poor persuasive ability, in terms of inducing behavior modification among travelers. A natural and effective solution is to design information strategically at the city transportation department, and present it to the travelers to steer their routing decisions towards socially optimal outcomes.

Recently, strategic information design has been studied in the transportation domain when the network congestion state is uncertainly available at the travelers. For example, in das2017reducing, the authors computed best-response signals under first-best, full information, public-signal and optimal information structure scenarios in the context of Wheatstone Network; they demonstrated that optimal information structures reveal only partial information revelation to mitigate network congestion. Similar results have been found in wu2019information in the case of Pigou networks (graphs with parallel routes between a single-source and a single-destination) in the presence of state uncertainty on one of the routes. Optimal information structures have been found using Bayesian persuasion framework to reduce average traffic spillover on a specific route in a Pigou network.

Despite the above development, existing works in strategic information design in transportation settings make several impractical assumptions. Since this is still a fledgling topic, almost all efforts assume that travelers are expected utility maximizers (EUM). However, there has been a strong evidence from real-world observations that travelers deviate from EUM behavior quite frequently. Such an effort was first made in nadendla2018effects, which studied strategic information design in a single-sender, single-receiver setting when both are prospect-theoretic agents. Nevertheless, this framework is not applicable to transportation domain where there are multiple receivers. Another impractical assumption is the consideration of single-attribute costs and unimodal transportation networks, all of which are far from reality. Therefore, in this paper, we consider a more realistic transportation framework and develop a novel strategic information design framework as stated below.

First, we assume that the travelers’ responses exhibit quantal response equilibrium (QRE), where deviations from EUM at each traveler are captured by the randomness within the stochastic utility maximization framework luce1959individual. We model the strategic interaction with the system as a novel Stackelberg-QRE game, where the system (leader) exhibits EUM behavior, while the travelers (followers) exhibit logit responses. Second, we assume that both the system and travelers exhibit non-identically weighted multi-attribute preferences. Specifically, we assume that the system’s motive is to reduce both network congestion (in terms of travel time) and carbon emissions on the entire transportation network, whereas the traveler wishes to minimize travel time and/or carbon emissions along his/her personal route.

Inspired from Bayesian persuasion kamenica2019bayesian as well as the method in bergemann2016bayes; mathevet2020information, when there is a single sender and multiple receivers, we develop a novel, approximate strategic information design algorithm to steer Logit Response travelers towards social welfare using strategic Information design (in short, LoRI). Our proposed algorithm LoRI uses the predictor-corrector method to find quantal responses at the travelers, and finds a locally-optimal state-information signal using interior-point algorithms that minimizes a non-convex system cost. Simulation results demonstrate that LoRI outperforms single source shortest path algorithms (e.g., Dijkstra’s algorithm) and improves social welfare in a Wheatstone network. We show that the system’s cost reduces by when SSSP algorithm is designed with a misaligned objective function.

System Model and Problem Formulation

Let a multi-modal transportation network consisting of travelers at time , be represented as a graph , where represents the set of physical locations (vertices), and represents the transport interconnections (edges) between various locations in . Let support a gamut of transport modalities . For the sake of convenience, we expand the network into a multi-layered graph using unimodal subgraphs , and switch edge sets which interconnect modality to modality within each vertex. For example, consider a Wheatstone road network with four vertices and ten edges, as illustrated in Figure 1. Consider transport modalities on this network, and ={Private Car (colored black), Metro Train (colored blue) and Walking (colored green)}.

Figure 1: An Example Multi-Modal Transportation Network

Using unimodal subgraphs and switch edges (depicted using dashed lines), we expand the example network into a multi-layered graph , as shown in Figure 2.

Figure 2: Multi-layered expansion of the Multimodal Transportation Network shown in Figure 1

We model the network state as , where is the number of travelers on edge at time .

Let there be a central entity (a.k.a. the system), which evaluates the network state in terms of the overall traffic congestion and carbon emissions using a weighted multi-attribute cost. Assuming that there are attributes, each edge

has a multi-attribute cost vector

]. The system evaluates the cost of each edge at time as


Since centralized systems typically have access to sensing infrastructure across the network to measure the network state in real-time, we assume that the system has greater information regarding the current state than the travelers.

In this paper, we assume that the system constructs a multi-dimensional signal to steer traveler’s decision, where


is the state transition probability shared by the system to the

traveler. The system constructs this signal with the goal of steering travelers’ decisions towards system’s optimal (a.k.a. social welfare).

Note that the overall system cost after a finite time horizon depends on decisions taken by all the active travelers and all the signals presented to the active travelers. It comprises of both past and future costs, and is given by


where is the signal profile sent to all the travelers in the network; is the path profile chosen by the travelers; and denotes the a priori system’s belief probability regarding the state of edge being at time . Then, we define the system’s rationality as follows:

Definition 1.

The system’s motive is to minimize its cost function that depends on all the travelers’ decisions and the signals presented by the system. The motive is given by:

Figure 3: State Transitions at the Traveler

Although these signals can be revealed by the system at any time, the travelers can take advantage of this information and change their path only when they are present at some node. We label such agents as active travelers. In other words, we can define the state of the traveler at time as


In other words, an active traveler’s state gets updated to an inactive state as soon as an active traveler chooses the next edge, and remains so until he/she traverses that edge completely and reaches the other vertex as shown in Figure 3. That is, is equal to the total number of inactive travelers on edge at time .

Furthermore, we assume that the travelers cannot fully observe the true network state at any given time, but can construct a multi-dimensional belief about at time based on prior experiences, where


is the traveler’s belief vector regarding the state of the edge at time , and . Assuming that the traveler’s multi-attribute cost111If some attribute is not applicable to a given edge , then we let . For example, the attribute ‘CO emissions’ is not applicable to all the edges of mode ”walking”, for these edges, we let . on edge at time is a weighted linear combination of all attribute-wise edge costs , as given by


we model the traveler’s stochastic expected cost for choosing a path as



is a probability distribution over the set of all paths

, denotes the nominal (known) expected cost of the traveler, and is the noise (random parameter) term that captures any uncertainty regarding traveler’s rationality. The decision policy adopted by the traveler at time is denoted as the path , where represents the set of all paths available for the traveler.

Let denote the sequence of edges that the traveler has already taken (committed) until time . Then, the traveler’s expected cost comprises of two terms: the incurred (deterministic) cost from traversed, and the future (unknown) cost from the remaining path to be traversed. In other words, we have


where is the time at which the traveler is at the head of edge , and represents the sequence of edges that the traveler will travel in the future, if he/she continues to stay on the same decision policy . Then, the traveler’s rationality is defined as follows:

Definition 2.

The traveler’s motive is to minimize the random cost function that depends on the signals presented by the system and the path chosen by the traveler, which is given as:


Given that both the system and travelers have non-identical utilities (i.e., mismatched motives), it is natural to model their interaction as a one-shot Stackelberg-Quantal-Response (SQR) game, where the system commits to its signaling strategy as defined in Definition 1, before travelers choose their stochastic policies as per Definition 2 fudenberg1991game.

Definition 3.

The equilibrium of an SQR game between the system and travellers is defined as the pair , where


Similar to solving traditional Stackelberg-Nash games, we propose a novel solution approach named LoRI based on backward induction, which evaluates travelers’ quantal response equilibrium as a function of system’s signal , and then evaluate the best response signal at the system. We present the technical details of our approach in the following section, and later analyze its performance in simulation experiments.

Equilibrium Analysis

Data: Network State , Current time , Traveler path
Result: Network State
for  do
end for
for  in  do
end for
Algorithm 1 Network State Transition

In order to carry out equilibrium analysis, it is necessary to evaluate the path costs at the traveler, which depend on thee network state. However, given that the network state evolves over time with all the active travelers’ path choice updates, we first compute the network statee based on travelers’ strategy profiles using Algorithm 1. Given the network state, we evaluate the cost of traversing a path at the traveler using Algorithm 2. Note that the term in Algorithm 2 represents the cost of traveling on edge at time at the traveler, when its state is given by . Given the cost matrix, we now proceed to evaluating the equilibrium of the proposed SQR game using backward induction, i.e. evaluate travelers’ QRE as a function of system’s signal, and then compute the best-response signal at the system.

Data: Traveler
Result: cost matrix
for  do
       for  do
       end for
      for profile in strategy Profiles do
             for  in d do
                   for  in  do
                         for  in  do
                               if d[key] in range(j-10, j+10) then
                               end if
                         end for
                   end for
             end for
       end for
end for
Algorithm 2 Computing the Cost Matrix

Traveler’s Quantal Response Analysis

Given the system’s signal , the traveler updates his prior belief defined in Equation (6) using Bayes rule to obtain the following posterior belief regarding the network state:


We assume that the denominator in Equation (12) always converges to some value in the region and every traveler’s belief regarding the future state of the network remains stationary until the system presents a signal.

The cost that the traveler attains by choosing a path is decomposed into (i) known (nominal) cost at the traveler, and (ii) an unknown random cost , as shown in Equation (8). In this paper, we assume that the noise term in the traveler’s expected cost is independently, identically distributed extreme value, also known as Gumbell distribution.

Theorem 1 (luce1959individual).

The traveler’s logit choice probability for the path at time is given by:


where is the parameter of the quantal response model.

To compute the Quantal Response Equilibrium for the travellers, we use Gambit mckelvey2006gambit

. Gambit is a library of game theory software and tools for the construction and analysis of finite extensive and strategic games. We build a strategic game (Normal-Form game) between all the travellers and use Gambit’s tool

to solve for QRE. Gambit computes the principle branch of the (logit) quantal response correspondence using the predictor-corrector method based on the procedure described in turocy2005dynamic. The predictor-corrector method first generates a prediction using differential equations describing the branch of the correspondence, followed by a corrector step which refines the prediction using Newton’s method for finding a zero of a function.

Approximate-Response Signaling

Data: Travelers , Network State
for time to infinity do
       forall  do
             if  then
                    /* If traveler is active */
             end if
       end forall
      forall  do

is a right stochastic matrix

             end while
       end forall
      forall  do
             if  then
             end if
       end forall
end for
Algorithm 3 LoRI

At any time , let there be a total of travelers on the network. Assuming that the traveler is on edge at time due to the decision , we can compute the total number of travelers on edge at the time as


where represents the indicator function which takes the value 1 whenever the argument holds true. Given at time on every edge , we can now compute the state transition probability as follows:


Let denote the probability that the traveler is present on edge at time given that he is on edge at time . Then, the state transition probability can be evaluated using the following recursive relation:




The leader’s optimal strategy is to minimize its cost which can be computed as:


Using Equation (16), we write the term and expand as shown in Equation (18).


We further expand this using Equation (17). For better representation, we write . By Definition 2, is a vector of for every edge in the network. is a vector of probabilities for all possible values of . We assume that the upper bound of is the capacity of the edge . Since the system has a cost minimization rationality, it will send a signal at time to the traveler such that the path chosen by the traveler minimizes the system’s overall cost. Since we have a leader-follower game, we use backward induction to solve for the optimal leader strategy, i.e., the optimal signal at the system. System cannot send signal to every traveller at the same time as every traveller’s decision depend on all the other travellers as well. Therefore, for every time step, the system sends to signals travellers in a round-robin fashion. The system sends a signal to the traveler while all the other travelers are fixed on their respective paths.

The search space in this optimization problem comprises of all right stochastic matrices which can be shown as a convex set. However, it is analytically hard to verify whether or not, the objective function stated in Equation (18) is convex in . Note that the term represents logit probabilities which are known to be non-convex. Equation (18) comprises of convex combination of sum of logit probabilities whose convexity properties are hard to verify. Therefore, we employ interior point algorithms to compute the approximate signal. In our simulation experiments, we use CVXPY diamond2016cvxpy package to implement interior point search in Algorithm 3.

Results and Discussions

In this section, we discuss our simulation experiments along with our findings in terms of the performance of LoRI, in comparison to single-source shortest path (SSSP) algorithms used by traditional navigation systems in the context of a Wheatstone network shown in Figure 2. We assume that SSSP algorithms are constructed based on a single attribute, namely Travel Time, whereas our proposed algorithm (LoRI) relies on two attributes, namely Travel Time, and CO Emissions. Depending on the transport mode, we employed well-known cost models found in the literature, to carry out our simulation experiments. For example, travel time on edge can be calculated for transport modes serviced on a road network (e.g. car, taxi, bus) using Bureau of Public Roads (BPR) formula manualbureau:


where is the number of vehicles at time , is the capacity, of the edge on edge , denotes the free-flow travel time of edge . and ares constants in the BPR function (usually is 0.15 and is 4). Similarly, the rate of carbon emissions per vehicle can be calculated using a a non-linear, static emission model for network links proposed by Wallace et al. wallace1998transyt, as shown below:


where is the link length (in kilometers), is the travel time (in minutes) for link , and is measured in grams per vehicle per hour.

The travel time for edges that support subway mode can be extracted from their arrival and departures time. In our example network in Figure 2, we asumme travel times as . For simplicity, we assume CO emissions per traveler on a subway to be . For the edge corresponding to walking modality, we assume the travel time and CO emissions to be simply . In this example, we consider the total cost of traveling on a switch edge to be . We implement our simulation experiments in two different scenarios using the following Python packages: python-igraph v, gambit v, cvxpy v, numpy v, matplotlib v and all their dependencies.

Figure 4: Comparison of agents costs due to LoRI and SSSP in the first experiment under Scenario 1

Scenario 1

We simulate three different travelers with unique origin-destination pairs: , each of whom interacts with SSSP for a route recommendation and LoRI for network state information. In our first experiment, we assume the LoRI’s weight for travel time to be , and traveler’s weights for travel time to be . We compute the empirical average costs across different traveler motives at both traveler and system, and plot them in Figure 4. Although the travelers’ average cost remains the same for both SSSP or LoRI, the system cost reduces by about when the travelers interact with the LoRI in lieu of SSSP. Specifically, LoRI reduces the congestion rate by about and CO emissions rate by across the entire multi-modal network.

Figure 5: Comparison of system costs across different motives due to LoRI and SSSP in second experiment under Scenario 1

In our second experiment, we assume that the three traveler’s weights for travel time are , and varied LoRI’s weights across . We evaluated average system costs across different origin-destination pairs and plot them as shown in Figure 5. It is quite evident that LoRI’s costs are at least as good as that of SSSP. Specifically, system obtains a tremendous gain by adopting LoRI when there is a motive mismatch between SSSP and the system. For example, when the system’s weight for travel time is , the adoption of LoRI reduces the overall network congestion by .

In our third experiment, we assume every travelers’ weight for travel time to be for all possible origin-destinations pairs. In this experiment, we observe that LoRI successfully persuades the traveler with probability, i.e. LoRI’s strategically designed information was able to successfully steer travelers’ routes towards socially optimal choices over of all origin-destination pairs.

Scenario 2

width=0.45 Number of Travelers LoRI SSSP 1 20.211234567901233 23.624074074074073 2 20.135617283950616 23.55925925925926 3 19.85057613168724 23.001058201058203

Table 1: System costs with travelers interacting with LoRI under Scenario 2

width=0.46 Number of Travelers LoRI SSSP 1 0.24670171737670898 0.0011279582977294922 2 0.41092681884765625 0.002007007598876953 3 2.2496159076690674 0.002805948257446289 4 909.9852938652039 0.0027680397033691406

Table 2: Run Time under Scenario 2

In this scenario, we distribute 30 travelers across the entire multi-modal transportation network. We perform this experiment with number of travelers who interact with LoRI, while all the other travelers interact with SSSP. We assume the system’s weight for travel time to be . We calculated average system costs across different traveler paths and present them in Table 1. Note that the system’s social welfare consistently improves as the number of travelers interacting with LoRI increases. However, there is a significant tradeoff in terms of run time. Table 2 shows how the runtime of LoRI and SSSP varies with . Although SSSP’s runtime remains almost unchanged with increasing number of travelers in our experiment, LoRI’s runtime increases exponentially with increasing number of interacting travelers. This exponential increase in runtime happens because of significant increase in the number of possible signaling strategies at the system, which in turn depends on all possible combinations of all the paths available at every active travelers () interacting with LoRI.


In summary, we proposed a novel Stackelberg signaling framework to improve the inefficiency of selfish routing in the presence of behavioral agents. We modeled the interaction between the system and quantal response travelers as a Stackelberg game, and developed a novel approximate algorithm LoRI that constructs strategic, personalized information regarding the state of the network. The system presents this information as a private signal to each traveler to steer their route decisions towards socially optimal outcomes. We demonstrate the performance of LoRI and compare with that of a SSSP algorithm on a Wheatstone network with multi-modal routes. We presented the tradeoff between system’s costs and runtime within strategic information design framework. In the future, we will design computationally efficient, approximate algorithms at the system with better run-time performance. We will also consider strategic information design for diverse agent rationalities.