1 Introduction
In many applications [38], a number of agents need to use one out of a number of resources, whose cost of use, peragent, depends on the number of agents using the resource, concurrently. In addition to the agents, there is often also a central authority in charge of all the resources, with a complete and uptodate information about their use. The central authority may or may not provide information to the agents. If no information is provided, the agents may choose the resources randomly, or using some simple policies [25]. If the central authority provides one scalar for each resource at each time instant, all agents may compare the scalars across the resources available to them in the same way, and make the same choice. Thereby, the usage of resources with the lowest announced scalar may increase sharply, while the usage of resources with the highest announced scalar may drop sharply, ultimately leading to a cyclic outcome. As an alternative, Mareček et al. [38] studied the use of randomisation in information provision, including the use of randomisation in deriving an interval to be broadcast for each resource at each time instant. Many challenges remain, though. For one, the use of randomisation, or obfuscation, may be difficult to justify in practice. The challenges are intimately related to control, but little studied, so far.
Let us motivate our study by illustrating the cyclic outcome on the example of roads during the rush hour. Travel times are influenced by the number of people on the roads. Congestion arises, when too many people want to use a particular road at the same time. This is not necessarily due to the inherent capacity limits of the road, but often due to the “synchronised” manner of travel, and the lack of foresight into the choices of other people. Imagine that there are two roads of similar capacity from one section of a ringroad to the city center, and a central authority announces the travel times on the radial roads as 10 and 20 minutes, respectively. This may cause congestion on the first radial road, in the short term, and lead to the congestion alternating between the two radial roads, subsequently. In Appendix A, we show that under simplistic assumptions, a similar limitcycle behaviour could be observed for any approach that picks a scalar to broadcast for each resource, as long as the scalars are distinct across resources, and that it can lead to an arbitrarily bad behavior. In practice, the differences due to signalling are bounded, but the example suggests why we aim to reduce the synchronisation.
In this paper, we hence study the problem of information provision, which is:

nonstationary, inasmuch the costs associated with resources are not stationary, but rather influenced by the agents’ actions

populational, inasmuch the agents come in a variety of types, with a population described by a distribution over the types

limited in terms of feedback, inasmuch the agent has access only to aggregate information about the state of each resource, provided by the central authority

limited in terms of the agents’ memory, inasmuch the agents pick the resource based on the most recently provided piece of information for each resource.
The paper is structured as follows. Section 2 formalises the problem of information provision and suggests how a central authority can “desynchronise” actions of people on the roads by providing them with signals. In particular, it suggests a signalling scheme, where one interval is broadcast for each route, with the additional constraint that each interval remains consistent with past observations. Our main theorem in Section 3 shows that if the population comprises of agents riskaverse to varying degrees over time, we can improve the social outcome using interval signaling with the intervals formed by extremes of the values encountered so far. In Section 4, we demonstrate the considerable impact in simulations. We conclude with an overview of related and potential future work in Sections 5 and 6, respectively.
2 Model
We consider a dynamic discretetime model of congestion, suggested above and illustrated in Figure 1. First, we describe the actions, then the signals, and finally the response of the population to the signals, which can be seen as a mapping from signals to actions.
2.1 Actions and their Costs
A finite population of agents is confronted with alternative choices at every time step. The alternative actions are denoted by and time is discretised into periods . Let denote the choice of agent at time and be the number of agents choosing action at time . Throughout the paper, we assume that each agent has to pick one of the actions at every time .
The alternative actions are perfectly substitutable, i.e., each agent decides only based on the cost. The cost of action at time is a function of the number of agents that pick at time . We let
denote the vector
. Let denote the socalled cost function for action . If agents choose action at time , the cost of action at time to any single of them is . We assume that all are continuous. Figure 2 gives an example of two cost functions.The social cost weights the costs of the two actions at time with the proportions of agents taking the two actions, i.e.,
(1) 
The social cost corresponding to the example cost functions is shown in Figure 2. Of further interest is the timeaveraged social cost:
(2) 
We study a number of signaling schemes and responses from agents.
2.2 Signaling Schemes
We introduce signaling schemes, which communicate information about the past cost of the actions. Let denote the history of congestion costs up to time :
Let denote the set of all possible histories at time . For a fixed integer , a signaling scheme is a set of mappings , where denotes the signal that the central authority broadcasts to all agents at time .
In scalar signaling schemes, we have , one scalar value for each action. In interval signaling schemes, , and , with . Notice that scalar signaling schemes are equivalent to interval signaling schemes with , but may perform worse, by an arbitrary amount, as per Appendix A. Notice that these signaling schemes summarise the history of observations .
In a signaling scheme that we call extreme, for any fixed positive integer , the central authority broadcasts the same signal to all agents at time , where
In a signaling scheme that we call subinterval, for any fixed positive integer , the central authority broadcasts a signal to all agents at time , such that
(3) 
for all . Notice that extreme signaling is a special case of subinterval signaling.
2.3 Agent Population and Policies
In response to the history of signals received prior to time and including it, every agent takes action . For example, this action can be a function of only the signal at a single time step . We assume that every agent acts based only on the signals, without considering the response of other agents to its own action. This is a reasonable assumption for three reasons. First, it is hard for the agent to obtain more information than the signal sent by the central authority. Second, the agents know that the signals received are consistent with past observations. Finally, when there is a large number of agents, each has a very limited effect on the population as a whole.
Formally, let denote the history of signals broadcast up to time :
Let denote the set of possible realisations of signal histories up to time . A mapping of a signal history to an action, , is called a policy. We assume that the number of agents is fixed over time. We let denote the set of all possible types of agents. Each type is associated with a policy, and every agent of type follows the policy .
We model the evolution of the number of agents of each type as follows. Let
denote a finite set of probability measures over
. For instance, for each subset , can be interpreted as a fraction of agents with policy , except that, for simplicity of analysis, the product does not have to be an integer. We let denote a probability measure over , i.e., a probability measure over a set of probability measures over . The distribution of agents among types over time stepsis an i.i.d. sequence of random variables
, where the distribution of is defined as for all . This allows us to model a population of agents that changes over time, e.g., one driver leaves the road network and is replaced by another driver, with another policy. For simplicity, we call the population profile at time .2.4 policies
In the case of extreme signaling, we consider a set of agent types , which is a finite set of numbers, each of which is within . Recall that every agent receives the interval signal . In response, we assume that each agent of type follows the policy :
(4) 
with the minimiser chosen uniformly at random, if nonunique. This policy is a greedy heuristic, which seems natural, when one considers the following special cases:

Riskseeking , i.e., acts based only on the bestcase elements

Riskaverse , i.e., acts based only on the worstcase elements

Riskneutral , i.e., acts based on the midpoints .
Notice that this policy (4) could also model convexifying “multiobjective” agents, e.g., 90% riskseeking and 10% riskaverse.
3 A Stability Analysis
In this section, we analyze the impact of extreme signaling. We show that it is stable in the sense that the population profile converges in distribution under mild assumptions. In particular, we study the case, where the parameter is the function of the time step .
Assumption 1 (i.i.d. , “Population Renewal”).
The distribution is
an i.i.d. sequence of random variables with ,
for all and , .
Notice that the population renewal assumption does not entail the elements of the random vector being independent. We show that under the population renewal assumption, the congestion profile converges for Lipschitz continuous cost functions for . Observe that, for example, the function is Lipschitz.
Theorem 1 (“Asymptotic Stability”).
There exists a constant , such that under Assumption 1, if the functions are Lipschitz continuous for , there exists a unique limit, an dimensional random variable such that the congestion profile converges to in distribution as .
The proof relies on the following result from iterated function systems, only trivially adapted from Barnsley et al. [13, 14]:
Proposition 2.
Let us have an index set , a family of functions indexed by , and a family of real numbers indexed by , . Let us have another infinite family , where for all , be i.i.d. such that for all . If, for all ,
then the limit exists and is independent of .
Proof.
The proof proceeds in two steps. First, we show that the signal process is an iterated function system. Then, we show that it converges in distribution.
(Step 1)
In order to apply Proposition 2, we construct an iterated
function system in .
First, let us recall the definitions introduced previously:
(5) 
where for all :
(6)  
Next, let us see that is a random variable:
(7)  
(8) 
Observe that by Assumption 1, the sequence is i.i.d. . Hence, is a random variable.
Recall that is finite and the support of the random variable is a finite set of probability measures, where each
(9) 
is such that
(10)  
Plugging (8) into (6), it follows that there exists a set of functions (as many as possible values of ) and a sequence of i.i.d. random variables such that
(11)  
where each corresponds to a realisation of the random variable . Hence, the process is generated by an iterated function system.
(Step 2)
In order to apply Proposition 2,
let us consider two signals , as defined in (5).
We want to show that for
all , we have
.
and for some , we have
.
The former is clear, whereas to show the latter, we need to establish that
there exists such that, for all , the event
has positive probability. The above event corresponds to the event
which has positive probability for all finite .
We have, by definition (6),
(12)  
where denotes the congestion profile at time when the signal is broadcast to all agents. We denote the two summands on the righthand side by . First, we bound ; the other summand can be bounded by a similar argument. We have four cases:
We only need to consider Case 2, which occurs with probability bounded away from zero by the above argument.
Under Case 2, we have
Observe that
(13) 
where is an interval obtained by solving the system of inequalities:
(14)  
Hence, there exists a constant such that
In turn, we obtain
(15)  
4 Simulations
Although the case of infinite recall, , is amenable to analysis, the case of finite recall is more realistic. We hence simulate the finite recall case on a benchmark [12] for the traffic assignment problem, where the cost function captures the travel time and each action corresponds to one path between two vertices. The travel time for path of agent is a sum of travel times over edges at time , where the travel time is the Bureau of Public Roads (BPR) function of the number of agents passing over :
(18) 
where is the capacity of , is the freeflow time of , and are constants, again particular to , often .
For simplicity, we send out signals specific to each edge , rather than for each possible path. We also replace the social cost by the agent and edgewise sum . Although this setup may seem rather arbitrary, a similar setup has been used throughout hundreds of papers [43] on the traffic assignment problem in transportation science.
On two instances, we show that the interval signaling we propose results in a regret, i.e. the distance to the social cost at the stochastic user equilibrium, which is convergent. On an artificial instance, which we call Diamond, the regret goes to 0 after a small number of iterations. On the wellknown Sioux Falls instance of LeBlanc et al. [36], we improve the social cost considerably, when compared to the best known stochastic user equilibrium, as reported by Hillel BarGera [12].
4.1 The Procedural Details
Let us now clarify a number of procedural details. First, notice that without knowing the congestion profile at time , it is difficult to enforce the capacity constraint . Consequently, the term
tends to produce outliers in terms of
across all , which are just modeling artifacts. In most of our simulations, we hence apply a cap on the travel time:(19) 
This eliminates the outliers, but makes it necessary to track the violation of capacity constraints by other means. To that end, we introduce the capacity excess:
(20) 
which captures the aggregate amount of violation of the constraint .
We have disregarded tolls and distances, discretised time, and proceeded as follows in each period:

Generate the population with size in types, where we assume the
throughout. The proportion of each type in the population is sampled from the uniform distribution
for the first types, with the remainder for the final one. Specifically, we use and . 
Generate signals for each , depending on whether we cap the travel time, using the history of congestion cost up to . If the history contains or more perlink costs recorded, we use the minimum and maximum within the most recent travel times (possibly capped) for the edge. Otherwise, we use signal for each path to initialise the simulation.

Compute the the number of agents passing over each edge . For each and each origindestination pair we pick acyclic paths
(21) where, with some abuse of notation, are all acyclic paths between origin and destination . If there are multiple such paths, , we subdivide the number of agents of the given type that travel between the given and into equal parts , which need not be a whole number. For each edge on each path , we then add the part to the traffic to .

Generate perpath costs. For each origindestination pair , we again consider paths as in (21) and sum up the peredge travel times.

Compute the social cost, by summing up across all origindestination pairs and all paths as in (21), the product of the perpath cost, the proportion of the population corresponding to the path, and the cardinality of the population.

Move to the next period, .
This makes it possible to plot the evolution of the social cost over time and the evolution of the sum of the excesses (20) across all links over time for scalar signaling using the most recent travel time (NOW), means of values seen so far (MEAN), and extreme signaling . In plots of the social cost and excess, we also plot the corresponding value of the stochastic user equilibrium not considering information provision, either for the global optimum, where known, or for the best known equilibrium as reported by Hillel BarGera [12].
4.2 The Diamond Instance
First, we present experiments on an instance on five nodes, 1, 2, …, 5, with five links 12, 23, 24, 35, and 45, which form a “diamond shape”. There, links 35 and and 45 have high very high capacity and identical cost functions. Each of the links 23 and 24 can carry half of the total traffic, but their cost functions differ markedly, as suggested columns 3 and 5–7 in Table 1. These two files presented in Table 1 can be provided as an input to a variety of tools developed in transportation engineering, and hence allow for crosscomparison and crossvalidation of our results.
The diamond instance illustrates the phenomenon of flapping well. The split of the traffic across links 23 and 24 is illustrated in the upper half of Figure 3, where for scalar signaling (NOW, MEAN), the traffic oscillate between paths 1235 and 1245, whereas the higher the , the smaller are the periodtoperiod changes for extreme signaling. This corresponds to much lower social cost and capacity excesses for extreme signaling, compared to scalar signaling using means or most recent values, as suggested in the bottom half of Figure 3.
Further, notice that the social cost approaches that of the bestpossible stochastic user equilibrium, highlighted by the red dashed line in Figure 3. The unique minimum of the uncapped cost at the stochastic user equilibrium, without considering information provision, of approx. 621.229 can be found by minimising over the interval . The corresponding capped cost is 322.307 and excess 15.985. Hence, the regret approaches 0, in this particular case.
In Figure 3, we have capped the value of the travel time at the value given by the travel time at capacity and counter the excess separately. When we do not cap the travel time at capacity, the behavior in terms of the proportions of traffic going either way is similar, as can be seen by comparing Figures 3 and 4, while the absolute difference between the social costs of using the most recent time and extreme signaling increases with the number of agents on the road.
4.3 Sioux Falls
Next, we have tested the signaling on the wellknown Sioux Falls instance of LeBlanc et al. [36], displayed in Figure 6. Since 1970s, this instance has attracted much attention in the transportation engineering community [40, 36, 1, 24], serving as a benchmark for the traffic assignment problem. In particular, we have used the variant distributed by BarGera [12], which corresponds to 360,600 agents moving through a network of 76 road segments with 24 junctions.
The bestknown stochastic user equilibrium, as available from BarGera [12] has capped cost of 3853754.650 with excess of 265068.520 and uncapped cost of 7480225.345 with the same excess. (Notice that these numbers vary from those reported by BarGera, considering our objective functions differ.) With cap on the travel time given by the capacity, as above (19), the use of extreme signaling leads to lower social cost with lower excess, as suggested in Figure 5
. Without cap on the travel time, the results are more varied, and heavily skewed by a small number of enormous values. Consequently,
Extreme signaling seems to perform the best for , although this surprising behavior merits further study.5 Related Work
There is related work being done in applied probability, control, operations research, theoretical computer science, and traffic theory. There are number of excellent surveys [52, 32, 55, 43, 25] available, although it may be difficult even for a booklength survey to be fully comprehensive.
Within applied probability, the rich history of work on the multiaction restless bandits problem, e.g., [57, 56, 17], has been summarised by Gittins et al. [25]. See the work of Glazebrook [27, 26] for some of the presentbest results. The replacement of a single scalar of feedback per arm played has been suggested [37, 4, 3] in the bandits literature, often in connection to revealing the outcome of further arms as well. We are not aware of any bilevel extensions, e.g., seeing the problem from the point of view of the owner of the bandit.
Within game theory, the social cost is the metric of a number of studies
[48, 46, 22, 19], which show that, even when agents have full information, a natural equilibrium outcome can incur much higher total congestion than a socially optimal outcome. This is known as the price of anarchy. Particularly interesting are studies of the Nash equilibria in connection with ignorance [9, 5, 10], often concerning the number of players [7, 8, 6], failures of agents [39], failures of resources [44, 45], or stratified and riskaverse populations [31, 47]. Indeed, our work can be seen as showing the benefits of ignorance to a stratified and riskaverse population, albeit over the long run. We can hence describe the attractor, whose existence is often moot in the studies of Nash equilibria. See [21] for further welldeveloped arguments why considering the fixedpoints of a dynamical system is preferable to the study of the Nash equilibrium.Within economics, our work is reminiscent of the equilibrium outcome of Sobel [11] in the context of signaling games. See [54] for an uptodate survey. Our work is also reminiscent of large bodies of work on followtheperturbedLeader [28], tremblinghand equilibrium [51], and stochastic fictitious play [29], inasmuch we also study repeated decisions and that the decisions are random variables. However, our scheme separates the decision making of the central authority from the decision making of the agents, and uses nontrivial procedures for the former decisions.
In the transportation literature, Daganzo and Sheffi [23] have introduced the concept of stochastic user equilibrium, where users have considerable amounts of information, and perhaps surprising analytical powers. Subsequently, a number of variants have been proposed, e.g., [18] consider robust variants and [2] consider the stochastic user equilibrium with distributional uncertainty over the travel times. [35] consider a stratified and riskaverse population, but only as much as a link failure is concerned. There are a number of other notions of stability, perhaps closer to our notion, inasmuch they capture the repeated nature of the problem. For example, [53] introduces the notion of equilibrium as the limit of the congestion distribution if it exists. [30] considers a number of notions of noisy signals and studies greedy policies and equilibria. Our approach is different from those that assign actions to agents, instead of presenting them with information and letting them make the decisions.
On the interface of transportation and control theory, there has been a recent interest in load balancing [50, 49]. These schemes, however, rely on simple randomisation, without modelling heterogenous agent behaviour and actions and without allowing for the same information to be provided to all agents. On the interface of transportation and behavioral science, BenAkiva et al. [16] have studied the effects of information on drivers. In a number of subsequent papers [34, 15] and the dissertation of Bottom [20], fixed points have been used to study deterministic scalar signaling with deterministic response of the population. See [42] for an extensive survey. In contrast, our analysis can be seen as a study of a probabilistic counterpart of fixed points, which allows for the uncertainty in the response of the population.
Throughout, we are not aware of any theoretical guarantees on the behavior of policies similar to ours, as described in this paper and [38]. Specifically, we are not aware of any other paper, which would study the broadcasting of intervals, instead of scalars, show its superiority, or study the behaviour of systems, where such signals are being provided. Compared to this paper, the setup of [38] is much simpler, and so are the proofs and simulations. Unlike [38], the approach presented in this paper does not employ randomisation, whose use may be unacceptable to the general public, allows for the same signal to be broadcast to all agents, such as at roadside displays, and considers a more elaborate model of the populational response, with riskaware agents. Both this paper and [38] suggest the importance of the controltheoretic aspects of information provision.
6 Conclusion and Future Work
We have introduced a novel interval signaling scheme. As opposed to scalar signaling schemes, interval signaling schemes have tremendous potential in reducing the social cost of congestion and present a major step forwards in a number of applications, which allow for agentbased models. This includes transportation and congestion management more broadly. These applications also open a number of questions throughout the possible applications of the approach as well as within cognitive science and control theory.
Key questions in cognitive science include: To what extent do human populations react to any signals? How do human populations react to interval signals? What are the factors to consider in modelling the populational response, outside of the riskaversion? What incentives would be most appropriate in improving the response? Answers to such questions should be of considerable interest to the optimisation and control communities.
Key questions in control theory include: Can our stability result be extended to other nonscalar signals, e.g., histograms? Can our stability result be extended to more general stochastic populations, e.g., evolving as a Markovian process? How to reason about policies, where the intervals are obtained by optimisation over the interval signals to send in the following period, subject to the signals being truthful in some sense? Perhaps most importantly: the model could be seen as a bilevel optimisation problem, with the information provision at the upper level and the choice of action at the lower level. For bilevel optimisation problems, even solving the firstorder optimality conditions [33, 41] presents a major challenge, whereas our approach provides certain guarantees for a certain solution to a certain bilevel optimisation problem. Could this be generalised? We hope to answer some of these questions in due course.
Finally, one could consider further applications. What is the performance of interval signaling schemes beyond transport applications, e.g., in ad keyword auctions, electricity consumption time slots, and emergency evacuation routes? Some could, indeed, be of considerable independent interest.
References
 [1] M. Abdulaal and L. J. LeBlanc. Continuous equilibrium network design models. Transport. Res. B: Meth., 13(1):19–32, 1979.

[2]
S. D. Ahipasaoglu, R. Meskarian, T. L. Magnanti, and K. Natarajan.
Beyond normality: A cross momentstochastic user equilibrium model.
Transport. Res. B: Meth., (0):–, 2015.  [3] N. Alon, N. CesaBianchi, O. Dekel, and T. Koren. Online learning with feedback graphs: Beyond bandits. arXiv preprint arXiv:1502.07617, 2015.
 [4] N. Alon, N. CesaBianchi, C. Gentile, and Y. Mansour. From bandits to experts: A tale of domination and independence. In Advances in Neural Information Processing Systems, pages 1610–1618, 2013.
 [5] N. Alon, Y. Emek, M. Feldman, and M. Tennenholtz. Bayesian ignorance. In Proceedings of the 29th ACM SIGACTSIGOPS symposium on Principles of distributed computing, pages 384–391. ACM, 2010.
 [6] N. Alon, R. Meir, and M. Tennenholtz. The value of ignorance about the number of players. In AAAI (LateBreaking Developments), volume WS1317 of AAAI Workshops. AAAI, 2013.
 [7] I. Ashlagi, D. Monderer, and M. Tennenholtz. Routing games with an unknown set of active players. In Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems, page 195. ACM, 2007.
 [8] I. Ashlagi, D. Monderer, and M. Tennenholtz. Twoterminal routing games with unknown active players. Artificial Intelligence, 173(15):1441–1455, 2009.
 [9] M.F. Balcan, A. Blum, and Y. Mansour. The price of uncertainty. In Proceedings of the 10th ACM conference on Electronic commerce, pages 285–294. ACM, 2009.
 [10] M.F. Balcan, F. Constantin, and S. Ehrlich. The snowball effect of uncertainty in potential games. In N. Chen, E. Elkind, and E. Koutsoupias, editors, Internet and Network Economics, volume 7090 of Lecture Notes in Computer Science, pages 1–12. Springer Berlin Heidelberg, 2011.
 [11] J. S. Banks and J. Sobel. Equilibrium selection in signaling games. Econometrica, pages 647–661, 1987.
 [12] H. BarGera. Transportation network test problems. http://www.bgu.ac.il/%7Ebargera/tntp/, June 1st, 2015.
 [13] M. Barnsley, S. Demko, J. Elton, and J. Geronimo. Invariant measures for markov processes arising from iterated function systems with placedependent probabilities. Annales de l’institut Henri Poincaré (B) Probabilités et Statistiques, 24(3):367–394, 1988.
 [14] M. F. Barnsley, J. H. Elton, and D. P. Hardin. Recurrent iterated function systems. Constructive Approximation, 5(1):3–31, 1989.
 [15] M. BenAkiva, J. Bottom, and M. S. Ramming. Route guidance and information systems. Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering, 215(4):317–324, 2001.
 [16] M. BenAkiva, A. De Palma, and K. Isam. Dynamic network models and driver information systems. Transportation Research Part A: General, 25(5):251–266, 1991.

[17]
D. Bertsimas and J. NiñoMora.
Restless bandits, linear programming relaxations, and a primaldual index heuristic.
Oper. Res., 48(1):80–90, 2000.  [18] D. Bertsimas and M. Sim. Robust discrete optimization and network flows. Math. Program., 98(13):49–71, 2003.

[19]
U. Bhaskar, L. Fleischer, and C.C. Huang.
The price of collusion in seriesparallel networks.
In F. Eisenbrand and F. Shepherd, editors,
Integer Programming and Combinatorial Optimization
, volume 6080 of Lecture Notes in Computer Science, pages 313–326. Springer Berlin Heidelberg, 2010.  [20] J. A. Bottom. Consistent anticipatory route guidance. PhD thesis, Massachusetts Institute of Technology, 2000.
 [21] G. E. Cantarella and E. Cascetta. Dynamic processes and equilibrium in transportation networks: Towards a unifying theory. Transportat. Sci., 29(4):305–329, 1995.
 [22] J. R. Correa, A. S. Schulz, and N. E. StierMoses. On the inefficiency of equilibria in congestion games. In M. Jünger and V. Kaibel, editors, Integer Programming and Combinatorial Optimization, volume 3509 of Lecture Notes in Computer Science, pages 167–181. Springer Berlin Heidelberg, 2005.
 [23] C. F. Daganzo and Y. Sheffi. On stochastic models of traffic assignment. Transportat. Sci., 11(3):253–274, 1977.
 [24] G. B. Dantzig, R. P. Harvey, Z. F. Lansdowne, D. W. Robinson, and S. F. Maier. Formulating and solving the network design problem by decomposition. Transport. Res. B: Meth., 13(1):5–17, 1979.
 [25] J. Gittins, K. Glazebrook, and R. Weber. Multiarmed bandit allocation indices. John Wiley & Sons, 2011.
 [26] K. Glazebrook and D. Hodge. On the asymptotic optimality of greedy index heuristics for multiaction restless bandits. Adv. Appl. Probab., 2015.
 [27] K. D. Glazebrook, D. J. Hodge, and C. Kirkbride. General notions of indexability for queueing control and asset management. Ann. Appl. Probab., 21(3):876–907, 06 2011.
 [28] J. Hannan. Approximation to bayes risk in repeated play. Contributions to the Theory of Games, 3:97–139, 1957.
 [29] J. C. Harsanyi. Games with randomly disturbed payoffs: A new rationale for mixedstrategy equilibrium points. Int. J. Game Theory, 2(1):1–23, 1973.
 [30] J. L. Horowitz. The stability of stochastic equilibrium in a twolink transportation network. Transport. Res. B: Meth., 18(1):13–28, 1984.
 [31] A. Hota, S. Garg, and S. Sundaram. Resource sharing games with failures and heterogeneous risk attitudes. In Communication, Control, and Computing (Allerton), 2013 51st Annual Allerton Conference on, pages 535–542, Oct 2013.
 [32] T. Ibaraki and N. Katoh. Resource Allocation Problems: Algorithmic Approaches. MIT Press series in the foundations of computing. Mit Press, 1988.
 [33] V. Jeyakumar, J. B. Lasserre, G. Li, and T. S. Pham. Convergent Semidefinite Programming Relaxations for Global Bilevel Polynomial Optimization Problems. ArXiv eprints, June 2015.
 [34] D. E. Kaufman, R. L. Smith, and K. E. Wunderlich. Userequilibrium properties of fixed points in dynamic traffic assignment. Transportation Research Part C: Emerging Technologies, 6(1):1–16, 1998.
 [35] V. L. Knoop, M. G. Bell, and H. J. van Zuylen. Traffic assignment based on individual riskattitude. In Infrastructure Systems and Services: Building Networks for a Brighter Future (INFRA), 2008 First International Conference on, pages 1–2. IEEE, 2008.
 [36] L. J. LeBlanc, E. K. Morlok, and W. P. Pierskalla. An efficient approach to solving the road network equilibrium traffic assignment problem. Transport. Res., 9(5):309–318, 1975.
 [37] S. Mannor and O. Shamir. From bandits to experts: On the value of sideobservations. In Advances in Neural Information Processing Systems, pages 684–692, 2011.
 [38] J. Mareček, R. Shorten, and J. Y. Yu. Signaling and obfuscation for congestion control. Int. J. Control, 88(10):2086–2096, 2015.
 [39] R. Meir, M. Tennenholtz, Y. Bachrach, and P. Key. Congestion games with agent failures. In AAAI, 2012.
 [40] E. Morlok, J. Schofer, W. Pierskalla, R. Marsten, S. Agarwal, J. Stoner, J. Edwards, L. LeBlanc, and D. Spacek. Development and application of a highway network design model, volumes 1 and 2. Final Report: FHWA Contract Number DOTPH11, 1973.
 [41] J. Nie, L. Wang, and J. Ye. Bilevel Polynomial Programs and Semidefinite Relaxation Methods. ArXiv eprints, Aug. 2015.
 [42] M. Papageorgiou, M. BenAkiva, J. Bottom, P. H. Bovy, S. Hoogendoorn, N. B. Hounsell, A. Kotsialos, and M. McDonald. Its and traffic management. Handbooks in Operations Research and Management Science, 14:715–774, 2007.
 [43] M. Patriksson. A survey on the continuous nonlinear resource allocation problem. Eur. J. Oper. Res., 185(1):1–46, 2008.
 [44] M. Penn, M. Polukarov, and M. Tennenholtz. Congestion games with loaddependent failures: identical resources. In Proceedings of the 8th ACM conference on Electronic commerce, pages 210–217. ACM, 2007.
 [45] M. Penn, M. Polukarov, and M. Tennenholtz. Congestion games with failures. Discrete App. Math., 159(15):1508–1525, 2011.
 [46] G. Perakis. The price of anarchy when costs are nonseparable and asymmetric. In D. Bienstock and G. Nemhauser, editors, Integer Programming and Combinatorial Optimization, volume 3064 of Lecture Notes in Computer Science, pages 46–58. Springer Berlin Heidelberg, 2004.
 [47] G. Piliouras, E. Nikolova, and J. S. Shamma. Risk sensitivity of price of anarchy under uncertainty. In M. Kearns, R. P. McAfee, and É. Tardos, editors, ACM Conference on Electronic Commerce, pages 715–732, 2013.
 [48] T. Roughgarden and E. Tardos. How bad is selfish routing? J. ACM, 49(2):236–259, Mar. 2002.
 [49] A. Schlote, B. Chen, and R. Shorten. On closedloop bicycle availability prediction. Intelligent Transportation Systems, IEEE Transactions on, PP(99):1–7, 2014.
 [50] A. Schlote, C. King, E. Crisostomi, and R. Shorten. Delaytolerant stochastic algorithms for parking space assignment. Intelligent Transportation Systems, IEEE Transactions on, 15(5):1922–1935, Oct 2014.
 [51] R. Selten. Reexamination of the perfectness concept for equilibrium points in extensive games. Int. J. Game Theory, 4(1):25–55, 1975.
 [52] Y. Sheffi. Urban transportation networks: equilibrium analysis with mathematical programming methods. Mit Press, 1985.
 [53] M. Smith. The existence, uniqueness and stability of traffic equilibria. Transport. Res. B: Meth., 13(4):295–304, 1979.
 [54] J. Sobel. Signaling games. In Computational Complexity, pages 2830–2844. Springer, 2012.
 [55] S. M. Stefanov. Separable Programming: Theory and Methods. Applied Optimization. Springer, 2001.
 [56] R. R. Weber and G. Weiss. On an index policy for restless bandits. J. Appl. Probab., pages 637–648, 1990.
 [57] P. Whittle. Restless bandits: Activity allocation in a changing world. J. Appl. Probab., pages 287–298, 1988.
Appendix A An Analysis of Flapping
The following proposition motivates the introduction of interval signaling. Specifically, it shows that interval signaling schemes make it possible to all but get rid of a particularly bad cyclical outcome, sometimes known as “flapping” in networking literature.
Proposition 3 (The Price of Flapping).
For every number ,
, and an odd integer
, there exist functions , a set , a population profile , and an interval signaling scheme with social cost at at such that for every scalar signaling scheme with social cost , we have .The example used in the proof of Proposition 3 may seem extreme, but extensive simulations, which we have conducted, do suggest that the cyclic behavior encountered in scalar signaling is indeed reduced to a large extent, when one applies interval signaling.
Proof.
For an arbitrary constant , let us construct cost functions , for two actions, where the difference in the social cost of the resulting congestion profiles
is . Consider
The optimum of the social cost is clearly achieved for congestion profiles such that .
Let be deterministic for all . For interval signaling, observe that:
which is possible to solve for such that, e.g., , even considering that the interval signaling is subinterval (3) and extreme interval signaling with . We can hence find a singleton and an initial signal such that by the argument above. This means we do observe a cyclic behavior, but that is limited to elements of , i.e. the best possible congestion profile, up to the rounding.
In contrast, recall that the scalar signaling scheme is equivalent to interval signaling scheme with , when the agents follow the policies for any . For all , we have , thus becomes irrelevant, and hence we have:
For any scalar signaling scheme , the congestion profile will hence alternate between “allornothing” elements of . We call this cyclic behavior “flapping”.
Hence, , and . ∎