Information and Memory in Dynamic Resource Allocation

04/17/2019 ∙ by Kuang Xu, et al. ∙ The University of Chicago Booth School of Business Stanford University 0

We propose a general framework, dubbed Stochastic Processing under Imperfect Information (SPII), to study the impact of information constraints and memories on dynamic resource allocation. The framework involves a Stochastic Processing Network (SPN) scheduling problem in which the decision maker may access the system state only through a noisy channel, and resource allocation decisions must be carried out through the interaction between an encoding policy (who observes the state) and allocation policy (who chooses the allocation). Applications in the management of large-scale data centers and human-in-the-loop service systems are among our chief motivations. We quantify the degree to which information constraints reduce the size of the capacity region in general SPNs, and how such reduction depends on the amount of memories available to the encoding and allocation policies. Using a novel metric, capacity factor, our main theorem characterizes the reduction in capacity region (under "optimal" policies) for all non-degenerate channels, and across almost all combinations of memory sizes. Notably, the theorem demonstrates, in substantial generality, that (1) the presence of a noisy channel always reduces capacity, (2) more memory for the allocation policy always improves capacity, and (3) more memory for the encoding policy has little to no effect on capacity. Finally, all of our positive (achievability) results are established through constructive, implementable policies.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In many modern large-scale resource allocation systems, such as data centers, call centers and hospitals, getting reliable access to accurate system state information usually requires expensive investment in monitoring or machine learning infrastructures. Furthermore, such information can often be subject to noise, loss or misinterpretation. It is therefore crucial to understand how

imperfect and noisy information affects system performance. Insights along this direction can provide crucial architectural guidelines on how to design efficient information-driven scheduling policies, and also assist with infrastructure planning by quantifying the performance benefits from better information access, and hence the tradeoffs between such benefits and the investment costs.

As a first step towards this direction, we propose in this paper a new, general framework for quantifying the performance impact of information constraints on an underlying dynamic resource allocation problem. Specifically, we will focus on Stochastic Processing Networks (SPN) (Harrison, 2002, 2000, 2003; Dai and Lin, 2005) – a widely used paradigm for modeling resource allocation problems in diverse sectors, including information technology (Dai and Prabhakar, 2000; McKeown et al., 1996; Roberts and Massoulié, 2000; Tassiulas and Ephremides, 1992), manufacturing (Chen et al., 1988), call centers (Harrison and Zeevi, 2005; Gans et al., ; Mandelbaum and Stolyar, 2004), as well as other service industries – and address how information, or the lack of, alters the capacity region of an SPN.

We begin with an overview of the model; the formal description will be presented in Section 2. The framework, dubbed the Stochastic Processing under Imperfect Information (SPII) model, is illustrated in Figure 1, and consists of three main elements: an underlying dynamic resource allocation problem, a model of imperfect information, and memories.

I. Stochastic Processing Network Scheduling. The underlying dynamic resource allocation problem is that of scheduling in a discrete-time SPN, sometimes also known as a switched network (Stolyar, 2004; Shah et al., 2012), where a finite set of processing resources is employed to serve incoming tasks of different types. In each time slot , new tasks arrive to the system in a stochastic manner, where unprocessed tasks of type are buffered in a queue , and the queue lengths are denoted by . The number of arrivals of type jobs at time , , has an expected value

, and we refer to the vector

as the arrival rate vector. The decision maker is to select from a finite schedule set, , an allocation vector, , where corresponds to the number of tasks in queue that can be processed during the present time slot. The queue lengths evolve according to the following dynamics222For a vector , we use the notation to mean the vector .:


As an example, the scheduling problem involving one server and two parallel queues falls under this framework, as illustrated in Figure 2, where the schedule set contains two elements, , corresponding to the server processing a job from the first and second queue, respectively.

Figure 1: System diagram. The encoder is able to fully observe the state of the queues, . The encoder and receiver are equipped with memories of size and bits, respectively.

II. Scheduling with Imperfect Information. As a major departure from conventional SPNs, the decision maker in our model does not have access to the full queue-length state information when making scheduling decisions. In contrast, she obtains information concerning the queues only through a (noisy) channel. A channel consists of a pair of finite input and output alphabets, and

, and a family of probability distributions

over . When an input signal, , is sent through the channel, it results in a random output message, , drawn from the probability distribution . For instance, one simple channel is that of an -noisy binary symmetric channel with , where the input signal is correctly received with probability , and is perturbed to the opposite symbol with probability , i.e., and for , . By using different alphabets and distributions , the channel is able to capture a variety of partial and/or lossy information models, such as controlling a data center over band-limited communication constraints, where only noisy and compressed signals of the full system state can be obtained. In this paper, we will impose little restriction on the form of the channel, in order to allow for a maximum degree of generality and modeling flexibility.

In the presence of a channel, the allocation decisions in our model are carried out by a pair of encoding and allocation policies, situated on opposite ends of the channel (Figure 1). An encoder is co-located with the queues and has complete knowledge of the queue lengths at all times. In time slot , the encoder employs an encoding policy, , to send an input symbol through the channel. The resulting random output message, , arrives at the receiver, who then employs an allocation policy, , to choose the allocation vector . Note that the receiver does not observe the queue lengths, and in most settings considered in this paper, the information contained in is severely limited; for instance, the output alphabet can be substantially smaller than the number of possible allocation vectors in .

Figure 2: An example of a Stochastic Processing Network with one server and two parallel queues. Two types of tasks arrive to the systems at rates and respectively, and wait in two separate queues. The schedule set consist of two possible allocation vectors, and , representing the server’s choice of processing one job from queue 1 or queue 2, respectively.

III. Memory. The last crucial element of the system is memory. In a classical SPN, external memory is rarely needed by a scheduling policy, because the queue lengths are fully observable and already contain all the relevant information for policies such as Max-Weight (e.g., Tassiulas and Ephremides, 1992; Dai and Lin, 2005; Stolyar, 2004; Shah et al., 2012) to make scheduling decisions. However, when the system state information becomes constrained and imperfect, memory plays a crucial role in determining the system performance (for details, see Theorem 3

and the subsequent remark). For instance, with memory, the allocation policy could aggregate past messages across multiple time slots to better estimate the system state and inform its scheduling decisions, and similarly, the encoding policy could benefit by remembering past transmissions in order to better tailor future input signals.

For this reason, we will allow for the possibility that the encoder and receiver have at their disposal a finite memory of and bits, respectively, in which information can be recorded and subsequently retrieved in the next time slot. The encoding policy may generate the input signal based on the queue lengths as well as the state of the encoder memory, and similarly, the allocation policy may choose the allocation vector based on the received output message along with the state of the receiver memory. If or is equal to 0, we say that the corresponding encoding or allocation policy is memoryless.

Finally, we say that the system has memory-feedback if the state of the receiver memory is accessible by the encoding policy. The existence of memory-feedback would correspond to applications in which the backward communication from the receiver to the encoder is lossless and relatively cheap. Arguably, this is not a very restrictive assumption within the SPII framework, considering that the receiver must, in any event, communicate the allocation decision, , back to the queues, and hence sending along the state of the receiver memory in the mean time should not incur too much additional overhead. A number of main results in the present paper will rely on this assumption, and we expect to relax it in future work.

Figure 3: Illustration of the reduction of capacity region due to lossy information, for the two-queue-one-server SPN in Figure 2. The triangle with vertices , and represents the maximum capacity region, , when the queue lengths are fully observable, . The rectangle dominated by represents the capacity region under an uninformative channel and an allocation policy that induces an average service rate of . The triangle with vertices , and represent the scaled (“shrunk”) version of the maximal capacity region that is contained in .

1.1 Performance Metric: Capacity Factor

The main quantity of interest is the capacity region of an SPII: the set of arrival rate vectors under which the queue lengths, , remain stable333Roughly speaking, the process is stable if queue lengths do not grow to infinity as time goes. The precise notion of stability that we consider in this paper will be defined in Section 2.1., for a given pair of encoding and allocation policies. As we shall see shortly, harsh information constraints tend to reduce the capacity region. Therefore, at a high level, the main question that we aim to address can be stated as follows:

How does the information constraint, in the form of a noisy channel, reduce the capacity region of an SPII architecture, and how does the degree of reduction vary with respect to the sizes of encoder and receiver memories?

In order to measure the magnitude of the reduction in capacity region, we introduce a key (scalar) metric, dubbed the capacity factor, which captures the fraction of capacity region lost due to imperfect information. Informally, the notion of capacity factor can be described as follows (the formal definition will be given in Section 2). Fix a channel . For a pair of encoding-allocation policies , denote by the set of all arrival rate vectors that the system is able to stabilize under the pair and channel . Denote by the maximum capacity region of the SPII, defined by the set of arrival rate vectors that are dominated by the convex hull of the schedule set, . In particular, corresponds to the maximal capacity region of the system under full information and a maximally stable scheduling policy, such as Max-Weight. [Capacity Factor (Informal)] Fix . The -capacity factor of the channel , denoted by , is defined by the supremum of , such that


for some pair of encoding and allocation policies , whose memory sizes are and bits, respectively.

Intuitively, the capacity factor can be interpreted as the largest fraction of the full-information maximum capacity region that can be “preserved” under information constraints (using an “optimal” encoding-allocation policy pair). A capacity factor of means that the information constraints result in no loss of the maximum capacity region, and on the other extreme, a capacity factor of indicates that most of the maximum capacity region has been lost due to lossy information.

[Scheduling with Parallel Queues] We now present a simple example to illustrate the drastic impact that lossy information can have on the capacity region, consequently resulting in a small capacity factor. Consider the SPN in Figure 2 with one server and two parallel queues. It is not difficult to see that the maximum capacity region under full information, , consists of all arrival rates where , achieved by always serving a queue that is non-empty.

Now suppose the channel of the SPII is completely uninformative, so that the output message is independent of the input signal . Then, it can be shown that any allocation policy will induce a fixed average service rate vector, , independent of the actual arrival rates. The resulting capacity region is represented by the rectangle in Figure 3 (dominated by ), which is always substantially smaller than the maximum capacity region, regardless of the choice of . It is not difficult to show that the capacity factor under such an uninformative channel is equal to , which corresponds to the case where the allocation policy induces a service rate vector of by choosing the two schedules with equal frequency. Therefore, depending on the quality of the channel and the memories available, the fractional loss of capacity region in SPII can range from (uninformative) to (full-information).

Values of
Table 1: Summary of the results in Theorem 3, where we omit the dependence on for simplicity of notation. The system is assumed to have memory-feedback. The parameters and denote the size of the memory for the encoder and receiver, respectively. The symbol “” indicates that the result of the cell is readily implied by that of the cell to the left. The symbol “” means that the result holds if in addition to being informative, the channel is also “noisy” (more precisely, -majorizing for some ; see Definition 3). Finally, the symbol “” means that we do not yet know the exact value of when , beyond the fact that it’s not greater than . For instance, we do not know whether is in fact equal to for all .

1.2 Preview of Main Results: Characterization of Capacity Factor

We now give an informal preview of our main results (Theorem 3), which are also summarized in Table 1. Fix any channel that is informative, such that the input signal and output message are not independent. Suppose that the system has memory-feedback. We show the following:

  1. When the receiver is memoryless (), the capacity factor is independent of , the size of the encoder memory:

  2. When the receiver has a finite memory (), the capacity factor may depend on , but only up to a constant, , whose value does not depend on or the channel :

  3. In the limit where the size of the receiver memory tends to infinity (), the (limiting) capacity factor is always , regardless of the magnitude of :


    In particular, there will be no loss of capacity region in this limiting regime.

  4. Finally, if the channel is “noisy” (to be precisely defined), and the receiver does not have infinite memory (), then the capacity factor is always strictly less than 1:


Moreover, as a by-product of the proof, all of our achievability results (Items 1 through 3) are established through constructive and implementable encoding-allocation policy pairs. The capacity factor can also be explicitly computed in closed-form for restricted families of channels and schedule sets, which we will demonstrate in Section 7.

Key implications of the main results: Our results have a number of architectural implications; we highlight some of them below.

  1. Impact of imperfect information is non-trivial. The result in Item 4 shows that the presence of a noisy channel always reduces capacity (i.e., ), regardless of the amount of memories available to the encoding or allocation policies, so long as the allocation policy does not have infinite memory.

  2. Memories play an intricate, asymmetric role. The results in Items 1 through 3 demonstrate that memories are crucial in mitigating the reduction in capacity region due to imperfect information. However, memories are substantially more important for the allocation policy than they are for the encoding policy: increase in memory for the allocation policy always improves capacity, while the benefit of having additional memory for the encoding policy becomes zero beyond some finite threshold. This asymmetry suggests design principles that favor simple, low-memory communication modules for the encoding policy, while the allocation policy must aggregate sufficient amount of past observations in order to achieve a maximum capacity region.

1.2.1 Motivating Examples of SPII

While the SPII model is highly stylized, it is motivated by a range of dynamic resource allocation systems with information constraints. We discuss in this sub-section two motivating examples:

  1. Example 1 - Scheduling in Large-Scale Data Centers. It has long been recognized that obtaining reliable system-wide state information is challenging in modern-day data centers, where many tasks are processed in a massively parallel manner, servers may fail, and bandwidths may be limited (e.g., Armbrust et al., 2010; Chowdhury and Stoica, 2015; Zats et al., 2015; Ananthanarayanan et al., 2014). For a concrete illustration, consider a manager (the receiver that decides on the allocation policy) who is operating a large-scale data center via a limited communication channel, through which the data-center servers (the encoder) transmit information about their states. Here, the channel is broadly construed as incorporating all aspects of information obfuscations, such as those due to server failures, and limiting bandwidths on the network and server I/O. For example, due to possible server failures or malfunctions, incomplete information concerning the state of the data center may be transmitted to the manager across a network link. Furthermore, because of communication constraints on the link and server I/O, the information that the manager obtains may be condensed and corrupted by randomly dropped packets. In these scenarios, our model will speak to the design of the communication protocol from the data center to the manager, as well as how she should translate the received messages into resource allocation decisions.

  2. Example 2 - Human-in-the-Loop Resource Allocation. Our model can also be applied to resource allocation problems where communications amongst human operators may be subject to errors or misinterpretations (e.g., Manojlovich et al., 2015; Pothier et al., 2005; Carter et al., 2009). For instance, one may consider a hospital setting where nurses (“encoder”), who observe the state of the ward, must communicate the relevant information and instructions to the physician (“receiver”), who will carry out the actual interventions. Misunderstanding or loss could occur when transmitting information from the nurses to the physicians leading to misplaced decisions. Because both the encoder and receiver in this system can be human agents, memories can be a highly constraint resource and it is thus important to understand how to design simple and low-memory policies that may be just as effective at delivering optimal system capacity.

1.3 Organization

The remainder of the paper is organized as follows. We formally describe our model in Section 2, and present our main results in Section 3. Section 4 discusses the related literature. The remainder of the paper is devoted to the proof of our main results, with a proof overview in Section 5 that summarizes the key techniques. We conclude the paper with some discussion in Section 10.

1.4 Notation

We reserve boldface letters for vectors and plain letters for scalars. For any scalar , denotes the smallest integer greater than or equal to , and denotes the non-negative part of . For any , and vector , we use to denote the th coordinate of . We use to denote the vector . For any , we write if for all , and we write if for all . For a set , we write (, respectively) if there exists such that (, respectively), and say that the vector is dominated (strictly dominated, respectively) by the set . For , we will use the short-hand to denote the set of consecutive integers.

For a set and function , we will use to denote the set . For any set , we use to denote the convex hull of , to denote the set of vectors strictly dominated by , and to denote the closure of the set .

For vectors , the inner product of and is denoted by . The -norm of vector is denoted by , and the -norm denoted by . The th standard unit vector in is denoted by , whose th coordinate equals and all other coordinates equal zero. The vector with all components being is denoted by .

We use the shorthand “w.p.1” to mean “with probability 1,” “i.i.d” to mean “identically and independently distributed,” and “WLOG” to mean “without loss of generality.” We use

to describe random variables that are uniformly distributed over the interval

. The indictor function of an event is denoted by .

2 The Model

We formally present the model, Stochastic Processing with Imperfect Information (SPII), in this section.

Stochastic Processing Network model. We consider a dynamic Stochastic Processing Network (SPN) evolving in discrete time (Figure 1). The system consists of queues, whose lengths at the end of the time slot are represented by . The evolution of the queues is captured by the following equation:


where and are the arrival and allocation vectors during slot , respectively. We use to denote the finite set of all allowable schedules. For each , the arrivals over time are i.i.d. Bernoulli random variables that are independent from everything else, where


We will refer to as the arrival rate vector.

WLOG we suppose that the schedule set satisfies the following three assumptions. is monotone: if , then for any with , as well. There exists some such that for each , . The schedule set has at least two distinct maximal elements. More specifically, there exist with , and for , with implies that . Assumption 2 allows for more flexible allocation decisions without impacting system performance. Assumption 2 guarantees that each queue can receive a positive service rate, and Assumption 2 rules out the possibility of having only one maximal element, in which case the single maximal element would dominate all other schedules (see Lemma A.3 of Appendix A.3 for details), and the trivial decision of always choosing that maximal element would be optimal for a wide range of performance objectives, such as maximizing throughput or minimizing queue lengths.

Signals, channels, and messages. A channel is a triplet , where and are finite sets representing the input and output alphabets, with cardinalities and , respectively, and is an -by-

row stochastic matrix, which we will refer to as the

channel matrix. Since is row stochastic, each row corresponds to a probability distribution over , the set of output alphabets, and we denote the probability distribution corresponding to the th row as . When an input signal, is sent through the channel, it leads to a (possibly random) output message, , drawn from the probability distribution . Thus, the matrix captures the stochastic distortion introduced by the channel, where the entry represents the probability that the output message of the channel is when the input signal is :


We assume the channel is memoryless, so that each output message only depends on the input signal of the present time slot, and is independent from the system’s past history. We also assume that the channel is stationary, so that for any and , the probability that the output is when the input is does not depend on time.

Encoder and encoding policies. During each time slot, an encoder situated at the queues sends a signal, , over the channel. The encoder is equipped with a finite-sized lossless memory represented by a -bit binary sequence, whose value at time is denoted by . The signal can depend on the most recent state of the queues , the most recent arrivals , the content of the memory , and possibly some idiosyncratic randomness. Formally, let be a deterministic encoding policy. Then,444 In the sequel, we will see that our achievability results for the case of finite receiver memory (Items 1 and 2 of Theorem 3; also Sections 8 and 9) are established using encoding policies that do not depend on , the most recent arrivals. However, to prove the result on infinite receiver memory (Item 3 of Theorem 3; also Appendix B), the encoding policy that we constructed makes crucial use of , which is therefore included in Eq. (12).


where is an string of i.i.d.  random variables, which are all independent from the rest of the system. In each time slot , the content of the memory, , is also updated based on , , , and , and we can formally write


for some deterministic function . If we write , then


With a slight abuse of notation, we also call the encoding policy. We will denote by the set of all encoding policies with bits of memory. We use to denote the set of possible values for the encoder memory when bits are allowed. When the context is clear, we often suppress the dependence on and simply write .

Roughly speaking, the size of the memory, , serves as a measure of “complexity” of an encoding function. A special case is when , where the encoder is equipped with no memory and the signal depends only on the current state of the queues. We will refer to a policy as a memoryless encoding policy.

Receiver and allocation policies. The signal passes through the channel and results in a message, at the receiver. The responsibility of the receiver is to choose, at each time slot, the allocation vector . However, the receiver is not able to observe the state of the queues directly, so the allocation decisions can only rely on information provided by the encoder through the channel. Similar to the encoder, the receiver is equipped with a memory of bits, whose state in slot is denoted by . Let be a deterministic allocation policy, such that


where is an string of i.i.d.  random variables, which are all independent from the rest of the system. In each time slot , the content of the receiver memory, , is also updated. However, different from the allocation decisions, is updated with a time lag, and it depends on , and , the message from an earlier time slot, instead of , the most recent message.555The one-step lag can be removed without substantially changing the results; it serves the purpose of simplifying the notation and proof. Formally,


for some deterministic function . Similar to the encoder side, we also call the allocation policy.

We will denote by the set of all allocation policies with bits of memory. Analogous to the encoding policies, an allocation policy with no memory generates the allocation decision using only the current message, . We will refer to a policy as a memoryless allocation policy. The set of possible values for the receiver memory is denoted by when bits are allowed. Similar to , when the context is clear, we suppress dependence of on and simply write .

In this paper, we are primarily interested in the dynamics of the tuple


It is not difficult to verify that under any well-defined encoding-allocation policy pair , is a countable-state Markov chain. Therefore, for any time , we call the system state at time , and from now on, we restrict our attention to pairs of encoding and allocation policies under which the Markov chain is irreducible.

Memory-feedback. We say that the system has memory-feedback if the state of the receiver memory, , is accessible by the encoding policy in time slot for generating the message and updating the encoder memory state . That is, under this assumption, Eq. (12) would become:


As alluded to in the Introduction, note that the model already assumes a mode of feedback: the scheduling decision, , can be sent to the queues without obstruction, implying that the backward communication from the receiver to the encoder is lossless. Therefore, under the memory-feedback assumption, in addition to sending , the allocation policy also includes the state of its own memory in the backward communication.

2.1 Main Performance Metric: Capacity Factor

We define in this subsection the main performance metric of this paper, the capacity factor. Fix a pair of encoding and allocation policies, and , respectively. We say the system is stable if is positive recurrent. Define the maximum capacity region, , to be the set of all vectors strictly dominated by the convex hull of the schedule set :


Note that because the schedule set satisfies Assumptions 2 and 2, it is not difficult to see that .

We now define our main performance metric.

[Capacity Factor] Fix a channel , and .

  1. Consider encoding policy and allocation policy . Define to be the capacity region under the policy pair :


    We also define the capacity factor of the channel under the policy pair , denoted by , to be

  2. The -capacity factor of channel , denoted by , is defined to be


    When the context is clear, sometimes we just write the “capacity factor” to mean .

Some elementary properties of capacity factor. Before we proceed, we state some elementary properties of capacity factor. First, for any channel, , must be non-decreasing in both and , because the capacity region can never decrease with more memory. Furthermore, since is upper-bounded by by definition, by the Monotone Convergence Theorem, we have the following:


In particular, the limits in which we take , , or both to are well defined.

3 Main Results

We formally state the main results in this section. We begin with two definitions. [Informative Channels] A channel is said to be informative if the corresponding channel matrix, , admits at least two distinct rows. A channel whose channel matrix has identical rows is called uninformative. The purpose of Definition 3 is to rule out degenerate channels: simply put, a channel is informative if and only if its output is not independent of the input. The next definition speaks to the other extreme by describing channels that are sufficiently noisy.

[-Majorizing Channels] Fix . We say that a channel is -majorizing if its corresponding channel matrix, , can be written as


where and are two row-stochastic matrices, such that the rows of are identical, and every column of has at least one zero entry. Roughly speaking, an -majorizing channel can be interpreted as having at most -portion of the channel being “completely uninformative.” For technical reasons, we will also assume that the “uninformative portion,” , of an -majorizing channel, is everywhere positive:666Intuitively, since is completely uninformative, whether Assumption 3 is satisfied or not should have little impact on performance. Assumption 3 is used in some of the subsequent proofs to ensure that states of the chain are “easily reachable” from each other. For more details, see Appendix A.3. Let be an -majorizing channel, and let be as in (24). Then, for all and , .

The following theorem is the main result of this paper. The same results are summarized in Table 1, where the rows of the table correspond to Items 1 through 3 in the theorem, respectively. We will assume that the SPII architecture has memory-feedback (See Appendix B.3 for a discussion a scenario without memory-feedback.)

[Characterization of Capacity Factor] Fix the number of queues, and a finite schedule set, . Let be an informative channel (Definition 3). Suppose the system has memory-feedback. The capacity factor, , satisfies the following:777In this theorem, the notation should be interpreted as a short-hand for belonging to the extended non-negative integers: .

  1. Memory-less receiver: when , we have that

  2. Finite-memory receiver: when , we have that


    where depends only on the structure of the schedule set, , and the channel’s input alphabet, .

  3. Infinite-memory receiver: when , we have that

  4. Suppose, in addition, that the channel is also -majorizing for some (Definition 3), and it satisfies Assumption 3. Then, for all , we have


Remark (The Importance of Memory).

At this point, let us provide some remarks regarding the fundamental importance of memory in our model. In this paper, we are primarily concerned with the stability analysis of dynamic control policies in the SPII. Similar to the stability analysis of conventional Stochastic Processing Networks, our problem can be viewed as a relaxed version of an infinite-horizon average-cost Markov Decision Process (MDP): instead of trying to minimize the long-run average total queue size, we are concerned with the

a priori simpler question of whether the long-run average total queue size can be made finite. Since an average-cost MDP typically admits optimal stationary policies that do not require additional auxiliary memory, it may seem natural to expect the same of an optimal pair of encoding and allocation policies in SPII as well. There is, however, a caveat: while for any fixed instance of SPII (hence a fixed MDP) this may be true, achieving a large capacity region, on the other hand, requires us to identify a single policy pair that performs well across a set of different MDPs (parameterized by the arrival rate vector, in our case), which, in general, cannot be accomplished by a single stationary policy (cf. Definition 2.1). Indeed, in order for a single policy pair to perform well across a diverse set of problem instances, additional memory is necessary for the policy pair to keep track of relevant information and adapt to the specific instance over time. Viewed from this angle, the throughput optimality of the original Max-Weight policy for SPNs (which is stationary) is a rather remarkable and surprising result (Tassiulas and Ephremides, 1992). However, the Max-Weight policy crucially relies on being able to fully observe the queue lengths, and, unfortunately, under imperfect information, our results demonstrate that it is no longer possible to achieve maximal capacity region without memory. In particular, Theorem 3 shows that receiver memory is necessary for a policy to obtain a large capacity region (Items 3 and 4), while encoder memory seems to be less crucial (Items 1 through 3). A major open problem is that, when the receiver has finite but non-zero memory, whether the encoder memory is needed at all, i.e., whether for .

4 Related Literature

The challenges in obtaining reliable and timely access to state information have long been recognized in large-scale dynamic resource allocation problems. One prominent example is the celebrated “power-of-two-choices” (PoT) routing algorithm (Vvedenskaya et al., 1996; Mitzenmacher, 2001) for load-balancing. Designed to address the lack of full queue-length information in a system with a large number of parallel queues, the PoT algorithm routes an incoming job to the shorter one between two randomly sampled queues. The same design consideration underlies pull-based variants of PoT (Badonnel and Burgess, 2008; Lu et al., 2011; Stolyar, 2015, 2017), and the partially centralized scheduling policy by Tsitsiklis and Xu (2012) that has access to complete queue-length information only a small fraction of the time. Beyond the realm of computer networks, information constraints are also prominent in systems with humans in the loop. For instance, communication failures and misunderstanding between physicians and nurses have been cited as a leading cause of adverse events in healthcare, and specialized messaging and decision protocols have been developed to minimize the impact of errors (Manojlovich et al., 2015); see Pothier et al. (2005); Carter et al. (2009) for other examples of information loss among healthcare providers. While information constraints play a central role in the aforementioned models and applications, in contrast to our work, they often serve as an implicit motivation behind a chosen design, rather than an explicit constraint with respect to which an optimal policy is to be identified. As a result, there has been little understanding as to what policies are “optimal” for a given level of information availability, and what the fundamental impact information has on system performance.

Taking a more principled approach to policy design, several recent papers have aimed at rigorously quantifying the performance impact of information in dynamic resource allocation. Gamarnik et al. (2018) characterize how the average delay in a load-balancing system scales depending on the rate of messaging between the dispatcher and the servers, as well as the size of the dispatcher memory. In the context of queueing admission control, Spencer et al. (2014) and Xu (2015) quantify how the system’s optimal heavy-traffic delay scales as a function of the amount of future information available. In contrast to our approach, however, these papers largely focus on a specific model of information constraint, e.g., captured by the rate of messaging or length of the lookahead window, while our framework allows for a substantially more general family of information models, achieved by using different channels. To the best of our knowledge, the present paper is one of the first attempts at rigorously establishing the link between information and the performance of a resource allocation system at this level of generality.

At a high-level, our framework is partly inspired by information theory, and more specifically, the research on feedback control under communication constraints (Tatikonda and Mitter, 2004; Sahai and Mitter, 2006); see Yüksel and Başar (2013) for a survey. This literature studies the problem of stabilizing (i.e., minimizing the magnitude of the state) a linear dynamical system of the form: , where is the state, a noise disturbance, the control action, and a gain matrix, and the decision maker has access to the state only through a rate-limited communication channel, similar to the scenario depicted in Figure 1. While our framework also admits a feedback loop over a communication channel, the dynamics in our problem differ fundamentally from those in a linear dynamical system, and consequently, so do the design approaches and analysis. The difference stems from the fact that the state process in a linear dynamical system is driven multiplicatively by the gain matrix, , whereas in our system, it is updated in an additive manner (see Eq. (7)). Consequently, in the control setting, even when all parameters are known (e.g., gain matrix, noise distribution, etc), an informative channel is still necessary for stabilization (e.g., Tatikonda and Mitter, 2004). In sharp contrast, if all parameters are known in our SPII, it becomes trivial to stabilize the queues without any feedback: the decision maker can simply choose a stationary, randomized allocation policy that matches the arrival rate vector .

A major theme on the dynamic control of SPNs concerns the setting where the decision maker does not have complete information about the underlying system. For example, the classical Max-Weight policy (Tassiulas and Ephremides, 1992; Dai and Lin, 2005; Mandelbaum and Stolyar, 2004) is oblivious to detailed statistics of the arrival process. There is also a literature that addresses the setting where the service rates (or the service time distributions) are not known completely, and the decision maker needs to either learn these parameters or develop scheduling policies that do not depend on service rates (Stolyar and Yudovina, 2012; Pedarsani et al., 2017; Dimakis and Walrand, 2006; Baharian and Tezcan, 2011; Krishnasamy et al., 2016, 2018). Let us note that in all this literature, even though the decision maker has partial or no information on system parameters, she has full information on system states. In contrast, our model only assumes (often severely) noisy observations of system states. This fundamental difference requires us to take a very different approach in designing policies, and to develop new tools for analyzing them.

On the methodological front, our program involves the development of a host of new techniques, largely from first rinciples, by combining ideas from areas such as information theory, learning and queueing theory. As a sub-module of one of the policies we propose, we also create a simple yet powerful generalization of the Max-Weight policy, in which individual Markov chains are selected dynamically, in a manner analogous to how schedules are used in a conventional Max-Weight policy.

5 Overview of Proof Techniques

The remainder of the paper is devoted to the proof of Theorem 3, and we provide in this section an overview of the key ideas.

Item 4 of Theorem 3 (Section 7) - Capacity factor is less than one for -majorizing channels. We show that any level of noise in the form of an -majorizing channel always reduces the capacity region. The intuition is that the channel noise causes any output symbol to appear with sufficiently positive probability, independently from the underlying arrival rate vector, and thus limits the allocation policy’s ability to adapt to different arrival rates with sufficient precision (Lemma A.3 of Appendix A.3). The proof employs a lifting argument, whereby we carry out the analysis in a higher-dimensional product space for the output alphabet, with the added dimension capturing the realization of the channel noise. A coupling argument is then used to show that noise reduces capacity region. The non-degeneracy Assumption 2 on the schedule set ensures that there are at least two distinct, “extremal” arrival rate vectors in the maximum capacity region , which, due to the limited adaptability of the allocation policy, cannot be both stabilized, implying that the capacity factor is less than one (see e.g., proof of Theorem 7 in Appendix A.3 and proof of Theorem 7 in Appendix A.4). Let us also note that for the special case of a memoryless receiver (), we were able to obtain a tight characterization of the corresponding capacity factor under any -majorizing channel, by providing an achievable upper bound (Theorems 7 and 7).

Items 1 and 2 of Theorem 3 (Sections 8 and 9) - Memoryless and Finite-Memory Receivers. This is the most technically challenging part of our program. We begin with the simple case of a memoryless receiver (), where the size of the encoder memory has no effect on the capacity factor (Theorem 8 of Section 8). The key intuition is a change of perspective: instead of viewing the receiver (allocation policy) as the one making scheduling decisions based on noisy information, it turns out that the correct way to design the system is to treat the encoder as the more “intelligent” of the two policies that conducts Max-Weight-like scheduling, where the encoding policy treats the set of input symbols, , as its “scheduling actions.” By formulating a transformed, but equivalent, scheduling problem from the perspective of the encoder, we then use a version of the Max-Weight policy to establish stability.

For the general case of , we obtain a slightly weaker result than that of , showing that the encoder’s memory size becomes irrelevant after a finite threshold, . The proof builds on the same intuition of viewing the encoding policy as the main decision maker. However, a non-trivial receiver memory will mandate a substantially more sophisticated argument. This is because in each time slot, the induced service action no longer depends solely on the input signal , as in the case of memoryless receiver; it will now also depend on the state of the receiver memory, which is by itself a stochastic process, and hence conventional Lyapunov arguments for Max-Weight cannot be applied. Instead, we will formulate a generalized version of the conventional Max-Weight policy, dubbed the Episodic Max Weight (EMW) policy, where the encoding policies switches between a family of Markov chains, as opposed to input symbols, in a manner that is analogous to how schedules are used in conventional Max-Weight (Section 9.2). The stability proof heavily exploits a certain conditional independence properties among different elements of the overall process (proof of Proposition 9.1 in Appendix A.2), which in turn was a result of the feedback structure of the SPII.

Item 3 of Theorem 3 (Appendix B) - Infinite-Memory Receiver. The last part of the proof shows that as the receiver memory size , the capacity factor always converges to , regardless of the size of encoder memory. The argument is relatively straightforward compared to the other parts. Since the receiver has abundant memory in this regime, the main idea will be shifting the burden of decision-making back to the receiver. We construct an Episodic Greedy Learning (EGL) policy, where the receiver first estimates the arrival rates from the noisy messages, and subsequently deploys a randomized schedule that dominates the estimated arrival rate vector in expectation. With more memory, the receiver is able to estimate arrival rates more accurately, leading to capacity factors that are arbitrarily close to .

6 Preliminaries

The main purpose of this section is to establish some results and conventions that will be used throughout the remainder of the paper. Section 6.1 introduces a generalized formulation of the Max-Weight policy, whose stability properties will be used as a sub-module in subsequent proofs. Since our primary focus is on stability, we will often be concerned with the question of whether the long-run average service rates dominate the arrival rates. Section 6.2 formalizes the notion of long-run average service rates for our model, which will be used extensively in later sections.

6.1 A Generic Max-Weight Stability Theorem

In this section, we present a simple generalization of the stability result of the celebrated Maximum Weight (Max-Weight) policy (Tassiulas and Ephremides, 1992) to a class of systems that is more general than those typically seen in prior literature (e.g., Stolyar, 2004; Dai and Lin, 2005; Shah et al., 2012). This result, Proposition 6.1, will be used as a basic building block in our subsequent proofs. We first describe the setup, and then present the stability result. The proof is a simple modification of the standard stability proof of the Max-Weight policy, which we include in Appendix A.1 for completeness.

Consider a discrete-time, irreducible Markov chain with two components and , so that for any time , . Here takes value in a finite set , and evolves according to the following dynamics:


Here, , and are all random vectors taking values in . It is useful to think of the system dynamics in the following way. During each time slot , upon observing the current queue size vector , the system makes the service allocation decision , which is used twice in the current slot. First, it is used as much as possible to reduce the queue sizes , as represented by the term . There may be residual services left from this first use of . Then, arrivals take place, and some of the residual services may be used to serve the arrivals ; this portion of the residual services is denoted by .

More formally, we require the random vectors

to be i.i.d. with finite second moment, and independent from the rest of the system, with


are also required to have finite second moments, which are uniformly bounded above by a constant that does not depend on the time index . Furthermore, given , is conditionally independent from the past history . are required to satisfy the following properties w.p.:

  • , and guarantees that ; and

  • given , is conditionally independent from the past history .

Let us provide some remarks about the differences between the process and those in prior literature, as well as how these differences are used in the present paper. Compared to other works on SPNs  (e.g., Shah et al., 2012), has the additional component , which is included to model the signals and the memory contents and . The “residual service” term , also absent from related prior literature, will be used in Section 9.3 to describe the dynamics of (recall (15)) under a so-called Episodic Max-Weight policy.

We are now ready to state the stability result. The proof is given in Appendix A.1. Let be a non-empty, finite subset of that satisfies Assumption 2. Define the sets


Suppose that , and for all ,


Then is positive recurrent. An immediate corollary of Proposition 6.1 is the following. Suppose that the residual services are zero w.p.1, so that follows the dynamics given by (7). Let , and be exactly as in Proposition 6.1. Then is positive recurrent.

6.2 Stationary Service Rates in SPII

In conventional SPN systems, it is a simple fact that a stabilizing scheduling policy induces stationary service rates that dominate corresponding arrival rates. The presence of channel, encoder and receiver in the SPII architecture slightly complicates the notion of stationary service rates. For this reason, we provide in this subsection some discussion on how stationary service rates will be defined.

Consider a channel , constants , and a policy pair with and . Suppose that when the arrival rate vector is , the process is positive recurrent under the policy pair . Then, has a unique stationary distribution, whose corresponding probability we denote by .

Next, we define some further notation. First, for each , define


to be the stationary probability of sending signal , and let . By the ergodicity of the Markov chain , we also know that w.p.,


where represents the long-run average fraction of time that the sent signal is . Second, for each and , the following conditional probability is well-defined:


We use to denote the -matrix , and call it the rate allocation matrix. Finally, we also represent the set of allowable schedules in a matrix form. Let be the schedule matrix where


Let us note that in general, and may depend on , whereas does not. With the preceding notation in mind, we define the stationary service rate vector to be


so that for each , represents the long-run average service rate offered to queue . The following lemma is a simple but useful fact. The proof follows simply from the fact that is positive recurrent. Let , , and be described as above, and be defined by (37). Then, .

7 Capacity Factors Are Non-Trivial For -Majorizing Channels

In this section, we will establish Item 4 of Theorem 3, that the capacity factor as a performance metric is non-trivial for -majorizing channels (recall Definition 3), i.e., that they are in general less than . Towards this end, we prove the following: Consider a system with schedule set that satisfies Assumptions 2, 2 and 2. Fix any finite , recall that means , and . Then, for any -majorizing channel ,


The main idea behind the proof of Theorem 7 is that because the channel is -majorizing, each combination of output symbol and receiver memory content appears with sufficiently positive probability. As a result, no allocation policy (with finite memory) is adaptive enough to stabilize all arrival rate vectors in . The proof is given in Appendix A.3.

The next theorem states a stronger result, for the special case of memoryless receiver, i.e., . It gives a tight characterization of the -capacity factor of any -majorizing channel, by providing an achievable upper bound, , on the capacity factors.

Consider a system with schedule set that satisfies Assumptions 2, 2 and 2. Then, for any , there exists a constant , which only depends on and , such that:

  1. For any -majorizing channel , and for any ,

  2. Conversely, there exists an -majorizing channel, , such that


Next, we state Theorem 7, a special case of Theorem 7, for systems of parallel queues with a single server, where we derive a simple, explicit expression for the bound . The proof of Theorem 7 is more abstract than that of Theorem 7 and involves using general properties of convex polyhedra, but it follows a similar argument and is provided in Appendix A.4.

While more restricted in scope, Theorem 7 best illustrates the key intuition present in the more general version of the result. Consider the following system that consists of one server and parallel queues, where the queues are indexed by . In each time slot, the server picks one queue, so that exactly one job departs from the chosen queue if it is non-empty, and no departure occurs in the system, otherwise. Thus, the schedule set is given by


where is the zero vector, and is the standard th unit vector, . It is easy to see that the capacity region of this system is given by


The following theorem provides a tight characterization on the capacity factors of -majorizing channels, when the allocation policy is memoryless. Fix . Consider a single-server system with parallel queues. Then, the following hold.

  1. For any -majorizing channel, , and any ,

  2. Conversely, there exists an -majorizing channel, , such that


The proof of Theorem 7 is given in Appendix A.6

8 Memoryless Receiver

In this section, we show Item 1 of Theorem 3: if the allocation policy is memoryless (i.e., ), then there exists some memoryless encoding policy , which achieves the optimal capacity factor among all encoding policies, memoryless or not. In short, when the receiver does not have memory, then one will not benefit from any memory at the encoder, either.

Fix any channel . We have that


The proof of Theorem 8 advances a design philosophy that will form the core of the proofs in subsequent sections. The key is to view the encoding policy, rather than the allocation policy, as the true “decision maker,” even though the latter physically selects the schedule. Concretely, note that for any fixed allocation policy, the expected allocation is solely determined by the input symbol. As such, we may implement a Max-Weight-like policy at the encoder, where the “maximization” is performed across the input symbols, as opposed to physical schedules.

It is clear that more encoder memory cannot degrade performance, i.e., , so it suffices to show that . For the rest of the proof, we write for to simplify notation.

Fix . We will show that by constructing a policy pair with and , under which . By the definition of , there exist , encoding policy and allocation policy , such that , the capacity factor of under , satisfies . We will fix such an allocation policy, , in the remainder of the proof.

Let , and consider a system with arrival rate vector , operating under the policy pair . Let the stationary service rate vector be defined by (37), and , and defined by (33), (35) and (36) respectively. Then, by Lemma 6.2, . Furthermore, because the allocation policy is memoryless, the matrix depends only on the policy , but not on . Thus, the matrix is fixed, and , where denotes the set consisting of the rows of :


Here represents the th row of . Since is arbitrary except , this implies that


We are now ready to define the memoryless encoding policy and prove that . For all time , the sent signal is a deterministic function of , defined to be




with ties are broken arbitrarily. Note that is the vector of expected allocation, conditional on alphabet being chosen to be the signal. As such (48) can be viewed as a Max-Weight policy implemented by the encoder, where the “schedules” indeed correspond to the set of input symbols.

Let us summarize the encoding-allocation policy pair that we have constructed so far:

  1. encoding policy : send a signal according to Eq. (48). Note that the optimization in Eq. (48) only involves the current queue lengths, so is memoryless.

  2. allocation policy : one that corresponds to the allocation policy matrix defined by (35).

Under the pair , the expected services offered at time is given by . The set satisfies Assumption 2 because of Eq. (47), and the fact that satisfies Assumption 2. Thus, we can apply Corollary 6.1 to the process , and conclude that for any , is positive recurrent under the pair . This implies that