# A Swiss Army Knife for Dynamic Caching in Small Cell Networks

We consider a dense cellular network, in which a limited-size cache is available at every base station (BS). Coordinating content allocation across the different caches can lead to significant performance gains, but is a difficult problem even when full information about the network and the request process is available. In this paper we present qLRU-Δ, a general-purpose dynamic caching policy that can be tailored to optimize different performance metrics also in presence of coordinated multipoint transmission techniques. The policy requires neither direct communication among BSs, nor a priori knowledge of content popularity and, under stationary request processes, has provable performance guarantees.

## Authors

• 10 publications
• 9 publications
• 1 publication
• 8 publications
• ### Implicit Coordination of Caches in Small Cell Networks under Unknown Popularity Profiles

We focus on a dense cellular network, in which a limited-size cache is a...
04/05/2018 ∙ by Emilio Leonardi, et al. ∙ 0

• ### Edge Caching in Delay-Constrained Virtualized Cellular Networks: Analysis and Market

Caching of popular contents at cellular base stations, i.e., edge cachin...
02/13/2018 ∙ by Tachporn Sanguanpuak, et al. ∙ 0

• ### Dynamic Coded Caching in Wireless Networks

We consider distributed and dynamic caching of coded content at small ba...
02/19/2020 ∙ by Jesper Pedersen, et al. ∙ 0

• ### Full-Duplex Radios for Edge Caching

This chapter focuses on the performance enhancement brought by the addit...
04/08/2020 ∙ by Italo Atzeni, et al. ∙ 0

• ### Stream Reasoning-Based Control of Caching Strategies in CCN Routers

Content-Centric Networking (CCN) research addresses the mismatch between...
10/13/2016 ∙ by Harald Beck, et al. ∙ 0

• ### Dynamic Edge Caching with Popularity Drifting

Caching at the network edge devices such as wireless caching stations (W...
09/12/2018 ∙ by Linqi Song, et al. ∙ 0

• ### On the Fault Tolerant Distributed Data Caching using LDPC Codes in Cellular Networks

The proliferation of mobile data has worked its way to become commonplac...
10/28/2020 ∙ by Elif Haytaoglu, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

In the last years, we have witnessed a dramatic shift of traffic at network edge, from the wired/fixed component to the wireless/mobile segment. This trend, mainly due to the huge success of mobile devices (smartphones, tablets) and their pervasive applications (Whatsapp, Instagram, Netflix, Spotify, Youtube, etc.), is expected to further strengthen in the next few years, as testified by several traffic forecasts. For example according to CISCO [CISCO] in the 5 year interval ranging from 2017 to 2022 traffic demand on the cellular network will approximately increase by a factor of 9. As a consequence, the access (wireless and wired) infrastructure must be completely redesigned by densifying the cellular structure, and moving content closer to users. To this end, the massive deployment of caches within base stations of the cellular network is essential to effectively reduce the load on the back-haul links, as well as limit latency perceived by the user.

This work considers a dense cellular network scenario, where caches are placed at every Base Station (BS) and a significant fraction of users is “covered” by several BSs (whose cells are said to “overlap”). The BSs in the transmission range of a given user can coordinate to offer a seamless optimized caching service to the user and possibly exploit coordinated multipoint (CoMP) techniques [lee12] on the radio access. We remark that, as soon as there are overlapping BSs, finding the optimal static content allocation becomes an NP-hard problem, even when the request process is known, the metric to optimize is the simple cache hit ratio, and coordinated transmissions are not supported [shanmugam13]. But realistic scenarios are more complex: popularities are dynamic and unknown a priori and more sophisticated metrics (e.g., PHY-based ones) that further couple nearby BSs-caches are of interest. Moreover, centralized coordination of hundreds or thousands of caches per km (e.g., in ultra-dense networks) is often infeasible or leads to excessive coordination overhead.

In such a context, our paper provides an answer to the open question about the existence of general (computationally efficient) distributed strategies for edge-cache coordination, which are able to provide some guarantees on global performance metrics (like hit ratio, retrieval time, load on the servers, etc.). In particular, we propose a new policy—LRU-—which provably achieves a locally optimal configuration for general performance metrics.

LRU- requires a simple modification to the basic behaviour of LRU [garetto16]. Upon a hit at a cache, LRU-

moves the corresponding content to the front of the queue with a probability that is proportional to the marginal utility of storing this copy. Upon a miss, it introduces the new content with some probability

. LRU- inherits from LRU

computation time per request and memory requirements proportional to the cache size. Its request-driven operation does not need a priori knowledge of content popularities, removing a limit of most previous work. Some information about the local neighborhood (e.g., how many additional copies of the content are stored at close-by caches also serving that user) may be needed to compute the marginal gain. Such information, however, is limited, and can be piggybacked on existing messages the user sends to query such caches, or even on channel estimates messages mobile devices regularly send to nearby BSs

[LTE-book]. As an example, we show that LRU- is a practical solution to optimize hit ratio, retrieval time, load on the servers, etc., both when a single BS satisfies the user’s request and when multiple BSs coordinate their transmissions through CoMP techniques.

### I-a Related work

We limit ourselves to describe work that specifically addresses the caching problem in dense cellular networks.

The idea of coordinating the placement of contents at caches, which are closely located at BSs, was first proposed in [Caire12] and its extension [shanmugam13]

under the name of FemtoCaching. This work assumes that requests follow the Independent Reference Model (IRM) and geographical popularity profiles are available, i.e. content requests are independent and request rates are known for all cell areas and their intersections. Finding the optimal content placement that maximizes the hit ratio is proved to be an NP-hard problem, but a greedy heuristic algorithm is shown to guarantee a

-approximation of the maximum hit ratio. In [Poularakis14], the authors have generalized the approach of [Caire12, shanmugam13], providing a formulation for the joint content-placement and user-association problem that maximizes the hit ratio. Efficient heuristic solutions have also been proposed. Authors of [Naveen15] have included the bandwidth costs in the formulation, and have proposed an on-line algorithm for the solution of the resulting problem. In [Chattopadhyay16], instead, the authors have designed a distributed algorithm based on Gibbs sampling, which is shown to asymptotically converge to the optimal allocation. [Anastasios2] revisits the optimal content placement problem within a stochastic geometry framework and derives an elegant analytical characterization of the optimal policy and its performance. In [avrachenkov17] the authors have developed a few asynchronous distributed content placement algorithms with polynomial complexity and limited communication overhead (communication takes place only between overlapping cells), whose performance has been shown to be very good in most of the tested scenarios. Still, they assume that content popularities are perfectly known by the system. Moreover they focus on cache hit rates, and do not consider CoMP.

One of the first papers that jointly considers caching and CoMP techniques was [ao15]: two BSs storing the same file can coordinate its transmission to the mobile user in order to reduce the delay or to increase the throughput. The authors consider two caching heuristics: a randomized caching policy combined with maximum ratio transmission precoding and a threshold policy combined with zero forcing beamforming. While they derive the optimal parameter setting of such heuristics, these policies are in general suboptimal with no theoretical performance guarantee. [tuholukova17] addresses this issue for joint transmissions techniques. The authors prove that delay minimization leads to a submodular maximization problem as long as the backhaul delay is larger than the transmission delay over the wireless channel. Under such condition, the greedy algorithm provides again a guaranteed approximation ratio. [chen17] considers two different CoMP techniques, i.e., joint transmission and parallel transmission, and derives formulas for the hit rate using tools from stochastic geometry.

Nevertheless, all aforementioned works hold the limiting assumption in [Caire12] that geographical content popularity profiles are known by the system. Reliable popularity estimates over small geographical areas may be very hard to obtain [leconte16]. On the contrary, policies like LRU and its variants (LRU, 2LRU, …) do not rely on popularity estimation and are known to well behave under time-varying popularities. For this reason they are a de-facto standard in most of the deployed caching systems. [giovanidis16] proposes a generalization of LRU to a dense cellular scenario. As above, a user at the intersection of multiple cells, can check the availability of the content at every covering cell and then download from one of them. The difference with respect to standard LRU is how cache states are updated. In particular, the authors of [giovanidis16] consider two schemes: LRU-One and LRU-All. In LRU-One, each user is assigned to a reference cell/cache and only the state of her reference cache is updated upon a hit or a miss, independently from which cache the content has been retrieved from. In LRU-All, the state of all caches covering the user is updated.

Recently, [paschos19] has proposed a novel approach to design coordinated caching polices in the framework of online linear optimization. A projected gradient method is used to tune the fraction of each content to be stored in a cache and regret guarantees are proved. Unfortunately, this solution requires to store pseudo-random linear combinations of original file chunks, and, even ignoring the additional cost of coding/decoding, it has computation time per request as well as memory requirements, where is the catalogue size. Also, coding excludes the possibility to exploit CoMP techniques, because all chunks are different.

Lastly, reference [leonardi18jsac] proposes a novel approximate analytical approach to study systems of interacting caches, under different caching policies, whose predictions are surprisingly accurate. The framework builds upon the well known characteristic time approximation [che02] for individual caches as well as an exponentialization approximation. We also rely on the same approximations, which we are described in Sect. IV. [leonardi18jsac] also proposes the policy LRU-Lazy, whose adoption in a dense cellular scenario is shown to achieve hit ratios very close to those offered by the greedy scheme proposed in [Caire12] even without information about popularity profiles. LRU- generalizes LRU-Lazy to different metrics as well as CoMP transmissions.

### I-B Paper Contribution

The main contribution of this paper is the proposal of LRU-, a general-purpose caching policy that can be tailored to optimize different performance metrics. The policy implicitly coordinates caching decisions across different caches also taking into account joint transmission opportunities. LRU- is presented in details in Sect. III, after the introduction of our network model in Sect. II.

Sect. IV is devoted to prove that, under a stationary request process, LRU- achieves a locally optimal configuration as the parameter converges to . This means that it is not possible to replace a single content at one of the caches and still improve the performance metric of interest. The proof is technically sophisticated relying on the characterization of stochastically stable states through an opportune potential originally proposed in [young93].

In order to illustrate the flexibility of LRU-, we show in Sect. V how to particularize the policy for two specific performance metrics, i.e., the hit rate and the retrieval delay under CoMP. While our theoretical guarantees hold only asymptotically, numerical results show that LRU- with already approaches the performance of the static allocation obtained through greedy, which, while not provably optimal, is the best baseline we can compare to. Note that the greedy algorithm requires complete knowledge of network topology, transmission characteristics, and request process, while LRU- is a reactive policy that relies only on a noisy estimation of the marginal benefit deriving from a local copy.

We remark that the goal of LRU- and this paper is not to propose “the best” policy for any scenario with “coupled” caches, but rather a simple and easily customizable policy framework with provable theoretical properties. Currently, new caching policies designed for a particular scenario/metric are often compared with classic policies like LRU or LFU or the more recent LRU-One and LRU-All. This comparison appears to be quite unfair, given that these policies 1) ignore or only partially take into account the potential advantage of coordinated content allocations and 2) all target the hit-rate as performance metric. LRU- may be a valid reference point, while being simple to implement. A Swiss-army knife is a very helpful object to carry around, even if each of its tools may not be the best one to accomplish its specific task.

## Ii Network model

We consider a set of base stations (BSs) arbitrarily located in a given region , each equipped with a local cache with size . Users request contents from a finite catalogue of size . A specific allocation of copies of content

across the caches is specified by the vector

, where (resp. ) indicates that a copy of is present (resp. absent) at BS . Let be the vector with a in position and all other components equal to . We write to indicate a new cache configuration where a copy of content is added at base station , if not already present. Similarly, indicates a new allocation where there is no copy of content at . Finally, we denote by , the specific allocation at time .

When user requests and receives content , some network stakeholder achieves a gain that we assume to depend on user , content and the current allocation of content ’s copies (). We denote the gain as . For example, if the key actor is the content server, could be the indicator function denoting if can retrieve the content from one of the local caches (reducing the load on the server). If it is the network service provider, could be the number of bytes caching prevents from traversing bottleneck links. Finally, if it is the user, could be the delay reduction achieved through the local copies. We consider that , i.e., if there is no copy of content , the gain is zero.

The gain

may be a random variable. For example, it may depend on the instantaneous characteristics of the wireless channels, or on some user’s random choice like the BS from which the file will be downloaded. We assume that, conditionally on the network status

and the user , these random variables are independent from one request to the other and are identically distributed with expected value .

Our theoretical results hold under a stationary request process. In particular, we consider two settings. In the first one, there is a finite set of users located at specific positions. Each user requests the different contents according to independent Poisson process with rates for . The total expected gain per time unit from a given placement is

 Gf(xf)=U∑u=1λf,uE[gf(xf,u)]. (1)

In the second setting, a potentially unbounded number of users are spread over the region according to a Poisson point process with density . Users are indistinguishable but for their position . In particular, a user in generates a Poisson request process with rate and experiences a gain . The total expected gain from a given placement of content  copies is in this case

 Gf(xf)=∫Rλf(r)E[gf(xf,r)]μ(r)dr. (2)

In what follows, we will refer to the marginal gain from a copy at base station . When the set of users is finite, we define the following quantities, respectively for a given user and for the whole network:

 Δg(b)f(xf,u)≜gf(xf,u)−gf(xf⊖e(b),u), (3) ΔG(b)f(xf)≜Gf(xf)−Gf(xf⊖e(b)) (4)

It is possible to definite similarly when users’ requests are characterized by a density over the region . In what follows, we will usually refer to the case of a finite set of users, but all results hold in both scenarios.

Using (4) and the fact that , it is easy to check that the gain from a given cache configuration can be computed as follows:

 Gf(xf)=B∑b=1ΔG(b)f(x(1)f,…,x(b)f,0,…,0). (5)

We would like our dynamic policy to converge to a content placement that maximizes the total expected gain, i.e.,

 maximizex1,x2,…,xF G(x)≜F∑f=1Gf(xf) (6) subject to F∑f=1x(b)f=C∀b=1,…,B, x(b)f∈{0,1}∀f=1,…,F, ∀b=1,…,B.

even in the absence of a priori knowledge about the request process. In the three specific examples we have mentioned above, solving problem (6) respectively corresponds to 1) maximize the hit ratio, 2) minimize the network traffic, and 3) minimize the retrieval time. This problem is in general NP-hard, even in the case of the simple hit ratio metric [shanmugam13].

## Iii qLru-Δ

We describe here how our system operates and the specific caching policy we propose to approach the solution of Problem (6).

When user has a request for content , it broadcasts an inquiry message to the set of BSs () it can communicate with. The subset () of those BSs that have the content stored locally declare their availability to user . If no local copy is available, the user sends the request to one of the BSs in , which will need to retrieve it from the content provider.111 This two-step procedure introduces some additional delay, but this is inevitable in any femtocaching scheme where the BSs need to coordinate to serve the content. If a local copy is available () and only point-to-point transmissions are possible, the user sends an explicit request to download it to one of the BSs in . Different user criteria can be defined to select the BS to download from (e.g., SNR, or pre-assigned priority list [LTE-book]); for the sake of simplicity, in this paper, we assume that the user selects uniformly at random one of them. If CoMP techniques are supported, then all the BSs in coordinate to jointly transmit the content to the user.

Our policy LRU- works as follows. Each BS with a local copy () moves the content to the front of the cache with probability proportional to the marginal gain due to the local copy, i.e.,

 p(b)f(u)=βΔg(b)f(Xf(t),u), (7)

where the constant guarantees that is indeed a probability, i.e., it is between and and adimensional. At least one of the BSs without the content (i.e., those in ) decides to store an additional copy of with probability

 q(b)f(u)=qδΔg(b)f(Xf(t)⊕e(b),u), (8)

where plays the same role of above and is a dimensionless parameter in . We are going to prove that LRU- is asymptotically optimal when converges to . This result holds under different variants for the update rule at BSs without the content, For example, any number () of BSs in can randomly decide to retrieve or not the copy, and the probability could be simply equal to a constant value . We propose (8) because it is more likely to add copies that bring a large benefit . This choice likely improves convergence speed, and then the performance in non-stationary popularity environments. Moreover, as it will be clear from the discussion in the following section, our optimality result depends on being proportional to . Then it is possible to replace in (7

) with any other unbiased estimator of

. We are going to show an example when this is useful in Sect. V.

## Iv Optimality of qLru-Δ

A caching configuration is locally optimal if it provides the highest gain among all the caching configurations which can be obtained from by replacing one content in one of the caches.

###### Definition IV.1.

A caching configuration is called locally optimal if, for any or , it holds .

We are going to prove that LRU- achieves a locally optimal configuration when vanishes. The result relies on two approximations: the usual characteristic time approximation (CTA) for caching policies (also known as Che’s approximation) [fagin77, che02] and the new exponentialization approximation (EA) for networks of interacting caches originally proposed in [leonardi18jsac]. The main results of this paper is the following:

###### Proposition IV.1.

[loose statement] Under characteristic time and exponentialization approximations, a spatial network of LRU- caches asymptotically achieves a locally-optimal caching configuration when vanishes.

Before moving to the detailed proof, we provide some intuition about why this result holds. We observe that, as converges to , the cache exhibits two different dynamics with very different timescales: the insertion of new contents tends to happen more and more rarely ( converges to ), while the frequency of position updates for files already in the cache is unchanged ( does not depend on ). A file at cache is moved to the front with a probability proportional to , i.e., proportional to how much the file contributes to improve the performance metric of interest. This is a very noisy signal: upon a given request, the file is moved to the front or not. At the same time, as converges to , more and more moves-to-the-front occur between any two file evictions. The expected number of moves-to-the-front file experiences is proportional to 1) how often it is requested () and 2) how likely it is to be moved to the front upon a request (). Overall, the expected number of moves is proportional to

, i.e., its contribution to the expected gain. By the law of large numbers, the random number of moves-to-the-front will be close to its expected value and it becomes likely that the least valuable file in the cache occupies the last position. We can then think that, when a new file is inserted in the cache, it will replace the file that contributes the least to the expected gain.

LRU- then behaves as a greedy algorithm that, driven by the request process, upon insertions progressively replaces the least useful file from the cache, until it reaches a local maximum.

### Iv-a Characteristic Time Approximation

This is a now standard approximation for a cache in isolation, and one of the most effective approximate approaches for analysis of caching systems. CTA was first introduced (and analytically justified) in [fagin77] and later rediscovered in [che02]. It was originally proposed for LRU under the IRM request process, and it has been later extended to different caching policies and different requests processes [garetto16, garetto15]. The characteristic time is the time a given content spends in the cache since its insertion until its eviction in absence of any request for it. In general, this time depends in a complex way from the dynamics of other contents requests. Instead, the CTA assumes that is a random variable independent from other contents dynamics and with an assigned distribution (the same for every content). This assumption makes it possible to decouple the dynamics of the different contents: upon a miss for content , the content is retrieved and a timer with random value is generated. When the timer expires, the content is evicted from the cache. Cache policies differ in i) the distribution of and ii) what happens to the timer upon a hit. For example, is a constant under LRU, LRU, 2LRU and FIFO

RANDOM. Upon a hit, the timer is renewed under LRU, LRU and 2LRU, but not under FIFO or RANDOM. Under CTA, the instantaneous cache occupancy can violate the hard buffer constraint. The value of is obtained by imposing the expected occupancy to be equal to the buffer size. Despite its simplicity, CTA was shown to provide asymptotically exact predictions for a single LRU cache under IRM as the cache size grows large [fagin77, Jele99, fricker2012].

Once inserted in the cache, a given content will sojourn in the cache for a random amount of time , independently from the dynamics of other contents. can be characterized for the different policies. In particular, if the timer is renewed upon a hit, we have:

 T(b)S,f=M∑k=1Yk+T(b)c, (9)

where is the number of consecutive hits following a miss and is the time interval between the -th hit and the previous content request.

We want to compute the expected value of that we denote as . When the number of users is finite, requests for content from user  arrive according to a Poisson process with rate . The time instants at which content is moved to the front are generated by thinning this Poisson process with probability .222 Here we simply write instead of , because we are considering a single cache. Similary, we write , instead of . The resulting sequence is then also a Poisson process with rate . Finally, as request processes from different users are independent, the aggregate cache updates due to all users is a Poisson process with rate

 βU∑u=1λf,uE[Δg(b)f(u)]=βΔG(b)f.

The same result holds when we consider a density of requests over the region .

As the aggregate cache updates follow a Poisson process with rate , are i.i.d. truncated exponential random variables with rate over the interval and their expected value is

 E[Yk]=1βΔG(b)f−T(b)ceβΔG(b)fT(b)c−1.

Moreover, the probability that no update occurs during a time interval of length is . Then is distributed as a geometric random variable with values with expected value

 E[M]=1−e−βΔG(b)fT(b)ce−βΔG(b)fT(b)c=eβΔG(b)fT(b)c−1.

We can then apply Wald Lemma to (9) obtaining:

 ν(b)f ≜1E[T(b)S,f]=1E[Y1]E[M]+T(b)c =βΔG(b)feβΔG(b)fT(b)c−1. (10)

### Iv-B Exponentialization Approximation

We consider now the case when cells may overlap. The sojourn time of content inserted at time in cache  will now depend on the whole state vector for (until the content is not evicted), because the content is updated with probability (7) depending on the marginal gain of the copy (and then on ). EA consists to assume that the stochastic process

is a continuous-time Markov chain. For each

and the transition rate from state to is given by (IV-A) with replaced by . EA envisages to replace the original stochastic process, whose analysis appears prohibitive, with a (simpler) MC. [leonardi18jsac] shows that this has no impact on any system metric that depends only on the stationary distribution in the following cases:

1. isolated caches,

2. caches using RANDOM policy,

3. caches using FIFO policy as far as the resulting Markov Chain is reversible.

Numerical results in [leonardi18jsac] show that the approximation is practically very accurate also out of these specific cases.

### Iv-C Transition rates of the continuous time Markov Chain as q vanishes

For a given content , let and be two possible states of the MC. We write whenever for each and there is at least one such that , and we say that is an ancestor of , and is a descendant of . Furthermore we denote by the number of copies of content stored in state , and we call it the weight of the state . If and , we say that is a parent of and is a child of .

Now observe that by construction, transition rates in the MC are different from 0 only between pair of states and , such that or . The transition is called an upward transition, while is called a downward transition. A downward transition can only occur from a parent to a child (). Let be the index such that . We have that the downward rate is

 ρ[yf→xf]=ν(b0)f(yf)=βΔG(b0)feβΔG(b0)f(yf)T(b0)c−1. (11)

Upward transitions can occur to states that are ancestors. The exact transition rate between state and state with can have a quite complex expression, because it depends on the joint decisions of the BSs in . Luckily, for our analysis, we are only interested in how this rate depends on , when converges to . We use the symbol to indicate that two quantities are asymptotically proportional for small , i.e., if and only if there exists a strictly positive constant such that . If , then we write following Bachmann-Landau notation.

Upon a request for , a transition occurs, if BSs independently store, each with probability proportional to , an additional copy of the content in their local cache. It follows that:

 ρ[xf→yf]∝q|yf|−|xf|. (12)

Now, as for every every upward rate tends to 0. Therefore, the characteristic time of every cell must diverge. In fact, if it were not the case for a cache , none of the contents would be found in this cache asymptotically, because upward rates tend to zero, while downward rates would not. This would contradict the set of constraints:

 ∑f∑xfx(b)fπ(xf)=C,∀b (13)

imposed by the CTA, where is the MC stationary distribution. Therefore necessarily for every cell . More precisely, we must have at every cache, otherwise we fail to meet (13). There exist then positive constants and , such that asymptotically belongs to . We would like to characterize more precisely the growth rate of , i.e., to show that , but our attempts have been unfruitful. Nevertheless, we can prove that there exists a sequence converging to and a set of positive constants such that for all . In fact, consider any sequence converging to zero, the sequence of vectors belongs to the compact interval and then admits a converging subsequence because of Bolzano-Weierstrass theorem. The limit of such subsequence is the vector . From this result and (11), it follows that a downward transition from a parent to a child occurs with rate

 ρ[yf→xf]∝qΔG(b0)f(yf)/γb0n.

The following lemma summarises the results of this section.

###### Lemma IV.2.

Consider two neighbouring states and with . There exists a sequence converging to zero and a set of positive constants , such that

 ρ[xf→yf]∝q|yf|−|xf|n,

if , then

 ρ[yf→xf]∝qΔG(b0)f(yf)/γb0n.

### Iv-D Stochastically stable states

In this sub-section, we first introduce the key concept of stochastically stable state(s), in which, as converges to , the system gets trapped. Then, we provide a preliminary characterization of stochastically stable states (Lemma IV.4), which will be useful in subsection IV-F to prove that they correspond necessarily to locally optimal configurations.

Let us consider the uniformization of the continuous time MC with an arbitrarily high rate. We denote with the so obtained discrete time MC, whose transition probability matrix is . For

, the set of contents in the cache does not change, each state is an absorbing one and any probability distribution is a stationary probability distribution for

. We are rather interested in the asymptotic behaviour of the MC when converges to .333 For simplicity, we still refer to converging to , but the results hold for the vanishing sequence introduced above. For the MC is finite, irreducible,444 This is guaranteed if insertion probabilities in (8) are positive. In some specific settings, it may be for each . We can then consider with , or simply . and aperiodic and then admits a unique stationary probability .

###### Definition IV.2.

A state is called stochastically stable if .

We are going to characterize such states. For what we have said above, the probability to move from to an ancestor (if not zero) is . The probability to move from to the child without the copy in is . For each possible transition, we define its direct resistance to be the exponent of the parameter , then , and . Observe that the higher the resistance, the less likely the corresponding transition. We consider a weighted graph , whose nodes are the possible states and edges indicate possible direct transitions and have a weight equal to the corresponding resistance. Given a sequence of transitions from state to state (or equivalently a path in ), we define its resistance to be the sum of the resistances, i.e., .

We define the potential of state () as the resistance of the minimum weight in-tree (or anti-arborescence) in rooted to . Intuitively, the potential is a measure of the general difficulty to reach state from all other nodes.

The following lemma formalizes the intuition that the minimum weight in-tree will not include an edge, if it is possible to include the corresponding nodes as well as a third one at the same cost.

###### Lemma IV.3.

Consider a strongly connected weighted directed graph with weight for each edge . A minimum cost spanning tree rooted in a given node does not contain edges , such that there exists a node for which and belong to and .

###### Proof.

Let us denote by a minimum cost spanning tree rooted in . We prove the result by contradiction. Assume the edge with the property indicated above belongs to . Let us now consider the set of edges . Any node can reach node through the edges in . Note that cannot belong to , because any node can only have an outgoing edge and belongs to . If (i.e., if ), then is also an in-tree rooted in and its weight is strictly smaller than the weight of . If (i.e., if the path from to in does not use the edge , then is not an in-tree and the sum of the weights of the edge in and in is equal. It is then possible to remove an edge from to obtain an in-tree with strictly smaller weight. ∎

We can now characterize the set of stochastically stable sets:

###### Lemma IV.4.

A state is stochastically stable if and only if its potential is minimal.

###### Proof.

The family of Markov chains is a regular perturbation [young93, properties (6-8)] and then the result follows from Theorem 4 of [young93], which provides an analogous characterization. Our analysis simplifies the one in [young93]. In fact, the analysis in [young93] focuses on the complete graph , whose vertices are the recurrent communication classes of and the weight of the edge between class and class () is the minimum resistance of all the possible paths between any state and . The potential of class is then the minimum weight in-tree in . In our case, each communication class includes a single state, therefore and have the same set of nodes. has more edges (it is a complete graph), but one can show that minimum weight in-trees in include only edges that correspond to direct transitions between the two associated state, i.e., edges that also appear in (Lemma IV.3). ∎

For each content we are then able to characterize which configurations are stochastically stable as converges to . Moreover, this set of configurations must satisfy the constraint (13) at each base station . We conclude this section introducing the concept of jointly stochastically stable cache configurations.

###### Definition IV.3.

A cache configuration is jointly stochastically stable if 1) for each content is stochastically stable, 2) satisfies (13) for each .

### Iv-E Dominant transitions

In the proof or Proposition IV.1 the concept of dominant transitions will be used.

###### Definition IV.4.

Given two neighboring states and , we say that the transition is dominant if .

Let be a parent of with and , we observe that the upward transition is dominant if and only if . Similarly the downward transition is dominant if and only if . Note that both transitions and are dominant if .

Let us also consider the function of state

 ϕ(xf)≜Gf(xf)−∑b|x(b)f=1γb. (14)

The following lemma guarantees that the function decreases along a non-dominant transition.

###### Lemma IV.5.

The transition is dominant if and only if .

###### Proof.

Let be the index at which and differ. By definition, the upward transition is dominant if and only if . This inequality is equivalent to:

 ϕ( yf)=Gf(yf)−∑b|y(b)f=1γb =Gf(xf)−∑b|x(b)f=1γb+ΔG(b′)f(yf)−γb′ ≥ϕ(xf).

The proof when is a downward dominant transition is similar. ∎

### Iv-F Optimality proof

Now we can state formally our result.

###### Proposition iv.1.

Under characteristic time and exponentialization approximations, let and be the sequence and the constants in Lemma IV.2. Consider the spatial network of LRU- caches, where cache selects the parameter . As diverges, a jointly stochastically stable cache configuration is also locally-optimal.

###### Proof.

Given a cache configuration , let denote the global gain across all contents, i.e., .

A jointly stochastically stable cache configuration is locally optimal, if and only if changing one content at a given cache does not increase the global gain . Without loss of generality, we consider to replace content present at cache with content . Then, the cache allocation changes from and to a new one cache allocation , such that and . From (5), we obtain that

 G(x)≥G(x′) ⇔Gf1(xf1)+Gf2(xf2) ≥Gf1(x′f1)+Gf2(x′f2) ⇔ΔG(B)f1(xf1)≥ΔG(B)f2(x′f2). (15)

In order to prove , we will show that

 ΔG(B)f1(xf1) ≥γB (16) ΔG(B)f2(x′f2) ≤γB. (17)

We start observing that we can ignore all upward transitions with resistance larger than , i.e., all transitions to ancestors that are not also parents. In fact, by applying multiple times Lemma IV.3, we can conclude that none of them is used in the minimum weight in-tree that defines the potential. Note how the flexibility in defining the LRU- update rule upon misses comes from the fact that 1) resistances only depend on the exponent of (and then additional positive factors like those appearing in (8) do not play a role), 2) upward edges with resistance larger than do not contribute to the potential (and then it does not matter how many caches retrieve a copy of the content as far as at least one of them does with some probability that scales as ).

The state is a child of , then . Consider the in-tree rooted in with minimal resistance and let denote its resistance and be the sequence of transitions in from to . One of these transitions, say it corresponds to store the content in the cache and has resistance . Consider now the in-tree rooted in obtained from removing the edge and adding the edge . Its resistance is . From it follows (16). A sketch of this construction is in Fig. (a)a.

The proof of (17) is slightly more complex. First we prove it under the assumption that for every cell , then we consider the general case. We prove (17) by contradiction. Let us assume that . In such case is not a dominant (downward) transition, and .

Let now denote the in-tree rooted in with minimal resistance and be the sequence of transitions in from to . At each transition only one state variable changes, we denote by the corresponding index, representing the base station at/from which a copy of content is added/removed. By construction we have:

 ∑1

If transition is upward:

 ϕ(xlf2) −ϕ(xl−1f2)=ΔG(bl)f2(xlf2)−γbl =γbl[r(xlf2,xl−1f2)−r(xl−1f2,x