Implicit Coordination of Caches in Small Cell Networks under Unknown Popularity Profiles

04/05/2018 ∙ by Emilio Leonardi, et al. ∙ Inria Politecnico di Torino 0

We focus on a dense cellular network, in which a limited-size cache is available at every Base Station (BS). In order to optimize the overall performance of the system in such scenario, where a significant fraction of the users is covered by several BSs, a tight coordination among nearby caches is needed. To this end, this pape introduces a class of simple and fully distributed caching policies, which require neither direct communication among BSs, nor a priori knowledge of content popularity. Furthermore, we propose a novel approximate analytical methodology to assess the performance of interacting caches under such policies. Our approach builds upon the well known characteristic time approximation and provides predictions that are surprisingly accurate (hardly distinguishable from the simulations) in most of the scenarios. Both synthetic and trace-driven results show that the our caching policies achieve excellent performance (in some cases provably optimal). They outperform state-of-the-art dynamic policies for interacting caches, and, in some cases, also the greedy content placement, which is known to be the best performing polynomial algorithm under static and perfectly-known content popularity profiles.



There are no comments yet.


page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

In the last years, with the advent and the proliferation of mobile devices (smart-phones, tablets), along with a constant increase of the overall traffic flowing over Internet, we have assisted to a radical shift of the traffic at the edge, from the wired/fixed segment of the network to the wireless/mobile segment. This trend is expected to continue and intensify in the next few years. According to CISCO forecasts [2] in the 5 years ranging from 2016 to 2021 traffic demand on the cellular network will approximately increase by a factor 8. Such traffic increase may pose a tremendous stress on the wireless infrastructure and can be satisfied only by densifying the cellular network and redesigning its infrastructure. To this end, the integration of caches at the edge of the cellular network can be effective to reduce the load on the back-haul links. Caches, indeed, by moving contents closer to the user, can effectively contribute to “localize” the traffic, and to achieve: i) the reduction of the load on the core network and back-haul links; ii) the reduction of the latency perceived by the user.

In this paper we focus our attention on a dense cellular network, where caches are placed at every Base Station (BS) and a significant fraction of the users can be served (is “covered”) by two or more BSs (whose cells are said to “overlap”). In this context, an important open question is how to effectively coordinate different edge-caches, so to optimize the global performance (typically the hit ratio, i.e. the fraction of users’ requests that are satisfied by local caches). Given the possibility of partial cell overlap, the cache coordination scheme should, indeed, reach an optimal trade-off between two somewhat conflicting targets: i) make top-popular contents available everywhere, so to maximize the population of users who can retrieve them from local caches, ii) diversify the contents stored at overlapping cells, so to maximize the number of contents available to users in the overlap. Optimal solutions can be easily figured out for the two extreme cases: when cells do not overlap, every cache should be filled with the most popular contents; when cells overlap completely, every cache should be filled with a different set of contents. In the most general case, however, finding the optimal content allocation strategy requires the solution of an NP-hard problem [3].

In this paper, we propose a class of fully distributed schemes to coordinate caches in cellular systems with partially overlapping cells, so to maximize the overall hit ratio. Our policies are very simple, and, differently from most of the previous work, do not require any a priori knowledge of content popularity. In addition, we propose a novel analytical approach to accurately evaluate the performance of such caching systems with limited computational complexity.

I-a Related work

Due to the space constraints, we limit our description to the work that specifically addresses the caching problem in dense cellular networks.

To the best of our knowledge, the idea to coordinate content placement at caches located at close-by BSs was first proposed in [4] and its extension [3]

under the name of femto-caching. This work assumes that requests follow the Independent Reference Model (IRM) and geographical popularity profiles are available, i.e. content requests are independent and request rates are known for all the cell areas and their intersections. The optimal content placement to maximize the hit ratio has been formulated in terms of an NP-hard combinatorial problem. A greedy heuristic algorithm was then proposed and its performance analyzed. In particular the algorithm is shown to guarantee a

-approximation of the maximum hit ratio. In [5], the authors have generalized the approach of [4, 3], providing a formulation for the joint content-placement and user-association problem that maximizes the hit ratio. Efficient heuristic solutions have also been proposed. Authors of [6] have included the bandwidth costs in the formulation, and have proposed an on-line algorithm for the solution of the resulting problem. [7] considers the case when small cells can coordinate not just in terms of what to cache but also to perform Joint Transmission. In [8], instead, the authors have designed a distributed algorithm based on Gibbs sampling, which is shown to asymptotically converge to the optimal allocation. [9] revisits the optimal content placement problem within a stochastic geometry framework. Under the assumption that both base stations and users are points of two homogeneous spatial Poisson processes, it derives an elegant analytical characterization of the optimal policy and its performance. More recently, in [10] the authors have developed a few asynchronous distributed cooperative content placement algorithms with polynomial complexity and limited communication overhead (communication takes place only between overlapping cells), whose performance has been shown to be very good in most of the tested scenarios.

We would like to emphasize that all the previously proposed schemes, differently from ours, rely on the original assumption in [4] that geographical content popularity profiles are known by the system. Therefore we will refer to these policies as “informed” ones.

Reliable popularity estimates over small geographical areas may be very hard to obtain 

[11], because i) most of the contents are highly ephemeral, ii) few users are located in a small cell, and iii) users keep moving from one cell to another. On the contrary, policies like LRU and its variants (qLRU, 2LRU, …) do not rely on popularity estimation—we call them “uninformed”—and are known to well behave under time-varying popularities. For this reason they are a de-facto standard in most of the deployed caching systems. [12] proposes a generalization of LRU to a dense cellular scenario. As above, a user at the intersection of multiple cells, can check the availability of the content at every covering cell and then download from one of it. The difference with respect to standard LRU is how cache states are updated. In particular, the authors of [12] consider two schemes: LRU-One and LRU-All. In LRU-One each user is assigned to a reference cell/cache and only the state of her reference cache is updated upon a hit or a miss, independently from which cache the content has been retrieved from.111 The verbal description of the policy in [12] is a bit ambiguous, but the model equation shows that the state of a cache is updated by and only by the requests originated in the corresponding Voronoi cell, i.e. from the users closest to the cache. In LRU-All the state of all the caches covering the user is updated. These policies do not require communication among caches. Moreover, their analysis is relatively easy because each cache can be studied as an isolated one. Unfortunately, these policies typically perform significantly worse than informed schemes (see for example the experiments in [10]).

I-B Paper Contribution

This paper has four main contributions.

First, we propose in Sec. III a novel approximate analytical approach to study systems of interacting caches, under different caching policies. Our framework builds upon the well known characteristic time approximation [1] for individual caches, and, in most of the scenarios, provides predictions that are surprisingly accurate (practically indistinguishable from the simulations, as shown in Sec IV).

Second, we propose a class of simple and fully distributed “uniformed” schemes that effectively coordinate different caches in a dense cellular scenario in order to maximize the overall hit ratio of the caching system. Our schemes represent an enhancement of those proposed in [12]. As LRU-One and LRU-All, our policies neither rely on popularity estimation, nor require communication exchange among the BSs. The schemes achieve implicit coordination among the caches through specific cache update rules, which are driven by the users’ requests. Differently from [12], our update rules tend to couple the states of different caches. Despite the additional complexity, we show that accurate analytical evaluation of the system is still possible through our approximate analytical approach.

Third, we rely on our analytical model to show that, under IRM, our policies can significantly outperform other uninformed policies like those in [12]. Moreover, the hit ratios are very close to those offered by the greedy scheme proposed in [4] under perfect knowledge of popularity profiles. More precisely, we can prove that, under some geometrical assumptions, a variant of qLRU asymptotically converges to the optimal static content allocation, while in more general scenarios, the same version of qLRU asymptotically achieves a locally optimal configuration.

Finally, in Sec. VI, we carry on simulations using a request trace from a major CDN provider and BS locations in Berlin, Germany. The simulations confirm qualitatively the model’s results. Moreover, under the real request trace, our dynamic policies can sometimes outperform the greedy static allocation that knows in advance the future request rate. This happens even more often in the realistic case when future popularity needs to be estimated from past statistics. Overall, our results suggest that it is better to rely on uninformed caching schemes with smart update rules than on informed ones fed by estimated popularity profiles, partially contradicting some of the conclusions of [13].

Ii Network Operation

We consider a set of base stations (BSs) arbitrarily located in a given region , each equipped with a local cache. Our system operates as follows. When user has a request for content , it broadcasts an inquiry message to the set of BSs () it can communicate with. The subset () of those BSs that have the content stored locally declare its availability to user . If any local copy is available (), the user sends an explicit request to download it to one of the BSs in . Otherwise, the user sends the request to one of the BSs in , which will need to retrieve it from the content provider.222 This two-step procedure introduces some additional delay, but this is inevitable in any femtocaching scheme where the BSs need to coordinate to serve the content. Different user criteria can be defined to select the BS to download from; for the sake of simplicity, in this paper, we assume that the user selects uniformly at random one of them. However, the analysis developed in the next section extends naturally under general selection criteria. The selected BS serves the request. Furthermore, an opportunely defined subset of BSs in updates its cache according to a local caching policy, like LRU, qLRU333 In the case of qLRU

, the cache will move the content requested to the front of the queue upon a hit and will store the content at the front of the cache with probability

upon a miss. LRU is a qLRU policy with . or 2LRU,4442LRU uses internally two LRU caches: one for the metadata, and the other for the actual contents (for this reason we say that 2LRU is a two-stage cache). Upon a miss, a content is stored in the second cache only if its metadata are already present in the first cache. See [14] for a more detailed description. etc. The most natural update rule is that only the cache serving the content updates its state independently from the identity of the user generating the specific request. We call this update rule blind. At the same time, it is possible to decouple content retrieval from cache state update as proposed in [12]. For example each user may be statically associated to a given BS, whose state is updated upon every request from independently from which BS has served the content. We refer to this update rule as one, because of the name of the corresponding policy proposed in [12] (LRU-One). Similarly, we indicate as all the update rule where all the BSs in update their state upon a request from user (as in LRU-All). These update rules can be freely combined with existing usual single cache policies, like qLRU, LRU, 2LRU, etc., leading then to schemes like LRU-One, qLRU-Blind, 2LRU-All, etc., with obvious interpretation of the names.

The analytical framework presented in the next section allows us to study a larger set of update rules, where the update can in general depend on the identity of the user as well as on the current set of caches from which can retrieve the content. When coupled with local caching policies, these update rules do not require any explicit information exchange among the caches, but they can be implemented by simply piggybacking the required information () to user ’s request. In particular, in what follows, we will consider the lazy update rule, according to which

  1. only the cache serving the content may update its state,

  2. but it does only if no other cache could have served the content to the user (i.e. only if ).

This rule requires only an additional bit to be transmitted from the user to the cache. We are going to show in Sec. V that such bit is sufficient to achieve a high level of coordination among different caches and, therefore, a high hit ratio. Because no communication among BSs is required, we talk about implicit coordination

. For the moment, the reader should not be worried if he/she finds the rationale behind

lazy obscure and can regard lazy as a specific update rule among many others possible. Table I summarises the main notation used in this paper.

Iii Model

Symbol Explanation
generic user
generic content
generic cell
number of files
number of base stations
cache size
set of BSs communicating with
set of BSs able to provide to
surface of cell b
expected number of users in region
content request rate from region
configuration of content in caches
component of :
configuration of content in all caches but
characteristic time at cache
sojourn time of content in cache
transition rate for given
transition rate for given
distance between point and region
TABLE I: Summary of the main notation

We assume that mobile users are spread over the region according to a Poisson point process with density , so that denotes the expected number of users in a given area . Users generate independent requests for possible contents. In particular, a given user requests content according to a Poisson process with rate . It follows that the aggregate request process for content from all the users located in is also a Poisson process with rate . For the sake of presentation, in what follows we will consider that users’ density is constant over the region, so that is simply proportional to the surface of , but our results can be easily generalized. Our analysis can also be extended to a more complex content request model that takes into account temporal locality [15] as we discuss in Sec. III-F. Contents (i.e. caching units) are assumed to have the same size. This assumption can be justified in light of the fact that often contents correspond to chunks in which larger files are broken. In any case, it is possible to extend the model, and most of the analytical results below, to the case of heterogeneous size contents.555 Similar results for qLRU-Lazy in Sec. V-B hold if we let the parameter be inversely proportional to the content size as done in [16]. For the sake of simplicity, we assume that each cache is able to store contents. Finally, let denote the coverage area of BS .

In what follows, we first present some key observations for an isolated cache, and then we extend our investigation to interacting caches when cells overlap. We will first consider the more natural blind update rule, according to which any request served by a BS triggers a corresponding cache state update. We will then discuss how to extend the model to other update rules in Sec. III-D.

Iii-a A single cell in isolation

We start considering a single BS, say it , with cell size . The request rate per content is then . We omit in what follows the dependence on .

Our analysis relies on the now standard cache characteristic time approximation (CTA) for a cache in isolation, which is known to be one of the most effective approximate approaches for analysis of caching systems.666Unfortunately, the computational cost to exactly analyse even a single LRU (Least Recently Used) cache, grows exponentially with both the cache size and the number of contents [17]. CTA was first introduced (and analytically justified) in [18] and later rediscovered in [1]. It was originally proposed for LRU under the IRM request process, and it has been later extended to different caching policies and different requests processes [14, 19]. The characteristic time is the time a given content spends in the cache since its insertion until its eviction in absence of any request for it. In general, this time depends in a complex way from the dynamics of other contents requests. Instead, the CTA assumes that

is a random variable independent from other contents dynamics and with an assigned distribution (the same for every content). This assumption makes it possible to decouple the dynamics of the different contents: upon a miss for content

, the content is retrieved and a timer with random value is generated. When the timer expires, the content is evicted from the cache. Cache policies differ for i) the distribution of and ii) what happens to the timer upon a hit. For example, is a constant under LRU, qLRU, 2LRU and FIFO

and exponentially distributed under

RANDOM. Upon a hit, the timer is renewed under LRU, qLRU and 2LRU, but not under FIFO or RANDOM. Despite its simplicity, CTA was shown to provide asymptotically exact predictions for a single LRU cache under IRM as the cache size grows large [18, 20, 21].

What is important for our purposes is that, once inserted in the cache, a given content will sojourn in the cache for a random amount of time , that can be characterized for the different policies. In particular, if the timer is not renewed upon a hit (as for FIFO and RANDOM), it holds:

while if the timer is renewed, it holds:

where is the number of consecutive hits preceding a miss and is the time interval between the -th hit and the previous content request. For example, in the case of LRU and qLRU, is distributed as a geometric random variable with parameter , and are i.i.d. truncated exponential random variables over the interval .

We denote by the expected value of , that is a function of the request arrival rate .


For example it holds:


  • LRU, qLRU:          .

where the last expression can be obtained through standard renewal arguments (see [22]).

Let be the process indicating whether content is in the cache at time . For single-stage caching policies, such as FIFO, RANDOM, LRU and qLRU, is an ON/OFF renewal process with ON period distributed as and OFF period distributed exponentially with mean value , where for FIFO and RANDOM, and for qLRU. The process can also be considered as the busy server indicator of an queue with service time distributed as .777 Under CTA a cache with capacity becomes indeed equivalent to a set of parallel independent queues, one for each content. This observation is important because the stationary distribution of queues depends on the service time only through its mean [23]. As a consequence, for any metric depending only on the stationary distribution, an queue is equivalent to an queue with service time exponentially distributed with the same service rate . In particular, the stationary occupancy probability is simply . Under CTA, the characteristic time can then be obtained by imposing that


for example using the bisection method.

The possibility of representing a cache in isolation with an

queue, i.e., as a simple continuous time Markov chain, does not provide particular advantages in this simple scenario, but it allows us to accurately study the more complex case when cells overlap, and users’ request may be served by multiple caches.

Fig. 1: Two overlapping cells each with unit surface. The area of the overlapping area is .

Iii-B Overlapping cells

We consider now the case when cells may overlap. Let indicate whether the BS stores at time a copy of content and

be the vector showing where the content is placed within the network. In this case the request rate seen by any BS, say it BS

, depends on the availability of the content at the neighbouring BSs, i.e. . For example, with reference to the Fig. 1, if , BS  experiences a request rate for content equal to i) if it is the only BS to store the content, ii) if both BSs store the content or none of them does, iii) if only BS stores the content.

Our analysis of this system is based on the following approximation:888 For any vector , we denote by the subvector of including all the components but the -th one and we can write as .

  • The stochastic process is a continuous-time Markov chain. For each and the transition rate from state to is given by (1) with replaced by .

Before discussing the quality of approximation A, let us first describe how it allows us to study the cache system. For a given initial guess of the characteristic times at all the caches, we determine the stationary distribution of the Markov Chains (MCs) . We then compute the expected buffer occupancy at each cache and check if the set of constraints (2) is satisfied. We then iteratively modify the vector of characteristic times by reducing (/increasing) the value for those caches where the expected buffer occupancy is above (/below) . Once the iterative procedure on vector of characteristic times has reached convergence, we compute the hit ratios for each content at each cache.

A envisages to replace the original stochastic process, whose analysis appears prohibitive, with a (simpler) MC. This has no impact on any system metric that depends only on the stationary distribution in the following cases:

  1. isolated caches (as we have shown in Sec. III-A),

  2. caches using RANDOM policy, because the corresponding sojourn times coincide with the characteristic times and are exponentially distributed, hence A is not an approximation,

  3. caches using FIFO policy under the additional condition in Proposition III.1 below.

In all these cases CTA is the only approximation having an impact on the accuracy of model results. In the most general case, however, A introduces an additional approximation. However our numerical evaluation shows that our approach provides very accurate results in all the scenarios we tested.

We end this section by detailing the insensitivity result for a system of FIFO caches.

Proposition III.1.

For FIFO, the probability of being in state is insensitive to the distribution of the sojourn times as far as the Markov chain in approximation A1 is reversible.

The proof of proposition III.1 is in Appendix A and relies on some insensitivity results for Generalized Semi Markov Processes. The reversibility hypothesis is for example satisfied for the cell trefoil topology considered in Sec. IV when users’ density is constant.

Iii-C Model complexity

Note that, in general, the number of states of the Markov Chain describing the dynamics of grows exponentially with the number of cells (actually, it is equal to ), therefore modeling scenarios with a large number of cells becomes challenging and requires the adoption of efficient approximate techniques for the solution of probabilistic graphical methods [24]. However, scenarios with up to 10-12 cells can be efficiently modeled. Furthermore, when the geometry exhibits some symmetry, some state aggregation becomes possible. For example, in the cell trefoil topology presented in the next section, the evolution of can be represented by a reversible birth-and-death Markov Chain with states ().

Iii-D Different Update rules

In presenting the model above, we have referred to the simple blind update rule. Our modeling framework, however, can easily accommodate other update rules. For example for one, if the reference BS is the closest one, we should set , where denotes the distance between the point and the set . On the contrary, for all, any request that could be served by the base station is taken into account, i.e. . Finally, for lazy we have:


i.e. only requests coming from areas that cannot be served from any other cache, affect the cache state. For example, with reference to Fig. 1, assuming , the request rate that contributes to update cache status is when content is stored also at cache .

As we are going to discuss in Sec. V, the update rules have a significant impact on the performance and in particular the lazy policies often outperform the others. Because our analysis will rely on the model described in this section, we first present in Sec. IV some validation results to convince the reader of its accuracy.

Iii-E Extension to multistage caching policies: kLRU

The previous model can be extended to 2LRU (and kLRU) by following the approach proposed in [14]. In particular dynamics of the two stages can be represented by two separate continuous time MCs whose states and correspond to the configuration of content  at time in the system of virtual caches and physical caches, respectively. The dynamics of the system of virtual caches at the first stage are not impacted by the presence of the second stage and perfectly emulate the dynamics of LRU caches; therefore we model them by using the same MC as for LRU. On the contrary, dynamics at the second stage depend on the first stage state. In particular content is inserted in the physical cache at the second stage upon a miss, only if the incoming request finds the content metadata within the first stage cache. Therefore the transition rate from state to is given by , where represents the probability that content metadata is stored at the first stage. Along the same lines the model can be easily extended to kLRU for .

Iii-F How to account for temporal locality

Following the approach proposed in [15, 19], we model the request process of every content  as a Markov Modulated Poisson Process (MMPP), whose modulating MC is a simple ON-OFF MC. Now focusing, first, on a single cell scenario, we denote by the aggregate arrival rate of content at BS during ON periods. The arrival rate of content is, instead, null during OFF periods. Let and denote the average sojourn times in state ON and OFF, respectively.999Sojourn times in both states are exponentially distributed. The idea behind this model is that each content has a finite lifetime with mean and after a random time with mean , a new content with the same popularity arrives in the system. For convenience this new content is denoted by the same label (see [15, 19] for a deeper discussion). We can model the dynamics of content in the cache as an MMPP/M/1/0 queue with state-dependent service rate. In particular service rates upon ON () are computed according to (1). Service rates on state OFF are simply set to , as result of the application of (1) when the arrival rate of content- requests tends to 0.

The extension to the case of multiple overlapping cells can be carried out along the same lines of Section III-B, (i.e. by applying approximation A). As in [19], the ON-OFF processes governing content- request rate at different cells are assumed to be perfectly synchronized (i.e., a unique underlying ON-OFF Markov Chain determines content- request rate at every cell). The resulting stochastic process is a continuous-time Markov Chain with states.

Iv Model validation

Fig. 2: (a): a trefoil.                  (b): a two-by-two cell torus.
(a) One
(b) Blind
(c) Lazy
Fig. 3: Comparison between model predictions and simulations; trefoil topology with cells; ; IRM traffic model with ; qLRU employs .

In this section we validate our model by comparing its prediction against simulation results for two different topologies. Our trace-driven simulator developed in Python reproduces the exact dynamics of the caching system, and therefore can be used to test the impact of model assumptions (CTA and A1) on the accuracy of results in different traffic scenarios. We start introducing a topology which exhibits a complete cell symmetry (i.e. the hit rate of any allocation is invariant under cell-label permutations). In such a case, turns out to be a reversible Markov Chain. Fig. 2 (a) shows an example for . Generalizations for can be defined in a dimensional Euclidean-space by considering hyperspheres centered at the vertices of a regular simplex, but also on the plane if users’ density is not homogeneous. We refer to this topology as the trefoil. Then we consider a torus topology in which the base stations are disposed according to a regular grid on a torus as in Fig. 2 (b). For simplicity, in what follows, we assume that all the cells have the same size and a circular shape.

Users are uniformly distributed over the plane and they request contents from a catalogue of

contents whose popularity is distributed according to a Zipf’s law with exponent . Each BS can store up to contents. We have also performed experiments with and , but the conclusions are the same, so we omit them due to space constraints.

In Fig. 3 we show the global hit ratio for different values of cell overlap in a trefoil topology with cells. The overlap is expressed in terms of the expected number of caches a random user could download the content from. The subfigure (a) shows the corresponding curves for FIFO-One and qLRU-One with and with , which coincides with LRU-One. The other subfigures are relative to the update rules blind and lazy.101010 We do not show results for RANDOM or the update rule all. RANDOM is practically indistinguishable from FIFO and all was shown to have worse performance than one for IRM traffic already in [12]. FIFO-Blind and FIFO-Lazy coincide because in any case FIFO does not update the cache status upon a hit. The curves show an almost perfect matching between the results of the model described in Sec. III and those of simulation. Figure 4 confirms the accuracy of the model also for the torus topology with 9 cells. Every model point has requested less than 3 seconds of CPU-time on a INTEL Pentium G3420 @3.2Ghz for the cell trefoil topology, and less than minutes for the torus.

(a) One
(b) Blind
(c) Lazy
Fig. 4: Comparison between model predictions and simulations; torus topology with 9 cells; ; IRM traffic model with ; qLRU employs .
Fig. 5: Trefoil: results for the lazy update rule and an ON-OFF request process with and . The cell request rate for the most popular content is .

Finally, Fig. 5 shows that the model is also accurate when the request process differs from IRM. The curves have been obtained for a trefoil topology under the lazy update rule and the ON-OFF traffic model described and studied in Sec. III-F. In the figure we also show some results for 2LRU-Lazy. As qLRU, upon a miss, 2LRU prefilters the contents to be stored in the cache. qLRU does it probabilistically, while 2LRU exploits another LRU cache for the metadata. Under IRM, their performance are qualitatively similar, but 2LRU is known to be more reactive and then better performing when the request process exhibits significant temporal locality [14]. Our results in Fig. 5 confirm this finding. In particular, as decreases, the performance of qLRU first improves (compare with ) because qLRU’s probabilistic admission rule filters the unpopular content, and then worsens (see the curve for ) when qLRU dynamics’ timescale becomes comparable to .

V The Lazy Update Rule

Even if the focus of the previous section has mainly been on the validation of our model, the reader may have observed by looking at Figures 3 and 4 that the update rule lazy performs significantly better than one and (to a lesser extent) blind, especially when the cellular network is particularly dense. This improvement comes at the cost of a minimal communication overhead: an additional bit to be piggybacked into every user’s request to indicate whether the content is available at some of the other cells covering the user. In this section we use our model to further investigate the performance of the lazy update rule.

Fig. 6: Performance of qLRU coupled with different update rules in a trefoil network (model results).

First, we present in Fig. 6 some results for qLRU coupled with the different update rules. The curves show the hit ratio versus the parameter achieved by the different policies. The topology is a trefoil with 10 cells. Results are reported for two values of cell overlap, corresponding to the cases where a user is covered on average by and BSs. As a reference, also the optimal achievable hit ratio is shown by the two horizontal green lines (in this particular scenario, the optimal allocation can be obtained by applying the greedy algorithm described below in Sec. V-A). qLRU-Lazy significantly outperforms qLRU-One for small values of , with relative gain that can be as high as % for the -coverage and % for the -coverage. The improvement with respect to blind is smaller, but what is remarkable is that qLRU-Lazy appears to be able to asymptotically approach the performance of the optimal allocation. In the following we will prove that i) this is indeed the case for the trefoil topology and ii) qLRU-Lazy achieves a locally optimal allocation in a general scenario. For a single cache, it has already been proven that qLRU asymptotically maximizes the hit ratio when converges to (see [14] for the case of uniform content size contents and [25] for the case of heterogeneous size), but, to the best of our knowledge, no optimality results are available for a multi-cache scenario as the one we are considering. Before proving optimality, we discuss what is the optimal allocation and we provide some intuitive explanation about lazy good performance.

V-a Optimal content allocation and a new point of view on lazy

If content popularities are known and stationary, one can allocate, once and for all, contents to caches in order to maximize the global hit ratio. Formally, the following integer maximization problem can be defined:

maximize (4)
subject to

Carrying on an analysis similar to that in [3], it is possible to show that this problem i) is NP-hard (e.g. through a reduction to the 2-Disjoint Set Cover Problem), ii) can be formulated as the maximization of a monotone sub-modular set function with matroid constraints. It follows that the associated greedy algorithm provides a -approximation for problem (4).

Let us consider how the greedy algorithm operates. Let describe the allocation at the -th step of the greedy algorithm, i.e. the matrix element indicates if at step the algorithm places content at cache . At step , the greedy algorithm computes for each content and each cache the marginal improvement for the global hit ratio to store a copy of at cache , given the current allocation , that is


The pair leading to the largest hit ratio increase is then selected and the allocation is updated by setting . The procedure is iterated until all the caches are full.

We observe that (5) is exactly the request rate that drives the dynamics of qLRU-Lazy in state , as indicated in (3). Upon a miss for content , qLRU-Lazy inserts it with a probability that is proportional to the marginal increase of the global hit ratio provided by adding the additional copy of content . This introduces a stochastic drift toward local maxima of the hit ratio. As we said above, when vanishes, it is known that an isolated qLRU cache tends to store deterministically the top popular contents, then one can expect each qLRU-Lazy cache to store the contents with the largest marginal request rate given the current allocation at the other caches. Therefore, it seems licit to conjecture that a system of qLRU-Lazy caches asymptotically converges at least to a local maximum for the hit ratio (the objective function in (4)). Section V-C shows that this is indeed the case. Before moving to that result, we show that in particular qLRU-Lazy achieves the maximum hit ratio in a trefoil topology.

V-B In a trefoil topology qLRU-Lazy achieves the global maximum hit ratio

Now, we formalize the previous arguments, showing that as tends to , qLRU-Lazy content allocation converges to an optimal configuration in which the set of contents maximizing the global hit ratio is stored at the caches. This result holds for the trefoil topology under our model.

We recall that that the trefoil topology exhibits a complete cell symmetry and that the hit ratio of any allocation is invariant under cell label permutations. A consequence is that the hit ratio depends only on the number of copies of each file that are stored in the network, while it does not depend on where they are stored as far as we avoid to place multiple copies of the same file in the same cache, that is obviously unhelpful. It is possible then to describe a possible solution simply as an -dimensional vector , where denotes the number of copies of content . Under qLRU-Lazy we denote by , the stationary probability that the system is in a state with allocation .

The optimality result follows from combining the two following propositions (whose complete proofs are in Appendix B):

Proposition V.1.

In a trefoil topology, an allocation of the greedy algorithm for Problem (4) is optimal.

The proof relies on mapping problem (4) to a knapsack problem with objects with unit size for which the greedy algorithm is optimal.

We observe that for generic values of the parameters, all the marginal improvements considered by the greedy algorithm are different and then the greedy algorithm admits a unique possible output (apart from BSs label permutations).

Proposition V.2.

Consider a trefoil topology and assume there is unique possible output of the greedy algorithm, denoted as . Then, under the approximate model in Sec III, a system of qLRU-Lazy caches asymptotically converges to when vanishes in the sense that

In order to prove this result, we write down the explicit stationary probability for the system, taking advantage of the fact that the MC is reversible, and we study its limit. In conclusion the greedy algorithm and qLRU-Lazy are equivalent in the case of trefoil topology.

V-C qLRU-Lazy achieves a local maximum hit ratio

We say that a caching configuration is locally optimal if it provides the highest aggregate hit rate among all the caching configurations which can be obtained from by replacing one content in one of the caches.

Proposition V.3.

A spatial network of qLRU-Lazy caches asymptotically achieves a locally-optimal caching configuration when vanishes.111111In the most general case, the adoption of different parameters is required at different cells for the implementation of the qLRU policy.

The proof is in Appendix B-C. In this general case the difficulty of proving the assertion stems from the fact that the MC representing content dynamics is not anymore reversible, and it is then difficult to derive an analytical expression for its steady state distribution. Instead, our proof relies on results for regular perturbations of Markov chains [26].

The analytical results in this section justify why for small, but strictly positive, values of , qLRU-Lazy performs better than qLRU-Blind and qLRU-One. More in general, what seems fundamental to approach the maximum hit ratio is the coupling of the lazy update rule, that reacts to the “right marginal benefit” for problem (4), with a caching policy that is effective to store the most popular contents. qLRU is one of them, 2LRU is another option. Moreover, 2LRU has been shown to react faster to popularity changes. For this reason, in the next section we also include results for 2LRU-Lazy. At last we wish to remark that the (static) cache configuration selected by greedy algorithm is in general not locally optimal, as a consequence of the greedy nature of the algorithm and the fact that marginal gains at a cell change during the execution of the algorithm (since they depend on the configuration of neighbouring cells).

Vi Performance in a Realistic Deployment

In this section we evaluate the performance of the lazy update rule in a more realistic scenario. To this purpose, we have extracted the positions of T-Mobile BSs in Berlin from the dataset in [27] and we use a real content request trace from Akamai Content Delivery Network [16]. The actual identity of the users and of the requested objects was obfuscated. The BS locations are indicated in Fig. 7. We refer to this topology simply as the Berlin topology. The trace includes 400 million requests issued over 5 days from users in the same geographical zone for a total of 13 million unique contents. In our simulations we randomly assign the requests to the users who are uniformly spread over the area.

Time span days
Number of requests received
Number of distinct objects
TABLE II: Trace: basic information
Fig. 7: T-Mobile BS configuration in Berlin.

Figure 8 compares the performance of different caching policies in this scenario, when the transmission range of the BSs varies from 25 to 250 meters and correspondingly a user is covered on average by 1.1 up to 9.4 BSs. We observe that the lazy update rule still outperforms one and blind when coupled with qLRU or 2LRU. Moreover, for the higher density scenarios, 2LRU-Lazy, 2LRU-Blind, qLRU-Lazy and (to a minor extent) qLRU-Blind outperform the static allocation that has been obtained by the greedy algorithm assuming known the request rates of each content over the future days. While we recall that the greedy algorithm provides only a -approximation of the optimal allocation for problem (4), we highlight that this apparently surprising result is most likely to be due to the non-stationarity of the request process. In this case an uninformed dynamic policy (like qLRU or 2LRU) can outperform an informed static one, by dynamically adapting content allocation in caches to the short-term request rate of contents.

Fig. 8: Berlin topology and real CDN request trace. qLRU employs .

In order to deepen the comparison between uninformed and informed policies, we have considered the operation scenario that is usually suggested from supporters of informed policies (see e.g. [3]): the optimal allocation is computed the day ahead and contents are pushed to the caches during the night when the network is unloaded. Figure 9 shows then the performance for two coverage values ( and ) on a daily basis as well as for the whole 5 days. In this case, the oracle greedy algorithm computes a static content allocation each night knowing exactly the future request rates for the following day. Instead, the forecast greedy algorithm uses the request rates seen during the current day as an estimation for the following one. The oracle greedy benefits from the knowledge of the request rates over a single day: it can now correctly identify and store those contents that are going to be popular the following day, but are not so popular over the whole trace. For this reason, it outperforms the greedy scheme that receives as input the average rates over the whole 5-day period. When cells have limited overlap (Fig. 9 (a)), the oracle greedy algorithm still outperforms the dynamic policies, but 2LRU-Lazy is very close to it. Interestingly, in the higher density setting (Fig. 9 (b)), this advantage disappears. The performance of the 2LRU-Lazy allocation becomes preferable than both oracle greedy (with daily rates), and qLRU-Lazy. Temporal locality appears to have a larger impact in high density scenarios!

At last we wish to remark that the allocation of the oracle greedy algorithm is an ideal one, because it assumes the future request rates to be known. A practical algorithm will be necessarily based on some estimates such as the forecast greedy. Our results show a significant performance loss due to this incorrect input (as already observed in [28]). Both 2LRU-Lazy and qLRU-Lazy perform significantly better than forecast greedy (2LRU-Lazy guarantees between 10% and 20% improvement). Interestingly, our results contradict one of the conclusions in [13], i.e. that at the BS level reactive caching policies would not be efficient because the content request rate is too low, and content prefetching would perform better. We observe that [13] considers a single BS scenario and that perfect popularity knowledge is available. We have performed some additional simulations considering the current typical request rate at a BS as identified in [13] and we still observe qualitatively the same behaviour illustrated in Fig. 9. These additional experiments are described in Appendix C. Moreover, data traffic rate in cellular networks is constantly increasing and this improves the performance of reactive policies (but not of prefetching) as already observed in [13].

(a) Coverage 1.1
(b) Coverage 5.9
Fig. 9: Berlin topology and real CDN request trace. Comparison of the different policies over the whole trace and for each of the days. qLRU employs .

Vii Conclusions

In this paper, we have shown that “uniformed” schemes can effectively implicitly coordinate different caches in dense cellular systems when smart (but simple) update policies like lazy are used. Indeed we show that they can achieve a performance, which is comparable to that of the best “informed” schemes in static traffic scenarios. Moreover, “uniformed” schemes better adapt to dynamic scenarios, often outperforming implementable “informed” schemes. For once, then, sloth is not the key to poverty, not at least to poor performance.

We have also proposed a new approximate analytical framework to assess the performance of “uniformed” schemes. The predictions of our model are extremely accurate (hardly distinguishable from Monte Carlo simulations in most cases).

This work was partly funded by the French Government (National Research Agency, ANR) through the “Investments for the Future” Program reference #ANR-11-LABX-0031-01.


  • [1] H. Che, Y. Tung, and Z. Wang, “Hierarchical Web caching systems: modeling, design and experimental results,” Selected Areas in Communications, IEEE Journal on, vol. 20, no. 7, pp. 1305–1314, Sep 2002.
  • [2] “Cisco visual networking index: Global mobile data traffic forecast update, 2016–2021 white paper,” CISCO, Tech. Rep., February 2017.
  • [3] K. Shanmugam et al., “Femtocaching: Wireless video content delivery through distributed caching helpers,” IEEE Transactions on Information Theory, vol. 59, no. 12, pp. 8402–8413, 2013.
  • [4] N. Golrezaei et al., “Femtocaching: Wireless video content delivery through distributed caching helpers,” in IEEE INFOCOM 2012, March 2012, pp. 1107–1115.
  • [5] K. Poularakis, G. Iosifidis, and L. Tassiulas, “Approximation algorithms for mobile data caching in small cell networks,” IEEE Transactions on Communications, vol. 62, no. 10, pp. 3665–3677, Oct 2014.
  • [6] K. Naveen et al., “On the interaction between content caching and request assignment in cellular cache networks,” in 5th Workshop on All Things Cellular: Oper., Applic, and Challenges.   ACM, 2015.
  • [7] A. Tuholukova, G. Neglia, and T. Spyropoulos, “Optimal cache allocation for Femto helpers with joint transmission capabilities,” in ICC 2017, IEEE International Conference on Communications, Communications QoS, Reliability, and Modeling Symposium, 21-25 May 2017, Paris, France, Paris, FRANCE, 05 2017. [Online]. Available:
  • [8] A. Chattopadhyay and B. Blaszczyszyn, “Gibbsian on-line distributed content caching strategy for cellular networks,” IEEE Trans. on Wireless Communications, vol. 17, no. 2, 2018.
  • [9] B. Blaszczyszyn and A. Giovanidis, “Optimal geographic caching in cellular networks,” in IEEE ICC 2015, June 2015, pp. 3358–3363.
  • [10] K. Avrachenkov, J. Goseling, and B. Serbetci, “A low-complexity approach to distributed cooperative caching with geographic constraints,” Proc. ACM Meas. Anal. Comput. Syst., vol. 1, no. 1, pp. 27:1–27:25, Jun. 2017.
  • [11] M. Leconte et al., “Placing dynamic content in caches with small population,” in IEEE INFOCOM 2016, 2016.
  • [12] A. Giovanidis and A. Avranas, “Spatial multi-lru caching for wireless networks with coverage overlaps,” 2016, arXiv:1612.04363.
  • [13] S.-E. Elayoubi and J. Roberts, “Performance and cost effectiveness of caching in mobile access networks,” in Proceedings of the 2Nd ACM Conference on Information-Centric Networking, ser. ACM-ICN ’15.   New York, NY, USA: ACM, 2015, pp. 79–88.
  • [14] M. Garetto, E. Leonardi, and V. Martina, “A unified approach to the performance analysis of caching systems,” ACM Trans. Model. Perform. Eval. Comput. Syst., vol. 1, no. 3, pp. 12:1–12:28, May 2016.
  • [15] S. Traverso et al., “Temporal Locality in Today’s Content Caching: Why It Matters and How to Model It,” SIGCOMM Comput. Commun. Rev., vol. 43, no. 5, pp. 5–12, Nov. 2013.
  • [16] G. Neglia, D. Carra, M. Feng, V. Janardhan, P. Michiardi, and D. Tsigkari, “Access-time-aware cache algorithms,” ACM Trans. Model. Perform. Eval. Comput. Syst., vol. 2, no. 4, pp. 21:1–21:29, Nov. 2017.
  • [17] A. Dan and D. Towsley, “An approximate analysis of the lru and fifo buffer replacement schemes,” in Proceedings of the 1990 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, ser. SIGMETRICS ’90.   New York, NY, USA: ACM, 1990, pp. 143–152.
  • [18] R. Fagin, “Asymptotic miss ratios over independent references,” Journal of Computer and System Sciences, vol. 14, no. 2, pp. 222 – 250, 1977.
  • [19] M. Garetto, E. Leonardi, and S. Traverso, “Efficient analysis of caching strategies under dynamic content popularity,” in IEEE INFOCOM 2015, April 2015, pp. 2263–2271.
  • [20] P. R. Jelenkovic, “Asymptotic approximation of the move-to-front search cost distribution and least-recently used caching fault probabilities,” The Annals of Applied Probability, vol. 9, no. 2, pp. 430–464, 1999.
  • [21] C. Fricker, P. Robert, and J. Roberts, “A versatile and accurate approximation for LRU cache performance,” in Proceedings of the 24th International Teletraffic Congress, 2012, p. 8.
  • [22] N. C. Fofack, P. Nain, G. Neglia, and D. Towsley, “Performance evaluation of hierarchical TTL-based cache networks,” Computer Networks, vol. 65, pp. 212 – 231, 2014.
  • [23] R. W. Wolff, Stochastic modeling and the theory of queues.   Pearson College Division, 1989.
  • [24] A. Pelizzola, “Cluster variation method in statistical physics and probabilistic graphical models,” Journal of Physics A: Mathematical and General, vol. 38, no. 33, p. R309, 2005.
  • [25] G. Neglia et al., “Access-time aware cache algorithms,” in ITC-28, September 2016.
  • [26] H. P. Young, “The Evolution of Conventions,” Econometrica, vol. 61, no. 1, pp. 57–84, January 1993.
  • [27] “Openmobilenetwork.” [Online]. Available:
  • [28] G. Neglia, D. Carra, and P. Michiardi, “Cache Policies for Linear Utility Maximization,” IEEE/ACM Transactions on Networking, vol. 26, no. 1, pp. 302–313, 2018. [Online]. Available:
  • [29] P. Konstantopoulos and J. Walrand, “A quasi-reversibility approach to the insensitivity of generalized semi-markov processes,” Probability in the Engineering and Informational Sciences, vol. 3, no. 3, pp. 405–415, 1989.

Appendix A Insensitivity for Fifo caches

The proof of Proposition III.1 follows.


The vector indicates where copies of content are stored at time . Under CTA for FIFO, when a copy is inserted at cache , a timer is set to the deterministic value and decreased over time. When the timer reaches zero, the content is erased from cache . We denote by the residual value of such timer at time . The system state at time , as regards content , is then characterized by , i.e. by the current allocation of content copies and their residual timers. The process is a Generalized Semi-Markov Process (GSMP) [29].

Let be the stationary distribution of . This distribution is in general a function of the timer distributions. If it depends on them only through their expected values, then the GSMP is said to be insensitive. In this case the stationary distribution remains unchanged if we replace all the timers with exponential random variables and then the GSMP simply becomes a continuous time Markov Chain with state . Then, Approximation A is correct whenever the GSMP is insensitive.

A GSMP is insensitive if and only if its stationary distribution satisfies some partial balance equations (a.k.a. Matthes’ conditions) [29, Eq. (2.2)], as well as the usual global balance equations. For our system, Matthes’ conditions can be written as


i.e. the rate at which the timer is activated is equal to the rate at which it expires. Conditions (6) are equivalent to the corresponding MC being reversible. This completes the proof.

Appendix B qLRU-Lazy’s optimality

In the trefoil topology, any cell is equivalent to any other, so it does not matter where the copies of a given content are located, but just how many of them there are. Let be the total request rate for content from users located in cells,121212 It does not matter which ones, because of the symmetry. i.e.:

corresponds then to content hit ratio when copies of the content are stored at the caches. Moreover, we denote by the marginal increase of the hit ratio due to adding a -th copy of content . We observe that is decreasing in and that it holds:

B-a Proof of Proposition v.1


The proof is rather immediate. First observe that by exploiting the properties of the cell-trefoil topology, (4) can be rewritten as:

maximize (7)
subject to

Now, caching problem (7) can be easily mapped to a (trivial) knapsack problem with objects of unitary size, according to the following lines: for every content we define different virtual objects with , with associated weights:

i.e. the weight is equal to the marginal increase of the hit ratio, which is obtained by storing the -th copy of content into the caching system.

The objective of the knapsack problem is to find the set of objects, which maximizes the sum of all the associated weights. Indeed (7) can be rewritten as: . In particular, observe that since , virtual object only if . This implies that provides a feasible solution for the original caching problem, where is equal to the largest such that .

Finally, note that, by construction, is the set composed of the objects with the largest value; therefore by construction, corresponds to: i) the caching allocation that maximizes the global hit rate (i.e. the allocation that solves (7)); ii) moreover, it is the only solution of the greedy algorithm, under the assumption that object values are all different, (i.e. for generic values of the parameters).

B-B Proof of Proposition v.2


Under our approximated model, system dynamics are described by Markov chains, one for each content, coupled by the characteristic times. The symmetry of the trefoil topology implies that the characteristic time at each cache has the same value that we denote simply as .

Every MC is a birth-death process. In particular, under qLRU-Lazy, for content , the transition rate from state to is

and from state to it is

Let us define . The stationary probability to have copies of content is then


are values that do not depend on or and they will not play a role in the following study of the asymptotic behaviour.

Under CTA, the buffer constraint is expressed imposing that the expected number of contents at a cache is equal to the buffer size. If the system is in state , any given BS has probability to be one of the storing it, then the buffer constraint is

or equivalently:


We focus now our attention on the stationary distribution when converges to . As changes, the characteristic time changes as well. We write to express such dependence. When converges to , diverges, otherwise all the probabilities would converge to and constraint (8) would not be satisfied. It follows that:


Let us consider a sequence that converges to such that it exists . The value is also said to be a cluster value for the function . It holds that for any marginal hit ratio

Let . For generic values of the parameters the values are all distinct by hypothesis, then there can be at most one content for which it holds . For any other content it follows that the dominant term in the denominator of (9) is and then for :

i.e. asymptotically exactly copies of content would be stored. For content , both the term and could be dominant. Then all the converge to for . The total expected number of copies stored in the system would then be:

Because of (8), this sum has to be equal to the integer , then one of the two following mutually exclusive possibilities must hold, or

and then , or

and then . In any case, the conclusion is that, when converges to , for each content a fixed number of copies is stored at the cache. is such that the marginal hit-ratio increase due to the -th copy is among the largest marginal hit-ratios (and the -th copy is not among them). This allocation coincides with the solution of the greedy algorithm.

B-C Proof of Proposition v.3


For a given content , let and be two possible states of the MC. We say that whenever for each ; furthermore we denote with the number of stored copies of the content in state , which we call weight of the state .

Now observe that by construction, transition rates in the MC are different from 0 only between pair of states and , such that: i) , ii) . In such a case we say that is a parent of and is a son of . Moreover we say that is an upward transition, while is a downward transition.

Let be parent of and let be the index such that , we have that the upward rate and the downward rate .

Now, as for every every upward rate tends to 0. Therefore necessarily the characteristic time of every cell must diverges. In fact, if it were not the case for a cache , none of the contents would be found in the cache asymptotically, because upward rates tend to zero, while downward rates would not. This would contradict the constraint:


imposed by the CTA. Therefore necessarily for every cell . More precisely we must have at every cache otherwise we fail to meet (10). Now we can always select a sequence such that .

Let us now consider the uniformization of the continuous time MC with an arbitrarily high rate and the corresponding discrete time MC with transition probability matrix . For

, the set of contents in the cache does not change, each state is an absorbing one and any probability distribution is a stationary probability distribution for

. We are rather interested in the asymptotic behaviour of the MC when converges to . For the MC is finite, irreducible and aperiodic and then admits a unique stationary probability . We call the states for which stochastically stable. We are going to characterize such states.

For what we have said above, it holds that the probability to move from to the parent is , while . For each possible transition, we define its direct resistance to be the exponent of the parameter , then , and . If a direct transition is not possible between two states, then we consider the corresponding direct resistance to be infinite. Observe that the higher the resistance, the less likely the corresponding transition. Given a sequence of transitions from state to state , we define its resistance to be the sum of the resistances, i.e. .

The family of Markov chains is a regular perturbation [26, properties (6-8)] and then it is possible to characterize the stochastically stable states as the minimizers of the potential function defined as follows. For each pair of states and let be the minimum resistance of all the possible sequences of transitions from to (then ). Consider then the full meshed directed weighted graph whose nodes are the possible states of the MC and the weights of the edge is . The potential of state () is defined as the resistance of the minimum weight in-tree (or anti-arborescence) rooted to . Intuitively the potential is a measure of the general difficulty to reach state from all the other nodes. From Theorem 4 of [26] it follows that is stochastically stable if and only if its potential is minimal.

For each content we are then able to characterize which configurations are stochastically stable as converges to . Moreover, this set of configurations must satisfy the constraint (10) at each base station . We define then the cache configuration to be jointly stochastically stable if 1) for each content is stochastically stable, 2) satisfies (10) for each .

The last step in order to prove Proposition (V.3) is to show that a jointly stochastically stable cache configuration is locally optimal, i.e. that changing one content at a given cache does not increase the hit ratio. Without loss of generality, we consider to replace content present at cache with content . Then, the cache allocation changes from and to a new one cache allocation , such that and . Let denote the hit rate for content over the whole network under the allocation and the global hit rate across all the contents. Lemma B.1 below provides a formula for the hit rate , from which we obtain that