Caching systems using the Least Recently Used (LRU) principle are already widely deployed but need to efficiently scale to support emerging data applications. They have very different stochastic dynamics [29, 73, 46, 45, 18, 60, 49, 79, 20, 40] than well-studied queueing systems. One cannot apply the typical intuition of resource pooling for queueing, e.g., [61, 53, 15, 78, 24], to caching. To serve multiple flows of data item requests, a fundamental question is whether the cache space should be pooled together or divided (see Fig. 1) in order to minimize the miss probabilities.
A request is said to “miss” if the corresponding data item is not found in the cache; otherwise a “hit” occurs. For a web service each miss often incurs subsequent work at a backend database, resulting in overhead as high as a few milliseconds or even seconds . A study on Facebook’s memcached workloads shows that a small percentage of miss ratio on one server can trigger millions of requests to the database per day [9, 85]. Thus, even a minor increase in the hit ratio can significantly improve system performance. To further motivate the problem, we examine the cache space allocation for in-memory key-value storage systems.
I-a Background and current practice
In-memory cache processing can greatly expedite data retrieval, since data are kept in Random Access Memory (RAM). In a typical key-value cache system, e.g., Memcached [1, 39, 68], a data item is added to the cache after a client has requested it and failed. When the cache is full, an old data item needs to be evicted to make room for the new one. This selection is determined by the caching algorithm. Different caching algorithms have been proposed [59, 65]. However, due to the cost of tracking access history, often only LRU or its approximations , are adopted . The LRU algorithm replaces the data item that has not been used for the longest period of time.
The current engineering practice is to organize servers into pools based on applications and data domains [9, 68, 28]. On a server, the cache space is divided into isolated slabs according to data item sizes [1, 85]. Note that different servers and slabs have separate LRU lists. These solutions have yielded good performance [1, 32, 85], through coarse level control on resource pooling and separation. However, it is not clear whether these rules of thumb are optimal allocations, or whether one can develop simple solutions to further improve the performance.
I-B The optimal strategy puzzle
These facts present a dilemma. On the one hand, multiple request flows benefit from resource pooling. For example, a shared cache space that provides sufficiently high hit ratios for two flows can improve the utilization of the limited RAM space, especially when the two flows contain overlapped common data items so that a data item brought into cache by one flow can be directly used by others. On the other hand, resource separation facilitates capacity planning for different flows and ensures adequate quality of service for each. For example, a dedicated cache space can prevent one flow with a high request rate from evicting too many data items of another competing flow on the same cache .
This dilemma only scratches the surface of whether resource pooling or separation is better for caching. Four critical factors complicate the problem and jointly impact the cache miss probabilities (a.k.a. miss ratios), including request rates, overlapped data items across different request flows, data item popularities and their sizes. Depending on the setting, they may lead to different conclusions. Below we demonstrate the complexity of the optimal strategy using three examples, showing that resource pooling can be asymptotically equal to, better or worse than separation, respectively. Consider two independent flows ( and ) of requests with Poisson arrivals of rates and , respectively. The data items of the two flows do not overlap and have unit sizes unless explicitly specified. Their popularities follow truncated Zipf’s distributions, and , where are the indeces of the data items of flow and , respectively. For pooling, two flows share the whole cache. For separation, the cache is partitioned into two parts using fractions and , to serve flow and separately, .
Case 1: Asymptotic equivalence
The optimal resource separation scheme has recently been shown to be better than pooling  under certain assumptions based on the Che approximation . However, it is not clear whether the difference is significant or not, especially when the cache size is large (a typical scenario). The first example shows that they can be quite close. Notably resource pooling is adaptive and need not optimize separation fractions . For , we plot the overall miss probabilities under resource pooling and separation in Fig. 2, respectively. The optimal ratio for separation is obtained numerically by an exhaustive search.
When the cache size is small, the optimal separation strategy achieves a better miss probability than resource pooling. However, for large cache sizes, the miss probabilities are indistinguishable. This is not an coincidence, as shown by Theorem 3. Note that the cache sizes take integer values, thus varying up and down.
Case 2: Pooling is better
The previous example shows that resource pooling can adaptively achieve the best separation fraction when the cache space is large. Consider two flows with and time-varying Poisson request rates. For , let in the time interval and in , .
The simulation results in Fig 3 show that resource pooling achieves a smaller miss probability, primarily attributing to self-organization. The optimal static separation ratio in this case is due to symmetry.
Case 3: Separation is better
Assume that the data items from flow and flow have different sizes and , respectively, with . The simulation results in Fig. 4 show that the optimal separation yields a better performance due to varying data item sizes, which is supported by Theorem 3. This may explain why in practice it is beneficial to separate cache space according to applications, e.g., text and image objects, which could have significantly different item sizes [68, 84]. What if the data item sizes are equal? Fig. 2 is an example that separation is better when
the cache space is small even with equal data item sizes. However, a small cache may not be typical for caching systems. These examples motivate us to systematically study the miss probabilities for competing flows with different rates, distributions, and partially overlapped data items of varying sizes. Our analytical results can be used to explain the puzzling performance differences demonstrated in the previous three examples.
I-C Summary of contributions
(1) An analytical framework under the independent reference model (IRM)  is proposed to address four critical factors for LRU caching: request rates, distributions, data item sizes and the overlapped data items across different flows. We generalize the existing results [57, 58] on the asymptotic miss probability of LRU caching from Zipf’s law to a broad class of heavy-tailed distributions, including, e.g., regularly varying and heavy-tailed Weibull distributions. More importantly, our results can characterize miss probabilities of multiple competing flows with varying data item sizes when they share a common large cache space. These asymptotic results validate the Che approximation  under certain conditions.
(2) Based on the miss probabilities for both the aggregated and the individual flows, we provide guidance on whether multiple competing flows should be served together or not. First, we show that when the flows have similar distributions and equal data item sizes, the self-organizing property of LRU can adaptively search for the optimal resource allocation for shared flows. As a result, the overall miss probability of the aggregated flows is asymptotically equal to the miss probability using the optimal static separation scheme. In addition, if the request rates of these flows are close, the miss probabilities of individual flows when served jointly differ only by a small constant factor compared to the case when they are served separately. Otherwise, either some of the request flows will be severely penalized or the total miss ratio will become worse. In that case, it is better to separately serve them. Second, we consider multiple flows with overlapped data. When the overlapped data items exceed a certain level, there exists a region such that every flow can get a better hit ratio. However, if not in this region, e.g., when the arrival rates are very different, some flows will be negatively impacted by other competing flows. Based on the analysis, we discuss engineering implications.
(3) Extensive simulations are conducted to verify the theoretical results. We design a number of simulations, with different purposes and emphases, and show an accurate match with our theoretical results.
I-D Related work
LRU caching is a self-organizing list [3, 2, 22, 23, 38, 52, 70, 54, 5] that has been extensively studied. There are two basic approaches to conduct the analysis: combinatorial and probabilistic. The first approach focuses on the classic amortized [16, 25, 72, 77, 76] and competitive analysis [62, 17, 31, 8, 26, 56]. The second approach includes average case analysis [75, 66, 4] and stochastic analysis [67, 41, 42, 43, 36, 14]. When cache sizes are small, the miss probabilities can be explicitly computed [10, 11, 12, 51]. For large cache sizes, a number of works (e.g., [19, 44, 48, 60, 71]) rely on the Che approximation , which has been extended to cache networks [63, 44, 74, 47, 48, 19]. For fluid limits as scaling factors go to infinity (large cache sizes), mean field approximations of the miss probabilities have been developed [55, 81, 50]. For emerging data processing systems, e.g., Memcached 
, since the cache sizes are usually large and the miss probabilities are controlled to be small, it is natural to conduct the asymptotic analysis of the miss probabilities[57, 58]. Although the miss ratios are small, they still significantly impact the caching system performance. Nevertheless, most existing works do not address multiple competing request flows on a shared cache space, which can impact each other through complicated ways.
Workload measurements for caching systems [6, 64, 34, 6, 27, 37, 7, 9] are the basis for theoretical modeling and system optimization. Empirical trace studies show that many characteristics of Web caches can be modeled using power-law distributions [6, 86], including, e.g., the overall data item popularity rank, the document sizes, the distribution of user requests for documents [27, 64, 13, 7], and the write traffic . Similar phenomena have also been found for large-scale key-value stores . These facts motivate us to exploit the heavy-tailed workload characteristics.
Web and network caching is closely related to this study with a large body of dedicated works; see the surveys [82, 69] and the references therein. Recently a utility optimization approach [30, 35] based on the Che approximation [29, 18] has been used to study cache sharing and partitioning. It has concluded that under certain settings the optimal resource separation is better than pooling. However, it is not clear whether the difference is significant or not, especially when the cache size is large for a typical scenario. We show that a simple LRU pooling is asymptotically equivalent to the optimal separation scheme for certain settings, which is significant since the former is adaptive and does not require any configuration or tuning optimization. We focus on the asymptotic miss probabilities for multiple competing flows directly, as the miss ratio is one of the most important metrics for caching systems with large cache sizes in practice.
Ii Model and intuitive results
Consider flows of i.i.d. random data item requests that are mutually independent. Assume that the arrivals of flow follow a Poisson process with rate . The arrivals of the mixed request flows occur at time points . Let be the index of the flow for the request at . The event represents that the request at originates from flow . Due to the Poisson assumption, we have .
To model the typical scenario that the number of distinct data items far exceeds the cache capacity, we assume that each flow can access an infinite number of data items. Formally, flow accesses the set of data items , from which only a finite number can be stored in cache due to the limited capacity. Let denote the size of data item . Note that it is possible, and even common in practice, to observe for flows and , where “” means that the two involved data items are the same. Therefore, this model describes the situation when data items can overlap between different flows.
For example, in Fig. 5, we have , and . Let denote the requested data item at time . Thus, the event means that the request at time is from flow to fetch data item . We also abuse the notation for a bit and define to be the probability that the request at time is to fetch a data item with an index larger than in the ordered list of flow . The ordering will be specified in the following part.
When the system reaches stationarity (Theorem 1 of ), the miss ratio of the system is equal to the probability that a request at time finds that its asked data item is not kept in the cache. Therefore, we only need to consider in the following part. Due to multiple request flows, we have two sets of probabilities for each flow. Flow experiences the unconditional probabilities
and the conditional probabilities
Specially, if there is only a single flow , i.e., , then for all . It couples the request flows, since a data item requested by flow is more likely to be found in the cache when it has recently been requested by other flows. In this case, the usual belief is to pool these flows together, so that one flow can help the others to increase the hit. However, if the fraction of overlapped data items is not significant enough, it is intuitively inevitable that the help obtained from other flows on these common data items will be quite limited. There have been no analytical studies to quantify the effects on how the overlapped data items can help different flows.
When studying flow , assume that the data items are sorted such that the sequence is non-increasing with respect to . Given (3), the sequence is not necessarily non-increasing by this ordering. We investigate how the following functional relationship for flow , , in a neighborhood of infinity, impacts the miss ratio,
Note means . The values in (4) are defined using reciprocals, as both and take values in , in line with the condition that is defined in a neighborhood of infinity. We consider the following class of heavy-tailed distributions
It includes Zipf’s distribution , , and heavy-tailed Weibull distributions with .
It has been shown [41, 43, 57, 58] that the miss probability of LRU is equivalent to the tail of the searching cost distribution under move-to-front (MTF). For MTF, the data items are sorted in increasing order of their last access times. Each time a request is made for a data item, this data item is moved to the first position of the list and all the other data items that were before this one increase their positions in the list by one.
Define to be the summation of the sizes for all the data items in the sorted list under MTF that are in front of the position of the data item requested by at time .
If the cache capacity is , then a cache miss under MTF, which is equivalent for LRU policy, can be denoted by . For a special case when the data item sizes satisfy for all , the event means the position of the data item in the list is larger than under MTF.
For the flows mixed together, let denote the set of data items requested by the entirety of these flows, with . Let denote the size of data item and assume . In general, can take different values when data item sizes vary. Let
be an increasing function with an inverse , which is related to the Che approximation . We can analytically derive in some typical cases, as shown in Corollaries 2 and 3, which directly exploit the properties of the popularity distributions, different from the Che approximation.
One of our main results can be informally stated as follows, for a gamma function .
Main Result (Intuitive Description) For flows sharing a cache, if , , is approximately a polynomial function (), then, under mild conditions, we obtain, when the cache capacity is large enough,
Sketch of the proof: First, we derive a representation for the miss probability of the request . Similar arguments have been used in [46, 57] but we take a different approach. Among all the requests that occur before we find the last one that also requests data item . More formally, define to be the largest index of the request arrival before such that . Conditional on , the following requests are i.i.d, satisfying
Now we argue that the event is completely determined by the requests at the time points . Let denote the total size of all the distinct data items that have been requested on points . Define the inverse function of to be . We claim that
If the event happens, the total size of the distinct data items requested on the time interval is no smaller than and these data items are different from the one that is requested at time (or ). Due to the equivalence of LRU and MTF, when arrives at , all of the data items requested on will be listed in front of it under MTF. Combining these two facts we obtain . If occurs, then after when is listed in the first position of the list, there must be enough distinct data items that have been requested on so that their total size exceeds or reaches . This yields , which proves (8) and implies
In order to compute , we take two steps. The first step is to show
The second step is to relate to as .
Here, we provide an intuitive proof for . From (7), we have
which, in conjuction with (4), yields, by replacing by ,
For the second step, we have with a high probability as by a concentration inequality. The monotonicity and continuity of imply with a high probability under certain conditions. Applying (9) and (10), we finish the proof
The rigorous proof is presented in Theorem 1. It also provides a numerical method to approximate the miss probabilities. In practice, once we have the information about the data sizes and the corresponding data popularities , e.g., from the trace, we can always explicitly express , since only takes a finite number of values in this case. Then, we can evaluate numerically; see Section V. Explicit expressions for are derived for some cases in Section III-A. Note that is tightly related to the Che approximation ; see Section III-C.
Iii Multiple competing flows
In this section, we rigorously characterize the miss probability of a given request flow, say flow , when it is mixed with other competing flows that share the same cache in Section III-A. In Section III-B, we provide a method to calculate for multiple flows based on a decomposition property.
Iii-a Asymptotic miss ratios
The miss probability of flow , for a cache size , is represented by a conditional probability . Recall and that is defined for the mixed flow. Note as . By the theory of regularly varying functions , a function is slowly varying if for any as ; and is called regularly varying of index .
Assume that, for a function and ,
The function characterizes how fast grows, and thus should be selected to be as large as possible while still satisfying (11). For example, when is regularly varying, e.g., , we can let , which yields . When , we can pick , since , implying . Both satisfy . Note that in these examples satisfies the following condition: there exist and , for ,
Theorem 1 is the rigorous version of the main result described in (6). The proof is presented in Section VII-B. Based on Theorem 1, we can easily derive some corollaries. We begin with the special case when there is only a single flow in service and all data items are of the same size . For a single flow , we simplify the notation by and . Theorem 1 recovers the results in [57, 58] for Zipf’s distribution
Our result enhances (14) in three aspects. First, we study multiple flows () that can have overlapped data items and the requested data items can have different sizes. Second, we address the case (then needs to be replaced by as in (15)), while the results in [57, 58] assume . For , we need to assume that only a finite number of data items can be requested (this paper assumes an infinite number); otherwise the popularity distribution does not exist. This special case needs to be handled differently and is not presented in this paper. Due to this difference, the asymptotical result in (13) is only accurate for large when . Third, our result can derive the asymptotic miss probability for a large class of popularity distributions, e.g., Weibull, with varying data item sizes. Corollary 1 extends the results of Theorem 3 in  that is proved under the condition (14) to regularly varying probabilities
with being a slowly varying function, e.g., .
Consider a single flow with and , . Let and , . If as for some , then
For a single flow with requests following a heavy-tailed Weibull distribution , and , we have, for a Euler’s constant ,
Since is a decreasing function in , we have
Changing the variable and using the property of incomplete gamma function, we obtain
which implies, for ,
Using Lemma 6 in , we obtain
Iii-B Decomposition property
For multiple request flows without overlapped common data items, we have a decomposition property. Let be constructed from a set of distributions according to probabilities , . Specifically, a random data item following the distribution is generated by sampling from the distribution with a probability . Since two flows have no overlapped data items, we have . Therefore, according to (3), can be represented by an unordered list,
Let . Lemma 1 shows a decomposition property for and under certain conditions. Let and . It is often easier to compute than .
Without overlapped data items, if, for either or , we have , with , then, as ,
The proof of Lemma 1 is presented in Section VII. It can be used to compute for multiple flows sharing the same cache. Furthermore, applying Theorem 1, we can derive the miss probability for each flow.
Consider flows without overlapped data, satisfying , and , . Assume that the data items of flow have identical sizes, i.e. . For and , we have, for ,
and for ,
Corollary 3 approximates the miss probabilities for multiple flows with different when the cache capacity . When the cache capacity is small, this approximation is not accurate. In order to improve the accuracy, denote by the second smallest value among all ’s. Defining , we consider all flows in the set , and derive
The inverse function of can be better approximated by
where and is defined in (28). We obtain more accurate numerical results for miss probabilities using (30) instead of (29) especially when the cache capacity is small, though the expressions in (29) and (30) are asymptotically equivalent. Experiments 6 in Section V validates this approximation. Alternatively, we also resort to numerical methods to directly evaluate for more complex cases.
Iii-C Connection to the Che approximation
The miss probability of LRU algorithm has been extensively studied using the Che approximation . Now we show that the Che approximation is asymptotically accurate under certain conditions; see a related validity argument in . For multiple flows, the overall miss probability computed by the Che approximation is
where is the cache characteristic time as the unique solution to .
Under the conditions of Theorem 1, we have, as ,
Iv Pooling and separation
We first characterize the self-organizing behavior of LRU caching for multiple flows in Section IV-A. Then, we study how the interactions of competing flows impact the individual ones in Section IV-B. The consequences of overlapped data items across different flows are investigated in Section IV-C. Based on the insights, we discuss engineering implications in Section IV-D.
A pooling scheme serves the request flows jointly using the cache space of size . A separation scheme divides the cache space into parts according to fractions , , and allocates to flow .
Iv-a Self-organizing behavior of pooling
Based on the asymptotic miss ratios derived in Theorem 1, we show that, when multiple flows have similar distributions and identical data item sizes, resource pooling asymptotically gives the best overall hit ratio achieved by the optimal separation scheme. Otherwise, the optimal separation scheme results in a better overall miss ratio. Note that the optimal separation scheme is static while the pooling scheme is adaptive without any parameter tuning or optimization. This explains why pooling is better in Fig. 3. Denote by and the overall miss probabilities under the optimal separation and under resource pooling, respectively.
For flows without overlapped data, following , , and the data items of flow having the same size , we have
and the equality holds if and only if .
This result explains the simulation in Fig.4 when data item sizes are different. In practice, data item sizes vary, and they can be considered approximately equal if within the same range, as used by slabs of Memcached [9, 68]. Note that Theorem 3 only characterizes an asymptotic result. When the cache size is not large enough and ’s are different, resource pooling can be worse than the optimal separation, as studied in . As commented after Corollary 3, a better approximation for small cache sizes is to use Theorem 1 by numerically evaluating . Theorem 3 also shows that when data item sizes vary significantly, resource pooling could be worse than separation, as illustrated in Fig. 4.
First, we assume . To characterize resource separation, by Theorem 1, we obtain
Since the optimal separation method minimizes the overall asymptotic miss probability, we have