Content distribution has become a dominant application in today’s Internet. Much of these contents are delivered by Content Distribution Networks (CDNs), which are provided by Akamai, Amazon etc (Jiang et al., 2012). There usually exists a stringent requirement on the latency between service provider and end users for these applications. CDNs use a large network of caches to deliver content from a location close to the end users. If a user’s request is served by the cache (i.e., cache hit), the user experiences a faster response time than if it was served by the backend server. It also reduces bandwidth requirements at the central content repository.
With the aggressive increase in Internet traffic over past years (Cisco, 2015), CDNs need to host contents from thousands of regions belonging to web sites of thousands of content providers. Furthermore, each content provider may host a large variety of content, including videos, music, images and webs. Such an increasing diversity in content services requires CDNs to provide different quality of service to varying content classes and applications with different access characteristics and performance requirements. Significant economic benefits and important technical gains have been observed with the deployment of service differentiation (Feldman and Chuang, 2002). While a rich literature has studied the design of fair and efficient caching algorithms for content distribution, little work has paid attention to the provision of multi-level services in cache networks.
Managing cache networks requires policies to route end-user requests to the local distributed caches, as well as caching algorithms to ensure availability of requested content at the caches. In general, there are two classes of policies for studying the performance of caching algorithms: conventional caching eviction policy and timer-based, i.e., Time-To-Live (TTL) (Fagin, 1977; Che et al., 2002; Fofack et al., 2014). On one hand, since the cache size is usually much smaller than the total amount of content, some contents need to be evicted if the requested content is not in the cache (i.e., cache miss). Some well known content eviction policies are Least-Recently-Used (LRU) (Gast and Houdt, 2016; Li et al., 2018b), Least-Frequently-Used (LFU) (Coffman and Denning, 1973), First In First Out (FIFO), and RANDOM (Coffman and Denning, 1973; Li et al., 2018b). Exact analysis of these algorithms has proven to be difficult, even under the simple Independence Reference Model (IRM) (Coffman and Denning, 1973), where the requests to contents are independent of each other. The strongly coupled nature of these eviction algorithms makes implementation of differential services challenging.
On the other hand, a TTL cache associates each content with a timer upon request and the content is evicted from the cache on timer expiry, independent of other contents. Analysis of these policies is simple since the eviction of contents are decoupled from each other.
Most studies have focused on the analysis of a single cache. When a cache network is considered, independence across different caches is usually assumed (Rosensweig et al., 2010). Again, it is hard to analyze most conventional caching algorithms, such as LRU, FIFO and RANDOM, but some accurate results for TTL caches are available (Berger et al., 2014; Fofack et al., 2014). However, it has been observed (Laoutaris et al., 2004) that performance gains can be obtained if decision-making is coupled at different caches.
In this paper, we consider a TTL cache network. Any node in the network can generate a request for a content, which is forwarded along a fixed path towards the server. The forwarding stops upon a cache hit, i.e., the requested content is found in a cache on the path. When such a cache hit occurs, the content is sent over the reverse path to the node initializing the request. This raises the questions: where to cache the requested content on the reverse path and what is the value of its timer? Answering these questions in an affirmative way can provide new insights in cache network design; however, it may also increase the complexity and hardness of the analysis.
Our goal is to provide thorough and rigorous answers to these questions. To that end, we consider moving the content one cache up if there is a cache hit on it and pushing the content one cache down once its timer expires in the cache hierarchy, since the recently evicted content may still be in demand. This leads to the “Move Copy Down with Push” (MCDP) policy. While pushing a copy down may improve system performance, it induces greater operational cost in the system. We can also consider another policy “Move Copy Down” (MCD) under which content is evicted upon timer expiry. These will be described in detail in Section 3.
We first focus on a linear cache network. In a linear cache network, all requests are made at an end node, and if the content is not present in the network, served at the other end. We first consider a utility-driven caching framework, where each content is associated with a utility and content is managed with a timer whose duration is set to maximize the aggregate utility for all contents over the cache network. Building on MCDP and MCD models, we formulate the optimal TTL policy as a non-convex optimization problem in Section 4.3. A first contribution of this paper is to show that this non-convex problem can be transformed into a convex one by change of variables. We further develop online algorithms for content management over linear cache networks, and show that this algorithm converges to the optimal solution through Lyapunov functions.
Utilities characterize user satisfaction and provide an implicit notion of fairness. However, since we consider a cache network, content management also induces costs, such as search cost for finding the requested content on the path, fetch cost to serve the content to the user that requested it, and move cost upon cache hit or miss due to caching policy. We fully characterize these costs and formulate a cost minimization problem in Section 4.4.
Next, informed by our results for linear cache networks, we consider a general cache network. In a general cache network, requests for content can be made at any node and content servers reside at any node. Requests are propagated along a fixed path from end user to a server. In Section 5.2, we assume that contents requested along different paths are distinct. In this case, we simply extend our results from Section 4.3, since the general network can be treated as a union of different line networks. The more interesting case where common content is requested along different paths is considered in Section 5.3. This introduces non-convex constraints so that the utility maximization problem is non-convex. We show that although the original problem is non-convex, the duality gap is zero. Based on this, we design a distributed iterative primal-dual algorithm for content management in the general cache network. We show through numerical evaluations that our algorithm significantly outperforms path replication with traditional caching algorithms over a broad array of network topologies.
Finally, we include some generalization in Section 6. We discuss how our framework can be directly mapped to content distributions in CDNs, ICNs/CCNs etc. Numerical results are given on how to optimize the performance. Conclusions are given in Section 7. Some additional discussions and proofs are provided in Appendix 8.
2. Related Work
There is a rich literature on the design, modeling and analysis of cache networks, including TTL caches (Rodríguez et al., 2016; Fofack et al., 2012, 2014; Berger et al., 2014), optimal caching (Ioannidis and Yeh, 2016; Li et al., 2018a) and routing policies (Ioannidis and Yeh, 2017). In particular, Rodriguez et al. (Rodríguez et al., 2016) analyzed the advantage of pushing content upstream, Berger et al. (Berger et al., 2014) characterized the exactness of TTL policy in a hierarchical topology. A unified approach to study and compare different caching policies is given in (Garetto et al., 2016) and an optimal placement problem under a heavy-tailed demand has been explored in (Ferragut et al., 2016).
studied joint routing and content placement with a focus on a bipartite, single-hop setting. Both showed that minimizing single-hop routing cost can be reduced to solving a linear program. Ioannidis and Yeh(Ioannidis and Yeh, 2017) studied the same problem under a more general setting for arbitrary topologies.
An adaptive caching policy for a cache network was proposed in (Ioannidis and Yeh, 2016), where each node makes a decision on which item to cache and evict. An integer programming problem was formulated by characterizing the content transfer costs. Both centralized and complex distributed algorithms were designed with performance guarantees. This work complements our work, as we consider TTL cache and control the optimal cache parameters through timers to maximize the sum of utilities over all contents across the network. However, (Ioannidis and Yeh, 2016) proposed only approximate algorithms while our timer-based models enable us to design optimal solutions since content occupancy can be modeled as a real variable (e.g. a probability).
Closer to our work, a utility maximization problem for a single cache was considered under IRM (Dehghan et al., 2016; Panigrahy et al., 2017a) and stationary requests (Panigrahy et al., 2017b), while (Ferragut et al., 2016) maximized the hit probabilities under heavy-tailed demands over a single cache. None of these approaches generalizes to cache networks, which leads to non-convex formulations (See Section 4.2 and Section 5.3); addressing this lack of convexity in its full generality, for arbitrary network topologies, overlapping paths and request arrival rates, is one of our technical contributions.
We consider a cache network, represented by a graph We assume a library of unique contents, denoted as with Each node can store a finite number of contents, is the cache capacity at node The network serves content requests routed over the graph A request is determined by the item requested by the user and the path that the request follows; this will be described in detail in Section 3.2.1. We assume that the request processes for distinct contents are described by independent Poisson processes with arrival rate for content Denote Then the popularity (request probability) of content satisfies (Baccelli and Brémaud, 2013)
3.1. TTL Policy for Individual Caches
Consider the cache at node . Each content is associated with a timer under the TTL cache policy. While we focus on node we omit the subscript Consider the event when content is requested. There are two cases: (i) if content is not in the cache, content is inserted into the cache and its timer is set to (ii) if content is in the cache, its timer is reset to . The timer decreases at a constant rate and the content is evicted once its timer expires.
3.2. Replication Strategy for Cache Networks
In a cache network, upon a cache hit, we need to specify how content is replicated along the reverse path towards the user that sent the request.
3.2.1. Content Request
The network serves requests for contents in routed over the graph . Any node in the network can generate a request for a content, which is forwarded along a fixed and unique path from the user towards a terminal node that is connected to a server that always contains the content. Note that the request need not reach the end of the path; it stops upon hitting a cache that stores the content. At that point, the requested content is propagated over the path in the reverse direction to the node that requested it.
To be more specific, a request is determined by the node, , that generated the request, the requested content, , and the path, , over which the request is routed. We denote a path of length as a sequence of nodes such that for where We assume that path is loop-free and terminal node is the only node on path that accesses the server for content
3.2.2. Replication Strategy
We consider TTL cache policies at every node in the cache network where each content has its own timer. Suppose content is requested and routed along path There are two cases: (i) content is not in any cache along path in which case content is fetched from the server and inserted into the first cache (denoted by cache )111Since we consider path , for simplicity, we move the dependency on and , denote it as nodes directly. on the path. Its timer is set to ; (ii) if content is in cache along path we consider the following strategies (Rodríguez et al., 2016)
Move Copy Down (MCD): content is moved to cache preceding cache in which is found, and the timer at cache is set to . Content is discarded once the timer expires;
Move Copy Down with Push (MCDP): MCDP behaves the same as MCD upon a cache hit. However, if timer expires, content is pushed one cache back to cache and the timer is set to
3.3. Utility Function
Utility functions capture the satisfaction perceived by a user after being served a content. We associate each content with a utility function that is a function of hit probability . is assumed to be increasing, continuously differentiable, and strictly concave. In particular, for our numerical studies, we focus on the widely used -fair utility functions (Srikant and Ying, 2013) given by
where denotes a weight associated with content .
4. Linear Cache Network
We begin with a linear cache network, i.e., there is a single path between the user and the server, composed of caches labeled A content enters the cache network via cache and is promoted to a higher index cache whenever a cache hit occurs. In the following, we consider the MCDP and MCD replication strategies when each cache operates with a TTL policy.
4.1. Stationary Behavior
(Gast and Houdt, 2016) considered two caching policies LRU() and -LRU. Though the policies differ from MCDP and MCD, respectively, the stationary analyses are similar. We present our results here for completeness, which will be used subsequently in the paper.
Requests for content arrive according to a Poisson process with rate Under TTL, content spends a deterministic time in a cache if it is not requested, independent of all other contents. We denote the timer as for content in cache on the path where
Denote by the -th time that content is either requested or the timer expires. For simplicity, we assume that content is in cache
(i.e., server) when it is not in the cache network. We can then define a discrete time Markov chain (DTMC)with states, where is the index of the cache that content is in at time The event that the time between two requests for content exceeds occurs with probability ; consequently we obtain the transition probability matrix of and compute the stationary distribution. Details can be found in Appendix 8.1.1. The timer-average probability that content is in cache is
where is also the hit probability for content at cache
Again, under TTL, content spends a deterministic time in cache if it is not requested, independent of all other contents. We define a DTMC by observing the system at the time that content is requested. Similar to MCDP, if content is not in the cache network, it is in cache ; thus we still have states. If , then the next request for content comes within time with probability , and otherwise due to the MCD policy. We can obtain the transition probability matrix of and compute the stationary distribution, details are available in Appendix 8.1.2.
By the PASTA property (Meyn and Tweedie, 2012), it follows that the stationary probability that content is in cache is
4.2. From Timer to Hit Probability
We consider a TTL cache network where requests for different contents are independent of each other and each content is associated with a timer at each cache on the path. Denote and . From (3) and (4), the overall utility in the linear network is given as
where is a discount factor capturing the utility degradation along the request’s routing direction. Since each cache is finite in size, we have the following capacity constraint
Therefore, the optimal TTL policy for content placement in the linear network is the solution of the following optimization problem
where is given in (3) and (4) for MCDP and MCD, respectively. However, (4.2) is a non-convex optimization with a non-linear constraint. Our objective is to characterize the optimal timers for different contents across the network. To that end, it is helpful to express (4.2) in terms of hit probabilities. In the following, we discuss how to change the variables from timer to hit probability for MCDP and MCD, respectively.
4.3. Maximizing Aggregate Utility
Proposition 1 ().
Optimization problem defined in (13) under MCDP has a unique global optimum.
Proposition 2 ().
Optimization problem defined in (14) under MCD has a unique global optimum.
4.3.3. Online Algorithm
In Sections 4.3.1 and 4.3.2, we formulated convex utility maximization problems with a fixed cache size. However, system parameters (e.g. cache size and request processes) can change over time, so it is not feasible to solve the optimization offline and implement the optimal strategy. Thus, we need to design online algorithms to implement the optimal strategy and adapt to the changes in the presence of limited information. In the following, we develop such an algorithm for MCDP. A similar algorithm exists for MCD and is omitted due to space constraints.
Primal Algorithm: We aim to design an algorithm based on the optimization problem in (13), which is the primal formulation.
Denote and We first define the following objective function
where and are convex and non-decreasing penalty functions denoting the cost for violating constraints (13b) and (13c). Therefore, it is clear that is strictly concave. Hence, a natural way to obtain the maximal value of (4.3.3) is to use the standard gradient ascent algorithm to move the variable for and in the direction of the gradient, given as
where and denote partial derivatives w.r.t.
Since indicates the probability that content is in cache , is the expected number of contents currently in cache , denoted by .
Therefore, the primal algorithm for MCDP is given by
where is the step-size parameter, and is the iteration number incremented upon each request arrival.
Theorem 4.1 ().
The primal algorithm given in (17) converges to the optimal solution given a sufficiently small step-size parameter
Since is strictly concave, and are convex, (4.3.3) is strictly concave, hence there exists a unique maximizer. Denote it as Define the following function
then it is clear that for any feasible that satisfies the constraints in the original optimization problem, and if and only if
We prove that is a Lyapunov function, and then the above primal algorithm converges to the optimum. Details are available in Appendix 8.4. ∎
4.3.4. Model Validations and Insights
In this section, we validate our analytical results with simulations for MCDP. We consider a linear three-node cache network with cache capacities , The total number of unique contents considered in the system is We consider the Zipf popularity distribution with parameter . W.l.o.g., we consider a log utility function, and discount factor W.l.o.g., we assume that requests arrive according to a Poisson process with aggregate request rate
We first solve the optimization problem (13) using a Matlab routine fmincon. Then we implement our primal algorithm given in (17), where we take the following penalty functions (Srikant and Ying, 2013) and .
From Figure 3, we observe that our algorithm yields the exact optimal and empirical hit probabilities under MCDP. Figure 3 shows the probability density for the number of contents in the cache network222The constraint (13b) in problem (13) is on average cache occupancy. However it can be shown that if and grows in sub-linear manner, the probability of violating the target cache size becomes negligible (Dehghan et al., 2016).. As expected, the density is concentrated around their corresponding cache sizes.
We further characterize the impact of the discount factor on performance. We consider different values of . Figure 3 shows the result for We observe that as decreases, if a cache hit occurs in a lower index cache, the most popular contents are likely to be cached in higher index caches (i.e., cache ) and least popular contents are likely to be cached in lower index caches (cache 1). This provides significant insight on the design of hierarchical caches, since in a linear cache network, a content enters the network via the first cache, and only advances to a higher index cache upon a cache hit. Under a stationary request process (e.g., Poisson process), only popular contents will be promoted to higher index cache, which is consistent with what we observe in Figure 3. A similar phenomenon has been observed in (Gast and Houdt, 2016; Li et al., 2018b) through numerical studies, while we characterize this through utility optimization. Second, we see that as increases, the performance difference between different caches decreases, and they become identical when . This is because as increases, the performance degradation for cache hits on a lower index cache decreases and there is no difference between them when Due to space constraints, the results for are given in Appendix 8.2.
We also compare our proposed scheme to replication strategies with LRU, LFU, FIFO and Random (RR) eviction policies. In a cache network, upon a cache hit, the requested content usually get replicated back in the network, there are three mechanisms in the literature: leave-copy-everywhere (LCE), leave-copy-probabilistically (LCP) and leave-copy-down (LCD), with the differences in how to replicate the requested content in the reverse path. Due to space constraints, we refer interested readers to (Garetto et al., 2016) for detailed explanations of these mechanisms. Furthermore, based on (Garetto et al., 2016), LCD significantly outperforms LCE and LCP. Hence, we only consider LCD here.
Figure 4 compares the performance of different eviction policies with LCD replication strategies to our algorithm under MCDP for a three-node line cache network. We plot the relative performance w.r.t. the optimal aggregated utilities of all above policies, normalized to that under MCDP. We observe that MCDP significantly outperforms all other caching evictions with LCD replications. At last, we consider a larger line cache network at the expense of simulation. We again observe the huge gain of MCDP w.r.t. other caching eviction policies with LCD, hence are omitted here due to space constraints.
4.4. Minimizing Overall Costs
In Section 4.3, we focused on maximizing the sum of utilities across all contents over the cache network, which captures user satisfactions. However, communication costs for content transfers across the network are also critical in many network applications. This cost includes (i) the search cost for finding the requested content in the network; (ii) the fetch cost to serve the content to the user; and (iii) the transfer cost for cache inner management due to a cache hit or miss.
4.4.1. Search and Fetch Cost
A request is sent along a path until it hits a cache that stores the requested content. We define search cost (fetch cost) as the cost of finding (serving) the requested content in the cache network (to the user). Consider cost as a function () of the hit probabilities. Then the expected search cost across the network is given as
Fetch cost has a similar expression with replacing
4.4.2. Transfer Cost
Under TTL, upon a cache hit, the content either transfers to a higher index cache or stays in the current one, and upon a cache miss, the content either transfers to a lower index cache (MCDP) or is discarded from the network (MCD). We define transfer cost as the cost due to cache management upon a cache hit or miss. Consider the cost as a function of the hit probabilities.
MCD: Under MCD, since the content is discarded from the network once its timer expires, transfer costs are only incurred at each cache hit. To that end, the requested content either transfers to a higher index cache if it was in cache or stays in the same cache if it was in cache Then the expected transfer cost across the network for MCD is given as
MCDP: Note that under MCDP, there is a transfer upon a content request or a timer expiry besides two cases: (i) content is in cache and a timer expiry occurs, which occurs with probability ; and (ii) content is in cache and a cache hit (request) occurs, which occurs with probability . Then the transfer cost for content at steady sate is
where means there is a transfer cost for content at the -th request or timer expiry, is the average time content spends in cache
Therefore, the transfer cost for MCDP is
Remark 1 ().
The expected transfer cost (4.4.2) is a function of the timer values. Unlike the problem of maximizing sum of utilities, it is difficult to express as a function of hit probabilities.
Our goal is to determine optimal timer values at each cache in a linear cache network so that the total costs are minimized. To that end, we formulate the following optimization problem for MCDP
A similar optimization problem can be formulated for MCD and is omitted here due to space constraints.
Remark 2 ().
As discussed in Remark 1, we cannot express transfer cost of MCDP (4.4.2) in terms of hit probabilities, hence, we are not able to transform the optimization problem (23) for MCDP into a convex one through a change of variables as we did in Section 4.3. Solving the non-convex optimization (23) is a subject of future work. However, we note that transfer costs of MCD (20) are simply a function of hit probabilities and the corresponding optimization problem is convex so long as the cost functions are convex.
5. General Cache Networks
In Section 4, we considered linear cache networks and characterized the optimal TTL policy for content when coupled with MCDP and MCD. We will use these results and insights to extend this to general cache networks in this section.
5.1. Contents, Servers and Requests
Consider the general cache network described in Section 3. Denote by the set of all requests, and the set of requests for content Suppose a cache in node serves two requests and , then there are two cases: (i) non-common requested content, i.e., and (ii) common requested content, i.e.,
5.2. Non-common Requested Content
We first consider the case that each node serves requests for different contents from each request passing through it. Since there is no coupling between different requests we can directly generalize the results for linear cache networks in Section 4. Hence, given the utility maximization formulation in (13), we can directly formulate the optimization problem for MCDP as
Proposition 3 ().
Since the feasible sets are convex and the objective function is strictly concave and continuous, the optimization problem defined in (24) under MCDP has a unique global optimum.
We can similarly formulate a utility maximization optimization problem for MCD for a general cache network. This can be found in Appendix 8.5.1.
5.2.1. Model Validations and Insights
We consider a seven-node binary tree network, shown in Figure 7 with node set . There exist four paths and Each leaf node serves requests for distinct contents, and cache size is for Assume that the content follows a Zipf distribution with parameter and respectively. We consider the log utility function where is the request arrival rate for content on path and requests are described by a Poisson process with for The discount factor .
Figures 7 and 7 show results for path From Figure 7, we observe that our algorithm yields the exact optimal and empirical hit probabilities under MCDP. Figure 7 shows the probability density for the number of contents in the cache network. As expected, the density is concentrated around their corresponding cache sizes. Similar trends exist for paths , and , hence are omitted here.
5.3. Common Requested Contents
Now consider the case where different users share the same content, e.g., there are two requests and Suppose that cache is on both paths and , where and request the same content . If we cache separate copies on each path, results from the previous section apply. However, maintaining redundant copies in the same cache decreases efficiency. A simple way to deal with that is to only cache one copy of content at to serve both requests from and Though this reduces redundancy, it complicates the optimization problem.
In the following, we formulate a utility maximization problem for MCDP with TTL caches, where all users share the same requested contents
where (25b) ensures that only one copy of content is cached at node for all paths that pass through node . This is because the term is the overall hit probability of content at node over all paths. (25c) is the cache capacity constraint and (25d) is the constraint from MCDP TTL cache policy as discussed in Section 4.2.
Example 5.1 ().
Consider two requests and with paths and which intersect at node Denote the corresponding path perspective hit probability as and , respectively. Then the term inside the outer summation of (25b) is , i.e., the hit probability of content in node .
Remark 3 ().
Note that we assume independence between different requests in (25), e.g., in Example 5.1, if the insertion of content in node is caused by request when request comes, it is not counted as a cache hit from its perspective. Our framework still holds if we follow the logical TTL MCDP on linear cache networks. However, in that case, the utilities will be larger than the one we consider here.
Similarly, we can formulate a utility maximization optimization problem for MCD. This can be found in Appendix 8.5.2.
Proposition 4 ().
Since the feasible sets are non-convex, the optimization problem defined in (25) under MCDP is a non-convex optimization problem.
In the following, we develop an optimization framework that handles the non-convexity issue in this optimization problem and provides a distributed solution. To this end, we first introduce the Lagrangian function
where the Lagrangian multipliers (price vector and price matrix) areand
The dual function can be defined as
and the dual problem is given as
where the constraint is defined pointwise for and is a function that maximizes the Lagrangian function for given i.e.,
The dual function is always convex in regardless of the concavity of the optimization problem (25) (Boyd and Vandenberghe, 2004). Therefore, it is always possible to iteratively solve the dual problem using
where and are the step sizes, and and are the partial derivative of w.r.t. and respectively, satisfyting
Sufficient and necessary conditions for the uniqueness of are given in (Kyparisis, 1985). The convergence of the primal-dual algorithm consisting of (29) and (5.3) is guaranteed if the original optimization problem is convex. However, our problem is not convex. Nevertheless, in the following, we show that the duality gap is zero, hence (29) and (5.3) converge to the globally optimal solution. To begin with, we introduce the following results
Theorem 5.2 ().
Theorem 5.3 ().
Take the derivative of w.r.t. for and , we have
Setting (32) equal to zero, we obtain
Consider the log utility function , then . Hence, from (33), we have
Lemma 5.4 ().
From Lemma 5.4, we know that the feasible region for the Lagrangian multipliers satisfies
Theorem 5.5 ().
The hit probability given in (34) is continuous in and for all and in the feasible region
From Lemma 5.4, we know at least one of and is non-zero, for all and Hence there are three cases, (i) and ; (ii) and ; and (iii) and .
For case (i), we have
which is clearly continuous in for all and
Similarly for case (ii), we have
which is also clearly continuous in for all and .
For case (iii), from (34), it is obvious that is is continuous in and for all and
Therefore, we know that is is continuous in and for all and ∎