I Introduction
Caching has emerged as a promising technology for future wireless networks [1]. By storing data in distributed network storage resources near users, cache-aided systems alleviate the increasingly intensive traffic in wireless networks to meet low latency requirements. Conventional uncoded caching [2] can improve the hit rate but is not efficient when there are multiple caches [3]. Coded caching is recently introduced in [4], where a Coded Caching Scheme (CCS) is proposed that combines a carefully designed cache placement of uncoded contents and a coded multicasting delivery strategy to explore the caching gain. Since then, coded caching has drawn considerable attention, with extensions to the decentralized CCS [5], transmitter caching in mobile edge networks [6], user caching in device-to-device networks[7], and for both transmitter and receiver caching in wireless interference networks [8].
The above works all assume uniform file popularity, for which symmetric cache placement strategy (i.e., the same placement for all files) is optimal [9, 10]. With nonuniform file popularities, the cache placement may be different among files, complicating both design and analysis. There is a fundamental question on whether to distinguish files of different popularities and to what extent. On the one hand, different cache placements for files with distinct popularities may help improve caching efficiency to reduce the traffic load. On the other hand, ignoring file popularity differences and simply using the symmetric cache placement may be a good trade-off in complexity vs. performance.
Existing works have studied the fundamental limits of coded caching under heterogeneous file demands [3, 11, 12, 10, 9]. File grouping was first proposed in [3], in which files are divided into groups with chunks of cache allocated to different groups, and symmetric decentralized CCS is used for all files in each group. File grouping has since been considered an effective and tractable method for files with nonuniform popularities. Existing works have proposed different methods to partition files into (typically two) groups [11, 12], with different achievable (upper) bounds provided. In [11], a simple RLFU-GCC scheme using two file groups was proposed. The first group contains popular files, and the entire cache is allocated to them, of which the decentralized CCS is used for cache placement. As an extension, a mixed file grouping scheme was proposed [12] by adding a choice of three file groups (using uncoded delivery) to the two-file-group scheme. Based on the existing studies, file grouping has become a promising method to handle nonuniform file popularities for cache placement. However, most existing schemes to form file groups are heuristic, and the optimal cache placement and its relation to file grouping remain unknown. In [9] and [10], the cache placement problem is formulated as an optimization problem. A property of the optimal cache placement that more cache is allocated to a file with higher popularity is shown [9]. However, both works focused on devising numerical methods to solve the problem. There are no insights into the optimal cache placement.
In this paper, we aim to characterize the optimal cache placement for nonuniform file popularities. We obtain the structure of the optimal cache placement for the CCS under any file popularity distribution and cache size, connecting it to the file grouping strategies. We use the optimization framework to formulate the cache placement problem to minimize the average rate (load) in the coded delivery phase. Exploring several properties of the optimization problem, we reformulate the problem into a specific linear programming (LP) problem. By analyzing the structure of the reformulated problem, we obtain the structural property in file grouping for the optimal cache placement. We show that there are at most three file groups formed by the optimal cache placement. Unlike the existing works which adopt the decentralized cache placement for each file group, we further derive the complete structure of the optimal cache placement and, in turn, obtain the closed-form solution in each possible file group case. Based on these, the optimal file grouping and cache placement can be obtained efficiently. Simulation verifies the optimal cache placement structure and solution obtained for different file popularity distributions.
Ii System Model
Consider a cache-aided transmission system with a server connecting to users over a shared error-free link, where each user is equipped with a local cache, as shown in Fig. 1. The server has a database of files, , each of size bits. Let be the popularity distribution of these files with , where
represents the probability of file
being requested. Without loss the generality, we index files according to their popularities in decreasing order: . Let and . Each user has a local cache of capacity bits, and we refer it as cache size (normalized by the file size), where is any value within .The coded caching operates in two phases: the cache placement phase and the content delivery phase. In the cache placement phase, a portion of uncoded file contents from files are stored in each user local cache, according to a cache placement scheme. Each user is assumed to request one file from the server independently. Let be the index of the requested file by user , and let
denote the demand vector of all
users. In the content delivery phase, based on the demand vector and the cached content at users, the server generates coded messages and transmits them to the users. Upon receiving the coded messages, each user obtains the requested file from the received coded messages and its cached content. Note that for a valid coded caching scheme, each user should be able to reconstruct its requested file , , for any demand vector , over an error-free link.Iii The Coded Caching Problem Setup
Iii-a Cache Placement
A key design issue in a coded caching scheme is the cache placement. For uniform file popularities, the optimal cache placement is a symmetric placement, i.e., the same for all files. However, for files of distinct popularities, the cache placement becomes file dependent. A common approach in the existing works for the CCS is to propose a cache placement scheme and evaluate its performance. Different from these works, in this paper, we formulate a cache placement optimization problem for the CCS, aiming to minimize the average rate over the shared link.
In the CCS, each file is partitioned into non-overlapping subfiles. For users, there are user subsets in . Among these subsets, there are different user subsets with the same size , for , where corresponds to the empty set . Define a cache subgroup by which contains all user subsets of size , for . Partition each file into non-overlapping subfiles, one for each unique user subset , denoted by . Each user in user subset stores subfile in its local cache. For a given caching scheme, each file should be able to be reconstructed by combining all its subfiles, and we have the file partitioning constraint , for . In [9], it is shown that for given file , the size of its subfile only depends on the size of user subset . In other words, is the same for any . Based on this property, for each file , we partition the subfiles into subgroups, each containing all subfiles of the same size. We denote each subgroup by , for . There are subfiles in (intended for user subsets in cache subgroup ).
Let denote the size of subfiles in in a local cache, as a fraction of the file size bits, i.e., , for all , , . Note that represents the fraction of file that is not stored in any user cache and only remains at the server. Using , we simplify the file partitioning constraint to
(1) |
By file partitioning, each subfile is intended for a unique user subset. In the cache placement phase, user caches all the subfiles in that are for the user subsets containing it, i.e., , for , . Note that in each , , there are total different user subsets containing the same user . Thus, there are subfiles in each file that a user can possibly cache. With subfile size for each subgroup , this means that a user caches a total of bits from file . Given cache size , we have the local cache constraint
(2) |
Iii-B Content Delivery via Coded Multicasting
The delivery scheme in the CCS is by multicasting a unique coded message to each user subset , formed by bitwise XOR operation of subfiles as . Each user in can retrieve the subfile of its requested file from .
The original CCS with the proposed cache placement scheme in [4] is shown to be a valid caching scheme for cache size . We can straightforwardly extend the argument in [5] and conclude that the above described coded caching scheme is also valid for any value of as long as all the coded messages formed by all the user subsets, , are delivered.
With nonuniform file popularities, the main challenge in designing the cache placement is that it may be file dependent, i.e., files may be partitioned differently, and the subfile size is a function of
. Note that when the sizes of subfiles are not equal, zero-padding is needed to form the coded multicasting message
. As a result, the size of coded message is determined by the largest subfile among subfiles in the delivery group (user subset) : .Iv Cache Placement Optimization Formulation
From Section III-B, the average rate in the delivery phase is given by
(3) |
For nonuniform file popularities, it is shown in [9, Theorem 2] that the optimal cache placement under the CCS has a popularity-first property. The result shows that, for files with their popularities , under the optimal cache placement, more cache is allocated to the file with higher popularity, and the following condition holds for the cached subfile contents
(4) |
Let denote the cache placement vector for file , . Our goal is to obtain the optimal {} to minimize the average rate . Without loss of the optimality, we explicitly impose constraint (4) and formulate the cache placement optimization problem as follows
s.t. | ||||
(5) |
At the optimality of P0, it is easy to show that the local cache constraint (2) is attained with equality, i.e., the cache memory is always fully utilized. Therefore, constraint (2) can be replaced by the following equality constraint
(6) |
With the popularity-first constraint (4), constraint (5) is equivalent to having the following two constraints
(7) |
This is because that if , , by (4), we have , , . Recall that represents the fraction of subfiles of that are not stored at any user’s cache. From (1), we have
(8) |
Combining (4) and (8), we have . Thus, if in (7) holds, then , . As a result, we have , . Therefore, given constraint (4), constraint (5) can be equivalently replaced by constraint (7).
Finally, let denote the -th smallest file index in demand vector , . With the popularity-first property of the optimal cache placement, it is shown in [10] that the average rate in (3) can be expressed by
(9) |
where is independent of (its expression can be found in [10]). The expression in (9) indicates that is a weighted sum of for each cache subgroup .
V Structure of The Optimal Cache Placement
We now derive the structure of the optimal cache placement solution for P1. We first define the term file group: A file group contains all files that have the same cache placement vector. In other words, any two files and belong to the same file group, if their placement vectors are identical, . The idea of file grouping is first considered in[3] and is shown to be an efficient tool for the cache placement design under nonuniform file popularities. In general, there could be potentially as many as file groups, which makes the design of the optimal cache placement a major challenge. Our main result in Theorem 1 below describes the structural property of the optimal cache placement in terms of file groups for the CCS.
Theorem 1
For any file popularity distribution , the optimal cache placement for P1 partitions the files into at most three file groups.
Proof:
We only briefly outline the proof here. Since P1 is an LP problem, we use the Karush-Kuhn-Tucker (KKT) conditions for P1 and explore the properties in these conditions to obtain the file group structure.^{2}^{2}2The proofs of Theorem 1 and Propositions 1 - 4 are omitted due to the space limitation, please refer [13] for detailed proofs.
Theorem 1 implies that the optimal cache placement vectors for all files can at most have three unique values among each other, leading to only three possibilities: one, two, or three file groups. This conclusion drastically reduces the complexity of the cache placement problem, and in turn, it allows us to derive the optimal cache placement solutions analytically.
Remark 1
Given there are as many as file groups could be considered, the result of at most three file groups, regardless of , is somewhat surprising. Among existing works, several different file grouping strategies have been proposed [3] [11] [12]. However, they are heuristic, suboptimal, or designed for a specific file popularity distribution. Furthermore, for a file grouping strategy, the specific cache placement for each file is needed. Existing works heuristically use the symmetric decentralized cache placement for each group. In the following, by Theorem 1, we will discuss each of the three file grouping cases to obtain the corresponding optimal placement.
Denote as the sub-placement vector of . It refers to the subfiles stored in the local cache. The subfile solely kept at the server is specified by . We use to denote that there is at least one non-zero element in ; otherwise, . Also, for , denote that there is at least one element in greater than that in , and all the other elements are equal. For any two files and , it is easy to verify that , and by (4) and (8), we have the following equivalence
(11) |
In the following, we describe the structural properties of the optimal cache placement solution in each file group case.
V-a One File Group
For a single file group, the cache placement vectors are the same for all files. Let . The expressions in P1 can be simplified in this case. Denote with , . Then, P1 reduces to the following equivalent problem
Note that P2 is the same as the cache placement optimization problem for files with uniform file popularity. The optimal solution has been shown in [10] in closed-form. Specifically, the optimal solution of P2 has at most two non-zero elements which are given by
(12) |
Note that when is an integer, we have , and only the -th element is non-zero in .
V-B Two File Groups
With two file groups, the placement vectors ’s have the following structure: , for some , . By (11), this is equivalent to the following: and . We use and to represent the placement vectors for the first file group and the second file group, respectively. Below, we first explore the possible cache placement strategy for the second file group.
Proposition 1
For the optimal cache placement resulting in two file groups, the optimal sub-placement vector of the second file group has at most one non-zero element.
By Proposition 1, has either or non-zero element. Note that, although two-group strategies with have been considered in the existing works, the case of , i.e., allocating some cache to the second file group, has never been considered in the literature.
We now discuss the optimal cache placement in each of the two cases for below:
V-B1 If
No cache is allocated to the second file group. The entire cache is assigned to the first file group. For the second file group, this means and . We only have the cache placement problem w.r.t. for the first file group, which reduces to that of the previous one-file-group case. To obtain the optimal placement for the first file group, we can simply treat the first group as a new database with the number of files being instead of . The cache placement problem is then equivalent to P2. Let , then we have
The solution for P3 is the same as that for P2, given in (12), except that is replaced by .
Remark 2
For two file groups, the above result shows the first possible structure of the optimal placement: Cache memory is all allocated to the first group, and the cache placement for all files in this group is the same, regardless of the different file popularities among files. Note that two-file-group strategies with have been proposed via heuristics in [11] and [12], where the location of is designed in heuristic ways. In [11], for the files with Zipf distribution, the selection of results in the performance of the (decentralized) cache placement scheme considered there being a constant away from that of the optimal placement. In [12], the choice of results in a suboptimal cache placement strategy for any file popularities.
V-B2 If
In this case, by Proposition 1, contains only one non-zero element. Assume the index of this non-zero element is , then and , , . We have the following two propositions. Proposition 2 characterizes the placement for the first group, and Proposition 3 specifies the differences of the placements and between the two file groups.
Proposition 2
For the optimal cache placement resulting in two file groups and , we have , i.e., the files in the first file group are cached entirely among users, and no subfile solely remains in the server.
Proposition 3
For the optimal cache placement resulting in two file groups and , only one element is different between and .
As stated earlier, in this case, contains only a single non-zero element. Assume that the index of this non-zero element is . Then, we have and , . By Proposition 3 and constraint (4), the index of the different element between and can be either with , or some , , with . Fig. 2 illustrates an example of the case where . Note that and are not necessarily adjacent to each other. Following the above discussion, we have the following two possible cases:
Case 2.i) : In this case, and are different at the -th element which is non-zero for . It follows that , for . By Proposition 2, we conclude that is the only non-zero element in . Following the equality constraints in (10), we have
(13) |
where and are defined below (9).
Case 2.ii) , for : The non-equal element is not the non-zero element in . Since , it follows that has two non-zero elements and (as shown in Fig. 2). Along with Proposition 2, we have , (shown in Fig. 2 as uncolored subgroups in a file). With two non-zero elements in and , we can solve the two equality constraints in (10), w.r.t. these elements. The expressions of these non-zero elements are given as follows
(14) |
which are indicated by the colored subgroups in Fig. 2.
V-C Three File Groups
In this case, the relation among ’s is given by , for . We use , , and to represent the cache placement vectors for the 1st, 2nd, and 3rd file group, respectively. We first focus on the cache allocation in the 3rd file group.
Proposition 4
For the optimal cache placement having three file groups, the optimal placement vector for the third file group is , and .
Proposition 4 shows that when there are three file groups, all the cache will be allocated to the first two groups, and the files in the 3rd file group remain in the server. Following this, we only need to optimize the cache placement in the first two groups, which is reduced to that in the previous two-file-group case discussed in Section V-B. The only exception is that, for the 2nd group, we only have the second case in Section V-B2. This is because, if , this would contradict the assumption of three file groups. Consequently, the optimal cache placement vectors and are described in Section V-B2 (), and we omit the details to avoid repetition. An example of the placement solution structure for three file groups in Fig 3, where only one element in and is different between the 1st and 2nd file groups, and no cache is allocated to the 3rd file group.
Remark 3
No existing works have considered the cache placement strategy for the CCS using three file groups. The only caching method that considers three file groups in literature is [12], in which a mixed caching strategy of two or three file groups is proposed. For the case of three file groups, the second file group consists of a single file, and conventional uncoded delivery is used, instead of coded delivery in the CCS, and this case is only used for very rare occasions. In simulation, we will show that, in some cases, the three-file-group cache placement strategy outperforms the two-file-group strategy for different file popularity distributions, including Zipf distribution.
Remark 4 (The optimal cache placement)
Using the structure of the optimal cache placement derived from Sections V-A to V-C, we are able to obtain the optimal cache placement solution for P1 using the following simple algorithm. Recall that P1 is separated into three subproblems for one to three file groups, respectively, and the closed-form solution for each subproblem is derived for given . In general, the values of to determine the optimal in each subproblem depend on , and except for one file group, they cannot be obtained analytically. However, since closed-form solutions are available, our algorithm can determine the values of efficiently in polynomial time through an exhaustive search. Then, we choose the one that gives the minimum among the solutions for the three subproblems, as the optimal for P1.
Vi Simulation
We verify the optimal cache placement structure via simulation. The file popularities are generated by Zipf distribution with , where we set the Zipf parameter . We solve P1 numerically to obtain the optimal cache placement solution. For , , Tables II and II show the optimal for cache size and , respectively. From Table II we see that, the files are partitioned into two file groups, and the optimal is the same as in the case discussed in Section V-B1: the cache is entirely allocated to the first group of seven most popular files with the same placement where has only one non-zero element (corresponding to the file being evenly partitioned into subfiles). Files in the second group are stored in the server. Table II shows a different optimal cache placement strategy, where the files are divided into three file groups, resembling the example shown in Fig. 3, as discussed in Section V-C.
Figs. 5 and 5 show the average rate vs. under the optimal cache placement scheme for Zipf distribution and a step function file popularity, respectively. We also consider centralized [4] and decentralized [5] uniform caching (one-file-group) strategies, the RLFU-GCC scheme (two-file group) [11], and the mixed grouping strategy by [12]. Fig. 5 verifies that the optimal strategy gives the lowest among all the strategies, regardless of cache size . In Fig. 5, we consider a case studied in [12] with , , and a (non-Zipf) step distribution for file popularities: , , , and , . For , the optimal cache placement is strictly better than all alternative schemes. In particular, for , the optimal solution results in three file groups for the CCS that no existing work has considered.
Vii Conclusion
In this work, we provided the structure of the optimal cache placement for the CCS under arbitrary nonuniform file popularities and arbitrary cache size. We showed that the optimal cache placement strategy results in at most three different file groups. Through analysis, we presented the closed-form solution of the optimal cache placement for each file group case, which leads to a simple efficient algorithm to compute the optimal cache placement for the CCS for any file popularity distribution.
Cache placement vectors of the files | |||||||||
0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.0000 | 1.0000 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0.0286 | 0.0286 | 0.0286 | 0.0286 | 0.0286 | 0.0286 | 0.0286 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Cache placement vectors of the files | |||||||||
0 | 0 | 0 | 0 | 0.2500 | 0.2500 | 1.0000 | 1.0000 | 1.0000 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0.0214 | 0.0214 | 0.0214 | 0.0214 | 0.0214 | 0.0214 | 0 | 0 | 0 | |
0.0071 | 0.0071 | 0.0071 | 0.0071 | 0 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
References
- [1] E. Bastug, M. Bennis, and M. Debbah, “Living on the edge: The role of proactive caching in 5g wireless networks,” IEEE Commun. Mag., vol. 52, no. 8, pp. 82–89, 2014.
- [2] S. Borst, V. Gupta, and A. Walid, “Distributed caching algorithms for content distribution networks,” in Proc. IEEE Conf. on Computer Communications (INFOCOM), 2010, pp. 1–9.
- [3] U. Niesen and M. A. Maddah-Ali, “Coded caching with nonuniform demands,” IEEE Trans. Inf. Theory, vol. 63, no. 2, pp. 1146–1158, 2017.
- [4] M. A. Maddah-Ali and U. Niesen, “Fundamental limits of caching,” IEEE Trans. Inf. Theory, vol. 60, no. 5, pp. 2856–2867, 2014.
- [5] ——, “Decentralized coded caching attains order-optimal memory-rate tradeoff,” IEEE/ACM Trans. Netw., vol. 23, no. 4, pp. 1029–1040, 2015.
- [6] A. Sengupta, R. Tandon, and O. Simeone, “Fog-aided wireless networks for content delivery: Fundamental latency tradeoffs,” IEEE Trans. Inf. Theory, vol. 63, no. 10, pp. 6650–6678, 2017.
- [7] M. Ji, G. Caire, and A. F. Molisch, “Fundamental limits of caching in wireless D2D networks,” IEEE Trans. Inf. Theory, vol. 62, no. 2, pp. 849–869, 2016.
- [8] F. Xu, M. Tao, and K. Liu, “Fundamental tradeoff between storage and latency in cache-aided wireless interference networks,” IEEE Trans. Inf. Theory, vol. 63, no. 11, pp. 7464–7491, 2017.
- [9] S. Jin, Y. Cui, H. Liu, and G. Caire, “Structural properties of uncoded placement optimization for coded delivery,” arXiv preprint arXiv:1707.07146, 2017.
- [10] A. M. Daniel and W. Yu, “Optimization of heterogeneous coded caching,” arXiv preprint arXiv:1708.04322, 2017.
- [11] M. Ji, A. M. Tulino, J. Llorca, and G. Caire, “Order-optimal rate of caching and coded multicasting with random demands,” IEEE Trans. Inf. Theory, vol. 63, no. 6, pp. 3923–3949, 2017.
- [12] J. Zhang, X. Lin, and X. Wang, “Coded caching under arbitrary popularity distributions,” IEEE Trans. Inf. Theory, vol. 64, no. 1, pp. 349–366, 2018.
- [13] Y. Deng and M. Dong, “Structure of optimal cache placement for coded caching with heterogeneous demands,” Tech. Rep., Oct. 2019, https://faculty.uoit.ca/dong/techreport19/TechReport102019.pdf.
Comments
There are no comments yet.