From Centralized to Decentralized Coded Caching

01/23/2018 ∙ by Yitao Chen, et al. ∙ The University of Texas at Austin 0

We consider the problem of designing decentralized schemes for coded caching. In this problem there are K users each caching M files out of a library of N total files. The question is to minimize R, the number of broadcast transmissions to satisfy all the user demands. Decentralized schemes allow the creation of each cache independently, allowing users to join or leave without dependencies. Previous work showed that to achieve a coding gain g, i.e. R ≤ K (1-M/N)/g transmissions, each file has to be divided into number of subpackets that is exponential in g. In this work we propose a simple translation scheme that converts any constant rate centralized scheme into a random decentralized placement scheme that guarantees a target coding gain of g. If the file size in the original constant rate centralized scheme is subexponential in K, then the file size for the resulting scheme is subexponential in g. When new users join, the rest of the system remains the same. However, we require an additional communication overhead of O( K) bits to determine the new user's cache state. We also show that the worst-case rate guarantee degrades only by a constant factor due to the dynamics of user arrival and departure.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Demand for wireless bandwidth has increased dramatically owing to rise in mobile video traffic [1, 2]. One of the most promising approaches for design of next generation networks (5G) is to densify deployment of small/micro/femto cell stations. One main issue is that the backhaul networks required for such a dense deployment is a severe bottleneck. To alleviate this, a vast number of recent works proposed caching highly popular content at users and or at femto cell stations near users [3, 4]. These caches could be populated during off-peak time periods by predictive analytics. This caching at the ‘wireless edge’ is being seen as a fundamental component of 5G networks [2, 5].

Upon a cache hit, users obtain the files using near-field communication from nearby femto stations or directly retrieve it from their local caches. Another non-trivial benefit is the possibility of coded transmissions leveraging cache content. One or more packets can be XORed by a macro base station and sent. Users can decode the required packets by using local cache content. Potentially, the benefit over and above that obtained only through cache hits can be enormous. A stylized abstract problem that explores this dimension is called the coded caching problem, introduced by Maddah-Ali and Niesen in their pioneering work [6].

In the coded caching problem, users are managed by a single server through a noiseless broadcast link. Each user demand arises from a library of files. Each user has a cache memory of files. Each file consists of subpackets. There are two phases - a placement and delivery phase. In the placement phase, every user cache is populated by packets of different files from the library. In the delivery phase, user demands are revealed (the choice could be adversarial). The broadcast agent sends a set of coded packets such that each user can decode its desired file using its cache content designed from the placement phase. The objective is to jointly design both phases such that the worst-case number of file transmissions (often called as the rate) is at most . The most surprising result is that , that is independent of the number of users can be achieved. This was shown to be information theoretically optimal upto constant factors. There has been a lot of work [7, 8, 9, 10, 11, 12] extending this order optimal result to various settings - demands arising from a popularity distribution, caching happening at various levels etc.

There is another line of work that focuses on minimizing file size - the number of subpackets

required - for a given worst-case rate. There have been two types of coded caching schemes - a) Centralized and b) Decentralized schemes. Centralized schemes have deterministic and coordinated placement and delivery phases. Specifically, if an additional user arrives, all caches have to be reconfigured. Decentralized schemes have a random placement phase and the objective is to optimize the worst-case rate with high probability over the randomization in the placement phase. For all known decentralized schemes, the random cache content of a new user is independent of the rest of the system. This removes the need for system wide changes when new users arrive and leave the system.

Initial centralized schemes required file sizes exponential in to obtain constant worst case rate (we always assume ratio is a constant in this work that does not scale with ). Subsequent works [13, 14, 15, 16, 17, 18] have explored centralized schemes that attain sub-exponential file size and constant worst-case rate. Even linear file size for near-constant rates is feasible in theory although this requires impractically large values of [19]. The original decentralized schemes required exponential file size in even for a constant coding gain of , i.e. w.h.p [20]. This was the price required for decentralization in the initial scheme. Subsequent works have reduced the file size to exponentially depend on only (the target coding gain) independent of the number of users [20, 21, 22, 23]. However, there are no decentralized schemes known (as far as the authors are aware) that have file size F scaling subexponentially in the target coding gain .

Our Contributions: In this work, inspired by results in [23] and leveraging ideas from balls and bins literature with power of two choices, we show the following:

  1. We provide a simple translation scheme that takes any centralized scheme with constant rate and subexponential file size scaling with the number of users and turns into a decentralized scheme with target coding gain with file size that is subexponential in . This generic translation scheme when applied to a known centralized scheme gives a feasible decentralized scheme whose file size is subexponential in .

  2. Our decentralized scheme does not require any change in the rest of the system when a new user joins. However, it requires an additional bits of communication between the server and a newly joining user. We also show that the worst case rate degrades by at most a constant factor when there are not too many adversarial arrivals and departures.

  3. Finally, we show that the centralized scheme with near constant rates and polynomial file size requirements can also be translated into decentralized schemes that provide a polynomial scaling in the target gain .

In summary, we show that good centralized schemes can be mapped to decentralized schemes with similar performance. We emphasize that our decentralized schemes are not fully independent (as opposed to all previous decentralized methods), but still allow users to easily join or leave the system.

Ii Problem Setting

Ii-a Coded Caching Problem

In this part, we formally define the coded caching problem. Consider users that request files from a library of size . We are mostly interested in the case when . The files are denoted by , consisting of data packets. Each file packet belongs to a finite alphabet . Let and denote the set of files and the set of users, respectively. Each user has a cache that can store packets from the library, . In the placement phase, user caches are populated without knowledge of the user demands. Let denote the caching function for user , which maps files into the cache content for user . Let denote the cache contents of all the users. In the delivery phase, where users reveal their individual demands , let denote the encoding function for the server, which maps the files , the cache contents , and the request into the multicast message sent by the server over the shared link. Let denote the decoding function at user , which maps the multicast message , the cache content and the request

, to estimate

of the requested file of user . Each user should be able to recover its requested file from the message received over the shared link and its cache content. Thus, we impose the successful content delivery condition

(1)

Given the cache size ratio , the cache contents and the requests of all the users, let be the length of the multicast message . Let

denote the worst-case (normalized) file transmissions over the shared link. The objective of the coded caching problem is to minimize the worst-case file transmissions . The minimization is with respect to the caching functions , the encoding function , and the decoding functions , subject to the successful content delivery condition in (1). A set of feasible placement and delivery strategies constitutes a coded caching scheme.

Ii-B Two types of schemes

As we state in the introduction, there are two types of coded caching schemes - a) Centralized Schemes and b) Decentralized Schemes. Now we further divide the decentralized schemes into two kinds in this work for the purpose of illustrating our results in contrast to existing ones.

  1. Decentralized Type A The random set of file packets placed in any user ’s cache is independent of the rest of the system requiring no coordination in the placement phase when users join the system and leave. Most of the current known (as far as the authors are aware) decentralized schemes are of this kind.

  2. Decentralized Type B When a new user joins, the random set of file packets placed in any users ’s cache is dependent of the rest of the system. However, it does not require any change in the rest of the system. We also seek to minimize the number of bits communicated when the new user’s cache state is determined.

Ii-C Objective

The prime focus in this work is to design Decentralized Schemes of type B such that for a given worst-case rate (with high probability 111We say an event occurs with high probability (w.h.p.) if for a constant . with respect to the random placement scheme) of at most , for constant , the file size , as a function of the coding gain , is kept small as possible. The number of bits communicated when users join and leave the system also needs to be minimized.

Iii Preliminary

Iii-a Centralized Schemes - Ruzsa-Szemerédi constructions

In this section, we introduce a class of centralized coded caching schemes called Ruzsa-Szemerédi schemes. We describe a specific family of bipartite graphs call Ruzsa Szemerédi bipartite graphs. Then, we review an existing connection between these bipartite graphs and centralized coded caching schemes.

Definition III.1.

Consider an undirected graph . An induced matching is a set of edges such that a) no two edges in share a common vertex and b) the subgraph induced by the vertices in the matching contains only the edges in and no other edge in the original graph .

Definition III.2.

A bipartite graph is an -Ruzsa-Szemeredi graph if the edge set can be partitioned into induced matchings and the average size of these induced matchings is .

Now, we describe a coded caching scheme-placement and delivery phases-from the construction of a Ruzsa-Szemeredi bipartite graph.

Theorem III.3.

[19] Consider a Ruzsa-Szemerédi bipartite graph on vertex sets and such that the minimum right-degree is . Then, for any , we have a centralized coded caching scheme with worst case rate with system parameters .

With a given Ruzsa-Szemerédi bipartite graph , an -packet coded caching scheme can be realized by Algorithm 1. In the placement phase, non-edge represents storage actions. An edge between and is denoted by . If , then file packet of all files is stored in user ’s cache. In the delivery phase, an XOR of all the packets involved in an induced matching is sent. We repeat this XORing process for every induced matching. This policy yields a feasible delivery scheme that satisfies any demand set .

Almost all (as far as the authors are aware) known centralized coded caching schemes belong to the class of Ruzsa-Szemerédi schemes. They have been introduced in the literature through several other equivalent formulations (like placement delivery array etc.)[13, 14, 15, 16, 17, 18].

procedure  Placement()
     Split each file into packets, i.e.,
     for   do
         
     end for
end procedure
procedure  Delivery()
     for  do
         Suppose represents a -sized induced matching.
         Server sends
     end for
end procedure
Algorithm 1 Ruzsa-Szemeredi based caching scheme

In the next section, we define a new ‘translation’ mechanism that generates a decentralized scheme of type B out of an existing class of Ruzsa-Szemerédi schemes of constant rate that preserves the efficiency of file size requirements.

Iv Our Decentralized Scheme

Iv-a Translation using Balls and Bins Argument

Our objective is to specify a decentralized scheme for users, system parameters and and a worst-case rate of at most w.h.p. First, given the target coding gain , the size of cache memory , the number of files and the number of users , we decide an appropriate number of virtual users . We assume that we can construct Ruzsa-Szemerédi centralized schemes for (this function will be specified later), for constant and worst-case rate which is dependent only on and and file size requirement . Consider the cache content of every virtual user according to this centralized scheme. Let us denote the virtual user’s cache content by . Please note that the cache contents of the centralized scheme is only virtual. We specify the random placement scheme for real users as follows.

Placement Scheme: For each real user in sequence, we pick two virtual cache contents and at random. We assign the cache content of the real user to that virtual cache content which has been least used so far amongst and . Let us denote by the number of real users which store .

Balls and Bins: We specify a one-one correspondence to a balls and bins system. The number of distinct virtual user cache contents are the bins in the system. There are of them. A real user corresponds to a ball. When a ball in placed in the bin, a real user (ball) is assigned the cache content of that virtual user (bin it is placed in). We can easily see that the random placement exactly corresponds to a power of two choices in a standard balls and bin process [24, 25].

Delivery Scheme: Note that, in a system with users with distinct cache contents , by using the Ruzsa-Szemerédi delivery scheme with files demanded by users substituted by a dummy file, it is possible to still guarantee a worst-case rate of in the delivery phase.

We now repeatedly perform the following until all : Find a set of at most real users with maximum number of distinct virtual cache contents. Subtract corresponding to those virtual cache contents by . Use the Ruzsa-Szemerédi delivery scheme for these real users and their real demands. Clearly, the total number of worst case transmissions is at most . We summarize the decentralized scheme in Algorithm 2.

Given , let (depends on constructions).
Get the cache contents corresponding to the Ruzsa-Szemerédi placement scheme (in Algorithm 1) with parameters .
procedure Sampling()
     Uniformly sample a cache content from for the cache of user twice with replacement, i.e., .
     if  then
         .
     else
         .
     end if
end procedure
procedure  Placement()
     Initialize .
     for  do
         Sampling()
     end for
end procedure
procedure Delivery()
     Let . Let .
     while  do
         Find a maximal subset such that cache contents of all real users assigned in are distinct.
         if  then
              Use Delivery subroutine of Algorithm 1 to satisfy demands of users in using a Ruzsa-Szemerédi Scheme. This can be done by substituting packets belonging to file demands of users outside set (since ) by packets from a dummy file known to all users.
         else
              Use Delivery subroutine of Algorithm 1 to satisfy demands of users in () using a Ruzsa-Szemerédi Scheme.
         end if
     end while
end procedure
Algorithm 2 Decentralized Scheme

Iv-B Analysis of the decentralized algorithm

Lemma IV.1.

The total number of worst-case file transmissions of the delivery scheme in Algorithm 2 is given by:

where is the worst-case rate of the Ruzsa-Szemerédi of Algorithm 2.

Proof.

The delivery scheme of Algorithm is called, with possibly dummy user demands, at most times. Each call produces at most file transmissions. The proof follows from this. ∎

As we stated before, the placement has a direct correspondence to a choice of two balls and bins process. There are balls and bins. In sequence, for every ball, two bins are chosen uniformly randomly with replacement and the ball is placed in the bin with least number of balls. From [26], we have the following Lemma:

Lemma IV.2 ([26]).

The maximum number of balls in any bin, achieved by the choice of two policy for balls and bins problem, with balls and bins, is less than with probability at least , where is a suitable constant.

From Lemma IV.2, we know that probability at least for some constant . Therefore, have the following theorem:

Theorem IV.3.

Suppose there exists a Ruzsa-Szemerédi centralized scheme, with constant (independent of ) worst-case rate , constant cache size ratio , and subpacketization level . Then the scheme in Algorithm 2 that uses this centralized scheme has target gain , i.e. the number of file transmissions in the worst case is rate w.h.p. The subpacketization level required is where . To obtain the scheme, we set in Algorithm 2.

Proof.

The proof can be found in the full version. ∎

From [17], we have the following lemma,

Lemma IV.4 ([17]).

There exists an -Ruzsa Szemerédi graph with for some and for some constant . Then, and and by Stirling’s formula we have:

where for is the binary entropy function. It is easy to see that under such choice of parameters, and are both constants independent of and grows sub-exponentially with .

Apply Theorem IV.3 with Lemma IV.4, we have the following Corollary,

Corollary IV.5.

For the Ruzsa Szemerédi centralized scheme in Lemma IV.4, we have a corresponding decentralized scheme with and subpacketization level , where means , terms are omitted.

For non-constant rate and constant cache size ratio , we have the following theorem:

Theorem IV.6.

Suppose there exists a Ruzsa-Szemerédi centralized scheme, with rate , and constant cache size ratio , and subpacketization level where is a polynomial in . Then the scheme in Algorithm 2 that uses this centralized scheme has target gain , i.e., the number of file transmissions in the worst case is rate w.h.p. The subpacketization level required is . To obtain the scheme, we set .

Proof.

The proof can be found in the full version. ∎

Iv-C Overhead analysis for the dynamic version of the decentralized scheme

We consider the dynamics of user arrival and departure. When a user leaves the system, the user’s cache content is deleted and if the user had cached (Recall from Section IV-A, that this is the cache content of the -th virtual user from Section IV-A), is decreased by . When a new user joins the system, then the subroutine from Algorithm 2 is executed to determine the cache content of user (i.e. ). The comparison between and in the procedure involves an additional bits of communication overhead between user and the central server. Note that, the dynamics of user arrivals and departure does not change the cache contents of users already in the system.

The worst-case rate during delivery is directly proportional to according to Lemma IV.1. We show that, despite the dynamics, remains the same upto constant factors w.h.p provided the number of adversarial departures and arrivals is bounded. We recall that the real users represent the balls and the virtual users or their distinct cache contents represent the bins. is the size of the maximum bin.

For the analysis, let us first define the balls and bins process with adversarial deletions/additions. Consider the polynomial time process where in the first steps, a new ball is inserted into the system (the system is initiated with users). At each subsequent time step, either a ball is removed or a new ball is inserted in the system, provided that the number of balls present in the system never exceeds . Suppose that an adversary specifies the full sequence of insertions and deletions of balls in advance, without knowledge of the random choice of the new balls that will be inserted in the system (i.e., suppose we have an oblivious adversary).

Our proof uses Theorem 1 from [25] and Theorem 3.7 from [24] to obtain performance guarantees as dynamic users join and leave the system.

Theorem IV.7.

For any fixed constant and such that , if the balls and bins process with adversarial deletions runs for at most times steps, then the maximum load of a bin during the process is at most , with probability at least .

Proof.

Please refer to the full version for a self-contained proof that extends results from previous work. ∎

V Conclusion

In this work we show a simple translation scheme that converts any constant rate centralized scheme into a random decentralized placement scheme that guarantees a target coding gain of . We show the worst-case rate due to the dynamics of user arrival and departure degrades only by a constant factor.

References