I Introduction
Wireless data traffic has been growing tremendously in the last decade and is expected to do so in the future, and video in particular is expected to comprise of more than 50% of the total traffic [1]. Caching has been in vogue to lay off the traffic during peak times in the network by storing part of the information demanded by users (clients) in local storage known as caches. In this way, during the peak hours, the server can transmit only the noncached information thus reducing the traffic. For instance, consider the setting taken in [2], consisting of a singleserver errorfree broadcast channel with clients(users), files at the server (each file comprised of subfiles, where is known as the subpacketization parameter), with each client capable of caching subfiles. By populating the cache during the caching phase when the demand requests are not present, traditional caching can achieve a rate equal to during the delivery phase when the network is required to satisfy the demands (the rate is defined as fraction of the number of transmitted packets (each of size equal to a subfile) to .
The paradigm of Coded Caching, introduced in [2], was based on the idea of transmitting coded packets during the peak times to further reduce the network usage. In contrast to the uncoded caching scenario which required a rate , the coded caching scheme shown in [2] achieves a rate , which is constant as for constant . This is achieved by designing both the caching phase and the delivery phase carefully. The tremendous rate advantage shown in [2] were shown in other settings also, for instance [3, 4]. The scheme in [2] is also shown to be order optimal, i.e., within a constant multiple of the optimal rate for the same set of parameters. The scheme of [2] is achieved by dividing each file into subfiles, and caching them appropriately in the client caches. It was noticed in [6] that the subpacketization required is exponential for constant as grows large (as for constant , where is the binary entropy). Since then a number of papers [6, 7, 8, 9, 10, 11, 13, 14, 15] have presented new schemes for coded caching which uses smaller subpacketization at the cost of having increased rate or cache requirement compared to [2]. Table I lists the relevant known results in this context (the references and the techniques used are shown in the first column). The second column lists the uncached fraction of any file (a fraction of each file is cached by a user). Many of the schemes presented in Table I require exponential subpacketization (in , for large ), as shown in the fourth column of Table I to achieve a constant rate (shown in the last column). The subpacketization of particular schemes of [9, 14] have been shown to be subexponential, while some schemes of [13] have subpacketization that is linear or polynomial (in ) at the cost of either requiring larger cache or larger rate compared to [2]. Interestingly, a linear subpacketization scheme () was shown in [10] using a graph theoretic construction with near constant rate and small memory requirement. However the construction in [10] holds for very large values of only. In [9], it was shown that subpacketization linear in is impossible if we require constant rate. In [15], the authors consider caching schemes without file splitting, i.e., the scenario when .
Scheme  Number of Users  Rate  
large and constant  
AliNiesen [2]  for  any  
such that  (for large )  
AliNiesen Scheme with  Same as [2]  ()  , where  
Grouping [6]  (for large )  such that .  
Yan et al [8] (PDAs)  or  Any  (, for large )  
Shanguan et al [9]  For large  , such that ,  
(PDAs based on hypergraphs)  or  Specific choices  where is such that  
Yan et al [7] (for integers  
and based on  
strong edge coloring of bipartite graph)  
Tang et al [11] based  For large , , exponent  
on resolvable designs  or  similar to [7] and [9]  
(for some constant )  (some schemes) but less than  (for some constant )  
some schemes of [9]  
Scheme from [10] based on  
induced matchings of a  (where ,  (necessarily  
Rusza Szemeredi graph  for as in last column)  large)  (some small )  
PDA scheme from  For integers  
Cheng et al [13]  
PDA scheme from  For integers  
Cheng et al [13]  
Two PDA Schemes from [14]  For integers  and  
and  and 
The contributions and organization of this work are as follows. After reviewing the work of [7] in Section II, we prove a lower bound on the peak delivery rates of coded caching schemes using the properties of the associated bipartite graph (Section III). We then map the problem of finding a valid transmission scheme corresponding to a bipartite caching scheme to a clique cover problem of a graph derived from the line graph of the bipartite graph (Section IV). In Section IV, we also show that the existence of a class of such line graphs of bipartite graphs implies the existence of coded caching schemes for which there is a nice characterization of the rate, uncached fraction, and subpacketization. We then give a coded caching scheme using a construction of such caching line graphs based on projective geometries over finite fields (Section V). Analyzing this scheme in Section VI, we get to add results to Table I, as shown in Table II. The first row of Table II lists the actual parameters of our scheme. The other two rows indicate asymptotic results as (with constant field size ) respectively. We note that the last row shows a constant rate with subexponential packetization achieved by the scheme in this paper. We conclude the paper with a short discussion in Section VII.
For nonnegative integers with  
Limiting behaviors as grows, for constants  
(constant)  
Limiting behaviors as grows, for constants  
(constant) 
Notations and Terminology: For a positive integer , we denote by the set . We recall only minimal facts regarding graph theory. For other standard definitions, the reader is referred to [16]. A graph consists of a set of vertices and a set of edges. For a subset of vertices of graph , we denote as the set of adjacent vertices of . A bipartite graph is one whose edges can be visualized as being between two subsets of a partition of the vertex set (called left and right vertices of ). A subset is called a clique of if all vertices in are adjacent to each other (we assume vertices to be cliques of size ). A cliquecover of is a collection of cliques such that
Ii Bipartite Graph based Coded Caching and Delivery based on [7]
Let be the set of users(clients) () in a system consisting of one server having files connected to the clients via a errorfree broadcast channel. Let be the subpacketization level, i.e. each file is composed of
subfiles, each taking values according to a uniform distribution from some finite abelian group
. The subfiles of file are denoted as for some set of size . Let denote the number of subfiles that can be stored in the cache of any user. A coded caching scheme consists of two subschemes (as in [2]), a caching scheme according to which subfiles of the files are placed in the user caches during periods when the traffic is low, and a transmission scheme that consists of broadcast transmissions from the server satisfying the demands of the clients appearing during the demand phase. We assume symmetric caching throughout the paper, i.e., the caches at the users are populated in such a way that if user caches the subfile of any file, then it caches the subfile of each file. All the schemes presented in [6, 7, 8, 9, 10, 11, 13, 14] employ symmetric caching. We also assume throughout this work that is an integer which is the number of subfiles of any particular file stored in a user’s cache. In the delivery scheme, the transmissions (of size equal to subfiles) must be done so that the demands of the clients are all satisfied. As in [2], the rate of the coded caching scheme is defined asWe can visualize the symmetric caching scheme (with fully populated caches) using a bipartite graph, following [7]. Consider a bipartite graph with being the left(user) vertices and the right(subfile) vertices being . We then define the edges of the bipartite graph to denote the uncached subfiles of the files, i.e, for , an edge exists if and only if user does not contain in its cache the subfile of each file. Clearly, this bipartite graph is leftregular, with being the degree of any user vertex. Indeed any leftregular bipartite graph defines a caching scheme, which we formalize below.
Definition 1 (Bipartite Caching Scheme).
Given a bipartite leftregular graph with left vertices and right vertices denoted by (or in short, ), the symmetric caching scheme defined on users with subpacketization with the edges of indicating the uncached subfiles at the users, is called the bipartite caching scheme associated with the bipartite graph .
Remark 1.
We observe that the bipartite caching scheme associated with the graph has the uncached fraction .
Fig. 1 shows a graph describing a bipartite caching scheme. Note that during the caching phase the user demands are not available. We now look at the transmission phase during which the user demands one file (for some ), as given in [7]. An induced matching of a graph is a matching such that the induced subgraph of the vertices of is itself. For an induced matching of consisting of edges , consider the associated transmission
(1) 
As is an induced matching, is a subfile unavailable but demanded at user . By the same reason, each user has all the subfiles in (1) in its cache except for , hence user can decode A strongedgecoloring of a graph is an assignment of a label (called colors) from a finite set of size to each of its edges such that the set of all edges of any color (called a color class) form an induced matching. Let be the set of all induced matchings (color classes) arising from a strong edge coloring of . It is not difficult to see that the transmissions (constructed as in (1)) corresponding to satisfies the demands of all the users. The rate of this transmission scheme is then .
Iii Lower bound on rate of delivery scheme for symmetric caching
In this section, we show a bound on the rate of the transmission scheme associated with a bipartite caching scheme associated with . As takes values from with uniform distribution, taking the base of logarithm as , we have the Shannon entropy of as . Thus . For a given bipartite caching scheme, a rate is said to be achievable if there exists some transmission scheme with rate that satisfies all client demands. We now prove a lower bound on the infimum of all achievable rates for a given bipartite caching scheme.
Theorem 1.
Let be any leftvertex (user vertex) of and let be the subgraph of induced by the vertices Let Let be a subset of vertices of taken in some order such that . For , let be the set of right vertices (subfiles) in which are adjacent to Let be the infimum of all achievable rates for the bipartite caching scheme defined by . Then .
In particular, we must have
(2) 
Proof:
We are given a valid coded caching scheme with the caching scheme associated with . Let denote the set of all transmissions in a valid transmission scheme . As , we can assume a demand scenario in which the users all demand different files. Let be the demand of and be the cache content of user . Let denote the set of subfiles of in the subgraph missing from users . This corresponds to subfile vertices adjacent to users in . In our notation, Since s are distinct, thus each subfile in is distinct. We then follow an idea similar to [5]. We construct a virtual receiver which contains an empty cache at first. In the step, the cache of this virtual user is populated with all the cache contents of user except those pertaining to the files demanded by Let Then is the final cache content of this virtual user. By the given transmission scheme, the receivers can decode their demands. Hence, we must have
(3) 
as the virtual user must be successively able to decode all the demands of the users. Since denotes the number of transmissions, we must have the following inequalities.
where denotes the mutual information, and the last inequality is obtained by noting the missing subfiles in .
We finally prove (2). By a pigeonholing argument, it is easy to see that there is a subfile vertex having at least adjacent user vertices. Let be any user vertex adjacent to such a subfile vertex with adjacent vertices. Consider the subgraph induced by vertices . Note that Let . If , consider some subset of containing . Then for any ordering of the user vertices starting from , we have . If , then by a similar ordering starting with , we have . Invoking the result in the first part completes the proof. ∎
Iv Line Graphs of Bipartite Graphs and Caching
In this section, we shall map the coded caching problem to the line graph of the bipartite caching graph described in the previous section. The line graph of a graph is a graph in which the vertex set is the edge set of , and two vertices of are adjacent if and only if they share a common vertex in . The square of a graph is a graph having , and an edge if and only if or there exists some such that . The following result is folklore and easy to prove.
Lemma 1.
There exists a cliquecover for if and only if there exists a strongedgecoloring for , with the cliques in the clique cover of corresponding to the color classes (induced matchings) arising from the strong edge coloring of .
By Lemma 1 and Section II, a valid transmission scheme corresponding to the caching scheme associated with can be obtained by obtaining a clique cover for . From the arguments in Section II, such a transmission scheme will involve one transmission per each clique in a clique cover of . Fig. 2 shows the graph for the line graph of the bipartite graph shown in Fig. 1. A clique cover consisting of cliques is also shown, each containing vertices. Thus the number of transmissions is , and the rate is , which is optimal for this graph as shown in Example 1.
It turns out that the line graph of the leftregular bipartite graph is highly structured, and any such structured graph will serve as a line graph of such a bipartite graph.
Proposition 1.
A graph containing vertices is the line graph of a leftregular bipartite graph if and only if the following conditions are satisfied.

The vertices of can be partitioned into disjoint cliques containing vertices each. We denote these cliques by and call them as the usercliques. We label the vertices of as

Consider distinct For any , there exists at most one vertex such that

For any and any vertex , the set containing and all adjacent vertices of except those in , forms a clique. We refer to these cliques as the subfilecliques. Let be the number of subfilecliques in and the subfilecliques be denoted as .

The number of right vertices of is
(4)
Proof:
We prove the If part. The Only If part can be inferred easily. We are given a graph satisfying properties (C1)(C4). To prove the If part, we first create a bipartite graph with left vertices and right vertices. Partitioning the right vertices into subsets of size each, we initialize the edge set of by assuming that the subset of right vertices in the partition are all adjacent to the left vertex. We also label the adjacent right vertices of as Note that the line graph contains cliques of size each and no other edges. Thus is a subgraph of as (C1) holds. Furthermore, we note that by conditions (C1)(C3), , where is the set of all edges in all the subfile cliques .
The proof proceeds by updating the graph by identifying rightvertices according to the subfilecliques of the given graph so that at any step , the updated graph is such that will be a subgraph of . Finally we will have , where is the number of subfile cliques.
We proceed by induction. Assume that for , is a subgraph of . We now explain how the graph is obtained from , and show that is a subgraph of . Consider the subfile clique of , given as . To obtain , we identify the right vertices in as a single right vertex. Note that these vertices in are welldefined, as the subfilecliques partition the vertices of the graph by Condition (C3). Furthermore, as (C2) holds, the vertices are all distinct. Hence continues to be leftregular, moreover with the same number of edges as . With all these facts, we can see that where is the set of edges in the clique . It follows that is a subgraph of . After the step, after going through all the subfile cliques, we have and hence .
We claim that the required bipartite graph is then . It remains to check that the number of right vertices of is . This is seen by noting that the number of right vertices of is , and the number of right vertices of is less than that of , . This completes the proof. ∎
By Proposition 1, if we construct a graph satisfying conditions (C1)(C3), then we have constructed a caching scheme based on a bipartite graph such that with subpacketization as in (4). We therefore give the following definition.
Definition 2.
A graph is called a caching line graph if it satisfies conditions (C1)(C3) of Proposition 1 for some parameters and .
Henceforth all our line graphs are caching line graphs. By Lemma 1, any clique cover of (the complement of the square of ) gives us a transmission scheme (one transmission per clique) that satisfies all receiver demands. In order to obtain a clique cover of , we have to understand the behaviour of the cliques of .
Lemma 2.
A subset of vertices is a clique of if and only if the following condition is true.

For any two vertices , there exists no vertex in adjacent to in .
Furthermore, any clique of contains at most one vertex from each of the usercliques of .
Proof:
Recall that We prove the only if part of the first claim. Let be a clique of . Now, suppose there are vertices such that there is some vertex in adjacent to in . Then by definition of , and will be adjacent in , and hence nonadjacent in . This contradicts our assumption that is a clique of .
Now the if part. Suppose the condition of the lemma is satisfied for some subset , but there are two vertices which are nonadjacent in , and hence adjacent in . By definition of , this means that either , or there exists some vertex such that and . In the former case, there is a clear contradiction of the condition of the lemma statement. In the latter case, we have . The case of is already handled by the former case. If no two of are equal to each other, this means must be in a clique of by (C3) of Proposition 1, which violates the condition of lemma. Thus, WLOG, we can assume , in which case we see that the vertex contradicts the assumed condition. This completes the proof of the first claim.
The last claim of the lemma follows from the first claim, since if some clique of contains two vertices from the same userclique of then the condition in the lemma is violated. ∎
We now define a specific class of caching line graphs called caching line graphs. The reason for considering caching line graphs is because they yield easily to the computation of the rate and the subpacketization, as Theorem 2 will show.
Definition 3.
A caching line graph such that has a clique cover consisting of sized disjoint subfile cliques and has a clique cover consisting of sized disjoint cliques, for some positive integers , is called a caching line graph.
Theorem 2.
Consider a caching line graph . Then there is a coded caching scheme consisting of the caching scheme given by with (and thus ), and the associated transmission scheme based on a clique cover of having rate . Furthermore, if the number of files , the rate of this scheme satisfies
(5) 
where is the infimum of all achievable rates for with subpacketization .
Proof:
Since there is a clique cover of (which satisfies (C1)(C3)) with sized disjoint subfile cliques, by Proposition 1, there exists a caching scheme with
Clearly, we also have
Remark 2.
We observe that (5) indicates that if the subpacketization is large compared to in a bipartite caching scheme, then a clique cover of with cliques of size makes the rate of the transmission scheme based on the clique cover of close to the optimal rate . Similarly if is much larger than , a clique cover of with size being brings close to optimal.
In the rest of this section, we reinterpret some priorly known coded caching schemes as schemes based on caching line graphs. In both these examples, it may be observed that the situation is similar to that of Remark 2; we have growing exponentially in as , but and hence we can keep the rate close to optimal.
Example 2.
For given parameters , let . We will now construct a caching line graph, which corresponds to the coded caching scheme of [2]. The caching line graph is initialized with cliques of size , indexed using . For each user , denote the vertices of the userclique as .
For each such that , we create a clique of size in consisting of the vertices by defining edges between all these vertices. It is easy to see that
For some sized , consider the set of vertices of given by
consisting of vertices of . It is not difficult to see that for any distinct , there exists no edge in from the vertex to any vertex in the userclique. Thus, by Lemma 2, forms a clique in of size . Also note that
Thus, the caching line graph is a caching line graph. Hence by Theorem 2, the subpacketization for this graph is
And the rate corresponding to the clique cover scheme on is
We have thus recovered the coded caching scheme of [2] using .
Example 3.
We now recover a special case of the coded caching scheme based on resolvable deisgns from [11] which first appeared in [12]. Let be a dimensional linear single parity check code of length over a finite field . We initialize the caching line graph with usercliques, each consisting of vertices. We index the usercliques as , where , and . The vertices of the userclique are indexed as follows.
It is not difficult to see that
since we can think of as a coset of the subcode within . For a formal proof, we refer the reader to [12]. Thus, .
We now construct the subfile cliques as follows. For each vector
, we construct the cliqueby creating the edges between all the vertices in . Again, it is not difficult to see that Thus the cliques form a disjoint clique cover of . Furthermore , since by definition, an userclique does not contain if and only if and thus .
From the above construction of the subfileclique cover for we have from Theorem 2 that . We now construct a clique cover of . For , let be the codeword in such that is equal to at the coordinates but not at the coordinate. Note that a unique such codeword does exist in by definition of and . Now consider the set of vertices of given by
Note that . Also, for , there exists no edge from to any vertex in because in , the coordinate is precisely . Thus forms a clique of size in . Furthermore, it is not hard to see that
where the above union is a disjoint union. Thus the sized disjoint cliques s cover the vertices of . We have thus got a caching line graph . By Theorem 2, the coded caching scheme on has rate . We have hence recovered the scheme from [12].
Remark 3.
In the examples given so far in this section, we have essentially reverse engineered the schemes given in prior works and demonstrated how they can be intrepreted according to the line graph framework we have presented in this current work. We also remark that the caching schemes based on PDAs (placement delivery arrays) discussed in [8] and subsequent works can be seen in the framework of caching line graphs as well. However the special class of caching line graphs seem to offer some advantages in terms of tracking the subpacketization and rate using the graph characteristics. In the forthcoming section, we present a new explicit construction of a caching scheme based on caching line graphs.
V A Line Graph based Coded Caching Scheme based on Projective Geometry
We recollect some basic ideas of projective geometries over finite fields. The reader is referred to [17] for more details. For positive integers , let denote the dimensional projective space over . The elements of are called the points of . The points of can be considered as the representative vectors of onedimensional subspaces of . For , let denote the set of dimensional subspaces of . It is known that is equal to the Gaussian binomial coefficient , where is given by
In any Gaussian binomial coefficient given in this paper we assume that The following is known about the Gaussian binomial coefficients (see [17], for example).
Lemma 3.

The Gaussian binomial coefficient is the number of subspaces of dimension of any dimensional subspace over . Also, .

The number of elements of