 # Linear Programming Approximations for Index Coding

Index coding, a source coding problem over broadcast channels, has been a subject of both theoretical and practical interest since its introduction (by Birk and Kol, 1998). In short, the problem can be defined as follows: there is an input x (x_1, ..., x_n), a set of n clients who each desire a single symbol x_i of the input, and a broadcaster whose goal is to send as few messages as possible to all clients so that each one can recover its desired symbol. Additionally, each client has some predetermined "side information," corresponding to certain symbols of the input x, which we represent as the "side information graph" G. The graph G has a vertex v_i for each client and a directed edge (v_i, v_j) indicating that client i knows the jth symbol of the input. Given a fixed side information graph G, we are interested in determining or approximating the "broadcast rate" of index coding on the graph, i.e. the fewest number of messages the broadcaster can transmit so that every client gets their desired information. Using index coding schemes based on linear programs (LPs), we take a two-pronged approach to approximating the broadcast rate. First, extending earlier work on planar graphs, we focus on approximating the broadcast rate for special graph families such as graphs with small chromatic number and disk graphs. In certain cases, we are able to show that simple LP-based schemes give constant-factor approximations of the broadcast rate, which seem extremely difficult to obtain in the general case. Second, we provide several LP-based schemes for the general case which are not constant-factor approximations, but which strictly improve on the prior best-known schemes.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Index coding is a particular form of network coding that was first introduced by Birk and Kol , and has since been shown to be in some sense as difficult as any other network coding problem . It is a multiuser communication problem in which a broadcaster aims to transmit data to many users. While the users are unable to communicate amongst themselves, some of them already possess data desired by other users, which we call the “side information.” The goal is then to design transmission schemes for the broadcaster and corresponding decoding schemes for the users that exploit this side information in order to get each user their desired data in a minimum number of broadcaster transmissions.

More formally, we have a set of clients which we refer to simply by number, and each client desires the corresponding message from the set , where each message belongs to an alphabet with . Additionally, each client has some side information . We define the (directed) side information graph of the index coding instance to be the graph with vertices corresponding to clients, and edges whenever . Then the goal is for a broadcaster to transmit messages, each belonging to , simultaneously to all clients so that every client can reconstruct as a function of and the messages sent by the broadcaster.

Specifically, if there exists an encoding function and decoding functions such that for each , then we say this is a solution to the index coding problem on in rounds. The minimal number of rounds needed to obtain a solution also depends on , the size of the alphabet. We define to be the minimum number of rounds such that a solution exists on in rounds over an alphabet of size . We then define the index coding rate or the broadcast rate of the graph as

 Ind(G)=infq≥2Indq(G). (1)

Some special types of index coding scheme require attention before we continue further. Suppose for some finite field and the encoding function is linear over . If and the broadcaster sends only linear combinations of the messages , the message-sending scheme is called scalar linear. For , if the broadcaster is allowed to break up the messages in into smaller packets in and transmit linear combinations of the packets, the scheme is called vector linear. To be more precise, for scalar linear schemes, the encoding function consists of different functions , where each function is an

-linear combination of the arguments. For vector linear schemes, the encoding function consists of

different functions , where each function is an -linear combination of the arguments. All scalar linear schemes are also vector linear schemes. If a scheme is not vector linear, it is called nonlinear. In this paper we will focus on the quality of solutions relative to the best possible nonlinear scheme, although all schemes we provide are vector linear.

### I-a Related Work

Without any restriction on the graph or the encoding function, no bounded time algorithm is known for finding

exactly, as little is understood about the speed at which the rates converge (therefore even an exponential-time algorithm to estimate

is of interest). This is in contrast to the scalar linear case with fixed alphabet size, in which the broadcast rate is known to be equal to another graph parameter called “minrank,” and finding this quantity exactly is known to be in NP . The best known approximation factor in general is (i.e. the scheme returned by the algorithm has rate at most a multiplicative factor of larger than ) , barely improving on the trivial factor approximation obtained by broadcasting each client’s message individually. In , for a graph with minrank , a scalar linear index coding scheme with an approximation factor of as , was provided which is nontrivial for a constant . In the negative direction, it has been shown that finding any constant-factor approximation of in general is at least as hard as some well-known open problems in graph coloring . In this paper, we explore two different approaches to make progress despite this difficulty. The first approach is to restrict the side information structure to some specific type of graph, and attempt to exploit its properties to attain better approximations than what are possible in general. The second is to find ways of strictly improving the existing schemes for the general case, though we cannot quantify the improvement asymptotically.

For perfect graphs (a class including all bipartite graphs which will be defined in section II), it has been known for some time that the index coding rate can be computed exactly, as it is sandwiched between two graph parameters that are equal . For more general classes than this, exactly computing the broadcast rate seems too much to ask, and we seek instead to approximate it as best as possible. There has been some work already in the area of approximating for restricted graph classes: in , Arbabjolfaei and Kim show a simple 4-approximation of (meaning the returned solution has rate at most ) for undirected planar graphs; in  Mazumdar et al. improve this to obtain a 2-approximation of for undirected planar graphs. In the (even more restricted) outerplanar case, while the scalar linear index coding rate with a fixed-size alphabet is studied in  (it is in fact shown to be equal to the size of the minimum clique cover of ), the nonlinear rate has not been studied beyond the known results for planar graphs. In general, it has been shown that the linear and nonlinear index coding rates can be extremely far apart, so the nonlinear case merits study even when the linear case is solved [18, 7]. The main technique used to approximate for planar graphs is to exploit the “dual” relationship between and another, easier to approximate quantity called the storage capacity, or , which was introduced in . The relationship between these quantities is also used in  to show some lower bounds on

for very restricted graph classes such as odd cycles. We will make use of this general technique as well, and will define

and explore its relationship with further in section II.

In the general case (recall this includes directed graphs), there have been a series of works providing increasingly better schemes. Birk and Kol  provided the first such scheme when introducing the problem, the “clique cover” scheme, in which the side information graph is covered by as few vertex-disjoint cliques as possible. In this scheme the broadcaster transmits a single message for each clique, which is the sum (as vectors with entries in ) of the vectors desired by each node in the clique. Such a clique covering is equivalent to a proper coloring of the complementary graph. This idea was further extended in  to show that in fact a weaker notion of coloring called a “local coloring” of the complementary graph yields an index coding scheme as well. Another generalization of the clique cover scheme that was known as early as  is to instead cover by “partial cliques,” which are nearly-complete subgraphs.

More recently in , ideas from both the local coloring and partial clique cover schemes were merged into a linear program (LP)-based scheme which outperforms both schemes individually. We continue in this line of work, showing a novel LP-based index coding scheme which combines ideas from previous schemes in order to obtain strictly better performance. Our scheme can also be extended to generalize the scheme proposed in , which proposed to cover the side information graph by a type of generalized cycle, rather than by cliques or partial cliques.

### I-B Contributions

All our contributions consist of (vector linear) index coding schemes, in various settings, as opposed to lower bounds on . Additionally, all our schemes correspond to solutions of particular linear programs, which will be described in more detail in sections III and II. For special graph families, we have chosen to focus specifically on undirected graphs, both for the sake of simplicity and for parity, as one family we consider (disk graphs) has no directed analogue. In the general case, we consider directed graphs as well.

#### I-B1 Approximations for Special Graph Families

Continuing the line of work in , we generalize beyond the case of undirected planar graphs to any undirected graph with small chromatic number. We prove new bounds on and that recover the results of  for planar graphs, give superior results for 3-colorable graphs, and also give constant-factor approximations for graphs with constant chromatic number . The techniques used for these types of graph and the barriers to progress that seem to arise give insight about other cases as well; as evidence of this, we use some of the same bounds used to prove results about -colorable graphs in order to improve the best known approximation of for undirected sparse graphs with edges.

The other main graph class we consider is more practically motivated. If our graph arises from thresholding the latencies between pairs of servers to 0 or 1, and these latencies roughly correspond to physical distances between servers in the real world, then we should expect two servers that are physically close to have an edge between them, and two servers that are far apart to not have an edge between them. This is very close to the notion of a “unit disk graph,” which is a graph formed by placing points in the plane that correspond to the vertices, and having an edge between two vertices whenever the corresponding points are less than some distance apart (we define this more formally in the next section). These graphs are thought to be good approximations of certain kinds of real-world networks, and in particular have seen widespread use in the area of scheduling problems for broadcast networks [12, 13]. In this setting there are many broadcasters which each have some radius in which they broadcast, and we may wish to, for instance, assign frequencies to each broadcaster so that no two broadcasters in the same area are broadcasting on the same frequency. This can be viewed as a coloring problem on a disk graph, where colors correspond to frequencies, and broadcasters correspond to vertices.

There are also prior examples of hard problems which are very difficult to approximate for general graphs, but for which good approximations exist when restricting to unit disk graphs; for example, it is well-known that maximum independent set cannot be approximated within any constant factor (in polynomial time) in general, but when restricting to unit disk graphs there is a polynomial time approximation scheme . We show improvements over the general approximation of for a superclass of unit disk graphs, as well as constant-factor but potentially inefficient approximations of for unit disk graphs, which can be made efficient in some special cases.

#### I-B2 Improved Schemes for the General Case

One of the earliest index coding schemes for the general case is the simple “clique cover” scheme, and since its introduction various different generalizations have been provided, such as “local graph coloring” and the “partial clique cover” scheme. Our work, expanding on a previously introduced interference alignment approach, gives a method that combines many of these “orthogonal” generalizations together. We give an example of a side information graph which shows that our new method can provide strict improvement over previous approaches. Furthermore, using ideas and tools from the previous scheme we further generalize another scheme which exploits what are called “Generalized Interlinked Cycles” in the side information graph.

#### I-B3 Paper Overview

The remainder of the paper is organized as follows:

• In section II we introduce some definitions and notation that is needed to state and prove our main results.

• In section III we summarize our main results, including several constant-factor approximations for special graph families, and improved schemes for the general case. Proofs are postponed until the next section.

• In section IV we state and prove bounds from which the quality of our approximations follows for special graph families, and prove the correctness of the schemes for the general case. For the special graph families, many of the bounds proved here actually imply good approximations for more general classes of graph than those focused on in the previous section, but we have chosen to highlight the results for those specific types of graph for greater clarity of exposition.

• In section V we provide detailed constructions of the improved schemes for general graphs presented in section III.

• In section VI we explore some difficulties in improving certain results further, including examples that demonstrate barriers to the success of some current proof techniques. We also discuss several interesting open questions and potential improvements to our results.

## Ii Prerequisites

Let us define our notation for sets, graphs, vectors, and matrices at the outset.

• For any , .

• For any , .

• The complement of a set is denoted by .

• For a graph , denotes the directed complement of .

• For any set and set of vectors , denotes the set and denotes the matrix . For a matrix , denotes the sub-matrix of constructed from the columns of corresponding to .

• For a graph , denotes the set of out-neighbors of . When the graph is clear from context, we shorten this to .

• An -MDS matrix is a matrix in , any field, with the property that any column vectors of the matrix are linearly independent.

Given a graph and a subset of vertices , we write to mean the subgraph of induced on . We write for the size of the maximum independent set of , i.e., the size of the largest set such that is edgeless. Many of our results give approximations with quality depending on the chromatic number , the minimum number of colors needed to color the vertices of such that no two adjacent vertices have the same color (such a coloring is called a “proper coloring”). Some results also make use of a related quantity, called the local chromatic number , which is the maximum number of colors in any out-neighborhood of a vertex , minimized over all proper colorings of . A few results depend also on the size of the largest clique (complete subgraph) in , the clique number written .

A planar graph is a graph with an embedding into the plane such that no two edges cross. An outerplanar graph is a planar graph, with the additional restriction that it has an embedding into the plane such that all vertices lie on the exterior face of the graph (i.e. a drawing exists with no vertex enclosed by edges). A perfect graph is a graph with the property that for every induced subgraph , . This class includes all bipartite graphs, and it is also known that the complement of every perfect graph is perfect.

Another type of graph we consider here are “disk graphs,” often thought to be good models of real-world networks where connections between nodes are based on their proximity in some metric. Disk graphs are a special case of geometric intersection graphs; these are the graphs which can be formed by placing shapes (usually of some restricted form) in the plane (or sometimes a higher dimensional space), then associating each shape with a vertex, and defining two vertices to have an edge whenever their corresponding shapes overlap (or touch at a single point). Any layout of shapes in the plane which corresponds to a specific graph in this way is called a geometric representation of . Whenever a graph has such a geometric representation, we say it is an intersection graph. In a disk graph, we require that the graph has a geometric representation where all shapes are circles, but of possibly varying sizes. In a unit disk graph, or UDG, we further require that all such circles have unit radius, i.e. radius 1. We will even consider a special case of unit disk graphs, introduced in , called -precision unit disk graphs, which are those unit disk graphs for which there exists a geometric representation where every pair of disk centers is distance at least from one another.

We say a subset of vertices is a vertex cover of if every edge of the graph includes some vertex in . We denote by the minimum size of all such covers. We can relax the notion of a vertex cover to the following LP, of which we refer to the solution as the minimum fractional vertex cover, with value :

 min. ∑v∈V(G)xv s.t. xu+xv≥1 for every edge (u,v)∈E(G) 0≤xv≤1 ∀v∈V(G).

A matching in a graph is a subset of edges with the property that no vertex of is adjacent to more than one edge of . We write for the size of the maximum matching of . Similar to vertex cover, we can relax this notion to the following LP for fractional maximum matching, the optimal value of which we denote by :

 max. ∑e∈E(G)ye s.t. ∑e∈E(G):v∈eye≤1 ∀v∈V(G) 0≤ye≤1 ∀e∈E(G).

It is well-known that this is the dual LP to that for fractional vertex cover, and thus by duality we have for any graph .

As mentioned briefly in section I, it will be useful for us to consider another graph parameter which turns out to be closely related to the index coding rate, called the storage capacity of the graph, or . Intuitively, the storage capacity corresponds to the maximum size of an error-correcting code in which each vertex of the graph stores a symbol from , and we require that if any single vertex fails (in a detectable way) and its data becomes inaccessible, the -ary symbol stored at it can be recovered as a function of only that vertex’s neighbors in the graph. Thus if the graph is complete, this reduces to the notion of a single-erasure correcting code, as then there are no restrictions on which locations can be accessed to recover.

Formally, we say a set of codewords is a recoverable distributed storage system code for the graph with over alphabet if there exist decoding functions such that for any codeword , for all . We are primarily interested in the question of how large any such code can be over some particular network; to this end we define the storage capacity

 Capq(G)=maxClogq|C| (2)

where the maximum is taken over all recoverable distributed storage system codes over an alphabet of size , and we then define the overall capacity to be

 Cap(G)=supq≥2Capq(G). (3)

One of the main results of  proves the following somewhat unexpected dual relationship between the storage capacity and the index coding rate for with :

 Cap(G)=n−Ind(G). (4)

Thus finding either quantity exactly is equivalently hard, though there is no reason to expect the two to be equally hard to approximate, and indeed it seems generally to be the case that is much harder to approximate than . We will see later on that we are sometimes able to exploit the relationship between these two quantities to give guarantees about the quality of certain approximations – in particular leveraging bounds on to get at the otherwise difficult to approximate .

It is also shown in  that is sandwiched between the size of the maximum matching of and the minimum vertex cover of , which is used in proving the results for planar graphs. The fact that taking one vertex from each edge in a maximum matching yields a feasible vertex cover implies these two quantities are at most factor 2 apart, so this yields a simple 2-approximation of for any graph. Thus when we try to approximate for restricted , we are primarily interested in improving on the 2-approximation, whereas for , almost any nontrivial approximation is of interest.

The primary quantity we will use to approximate the storage capacity of a graph is the maximum fractional clique packing of , an LP relaxation of clique packing in which we try to pack as many large cliques within as possible. Specifically, we write for the solution to the following LP, where denotes the set of all cliques in :

 max. ∑C∈KxC(|C|−1) s.t. ∑C∈K:v∈CxC≤1 ∀v∈V(G) 0≤xC≤1 ∀C∈K.

Note that in general we may not be able to compute the solution to this LP efficiently without a bound on the size of the largest clique in . The main reason proves useful as an approximation of is due to the bound

 FCP(G)≤Cap(G)

shown in . For approximating the index coding rate of rather than the capacity, we will use the complementary quantity , the size of the minimum fractional clique cover of , where we instead seek to use as few cliques as possible in order to cover every vertex of by some clique. This quantity is equal to the solution of the following LP:

 min. ∑C∈KyC s.t. ∑C∈K:v∈CyC≥1 ∀v∈V(G) 0≤yC≤1 ∀C∈K.

It is a simple exercise to see that , so we will sometimes use these two notations interchangeably depending on what is most convenient. The above relationship between and also immediately yields the upper bound

 Ind(G)≤FCC(G), (5)

which has been known for some time in the index coding literature .

Another bound on which we will rely on heavily in our approximations, first shown in , is that is lower bounded by the size of the maximum acyclic induced subgraph of , or . For undirected , , but in general for directed we have only , as every independent set clearly induces an acyclic subgraph. So it is always true that

 α(G)≤MAIS(G)≤Ind(G)≤FCC(G). (6)

From this we can see why it is easy to find exactly if is perfect, as then is perfect also, so if we write for the minimum integral clique cover of , we have

 ω(¯¯¯G)=α(G)≤Ind(G)≤FCC(G)≤CC(G)=χ(¯¯¯G), (7)

and the leftmost and rightmost terms are equal as is perfect. While both and are NP-hard to compute in general, we can instead compute any more nicely-behaved quantity sandwiched between them, such as the Lovász theta function .

Finally, we will in certain cases wish to cover the graph instead by a generalization of a clique, called a -partial clique. A -partial clique on vertices is a subgraph in which every vertex has at least out-neighbors, and at least one vertex has exactly out-neighbors. Thus, a complete subgraph on vertices is a -partial clique.

## Iii Main Results

In this paper, we present primarily two types of results for approximating the index coding rate of a graph: those which apply only to graphs in specific families, and those which apply to general graphs. When working with a special family, we can often provide good approximations of the index coding rate by using simple schemes but leveraging properties of the graph to prove these simple schemes are effective. In contrast, as it is known to be difficult to approximate the index coding rate in the general case, most of our results in the general (directed) setting do not provide provably good approximations; instead, they can be viewed as methods of strengthening the simple schemes to ones that perform strictly better, although we are not always able to rigorously quantify how much better they perform.

### Iii-a Approximation Results for Special Graph Families

Most of the results in this paper relating to specific graph families do not depend fundamentally on the graph family itself, but rather on certain nice properties of the graph family such as small chromatic number. In this section we do not state our results in full generality or prove them, but instead give instantiations of the general results with respect to the graph families we are most interested in. The most general versions of these results are stated and proven in section IV.

At a high level, the common technique used in these results is to employ the (relatively) easy-to-compute quantity as an approximation of , and similarly to use as an approximation of . The main challenge comes in proving the quality of these approximations. The table below summarizes the state-of-the-art bounds for the main graph families considered in this paper. We reiterate that in this subsection, all results assume the graph is undirected.

[htbp] Best-Known Approximations of and Graph Type UB for UB for Unrestricted Sparse Graph ( Small Chromatic Number ()     General Disk Graph   Unit Disk Graph   -precision UDG,

• Bound proved in this work.

• Bound in this work improves previous best bound by a constant factor.

#### Iii-A1 Results for Graphs with Small Chromatic Number

Many of the results in  are aimed at approximating and in the case that is planar, often by exploiting the 4-colorability of planar graphs. Here we generalize these ideas further to the case that is -colorable for some . Our first result generalizes the -approximation of for planar in  to a -approximation when is -colorable.

###### Theorem 1.

If has , then

 Cap(G)FCP(G)≤2−2k. (8)

Similarly,  presents a 2-approximation of index coding rate for planar graphs. By generalizing their bound to exploit -colorability instead of 4-colorability we immediately obtain an approximation for -colorable graphs, but the quality of this bound scales poorly with . However, we can use a different technique to show is a -approximation for -colorable .

###### Theorem 2.

If has , then

 FCC(G)Ind(G)≤k2. (9)

#### Iii-A2 Results for Sparse Graphs

Many of our results, especially for approximating , rely on the fact that graph families with small chromatic number always contain a relatively large independent set. This fact combined with the chain of inequalities and bounds on is often enough to give good results in the special cases we consider. The following theorem attempts to generalize this idea as much as possible, by using Turán’s theorem to guarantee the existence of a large independent set in any sufficiently sparse graph. If we restrict back to the planar or outerplanar case, this result is weaker than the other more specialized results.

###### Theorem 3.

Let be a graph with vertices and edges. Then

 FCC(G)Ind(G)≤max(e(n−2)n(n−1)+1,2e3n+43). (10)

#### Iii-A3 Results for Disk Graphs

As mentioned previously, the other main graph family we will consider are the disk graphs, and in particular unit disk graphs. The primary difficulty with this graph family which does not occur in the case of planar or outerplanar graphs is that these graphs may be very dense and contain cliques of arbitrarily large size, which means that in general they do not have linear-sized independent sets. If is very small, then the lower bound becomes very weak, and approximating becomes difficult. The situation is better for approximating the storage capacity, since the corresponding inequality is , meaning when is very small is easy to approximate. We use this idea along with some facts about disk graphs to get the following approximation guarantee.

###### Theorem 4.

If is a disk graph, then

 Cap(G)FCP(G)≤32. (11)

When is a disk graph or even a unit disk graph, it becomes increasingly difficult to approximate using preexisting methods as contains larger and larger cliques. If we are willing to tolerate superpolynomial running time (which may be reasonable, as finding exactly is not even known to be in NP), we can use a result of  along with some results from the disk graph literature to obtain the following approximation.

###### Theorem 5.

If is a unit disk graph, then

 FCC(G)Ind(G)≤3. (12)

If instead we insist on polynomial running time, we cannot prove a constant-factor approximation for all UDGs (the LP which has as its solution may have a superpolynomial number of constraints), but we can recover good approximations in some special cases.

###### Theorem 6.

If is a unit disk graph with clique number , then

 (13)

and furthermore we can obtain an approximation of with this approximation factor in polynomial time.

In , Hunt et al. introduced the notion of “-precision unit disk graphs.” These are unit disk graphs with the additional constraint that the centers of every pair of disks are at distance at least from each other, which may be a reasonable constraint in some real-world scenarios. This allows us to prove a bound on the clique number in terms of , which we can translate into a bound on using theorem 6.

###### Theorem 7.

If is a -precision unit disk graph, then

 FCC(G)Ind(G)≤64λ2+1=O(λ−2)+1, (14)

and furthermore we can obtain an approximation of with this approximation factor in polynomial time.

### Iii-B Algorithms for General Graphs

As seen above, almost all our results approximating the index coding rate of graphs from special families use the fractional clique cover as the achievability scheme. In this section we instead describe more complex vector linear achievability schemes which strictly improve upon the fractional clique cover, and thus can be viewed as a further strengthening of the approximations described previously for special graph families. Although we know of specific examples where these new schemes are superior, we leave as an open question whether they can yield better constant-factor approximations for certain graph families than those attained by . In this subsection we consider directed as well as undirected graphs. The detailed proofs of the results in this subsection are postponed to section V.

Let us first look at the index coding problem from an interference alignment perspective. Suppose that the data requested by user (vertex ) is . We assign a vector to each vertex such that the vectors satisfy the following condition,

 vi∉span(vN(vi,¯¯¯G)). (15)

From the interference alignment perspective, are the interfering set of indices for user . Recall we define . The index code (broadcaster transmission) is given by . It can be seen that each node can recover from the index code because of eq. 15.

In this section, we utilize the interference alignment perspective to find algorithms that improve beyond . We begin by combining two orthogonal generalizations of .

#### Iii-B1 Local Chromatic Number and Partial Clique Cover

It is certainly possible to satisfy the requirements in eq. 15 if , however, our goal is to minimize the dimension of . One solution to this problem is to find a proper coloring of the graph

and assign orthonormal vectors to each color class (the same vector is assigned to all vertices with the same color). Thus, an achievable broadcast rate is given by the chromatic number of

. Note that the size of a minimum (integral) clique cover of a graph is the same as the chromatic number of the complementary graph , and similarly , the fractional chromatic number of .

One way to improve beyond the fractional clique cover scheme is the local chromatic number. The local chromatic number of is always less than (or equal to) . Using the interference alignment perspective it is easy to see that we can assign the column vectors from an -MDS matrix to attain an index coding rate equal to the local chromatic number as shown in . A linear relaxation of the integer program corresponding to the local chromatic number gives a vector linear index coding scheme better than .

Another approach to improving the clique cover is to instead find a partial clique cover of . Whereas a clique cover is a cover of the vertices of the graph by complete subgraphs, a -partial clique cover is instead a cover of the vertices of the graph by -partial cliques, which were defined in section II. Let be the smallest such that is a -partial clique. In each of the -partial cliques , one can use a -MDS matrix to assign vectors to the nodes to satisfy eq. 15.

We can in fact go further, and combine the partial clique cover and the local chromatic number schemes to obtain an index code which generalizes both these schemes, as shown in theorem 8. In some cases eq. 16 provides strictly better solutions than either the partial clique cover or the local chromatic number of .

###### Theorem 8.

The minimum broadcast rate of an index coding problem on the side information graph is upper bounded by the optimum value of the following linear program, where equationparentequation

 min t s.t. ∑S∈Kmin{|S∩N(v,¯¯¯G)|,kS+1}ρS≤t,v∈V(G) (16a) ∑S∈K:v∈SρS≥1,v∈V(G) (16b) ρS∈[0,1],S∈K. (16c)

Let us explain the term

 ∑S∈Kmin{|S∩N(v,¯¯¯G)|,kS+1}ρs

in eq. 16a, for the integer version of the above linear program. Let be the set of selected partial cliques. Then, for each vertex compute the sum . Thus each selected partial clique only contributes . Now, the number of broadcast bits corresponds to the maximum sum for any vertex , i.e.

 t=maxv∈V(G)∑S∈Kmin{|S∩N(v,¯¯¯G)|,kS+1}ρS=maxv∈V(G)τ∑i=1min{|Si∩N(v,¯¯¯G)|,kSi+1}.

A solution to the integral version of the above linear program corresponds to a scalar linear index code. From the linear program in eq. 16, we instead obtain a vector linear index code, the details of which are covered in section V.

There is one more way we can generalize the solution of the linear program in eq. 16, which is to recursively apply the linear program to subgraphs. The recursive linear program is given in the following theorem.

###### Theorem 9 (Recursive LP).

Let denote the value of an optimal solution to the linear program below for graph :

 mints.t.∑S∈Kmin{|S∩¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯N(v,G)|,ICFLP(G|S)}ρS≤t,v∈V(G)∑S∈K:v∈SρS≥1,v∈V(G)ρS∈[0,1],S∈K. (17)

where is defined to be for single vertex graphs . Then the minimum broadcast rate of an index coding problem on the side information graph is bounded from above by .

The index code corresponding to the linear program in theorem 9 can be easily obtained from the index coding solution for theorem 8 as shown in section IV. Let us now give an explicit example of a graph where our index coding scheme is a strict improvement over the existing schemes. Of course, since our scheme is more general, it is clear that its performance must be at least as good for every graph .

Consider the index coding problem described by the graph in fig. 1. For this graph, the index code based on the fractional local chromatic number has broadcast rate , the index code based on just the fractional partial clique clique cover has broadcast rate and the proposed scheme combining the local chromatic number and partial clique cover in eq. 16 has broadcast rate . Similarly, fig. 2 shows an example for which the recursive version of the proposed scheme in theorem 9 is a strict improvement over the corresponding recursive scheme proposed in [2, theorem 4], with broadcast rates and , respectively. Fig. 1: Side information graph for which the broadcast rate of the proposed scheme in theorem 8 is a strict improvement over the existing schemes (fractional local chromatic number and fractional partial clique cover). Fig. 2: Side information graph for which the broadcast rate of the proposed recursive scheme in theorem 9 is a strict improvement over the existing recursive schemes in [2, theorem 4].

#### Iii-B2 Generalized Interlinked Cycle Cover

We now generalize the fractional clique cover scheme in another direction. Since cycle and clique covers yield natural solutions to the index coding problem it makes sense to combine these structures to obtain a more general solution. The -GIC (Generalized Interlinked Cycle) graph structure presented in  provides such a solution. Our contribution is to show that this scheme can be further generalized by combining it with the partial clique cover technique presented above. We will call the relevant graph structure used to cover the side-information graph a -GIC; here we simply define this structure, and the details of the scheme will be postponed to section V.

We say a graph with vertices is a -GIC if it has the following properties:

1. contains a set of vertices, denoted by , such that for any vertex there are at least vertices with the property that there is a path from to which does not include any other vertex of . We call the inner vertex set, and let . The vertices of are referred to as inner vertices.

2. Due to the above property, we can always find a directed rooted tree (denoted by ) with maximum number of leaves in and root vertex , having at least other vertices in as leaves. The trees may not be unique. Denote the union of all such trees by . Then the digraph must satisfy the following two conditions:

###### Properties 1.

1. Every cycle in the digraph contains at least two vertices in the vertex set .

2. For all ordered pairs of inner vertices (

), , there is only one path in from to that does not include any other vertices in .

#### Iii-B3 Example Fig. 3: Side information graph for which our proposed (k,n1)-GIC scheme outperforms the n-GIC scheme of . All edges are present except those indicated by dashed lines.

We provide an example where the proposed GIC scheme performs strictly better than the GIC scheme in  in fig. 3. The graph in fig. 3 has an index coding rate of using a partial clique cover scheme. Since the proposed GIC scheme is a generalization of partial clique covers it performs at least as well.

A vector linear scheme using a fractional cover with the GIC scheme proposed in  gives an index coding rate of . Note that for the graph proposed in fig. 3, there is no GIC (as proposed in ) with inner vertex set of size , since this violates condition in creftype 1.

## Iv Proofs for Index Coding Rate Approximations

In this section we prove the results of section III-A. Typically we will do so by establishing a more general result, from which we just need to plug in certain parameters of the graph family in question to obtain the more specific statement. To begin we consider bounds which exploit the graph having small chromatic number.

### Iv-a Bounds Using Chromatic Number

In , several results showing constant-factor approximations for both storage capacity and index coding rate in planar graphs are given. For the most part, these results depend not specifically on the planarity, but on the small chromatic number of the graph in question, as well as the chromatic number of the subgraph induced by removing a maximal set of triangles. In particular, the techniques used to show a constant-factor approximation of for planar graphs depend not only on the 4-colorability of planar graphs, but also on the 3-colorability of triangle-free planar graphs. Here we generalize and extend these techniques to give approximations in terms of the chromatic number of the graph.

To begin, the same argument used in  to show clique packing is a -approximation of for planar graphs easily extends to show theorem 10; we reproduce essentially the same proof as that of  for completeness, as some of the intermediate steps will be useful in subsequent results. We will also make use of the fact, noted in , that , the size of the minimum vertex cover.

###### Theorem 10.

Let be a graph, be the vertices of a maximal set of vertex-disjoint triangles in , and . Suppose the minimum vertex cover of has size , and . Then

 Cap(G)FCP(G)≤3t+k2t+kl/(2l−2). (18)
###### Proof.

To start, we have the upper bound , assuming perfectly efficient storage on all triangles, and using the bound on the remainder of the graph. We have also a lower bound , by including each triangle in in the fractional clique packing, then using the optimal packing on .

Then as is triangle-free, the maximum fractional clique packing is just a maximum fractional matching, which is equal to the minimum fractional vertex cover by duality. So to conclude, we need only bound the integrality gap of vertex cover on . Suppose we have a fractional vertex cover with variables . Vertex cover is -integral, so assume all , and as it is a fractional vertex cover, if is an edge, then . is -colorable by assumption, so let be a partition of corresponding to an -coloring of , such that

 ∑v∈I1xv≥∑v∈I2xv≥⋯≥∑v∈Ilxv.

First note that if , there are no edges, so the integrality gap of vertex cover is 1. Otherwise, we construct an integral vertex cover as follows: if is integral, then . Otherwise, if and , we set , and if but , we set . This is a vertex cover, because the only rounded-down variables were those with , and the other endpoint of any edge with must be in , as the partition corresponds to a coloring. comprises at least a -fraction of the rounded variables, so we rounded at most an -fraction of variables up from to 1, thus

 ∑v∈V(G′)yv≤2(l−1)l∑v∈V(G′)xv.

This shows the integrality gap of vertex cover is at most , so

 FCP(G)≥2t+FCP(G′)≥2t+l2l−2⋅k.

Combining these two bounds, we have

 Cap(G)FCP(G)≤3t+k2t+(l/(2l−2))⋅k.

This bound itself will be useful for proving further bounds, but also immediately provides a guarantee on the approximation quality of for graphs with small chromatic number, as if is a subgraph of , then .

###### Corollary 11.

Let be a graph with . Then

 Cap(G)FCP(G)≤max(32,2−2l).
###### Proof.

If or , then , so

 3t+k2t+(l/(2l−2))⋅k≤3t+k2t+(2/3)⋅k=32⋅t+k/3t+k/3=32.

Otherwise , so . Then we have

 3t+k2t+(l/(2l−2))⋅k≤((4l−4)/l)⋅t+k2t+(l/(2l−2))⋅k=2l−2l⋅((4l−4)/l)⋅t+k((4l−4)/l)⋅t+k=2l−2l=2−2l,

so

 Cap(G)FCP(G)≤max(32,2−2l),

as desired. ∎

In the specific case that is 3-colorable (such as when is outerplanar), we can use this additional information along with an idea from the above proof to improve further.

###### Theorem 12.

Let be a graph with . Then

 Cap(G)FCP(G)≤43.
###### Proof.

Recall that fractional minimum vertex cover and fractional maximum matching are dual, so for all . We showed in the above proof that when , the integrality gap of vertex cover is at most , so we have . As the maximum fractional matching is a feasible fractional clique packing with cliques of size at most , we have . In  it is observed that . Combining this, we have

 34VC(G)≤FVC(G)=FMM(G)≤FCP(G)≤Cap(G)≤VC(G),

thus is within a factor of . ∎

###### Corollary 13.

Let be a graph with . Then

 Cap(G)FCP(G)≤2−2k.

Now we move our attention to index coding. In the next two theorems, we provide two more general bounds on , each of which is a good approximation for certain special cases.

###### Theorem 14.

Let be a graph with , be the vertices of a maximal set of vertex-disjoint triangles in , , and be the size of a minimum vertex cover of . Suppose further that . Then

 FCC(G)Ind(G)=n−% FCP(G)Ind(G)≤j⋅l−22l−2−j⋅l−42l−2⋅tn+l2l−2.
###### Proof.

As seen in the proof of theorem 10, when . The size of the minimum vertex cover of is equal to the number of vertices of minus the size of the maximum independent set, so , thus

 n−FCP(G)≤n−2t−(l2l−2)⋅(n−3t−α(G′))=l−22l−2⋅n−l−42l−2⋅t+l2l−2⋅α(G′).

For bounding , we have . Then we simply combine the two bounds, using the fact that (as any independent set in an induced subgraph is also an independent set in the full graph):

 n−FCP(G)Ind(G) ≤l−22l−2⋅n−l−42l−2⋅t+l2l−2⋅α(G′)α(G) =l−22l−2⋅n−l−42l−2⋅tα(G)+l2l−2⋅α(G′)α(G) ≤l−22l−2⋅n−l−42l−2⋅tα(G)+l2l−2 ≤l−22l−2⋅n−l−42l−2⋅tn/j+l2l−2 =j⋅l−22l−2−j⋅l−42l−2⋅tn+l2l−2.

If instead or , we have , so , and thus using the notation above. One interesting feature of this bound is that the second term is negative for , but positive for , meaning that if or , then the bound is better when has less triangles, but for the bound becomes better as has more triangles.

As an example of when this bound might be useful, consider the case where is triangle-free outerplanar, so , and . Then we have

 n−FCP(G)Ind(G)≤3⋅14−3⋅−14⋅0n+34=32,

so for this graph family the bound gives a -approximation of . We will see later a result which attains approximation factor for general outerplanar (not necessarily triangle-free), but there may be other graph families where this bound is the best available, in particular if and are both larger than 4 and is known to contain a large set of triangles. We will use this bound later to prove a result about unit disk graphs as well.

Next, we show how to bound slightly differently in order to get a bound that does not depend on the chromatic number of , only on the number of triangles in and the chromatic number of .

###### Theorem 15.

Let be a graph, be the vertices of a maximal set of vertex-disjoint triangles in , , and . Then

 FCC(G)Ind(G)=n−% FCP(G)Ind(G)≤l2n−2ln−l2t+