Cache-aided delivery protocols represent a promising solution to counteract the dramatic increase in demand for multimedia content in wireless networks. Caching techniques have been widely studied in literature with the aim of reducing the backhaul congestion, the energy consumption and the latency. In cache-enabled networks content is pre-fetched close to the user during network off-peak periods in order to directly serve the users when the network is congested. In  Maddah-Ali et al. aim at reducing the transmission rate in a network where each user has an individual cache memory. In that work, the idea of coded caching is introduced, so that the cache memory not only provides direct local access to the content but also generates coded multicasting opportunities among users requesting different files.
In  and , maximum distance separable (MDS) codes are proposed for minimizing the use of the backhaul link during the delivery phase in networks with caches at the transmitter side only. In  a delayed offloading scheme based on MDS codes is proposed to spare backhaul link resources in a network with mobile users. Caching schemes leveraging on MDS codes have also been proposed for device to device communication in order to reduce the latency .
MDS codes, such as Reed-Solomon codes, are optimal in the sense that they achieve the Singleton bound. The drawback of such codes is their limited flexibility in the choice of the code parameters (e.g. the block length) once the finite field is fixed, and the fact that the rate of the code is set before the encoding takes place. Unlike MDS codes, fountain codes  are rateless, i.e., their rate can be adapted on-the-fly. This has the advantage of adding flexibility to the network, allowing a dynamic resource management.
Extensive studies regarding caching for terrestrial applications can be found in literature while limited work is available in the context of heterogeneous satellite networks [7, 8, 9, 10]. In  a off-line caching approach over a hybrid satellite-terrestrial network is proposed for reducing the traffic of terrestrial network. However, spare backhaul resources is of particular importance not only for terrestrial networks but also in satellite systems.
To the best of the authors knowledge, the application of linear random fountain codes (LRFCs) for caching content in satellite networks has not been proposed yet. In this paper we study and optimize the performance of fountain codes for caching-enabled networks with satellite backhauling. We derive the average backhaul rate 111we define the average backhaul rate as the average number of coded packets (output symbols) that the GEO needs to send through the bakchaul link during the delivery phase to serve the request of a user. for such system and optimize the cache placement. Our results show that the performance of the caching system using linear random fountain codes is close to that of a system based on MDS codes already for a field size .
The rest of the paper is organized as follows. Section II introduces the system model while in Section III some preliminaries on LRFC are presented. The achievable backhaul rate is presented in Section IV while the optimization of the number of coded symbol to be memorized at each cache is presented in Section V. In Section VI the numerical results are presented. Finally, Section VII contains the conclusions.
Ii System Model
We consider a heterogeneous network composed of a single geostationary Earth orbit (GEO) satellite and a number of hubs (e.g. terrestrial repeaters or high altitude platform station (HAPS)) with cache capabilities, as shown in Fig. 1. Each hub is connected to the GEO satellite through a backhaul link. Users are assumed to be fixed and to have a limited antenna gain so that a direct connection to the GEO satellite is not possible. Depending on their location users may be connected to one or multiple hubs. We denote by
the probability that a user is connected tohubs.
The GEO has access to a library of equally long files . We assume that users request files from the library independently at random. Furthermore, we assume that the probability of file being requested, , follows a Zipf distribution with parameter leading to
The assumption is made that all files are fragmented into input symbols (packets). The GEO satellite encodes each file independently, using a fountain code. Each hub has storage capability for files, i.e., for packets.
We shall assume that the coded caching scheme is split into two phases. During the placement phase the GEO sends a number of output symbols from file to each hub, which are cached by each hub. Note that for the same file each hub caches the same number of output symbols (encoded packets), however, the sets of output symbols cached at different hubs are different. The placement phase is assumed to be carried out offline. During the delivery phase, a user requests a file at random. In a first stage, the user downloads output symbols of cached in the hubs he is connected to. If the number of symbols is not enough for decoding successfully, then the GEO satellite generates additional output symbols. The GEO sends the additional output symbols to the user via one of the hubs he is connected to. For simplicity we assume that all transmissions are error-free.
Iii Linear Random Fountain Codes
In this work we consider the use of LRFCs for the delivery of the different files in the library. Each file is fragmented into input symbols, . For simplicity, we assume in the following . The case of , i.e. the case in which packets are m symbols long, can be addressed as a straightforward extension. The LRFC encoder generates a sequence of output symbols , where the number of outputs symbols can grow indefinitely. In particular, the -th output symbols is generated as
where the coefficients are picked independently at random with uniform probability in . For fixed , LRFC
encoding can be expressed as a vector matrix multiplication
where is the vector of input symbols and is a matrix, whose elements are selected independently and uniformly at random in .
In order to download a file, a user must collect a set of output symbols . If we denote by the set of indices corresponding to the output symbols collected by the receiver we have
The user attempts decoding by solving the system of equations
where is a matrix corresponding to the columns of associated to the collected output symbols i.e., the columns of with indexes in . If the system of equations admits a unique solution (i.e., if is full rank), decoding is declared successful after recovering , for example by means of Gaussian elimination. If is rank deficient, a decoding failure is declared. In the latter case the receiver reattempts decoding after collecting one or more additional output symbols.
Let us define as the receiver overhead , that is, the number of output symbols in excess to that the receiver has collected. Given , and , the probability of decoding failure of an LRFC is given by
and can be tightly lower and upper bounded as 
Note that the bounds are independent from the number of input symbols and become tighter for increasing .
For notational convenience, in the remaining of the paper we shall use the probability of decoding success rather than the probability of decoding failure, which is simply defined as
Iv Average Backhaul rate
We define the average backhaul rate as the average number output coded symbols that the GEO has to send during the delivery phase in order to fulfill a user request.
In this section, we derive the overhead decoding probability , that is, the probability that a user needs exactly coded symbols to successfully decode the requested file. Then, we calculate the average backhaul transmission rate for a LRFC coded caching scheme.
Iv-a Overhead Decoding Probability
Let us denote the event that the matrix is full rank when output symbols have been collected, where . Let us denote the complementary event, i.e. the rank of is smaller than , with . We are interested in deriving
we have that
which can be rewritten as
Note that the bounds are independent of the overhead (for non negative ) and become tight as grows. Note also that for the lower bound becomes and, hence, it loses significance.
Iv-B Overhead Average
Let us denote as
the random variable associated to the average number of symbols in excess tothat a user needs to recover the requested content and let us also denote as its realization. We can calculate the average overhead as follows
Iv-C Backhaul Rate
Let be the random variable associated to the number of output symbols for a file requested by a user in the hubs he is connected to, and let be its realization. Let be the random variable associated to the number of hubs a user is connected to, being its realization. Finally, let be the random variable associated to the index of the file requested by a user, being its realization. We have
where we recall that stands for the number of coded symbols from file stored in every hub. The probability mass function of is
We are interested in deriving the distribution of the backhaul rate, i.e, the number of output symbols which have to be sent over the backhaul link to serve the request of a user, which we denote by random variable . If we condition to , it is easy to see how the probability of corresponds to the probability that decoding succeeds when the user has received exactly output symbols from the backhaul link in excess to the output symbols it received from the hubs through local links, that is, when .
In order to derive we shall distinguish two cases.
If , then
If , then
We are after the expectation of , which is obtained as
where in the last equality we distinguished two different cases, and . Let us define and as
If we introduce the variable change in the expression of , we obtain
where equality is due to for and equality is due to
Introducing the same variable change in the expresion of we have
where the inequality is due to
The expression of the average backhaul rate given by
V Lrfc Placement Optimization
The LRFC placement problem calls for minimizing the average backhaul rate during the delivery phase. To this end, we want to optimize the number of coded symbols per file that each hub has to cache. We present in this section the placement optimization problem adapted to a LRFC cached scheme based on the optimization problem proposed for MDS codes in .
The optimization problem can be written as
The first constraint specifies that the total number of stored coded symbols should be equal to the size cache. The second constraints accounts for the discrete nature of the optimization variable.
Solving exactly the optimization problem requires evaluating (20), which can be computationally complex when and are large. Hence, as an alternative to minimizing the average backhaul rate, we propose minimizing its upper bound in (40), which leads to the following optimization problem
Since the upper bound on in (40) relies on the upper and lower bounds in (1), which are tight, we expect the result of the optimization problem in (44) to be close to the result of the optimization problem in (41).
In this section, we numerically evaluate the normalized average backhaul rate, which we define as .
In all the setups, we consider that users are uniformly distributed within the coverage area of the satellite and border effects are neglected. We consider that each hub covers a circular area of radiuscentered around the hub. For simplicity, we assume that the hubs are arranged according to a uniform two dimensional grid, with spacing . Unless otherwise specified, we assume km and km. Thus, the coverage areas of different hubs partially overlap, as it can be observed in Fig. 1. After some simple geometrical calculations we obtain that the following connectivity distribution
We first evaluate the tightness of the upper bound (14) on average overhead. Table I shows for different values of are shown. The values in the second column were numerically derived from equation (12) while values in the third column were derived from the bound in equation (14). We can see that the bound becomes tighter for increasing .
|upper bound (14)|
In the first scenario, we study how the cache size impacts in the average backhaul rate. In this setup, we assume all user are connected to exactly one hub, i.e. , furthermore, we consider that file popularity is uniformly distributed (i.e Zipf distributed with parameter ). Moreover, we assume a library size and we assume that each file is fragmented into input symbols. We optimized the number of LRFC coded symbols cached at each hub by solving the problem (44) for , and and we calculated the average backhaul rate the fountain coding caching scheme. As a benchmark we used the MDS caching scheme from .
In Fig. 2 the normalized average backhaul rate is shown as a function of the memory size . We can observe how the penalty on the average rate for using LRFC with respect to a MDS code becomes smaller for increasing and already for is almost negligible. We remark that for the cache size coincides with the library size, hence, the backhaul rate for the MDS scheme becomes zero, whereas for the LRFC schemes the average backhaul rate coincides with the average overhead.
In our second setup, we assume that users can be connected to multiple hubs. We consider the connectivity distribution given in (45) and file popularity distribution with parameter . The library size is set to and the number of input symbols is , the same as in the previous scenario. The optimal cache placing is computed for each LRFC scheme as well as for the MDS.
In Fig. 3 we show the impact of memory size on the normalized average backhaul rate for different code caching schemes when users can be connected to multiple hubs. Similarly to the previous scenario, we see that for sufficiently large the performance of the LRFC caching scheme approaches that of the MDS code. Note that since a MDS code achieves the best possible performance, this result shows implicitly that solving the optimization problem in (44) yields a solution that is close to that of solving the optimizaiton problem in (41). We further observe that LRFC caching with storage capabilities equal to 10% of the library size can reduce the average backhaul rate for at least 40% with respect to a system with no caching ().
For the same connectivity distribution, a fixed memory size and library size , we investigate how the file distribution impacts on the average backhaul rate. In Fig. 4 the normalized average backhaul rate is shown as a function of the file parameter distribution . As expected, when increases caching schemes become more efficient since the majority of the requests is concentrated in a small number of files. Looking at the figure we can observe how for , a LRFC in requires roughly 12% more transmissions over backhaul link than a LRFC in . For the LRFC of order requires only 4.7% more than in .
In our last setup we consider , , and the distribution given in (45). We evaluate the average backhaul rate for different cardinalities of the library. In Fig. 5 the nomalized average backhaul rate is shown as a function of the library size. For a fixed memory size the average backhaul rate increases as the library size increases. As it can be observed, also in this case the proposed LRFC caching scheme performs similarly to a MDS scheme.
In this work we analyzed fountain code schemes for caching content at the edge. We considered a heterogeneous satellite network, composed of a GEO satellite and a number of hubs which can cache content. We focus on reducing the average rate in the backhaul link which connects the GEO to the hubs. For this setting, we derived the analytical expression of the average backhaul rate as well as an upper bound of it. Making use of this upper bound, we formulated the cache placement optimization problem for a fountain coding caching scheme using linear random fountain codes (LRFCs). Finally, we presented simulation results where we compared the performance of the LRFC scheme with a MDS scheme. Our simulation results indicate that the performance of the LRFC caching scheme built over a finite field moderate order approaches that of the MDS caching scheme.
-  M. A. Maddah-Ali and U. Niesen, “Cache-aided interference channels,” in IEEE Int. Symposium on Info. Theory (ISIT), Hong Kong, Jun. 2015.
-  V. Bioglio, F. Gabry, and I. Land, “Optimizing MDS codes for caching at the edge,” in Proc. IEEE Globecomm, San Diego, U.S.A., Dec. 2015.
-  J. Liao, K. K. Wong, M. R. A. Khandaker, and Z. Zheng, “Optimizing cache placement for heterogeneous small cell networks,” IEEE Commun. Lett., vol. 21, no. 1, pp. 120–123, Jan. 2017.
-  E. Ozfatura and D. Gündüz, “Mobility and popularity-aware coded small-cell caching,” IEEE Commun. Lett., vol. 22, no. 2, pp. 288–291, Feb. 2018.
-  A. Piemontese and A. G. i. Amat, “MDS-coded distributed storage for low delay wireless content delivery,” in Int. Symposium on Turbo Codes and Iterative Information Processing (ISTC), Brest, France, Sep. 2016.
-  D. J. C. MacKay, “Fountain codes,” IEEE Proc.-Commun., vol. 152, pp. 1062–1068, 2005.
-  T. de Cola, G. Gonzalez, and V. E. V. Mujica, “Applicability of ICN-based network architectures to satellite-assisted emergency communications,” in Proc. IEEE Globecomm, Washington, DC, USA, Dec. 2016.
-  H. Wu, J. Li, H. Lu, and P. Hong, “A two-layer caching model for content delivery services in satellite-terrestrial networks,” in Proc. IEEE Globecomm, Washington, DC, USA, Dec. 2016.
-  S. Liu, X. Hu, Y. Wang, G. Cui, and W. Wang, “Distributed Caching Based on Matching Game in LEO Satellite Constellation Networks,” IEEE Commun. Lett., vol. 22, no. 2, pp. 300–303, Feb. 2018.
-  H. Kalantari, M. Fittipaldi, S. Chatzinotas, T. X. Vu, and B. Ottersten, “Cache-assisted hybrid satellite-terrestrial backhauling for 5G cellular networks,” in Proc. IEEE Globecomm, Singapore, Singapore,, Dec. 2017.
-  G. Liva, E. Paolini, and M. Chiani, “Performance versus overhead for fountain codes over ,” IEEE Comm. Letters., vol. 14, no. 2, pp. 178–180, 2010.