I Introduction
The global mobile data traffic continues growing at an unprecedented pace and will reach 49 exabytes monthly by 2021, of which 78 percent will be video contents[1]. To meet the high capacity requirement for the future mobile networks, one promising solution is network densification, i.e., deploying dense small cell base stations (SBSs) in the existing macrocell cellular networks. Although large numbers of small cells shorten the communication distance, the major challenge is to transfer the huge amount of mobile data from the core networks to the small cells and this imposes stringent demands on backhaul links. To address this problem, caching popular contents at small cells has been proposed as one of the most effective solutions, considering the fact that most mobile data are contents such as video, weather forecasts, news and maps, that are repeatedly requested and cacheable [2]. The combination of small cells and caching will bring content closer to users, decrease backhaul traffic and reduce transmission delays, thus alleviating many bottleneck problems in wireless content delivery networks. This paper focuses on the caching design at both sub6 GHz (Wave) and millimeterwave (Wave)^{1}^{1}1In this paper, we focus on Wave frequencies from 30 GHz to 300 GHz. SBSs in dense small cell networks.
Ia Related Works
IA1 Caching in Wave and Wave networks
MmWave communication has received much interest for providing high capacity because there are vast amount of inexpensive spectra available in the 30 GHz300 GHz range. However, compared to Wave frequencies, mmWave channel experiences excessive attenuation due to rainfall, atmospheric or gaseous absorption, and is susceptible to blockage. To redeem these drawbacks, mmWave small cells need to adopt narrow beamforming and be densely deployed in an attempt to provide seemless coverage [3, 4, 5]. The study of content caching applications in mmWave networks is of great importance, due to the fact that mmWave will be a key component of future wireless access and content caching at the edge of networks is one of 5G service requirements [5]. Cache assignment with video streaming in mmWave SBSs on the highway is discussed in [6] and it is shown to significantly reduce the connection and retrieval delays. Certainly, combining the advantages of Wave and Wave technologies will bring more benefits [7]. Caching in dualmode SBSs that integrate both Wave and Wave frequencies is studied in [8], where dynamic matching gametheoretic approach is applied to maximize the handovers to SBSs in the mobility management scenarios. The proposed methods can minimize handover failures and reduce energy consumption in highly mobile heterogeneous networks. Dynamic traffic in cacheenabled network was studied in [9].
IA2 Optimization of content placement
Content placement with finite cache size is the key issue in caching design, since unplanned caching in nearby SBSs will result in more interference. The traditional method of caching most popular content (MPC) in wired networks is no longer optimal when considering the wireless transmission. A strategy that combines MPC and the largest content diversity caching is proposed in [13], together with cooperative transmission in clustercentric small cell networks. This strategy is extended to the distributed relay networks with relay clustering in [14] to combat the halfduplex constraint, and it significantly improves the outage performance. A multithreshold caching that allows BSs to store different number of copies of contents according to their popularity is proposed in [15], and it allows a finer partitioning of the cache space than binary threshold, but its complexity is exponential in the number of thresholds.
Probabilistic content placement under random network topologies has also been investigated. In [16], the optimal content caching probability that maximizes the hit probability is derived. The results are extended to heterogeneous cellular networks in [17] which shows that caching the most popular contents in the macro BSs is almost optimal while it is in general not optimal for SBSs.
IA3 Caching in Heterogeneous Networks
Extensive works have been carried out to understand the performance gain of caching for heterogeneous networks (HetNets) and stochastic geometry is the commonly used approach. In [18], the optimal probabilistic caching to maximize the successful delivery probability is considered in a multitier HetNet. The cacheenabled heterogeneous signalantenna cellular networks are investigated in [19]. The optimal probabilistic content placement for the interferencelimited cases is derived, and the result shows that the optimal placement probability is linearly proportional to the square root of the content popularity with an offset depending on BS caching capabilities. Caching policies to maximization of success probability and area spectral efficiency of cacheenabled HetNets are studied in [20]
, and the results show that the optimal caching probability is less skewed to maximize the success probability but is more skewed to maximize the area spectral efficiency. The work of
[21] proposes a joint BS caching and cooperation for maximizing the successful transmission probability in a multitier HetNet. A local optimum is obtained in the general case and global optimal solutions are achieved in some special cases. Cachebased channel selection diversity and network interference are studied in [22] in stochastic wireless caching helper networks, and solutions for noiselimited networks and interferencelimited networks are derived, respectively.IB Contributions and Organization
The existing caching design for SBSs are restricted to the singleantenna case and mainly for the Wave band. Little is known about the impact of multiple antennas at the densely deployed SBSs and the adoption of Wave band on the successful content delivery and the optimal content placement. Analyzing multiantenna networks using stochastic geometry is a known difficulty, as acknowledged in [23]. In contrast to existing works, in this paper we analyze the performance of caching in multiantenna SBSs in Wave and Wave networks, and propose probabilistic content placement schemes to maximize the performance of content delivery. The main contributions of this paper are summarised as follows:

Derivation of successful content delivery probability (SCDP) of multiantenna SBSs. We use stochastic geometry to model wireless caching in multiantenna dense small cell networks in both wave and wave bands. The SCDPs for both types of cacheenabled SBSs are derived. The results characterize the dependence of the SCDPs on parameters such as channel effects, caching placement probability, SBS density, transmission power and number of antennas.

Development of a nearoptimal crossentropy optimization (CEO) method for a general distribution of content requests. The derived SCDPs do not admit a closed form, and are highly complex to optimize. To tackle this difficulty, we first propose a constrained CEO (CCEO) based algorithm that optimizes the SCDPs. The original unconstrained CEO algorithm is a stochastic optimization method based on adaptive importance sampling that can achieve the nearoptimal solution with moderate complexity and guaranteed convergence [24]. We adapt this method to deal with the caching capacity constraints and the probabilities constraints in our problem.

Design of a simple heuristic content placement algorithm. To further reduce the complexity, we propose a heuristic twostage algorithm to maximize the SCDP via probabilistic content placement when the content request probability follows the Zipf distribution [25]. The algorithm is designed by combining MPC and caching diversity (CD) schemes while taking into account the content popularity. The solution demonstrates nearoptimal performance in singleantenna systems, and various advantages in multiantenna scenarios.

Numerical results show that in contrast to the traditional way of deploying much higher density SBSs or installing many more antennas, increasing caching capacity at Wave SBSs provides a lowcost solution to achieve comparable SCDP performance as Wave systems.
The rest of this paper is organized as follows. The system model is presented in Section II. The analysis of SCDPs for Wave and Wave systems are provided in Section III. Two probabilistic content placement schemes are described in Section IV. Simulation and numerical results as well as discussions are given in Section V, followed by concluding remarks in Section VI.
Ii System Model
We consider a cacheenabled dense small cell networks consisting of the Wave and Wave SBSs tiers. In such networks, each user equipment (UE) in a tier is associated with the nearest SBS that has cached the desired content, and the optimal designs of content placement under such association assumption can address the concern that operators are required to place the content caches close to UEs [26]. We assume that there is a finite content library denoted as , where is the th most popular content and the number of contents is , we assume each content has normalized size of 1 and each BS can only store up to contents [15, 19, 22]. The analysis and optimization can be applied to the case of unequal content sizes. It is assumed that . The request probability for the th content is , and . Without loss of generality, we assume the contents are sorted according to a descending order of .
Iia Probabilistic Content Placement
We consider a probabilistic caching model where the content is independently stored with the same probability in all SBSs of the same tier (either Wave or Wave) [16]. Let denote the probability that the th content is cached at a SBS. Fig. 1 shows an example of probabilistic caching with and , where the contents are cached at a SBS by drawing uniformly a random number which is 0.9 in this example. In the probabilistic caching strategy, the caching probability needs to satisfy the following conditions:
(1) 
Note that although the probabilistic caching strategy is used, implementation of it will allow each SBS to always cache the maximum amount of total contents up to its caching capacity .
IiB Downlink Transmission
In the considered downlink networks, each Wave SBS is equipped with antennas, and each Wave SBS has directional Wave antennas. All UEs are singleantenna nodes, in the both Wave and Wave, only one singleantenna user is allowed to communicate with the SBS at one time slot^{2}^{2}2In dense small cell networks, we assume that the density of users is much higher than the density of Wave or Wave SBSs and this can be handled by using multiple access techniques [27].. The positions of Wave SBSs are modeled by a homogeneous Poisson point process (HPPP) with the density , and the positions of Wave SBSs are modeled by an independent HPPP with the density . Define and as the point process corresponding to all SBSs that cache the content in the Wave tier and the Wave tier with the density and , respectively.
IiB1 Wave Tier
In the Wave tier, the maximumratio transmission beamforming is adopted at each SBS. All channels undergo independent identically distributed (i.i.d.) quasistatic Rayleigh block fading. Without loss of generality, when a typical Wave UE located at the origin requests the content from the associated Wave BS that has cached this content, its received signaltointerferenceplusnoise ratio (SINR) is given by
(2) 
where is the transmit power, is the the equivalent smallscale fading channel power gain between the typical Wave UE and its serving Wave SBS, where
denotes Gamma distribution, with a shape parameter
and a scale parameter . The path loss is with the distance , where is the frequency dependent constant parameter and is the path loss exponent. The is the noise power at a Wave UE. The intercell interference and are given by(3) 
In (3), is the point process with density corresponding to the interfering SBSs that cache the content , and with density is the point process corresponding to the interfering SBSs that do not store the content . The
are the interfering channel power gains that follow the exponential distribution, and
denote the distances between the interfering SBSs and the typical UE.IiB2 Wave Tier
In the mmWave tier, we assume that the directional beamforming is adopted at each mmWave SBS and smallscale fading is neglected, since smallscale fading has little change in received power as verified by the practical mmWave channel measurements in [28]. Note that the traditional smallscale fading distributions are invalid for mmWave modeling due to mmWave sparse scattering environment [29]. Unlike the conventional Wave counterpart, Wave transmissions are highly sensitive to the blockage. According to the average lineofsight (LOS) model in [30, 31], we consider that the Wave link is LOS if the communication distance is less than , and otherwise it is nonelineofsight (NLOS). Moreover, the existing literature has confirmed that Wave transmissions tend to be noiselimited and interference is weak [30, 32]. Therefore, when a typical Wave UE requests the content from the associated Wave SBS that has cached this content, its received SINR is given by
(4) 
where is the transmit power of the Wave SBS, is the mainlobe gain of using direction beamforming and equal to number of antenna elements [33]. The path loss is expressed as with the distance and frequencydependent parameter . The path loss exponent when it is a LOS link and when it is an NLOS link. The is the combined power of noise and weak interference ^{3}^{3}3 Wave in dense networks works in the noiselimited regime, since the high path loss impairs the interference, which could improve the signal directivity [32]. In contrast to the sub6 GHz counterpart which is usually interferencelimited, mmWave networks tend to be noiselimited when the BS density is not extremely dense, due to the narrow beam and blocking effects [34]. For completeness, we also incorporate weak interference here..
Iii Successful Content Delivery Probability
In this paper, SCDP is used as the performance indicator, which represents the probability that a content requested by a typical UE is both cached in the network and can be successfully transmitted to the UE. We assume that each content has bits, and the delivery time needs to be less than .
By using the Law of total probability, the SCDP in the
Wave tier is calculated as(5) 
where is the Wave bandwidth allocated to a typical user (frequencydivision multiple access (FDMA) is employed when multiple users are served by a SBS in this paper), and . Likewise, in the Wave tier, the SCDP is calculated as
(6) 
where is the Wave bandwidth allocated to a typical user, and . The rest of this section is devoted to deriving the SCDPs in (III) and (III).
Iiia Wave Tier
Theorem 1
In the cacheenabled Wave tier, the SCDP is given by
(7) 
where denotes the probability that the th request content is successfully delivered to the Wave UE by its serving SBS, and is expressed as
(8) 
where is given by (1) at the top of the this page, which represents the conditional coverage probability that the received SINR is larger than given a typical communication distance .
is the probability density function (PDF) of the distance
between a typical Wave UE and its nearest serving SBS that stores content , and is given by [35](9) 
(10) 
where , is the Cosecant trigonometry function, and
(11) 
(12) 
Proof 1
Please see Appendix A.
Note that becomes the probability of successful transmission from the serving SBS to the typical user when =1 in traditional Wave networks without caching. We see that the SCDP expression for multiantenna systems is much complicated, compared to the closedform expression for singleantenna systems in [19].
IiiB Wave Tier
Theorem 2
In the cacheenabled Wave tier, the SCDP is given by
(13) 
where and denote that probabilities that the content is successfully delivered when the Wave UE is connected to its serving Wave SBS via LOS link and NLOS link, and are given by
(14) 
and
(15) 
respectively, where and .
Proof 2
Please see Appendix B.
Iv Optimization of Probabilistic Content Placement
In this section, we aim to maximize the SCDP by optimizing the probabilistic content placement . The main difficulty is that the SCDP expressions (7) and (13) do not have a closed form for the multiantenna case and whether they are concave with regard to is unknown, which is much more challenging than the singleantenna SBS case studied in [19]. Therefore, the optimal content placement problem for the multiantenna case is distinct. To tackle this new problem, here we propose two algorithms, the first one is developed based on the CEO method that can achieve nearoptimal performance, and the other twostair scheme is based on the combination of MPC and CD content placement schemes with reduced complexity.
Iva The NearOptimal CCEO Algorithm
The optimal caching placement probability in the multiantenna case is hard to achieve, so we introduce CEO to resolve the difficulty of maximizing the SCDP by optimizing the probabilistic content placement. CEO is an adaptive variance algorithm for estimating probabilities of rare events. The rationale of the CEO algorithm is to first associate with each optimization problem a rare event estimation problem, and then to tackle this estimation problem efficiently by an adaptive algorithm. The outcome of this algorithm is the construction of a random sequence of solutions which converges probabilistically to the optimal or nearoptimal solution
[24, 36]. The CEO method involves two iterative steps. The first one is to generate samples of random data according to a specified random (normally Gaussian) distribution. And the second step updates the parameters of the random distribution, based on the sample data to produce better samples in the next iteration.
The CEO algorithm has been successfully applied to a wide range of difficult optimization tasks such as traveling salesman problem and antenna selection problem in multiantenna communications [37]. It has shown superior performance in solving complex optimization problems compared to commonly used simulated annealing (SA) and genetic algorithm (GA)
[38] that are based on random search.The original principle of the CEO algorithm was proposed for unconstrained optimization. To deal with the constraints on the probabilities and the content capacity constraint, we propose a CCEO algorithm as shown in Algorithm 1. In the proposed CCEO algorithm, we force the randomly generated samples to be within the feasible set in the Project step. To satisfy the constraint of , we introduce a penalty function to the original objective function in the Modification step, where is a large positive number that represents the parameter for the penalty function. The dynamic Smoothing step will prevent the result from converging to a suboptimal solution. It can be seen that at each iteration, the main computation is to evaluate the objective functions for times and no gradient needs to be calculated, so the complexity is moderate and can be further controlled to achieve a complexityconvergence tradeoff.
(16) 
(17) 
(18) 
(19) 
(20) 
(21) 
In Fig. 2, we provide an example of the iterative results of content placement probabilities with iteration indices , , , and . In this example, the algorithm converges when . Each subfigure presents the resulting mean value of at the end of iteration , and it will help to generate random samples in next iteration. We can observe that when , the caching placement probability is quite close to the converged solution, which could significantly reduce the complexity. Overall the CEO algorithm converges fast and is an efficient method to find the nearoptimal SCDP result, and the complexity of the CEO algorithm is [39]. It is also noted that the top ranked contents are cached with probability , while to make effective use of the rest caching space, caching diversity is more important. Based on this observation, we design a lowcomplexity heuristic scheme in the next subsections.
IvB TwoStair Scheme for the Wave Tier
To further reduce the complexity of the optimization, we devise a simple twostair (TS) scheme, when the content popularity is modeled as the Zipf distribution [13, 25, 16] based on empirical studies, which is given by
(22) 
where is the Zipf exponent that represents the popularity skewness.
In the TS scheme, a fraction of caching space () at a SBS is allocated to store the most popular contents which is called the MPC region. The remaining cache space is allocated to randomly store the contents with certain probabilities and is called the CD region. As illustrated in Fig. 3, in the ‘TwoStair’ caching scheme, the contents in the CD region are cached with a common probability . The rest of the contents are not cached and must be fetched through the backhaul links. These content placement schemes will be studied in detail in the rest of this section.
In this scheme, the content placement probabilities need to satisfy the following conditions:
(23) 
which are characterized by two variables and , where denotes the common probability value that content in the CD region is stored at a SBS.
As such, the Wave SCDP (7) can be expressed as
(24) 
It is seen in (IVB) that contents have the same SCDP , and contents have the same SCDP . Our aim is to maximize the overall SCDP, and the problem is formulated as
(25) 
where is the indicator function that returns one if the condition A is satisfied. The convexity of the problem (IVB) is unknown, and finding its global optimal solution is challenging. To obtain an efficient caching placement solution, we first use the following approximations [40]
(26) 
(27) 
respectively, based on the fact that for Zipf popularity with , and , we have [40]. Therefore, the objective function of (IVB) can be approximated as
(28) 
Note that for the special case of MPC caching, i.e., , the above reduces to .
Then the problem (IVB) can be approximated as
(29) 
Because and are coupled in the objective function of (IVB), we use a decomposition approach to solve this problem. Since is always positive, given , the optimal is obtained by solving the following equivalent subproblem:
(30) 
where is independent of . Thus, we have the following theorem:
Theorem 3
Proof 3
Please see Appendix C. For to be in the range of , should satisfy
Consequently, the problem (IVB) reduces to the following optimization problem about only:
(32) 
Since the problem (32) is nonconvex, we propose to use Newton’s method to solve it, which is shown in the Appendix D. Note that the Newton’s method converges faster than the KarushKuhnTucker (KKT) method and the gradientbased method [41]. Suppose the obtained solution is , then the optimal is , and the optimal can be obtained from (31).
IvC TwoStair Scheme for the Wave Tier
Similar to the Wave case, the SCDP of the Wave tier can be approximated by
(33) 
Then the optimal twostair content caching can be found obtained by solving the following problem:
(34) 
The problem (IVC) can be efficiently solved by following the decomposition approach. Given , the optimal is obtained by solving the following equivalent subproblem:
(35) 
where . The rest procedures follow the same approach in the section IVA, except that the derivation of the search direction to solve the optimal , which is provided Appendix E.
V Results and Discussions
In this section, the performance of the proposed caching schemes are evaluated by presenting numerical results. Performance comparison between cacheenabled Wave and Wave systems is also highlighted. The system parameters are shown in Table I, unless otherwise specified. 1 GHz and 60 GHz are chosen for the Wave and Wave frequency bands, respectively.
Parameters  Values 

Number of Antenna in WaveSBS()  2 
Mainlobe Array Gain in WaveSBS ()  2 
LOS region ()  15 m 
Transmit power of each WaveSBS  20 dBm 
Transmit power of each WaveSBS  20 dBm 
SBS’s density for Wave and Wave  ,=600/km 
Path loss exponent =1 GHz  =2.5 
Path loss exponent =60 GHz [42]  =2.25,=3.76 
Bit rate of each content ()  bit/s 
Available bandwidth in Wave ()  10 MHz 
Available bandwidth in Wave ()  1 GHz 
SBS cache capacity ()  10 
Content library size ()  100 
Zipf exponent ()  0 2 
Fig. 4 verifies the SCDPs for content derived in Theorem 1 and Theorem 2 against the content placement probability. The analytical results are obtained from (8), (14) and (15). The SCDP for an arbitrary content is observed to be a monotonically increasing and concave function of the caching placement probability for both Wave and Wave systems. Notice that all our derived analytical results match very well with those ones via Monte Carlo simulations averaged over 2,000 random user drops and marked by ’’.
In Fig. 5, we examine the comparison of successful transmission probabilities of Wave from (8) and Wave from (14) and (15) as bit rate of each content varies, which corresponds to the case with caching placement probability . It is seen that when content size is small, the Wave system shows better performance than Wave, but as the content size increases, the Wave system outperforms the Wave system for its ability to provide high capacity. The successful Wave transmission probability shows a ‘ladder drop’ effect, and this is because the Wave system combines LOS part and NLOS part. The LOS effect is limited to the region within the distance while NLOS has a much wider coverage, so when the required content size is small, the performance is dominated by the NLOS part. However, the NLOS part cannot provide high capacity due to the much larger path loss exponent , so its performance drops steeply as the bit rate of each content increases.
Next, in Figs. 67, we compare the performance of the two proposed content placement schemes with the closeform optimal solution [19] and the intuitive MPC scheme [20] in the Wave singleantenna case. Note that in the general multiantenna setting, the closeform optimal content placement is still unknown. The SCDP with different caching capacity is shown in Fig. 6. It is observed that the CCEO algorithm achieves exactly the same performance as the known optimal solution in [19], and the proposed TS scheme provides closetooptimal and significantly better performance than the MPC solution, especially when is large and the caching capacity is small. The MPC solution is the worst caching scheme because it ignores the content diversity which is particularly important when the content popularity is more uniform. Fig. 7 shows the SCDP with different content sizes . It is found that the SCDP of the TS scheme is closer to the optimum when the is large. However, as the bit rate of each content increases, both TS and MPC schemes become very close to the optimal solution.
Fig. 8 shows the SCDP comparison of various systems with different caching capacities . It shows that both of the proposed content placement schemes perform consistently better than MPC, especially for the 60GHz Wave, the SCDP of the TS scheme is close to that of the CCEO algorithm. The results also indicate that Wave always has a superior performance than the 60GHz Wave with the same SBS density of .
Fig. 9 shows the SCDP comparison of various systems versus the caching capacities with different content sizes. We generate a random set of of content size , where denotes the content size of . For simplicity, is chosen to be or with equal probability of in our simulation. The caching probability satisfies . It is shown that in the unequalsize content case, CEO still greatly outperforms MPC, following a similar trend as the equalsize content case.
Fig. 10 studies the impact of content library size on SCDPs of different systems. It is seen that as the library size increases, the SCDP drops rapidly. The gap between the proposed content placement schemes and the MPC scheme remain stabilized when the library size increases.
Fig. 11 compares the SCDPs for the two proposed content placement schemes against Zipf exponent . It can be seen that the SCDP increases with because caching is more effective when the content reuse is high. In the high regime of both Wave and Wave systems, the content request probabilities for the first few most popular content are large, and SCDPs of both proposed placement schemes almost coincide. It is noteworthy that the proposed TS placement scheme achieves performance close to the CCEO algorithm, especially in the Wave system and at low and high regimes.
Finally, we investigate the cachedensity tradeoff and its implication on the comparison of Wave and Wave systems. The CEO placement scheme is used. Fig. 12 demonstrates the SCDPs with different caching capacity , SBS densities and . It is also observed that the Wave channel is usually better than the Wave channel when , so with the same SBS density, Wave achieves higher SCDP. To achieve performance comparable to that of the Wave system with SBS density of , the Wave system needs to deploy SBSs with a much higher density of , but the extra density of =400 /km is too costly to afford. Fortunately, by increasing the caching capacity from 10 to 20, the Wave system can achieve the same SCDP of 91% as the Wave system while keeping the same density of . This result shows great promise of cacheenabled small cell systems because it is possible to trade off the relatively cheap storage for reduced expensive infrastructure.
Vi Conclusion
In this paper, we have investigated the performance of caching in Wave and mmWave multiantenna dense networks to improve the efficiency of content delivery. Using stochastic geometry, we have analyzed the successful content delivery probabilities and demonstrated the impact of various system parameters. We designed two novel caching schemes to maximize the successful content delivery probability with moderate to low complexities. The proposed CCEO algorithm can achieve nearoptimal performance while the proposed TS scheme demonstrates performance close to CCEO with further reduced complexity. An important implication of this work is that to reduce the performance gap between the Wave and mmWave systems, increasing caching capacity is a lowcost and effective solution compared to the traditional measures such as using more antennas or increasing SBS density. As a promising future direction, to study cooperative caching in a multiband Wave and mmWave system could further reap the benefits of both systems.
Appendix A: Proof of Theorem 1
Based on (III), is calculated as
(A.1) 
where is the conditional coverage probability, and is the PDF of the distance . Then, we derive as
(A.2) 
where . Note that