Log In Sign Up

Content Popularity Prediction Towards Location-Aware Mobile Edge Caching

by   Peng Yang, et al.
University of Waterloo
Arizona State University
Huazhong University of Science u0026 Technology
Texas A&M University Corpus Christi
Beihang University

Mobile edge caching enables content delivery within the radio access network, which effectively alleviates the backhaul burden and reduces response time. To fully exploit edge storage resources, the most popular contents should be identified and cached. Observing that user demands on certain contents vary greatly at different locations, this paper devises location-customized caching schemes to maximize the total content hit rate. Specifically, a linear model is used to estimate the future content hit rate. For the case where the model noise is zero-mean, a ridge regression based online algorithm with positive perturbation is proposed. Regret analysis indicates that the proposed algorithm asymptotically approaches the optimal caching strategy in the long run. When the noise structure is unknown, an H_∞ filter based online algorithm is further proposed by taking a prescribed threshold as input, which guarantees prediction accuracy even under the worst-case noise process. Both online algorithms require no training phases, and hence are robust to the time-varying user demands. The underlying causes of estimation errors of both algorithms are numerically analyzed. Moreover, extensive experiments on real world dataset are conducted to validate the applicability of the proposed algorithms. It is demonstrated that those algorithms can be applied to scenarios with different noise features, and are able to make adaptive caching decisions, achieving content hit rate that is comparable to that via the hindsight optimal strategy.


page 9

page 10

page 11

page 13

page 14

page 15

page 19

page 22


User Preference Learning Based Edge Caching for Fog-RAN

In this paper, the edge caching problem in fog radio access networks (F-...

Online Edge Caching and Wireless Delivery in Fog-Aided Networks with Dynamic Content Popularity

Fog Radio Access Network (F-RAN) architectures can leverage both cloud p...

Using Grouped Linear Prediction and Accelerated Reinforcement Learning for Online Content Caching

Proactive caching is an effective way to alleviate peak-hour traffic con...

Caching Contents with Varying Popularity using Restless Bandits

Mobile networks are experiencing prodigious increase in data volume and ...

Online Learning Models for Content Popularity Prediction In Wireless Edge Caching

Caching popular contents in advance is an important technique to achieve...

Online Caching with Wireless Fronthauling and Delivery in Fog-Aided Networks

Fog Radio Access Network (F-RAN) exploits cached contents at edge nodes ...

Edge-Cloud Collaboration Enabled Video Service Enhancement: A Hybrid Human-Artificial Intelligence Scheme

In this paper, a video service enhancement strategy is investigated unde...

I Introduction

The past decade has witnessed a significant growth of mobile traffic. Such growth puts tremendous pressure on the paradigm of Cloud-based service provisioning, since moving a large volume of data into and out of the cloud wirelessly requires substantial spectrum resources, and meanwhile may incur large latency. Mobile Edge Computing (MEC) emerges as a new paradigm to alleviate the capacity concern of mobile networks [1]. Residing on the network edge, MEC makes abundant storage and computing resources available to mobile users through low-latency wireless connections, facilitating a number of mobile services like local content caching, augmented reality, and cognitive assistance [2].

Among these services, content caching at the network edge is garnering much attention [3]-[17]. In particular, with the prevalence of social media, multimedia contents are spreading among mobile users in a viral fashion, putting high pressure on the network backhaul [9, 11]. It is pointed out that, by caching contents on network edge, up to traffic on the backhaul can be reduced [2]. Unfortunately, compared with the increasing content volume, the storage size at edge node (EN) is always limited. It is impossible to cache all the contents locally. Hence, identifying the optimal set of contents that maximizes cache utilization becomes crucial.

Content popularity is an effective measure for making caching decisions. Extensive works have been devoted to popularity-based content caching. According to the features of content popularity profile, those works on content caching can be classified into three categories: 1) known popularity profile

[11]-[13]; 2) fixed but unknown popularity profile [16, 17]; and 3) time-varying and unknown popularity profile [19, 20]. In case of fixed and unknown popularity profile, learning algorithms have been proposed under different network settings. In case of time-varying and unknown popularity profile, context information of the request, including system states and user characteristics, is exploited to make content hit rate predictions. To improve the accuracy of popularity prediction, the context space needs to be subtly designed since there is endless context information that could be taken into consideration. It is often difficult to directly identify the factors that influence content popularity. More importantly, using user information for context differentiation is subject to privacy regulations and may not be applicable in practice.

In this paper, we investigate mobile edge caching with time-varying and unknown popularity profile. Instead of relying on user information for context differentiation, we explore location features of each EN to improve the accuracy of popularity prediction, with the rationale outlined as follows. First, locations can be divided into categories with distinct social functions, such as residential area and business district. Meanwhile, users in different places have diverse interests [22]. As indicated by real-world measurement studies [35], the distribution of content popularity for even adjacent Wi-Fi APs and cellular base stations are different, and existing content caching schemes do not take such fine-grained popularity difference into consideration [36]

. To further improve the content distribution in mobile context, it is crucial to investigate content popularity with location awareness. Given that there is no established model to characterize location features and user demands, we take some initial steps to devise a model where user demand of a certain content is treated as a linear combination of content features and location characteristics with unknown noise. It follows that, the popularity prediction problem boils down to the estimation of location feature vector of each EN in the presence of noise. In practice, the noise process is affected by various factors. Firstly, it is affected by location-dependent factors, such as user interests, the number of users and the social function of the coverage area of each EN. Secondly, it is also affected by content-dependent factors, which include genre, length, and frame quality for video contents. Unfortunately, it is often difficult for content providers and edge servers to understand the statistical nature of the underlying noise process in such complicated context space. To solve the location feature estimation problem, two online prediction algorithms are proposed for different scenarios.

To start with, we consider the tractable zero-mean noise scenario as the first step. A ridge regression based prediction algorithm (RPUC) is proposed to estimate the location feature vector. To account for the impact of noise, a positive perturbation is added to the result as the correction of the prediction. By comparing to the hindsight optimal caching policy, theoretical analysis shows that the RPUC algorithm achieves sublinear regret, i.e., it asymptotically approaches the optimal strategy in the long-term.

Furthermore, we consider practical cases where noise structure is unknown a priori. To ensure robust prediction, we resort to the filter technique, which enables us to obtain guaranteed accuracy even in the worst-case scenario. In particular, taking a prescribed accuracy threshold as an input, we propose an based prediction algorithm (HPDT), which is robust as long as the noise amplitude is finite.

Both RPUC and HPDT require no training phases, and hence are adaptive to the time-varying user demand. Numerical analysis indicates that, the regret of RPUC originates from the bias and variance of ridge regression, as well as the artificial perturbation. Note that the HPDT algorithm is conservative in that it makes no assumption on the noise. Yet, it is still able to make unbiased estimation on the location feature vector. Extensive simulations on real world traces demonstrate that those two algorithms can be applied to scenarios with different noise features, and both of them are able to make adaptive caching decisions, achieving content hit rate that is comparable to that using the hindsight optimal strategy. The contributions of this work on mobile edge caching are three-fold:

  • We propose to exploit the diversity of content popularity over different locations. We establish a linear model for content popularity prediction, taking into account both content and location features.

  • We develop two popularity prediction algorithms that deal with different noise models. Both algorithms are able to make location-aware caching decisions. Moreover, they require no training phases, and hence can adapt to dynamic user demand.

  • We demonstrate the effectiveness of the proposed algorithms through theoretical analysis. It is proved that performance of the RPUC algorithm asymptotically approaches that using the hindsight optimal strategy, while performance of the HPDT algorithm hinges upon noises. Experiments on real dataset crawled from YouTube show that, the long-term content hit rates of the proposed algorithms are comparable to that via the hindsight optimal strategy.

The remainder of the paper is organized as follows. Section II reviews related works on content caching in wireless networks. Section III describes the system model, including the mobile edge caching architecture and the formal problem formulation. In Section IV, we propose the RPUC caching algorithm for the case of zero-mean noise, and give the detailed performance analysis. For the case of unknown noise model, we present the HPDT algorithm as well as detailed regret analysis in Section V. Numerical analysis and experimental results of the two algorithms are provided in Section VI, followed by concluding remarks in Section VII.

Ii Related Work

Mobile user’s capacity is greatly augmented in the era of MEC. As a result, mobile service provisioning is expected to have further improved quality of experience (QoE) [2]. To this end, various mobile edge architectures have been proposed. Tandom et al. proposed to deploy edge resources within radio access networks. They characterized the relationship between latency and caching size, as well as latency and fronthaul capacity, from an information-theoretic perspective [3]. Yang et al. introduced an edge resource provisioning architecture based on cloud radio access network (C-RAN), and devised a cloud-edge interoperation scheme via software defined networking techniques [4]. Tong et al. designed a hierarchical edge architecture, aiming at making efficient use of edge resources when serving the peak loads from mobile users [5]. As the 5G wireless network is expected to incorporate diverse access technologies, in this paper, we consider edge caching in the context of heterogeneous networks. Potential EN deployment can be capacity-augmented base stations, WiFi access points and other devices with excess resources.

As an effective approach to improving QoE in 5G systems, edge caching has received extensive attention [6]. Specifically, various works have been done on video content caching, since video contents are forecast to be dominant in 5G systems [7]-[10]. A vast amount of other works simply focus on generalized content caching. Zhang et al. investigated the cache-enabled vehicular networks with energy harvesting, aiming at minimizing network deployment costs with QoE guarantees [12]. Ao et al. explored distributed content caching and small cell cooperation to accelerate content delivery [13]. Device-to-device (D2D) communication is another promising solution to improve the QoE of mobile content dissemination [14]. Different from conventional content unicast from cellular base stations, D2D communication has the potential to significantly boost system throughput by multicasting. Ji et al. provided a comprehensive summary on D2D caching networks, incorporating throughput scaling law and coded caching in D2D networks [15]. In the above works, content popularity profile was assumed to be completely known. However, in practice, content popularity may be unknown a priori. To address this issue, various learning-based approaches have been proposed to predict content popularity. Bharath et al. proposed a learning method that achieves desired popularity accuracy in finite training time [16]. Blasco et al. modeled content caching with unknown popularity as a multi-armed bandit problem [17]. By carefully balancing exploration and exploitation in the learning phase, they proposed three algorithms that quickly learn content popularity under various system settings.

Unfortunately, often times content popularity profile can not only be unknown a priori, but also time-varying. This is because user’s interests change constantly, and meanwhile new contents are being created [22]. As a result, learning-based caching algorithms should be designed in an online fashion, i.e., requiring no training phase, and adaptive to popularity fluctuations. To this end, Roy et al. proposed to predict video popularity by utilizing knowledge from the social streams [18]. Müller et al. introduced context-aware proactive caching [19]. By constructing context space based on user information, they proposed an online algorithm that first learns context-specific user demands, and then updates cached contents accordingly. Other information has also been used for context differentiation, such as content features and system states [20]. The prediction accuracy of those solutions is highly dependent on the information used for context differentiation. To content service providers, however, user information is extremely sensitive and often unavailable. In addition, it is also impossible for them to get detailed system or network information when making caching decisions.

In this paper, we exploit locational features for context differentiation. Locational information can be easily obtained, for example, users attached to different ENs are naturally divided into geographical groups. Based on which we investigate the location-aware caching problem with unknown and time-varying content popularity profile. By modeling user demand as linear combination of location features and content attributes, our previous work has addressed the content popularity prediction problem with the assumption that the model noise is zero-mean [23]. As an extension, this paper additionally considers the practical scenario, where noise structure is unknown a priori. Specifically, a robust prediction algorithm is proposed with detailed theoretical caching performance analysis. The proposed algorithm is robust and practical as it guarantees prediction accuracy regardless of the noise statistics. Additionally, numerical analysis and comparison on the root causes of estimation errors of both algorithms are presented. Much extensive experiments are conducted to validate the performance of the proposed algorithms.

It is worth noting that, in the mobile context, fetching content from the edge cache significantly reduces the delay, compared with that from conventional content distribution network (CDN). Moreover, existing content pushing strategies in CDN do not consider the fine-grained popularity differentiation in neighbouring Wi-Fi APs and cellular base stations [35]. With the consideration of location awareness, this paper further models and predicts the dynamics of content popularity, which is constantly varying with time.

Iii System Model and Problem Formulation

In this section, we present the system model and formulate the caching problem in mobile edge networks.

Iii-a Network Model

Fig. 1: Network model of mobile edge caching.

Mobile edge computing can enhance mobile user’s capacity by provisioning storage, computing and networking resources in their proximity. Capacity-augmented base stations, WiFi access points and other devices with excess capacity can be exploited for edge node deployment [1]. In this paper, the storage resources at edge nodes are harnessed for content caching services. Specifically, as shown in Fig. 1, a set of edge nodes is deployed with separated backhaul links connecting to the mobile core network. Online contents are dynamically pushed to edge nodes so that user’s content requests can be processed with reduced latency. Each edge node serves a disjoint set of mobile users.

Iii-B Content Popularity and Location Diversity

Fig. 2: The daily view amount and popularity trends of a YouTube video since uploaded. The popularity score equals to the ratio of the video’s daily view amount to the total daily view amount of all the videos. Note the statistics are based on a set of randomly crawled videos.

A simple yet effective caching strategy is to push the most popular contents to the network edge. Hence, local content hit rate is maximized and user’s requests are served with reduced latency and improved QoE. Extensive works have been done on the popularity of contents, especially video files [9, 22, 25]. According to the statistics we crawled from YouTube, as illustrated in Fig. 2, the popularity profile of a video file varies in two-fold. 1) The daily view amount is time-varying. 2) As other videos’ daily view amounts are also varying and new videos are uploaded, the popularity of a video file is constantly fluctuating [21]. Moreover, location diversity also affects the content popularity. As a result, general caching strategies based on fixed popularity profile are not optimal in practice.

Let be a -dimensional attribute vector of file associated with EN . For example, the attributes of video contents may include video quality, genre, length, and historical view statistics. Then, the hit rate111We define hit rate as the number of content requests rather than a ratio. of file at EN , denoted by , can be expressed as the following noisy linear combination


where is the unknown location feature vector associated with EN . Further, it also represents the location characteristics of EN , which is time-invariant. is the random noise associated with EN , which may be affected by various locational features, including social function of the area around EN , the number of users served by EN , the frequency of content update (e.g., hourly or daily). As a result, contents with the same attribute vector are expected to have different view amounts at different ENs. This linear prediction model is widely used in other areas, such as signal processing and financial engineering [26]. It provides a method to predict future hit rate and it is essential when exploiting location diversity for popular-unknown content caching. Without loss of generality, let , and for all and , where denotes the Euclidean norm of , , and are positive constants. Also, for notational simplicity, define as the weighted (by a matrix ) Euclidean norm of .

Iii-C Problem Formulation

Consider a set of files that can be cached at ENs, and let be the caching size of each EN. We assume that all the contents are of equal size222In case contents are of different sizes, they are split into smaller ones of equal size. For example, the widely used DASH (Dynamic Adaptive Streaming over HTTP) protocol breaks contents into small segments before transmission. This assumption is used to simplify the theoretical analysis, and a similar assumption has been made in [20, 24]. Location-aware edge caching with different content sizes deserves further investigation. and the size is normalized to 1, i.e., each EN can cache up to contents. As indicated by Fig. 2, content popularity is time-varying. Therefore, contents with higher popularity should be proactively identified and cached at the ENs, and the less popular ones should be evicted so as to improve the local content hit rate. Considering a sequence of time slots , and let denote the set of contents cached at EN during time slot , and be the amount of user demand on file at EN during time slot . The objective of a caching policy is to maximize the time-averaged hit rate. Formally, it can be formulated as the following time-averaged hit rate maximization (THRM) problem333Without loss of generality, we assume that the underlying process is ergodic.:


As the amount of user demand, i.e., the hit rate of contents at each EN, is unknown a priori, the decision variables in problem (2) is intractable directly. For convenience, denoting the optimal caching strategy for EN at time , we have


Define the time-averaged caching regret of a solution respect to the optimal caching strategy as


Then, the THRM problem can be reformulate as a time-averaged regret minimization (TRM) problem:


Given that the optimal set is unknown a priori, our goal is to develop a caching policy that constantly makes good estimation of the optimal set , and therefore minimizes the time-averaged caching regret. As indicated by Eq. (3), the LRM problem boils down to estimating user demands of different contents at each EN. Given the linear model in Eq. (1), if the location feature vector can be found in the presence of noise, we can make an accurate prediction on user demand. Unfortunately, there is no established statistical model on the noise processes that impinges the prediction of user demand. In what follows, we propose two online content popularity prediction algorithms by making dynamic estimations on the location feature vectors for different noise processes. In particular, the first algorithm achieves near-optimal performance with the assumption that the model noise is zero-mean, while the second algorithm is designed to provide robust performance guarantees in the case of unknown noise statistics.

Iv Ridge Regression based Content Popularity Prediction and Edge Caching

In this section, as the first attack on the TRM problem, we present a caching algorithm when noise is zero-mean.

Iv-a Location Feature Vector Estimation

When the noise is zero-mean, according to Eq. (1), we have


It can be interpreted that, at time slot , given the attribute vector , the hit rate of file at EN is predicted to be the linear combination of its attributes, which provides a feasible way to predict the content hit rate. Since the location feature vector of EN is time-invariant, a good estimation of will lead to accurate prediction of the content hit rate.

Let the attribute matrix be the historical data up to time slot , where is the frequency of file being cached at EN up to time slot , and the -th row of is the corresponding attribute vector . Denote by the -time empirical hit rate of file at EN

. By applying the standard ordinary least square linear regression, i.e.,

, we can obtain the unique solution , which is unbiased. However, when there are correlated variables in the attribute vector, the matrix may not be invertible. As a result, the estimated can be poorly determined and will exhibit high variance.

In contrast to the unbiased estimation, ridge regression makes biased estimation by adding a control parameter that “penalizes” the magnitude of estimated , which helps to improve estimation stability. Specifically, ridge regression aims at minimizing a penalized sum


where controls the size of : the larger the value of , the greater the shrinkage of the magnitude of [27]. Consequently, the estimation of can be explicitly given as



is the identity matrix. The accuracy of the estimation depends on the amount of data and the selection of

. For convenience, let for all and . The following lemma, which is slightly manipulated from [31], gives an upper bound on the estimation error of ridge regression.

Lemma 1.

If for all , then , the estimation error of ridge regression can be upper bounded as


with probability at least


Please refer to Appendix A for the proof. The probabilistic upper bound of estimation error provided in Lemma 1 indicates that, the true hit rate

falls into the confidence interval around the estimation

with high probability. The righthand side of Eq. (9) gives the length of the confidence interval, which is crucial to the following content popularity prediction and caching algorithm.

Iv-B RPUC Caching Algorithm

0:  .
0:  Set of files to be cached in each EN.
1:  Initialization: Cache files in every EN and get the initial attribute vectors of all file-EN pairs.
2:  , ,
3:  for  do
4:     for each EN  do
6:        for each file  do
7:           Obtain attribute vector
9:           Compute the perturbation in Eq. (11)
11:        end for
13:        Cache all the files in set on EN
14:        Observe empirical demands of cached files
15:        Update and based on and of all cached files:
16:     end for
17:  end for
Algorithm 1 RPUC: Ridge Regression Prediction with Upper Confidence for Location-Aware Edge Caching

The location-aware edge caching algorithm is sketched in Algorithm 1. After initialization, the algorithm iteratively performs the following three phases.

  1. Predict: During each time slot , the location feature vector is firstly updated according to the demand information observed in time slot . Then, based on the linear prediction model, the estimated demand is obtained. Considering the impact of random noises, a perturbation is added to the estimation, i.e., the ultimate hit rate is predicted to be


    where the perturbation is given by


    and .

  2. Optimize and cache: Based on the predicted hit rate of each content, a set of contents that maximizes the content hit rate at EN during time slot is identified and cached respectively. Note that, certain contents may be cached in multiple ENs simultaneously.

  3. Observe and update: At the end of time slot , the empirical hit rate information of cached files on each EN is recorded, which is then used to update the parameter matrices for subsequent estimation and prediction.

The rationale of the perturbation is that Eq. (6) only gives a mean value of the hit rate which omits the potential random fluctuation, while Lemma 1 provides a probabilistic upper bound of the demand estimation error. The perturbation given in Eq. (11) is inline with the righthand side of Eq. (9) and can be regarded as the optimism in face of uncertainty, or equivalently, the upper confidence of the demand estimation. By adding a perturbation according to Eq. (11), we have . According to Lemma 1, the upper bound holds with probability at least , which approximates to rapidly as increases.

Iv-C Regret Analysis

The content hit rate of the RPUC algorithm highly depends on the accuracy of prediction. This subsection gives a theoretical upper bound on its time-averaged caching regret .

In mobile edge caching, let be the caching size of each EN, and be the size of ground file set. Note that content hit rate satisfies the linear model, and content attribute vectors and user demands are bounded by and for all , and , we have the following theorem.

Theorem 1.

If the noise is zero-mean, the RPUC algorithm achieves near-optimal performance, i.e., the time-averaged regret is of order , and when .

The proof has been relegated to Appendix B. Basically, the root-cause of regret is two-fold: the estimation error and the perturbation. Particularly, the estimation error consists of the linear model error and the intended bias incurred by in ridge regression. The perturbation term is well managed by the time-varying control parameter . Theorem 1 indicates that under RPUC algorithm, the content hit rate asymptotically approaches the optimal caching policy in the long term.

V Robust Content Popularity Prediction and Edge Caching

In the previous section, we proposed the online caching algorithm RPUC based on the linear prediction model given in Eq. (6). A biased estimation of the location feature vector for each EN is obtained by ridge regression. Further, a perturbation is added to the estimation of content hit rate to account for uncertainty. However, this caching algorithm would not work well when the noise is not zero-mean. Even worse, it is likely that we are unable to get detailed noise statistics, as it is affected by various location features, such as population, social function or even weather condition. Therefore, robust prediction algorithm that could handle noise uncertainty is desirable. In this section, by resorting to the filter technique, we propose a popularity prediction algorithm that provides guaranteed accuracy in the case of unknown noise structures.

V-a Noisy Model for Content Popularity

Denoted by the additive noise added to the linear model. The linear model is rewritten as


If the noise process

follows white Gaussian distribution and its mean and correlation are always known, Kalman filtering technique can be applied to estimate

, which achieves the smallest possible standard deviation of the estimation error

[28]-[30]. Since there is no established model on the statistics of the noise structure, robust estimators on the location feature vector that can tolerate noise uncertainty are needed. Next, we will introduce the filtering technique for location feature vector estimation, which requires no a priori information on the noise process. The only assumption is that, the magnitude of the noise process is finite, which is true since the total demand is always finite in reality.

V-B An Filter Approach

When locational features are time-invariant, the true location feature vector remains the same across the time span. For notational simplicity, in this subsection, we focus on a specific content on a certain EN and neglect indices and . Based on Eq. (12), the location feature vector estimation problem is reformulated as


Different from Kalman filter, we aim at providing a uniformly small estimation error, , for any form of noise process. Notice that estimation of the location feature vector is crucial to minimizing the estimation error . Then, we define the following cost function of estimation [28]:


where is a symmetric positive definite matrix reflecting the confidence of the a priori knowledge of the initial state. A ’smaller’ choice of indicates larger uncertainty of the initial condition and vice versa. The objective is to make a sequence of estimation on such that the above cost is minimized. The denominator of the cost function can be regarded as a combined norm of all possible initial states and noises affecting the system. Given that there is no established stochastic model for , the cost function in Eq. (14) allows us to make robust estimation of content popularity from game-theoretical perspective. Suppose that there is an unrestricted adversary, who can control the initial state and the magnitude of to maximize the error of our estimation. While we focus on minimizing the numerator of Eq. (14), the adversary may incur infinite magnitude of disturbances. The form of prevents the adversary from using brute force to maximize . Instead, the adversary needs to carefully choose and as it tries to maximize . Formally, this game can be generalized as a minmax problem, and the optimal estimation on achieves the minimal cost as


Given the cost function in Eq. (14), directly minimizing is challenging. In practice, a better approach for filtering is to seek a sub-optimal estimation that meets a given threshold. Specifically, one can try to find such that the optimal estimate of among all possible (even including the worst-case performance measure) should satisfy


where is the prescribed performance bound. Eq. (16) indicates that filter guarantees the smallest estimation error over all possible finite disturbances of the noise magnitude [28]. Rearranging Eq. (16) gives the following equivalent minmax problem


This minmax problem can be interpreted as a zero-sum game against the adversary. With a given , our goal is to find an estimation that wins the game (i.e., achieve a negative cost ). By resorting to the Lagrange multiplier method for the dynamic constrained optimization problem (17), the filter approach results in the following iterative algorithm to find the optimal estimates for all :


where is initialized as , is the filter gain, given by




with initialized by . The detailed proof of this solution can be found in [30]. With the aid of filter, we are able to make performance-guaranteed estimation on the location feature vector , regardless of the detailed noise structure. Based on the estimation of location feature vector, content popularity prediction and caching algorithm can be further devised.

0:   close to but larger than .
0:  Set of files to be cached in each EN.
1:  Initialization: Cache files in every EN and get the initial attribute vectors of all file-EN pairs;Choose symmetric positive definite for all ;Choose the smallest possible value of , so that is nonsingular;, for all .
2:  for  do
3:     for each EN  do
4:        for each file  do
5:           Obtain the attribute vector
6:           Compute the estimated user demand:
7:        end for
9:        Cache the set of files in EN
10:        Observe user demand of cached files
11:        Update based on and of all cached files: , where
12:     end for
13:  end for
Algorithm 2 HPDT : filter Prediction with Dynamic Threshold for Location-aware Edge Caching

V-C From Determining to the HPDT Caching Algorithm

The performance of filter highly depends on the prescribed threshold . A smaller results in a smaller estimation error. However, if is too small, in Eq. (20) may be singular, which renders the iterative solution infeasible. Hence, the value of should be carefully selected. An adaptive scheme for threshold selection was proposed in [28], which makes online iterative prediction possible. Denote by the threshold on the -th iteration, it should be properly chosen to guarantee is positive definite, i.e.,


Denote by

the eigenvalues of

matrix , and is the -th largest eigenvalue. According to the min-max theorem on matrix eigenvalues, the adaptive threshold should satisfy


for all . Since holds for all , equivalently, we have . We may let


where is a constant very close to but larger than one, so that is guaranteed to be positive and hence is nonsingular. Meanwhile, the magnitude of is also suppressed.

With the aid of filter technique, we are able to make performance-guaranteed estimation on the location feature vectors. Therefore, more precise prediction on content popularity can be made, and hence differentiated caching policies can be devised on each EN. The corresponding prediction and caching algorithm is sketched in Algorithm 2. Similar to Algorithm 1, the iteration of the HPDT algorithm can be generalized into the following three steps after initialization.

  1. Predict: During time slot , estimation of the location feature vector of each EN is predicted based on the updated location feature vector by the end of time slot .

  2. Optimize and cache: Based on the predicted content hit rate profile, the set of contents with maximized predicted content hit rate are cached on each EN respectively. Certain contents may be cached in multiple ENs simultaneously.

  3. Observe and update: At the end of time slot , the empirical hit rate of the cached files on each EN is observed, which is then used to update the input of the filtering process, yielding the updated location feature vector.

The adaptive adjustment of in Eq. (23) is crucial to the online HPDT algorithm. It is tuned to its minimum at each iteration, so that is guaranteed to be positive definite, and meanwhile the upper bound of the cost function is minimized.

V-D Regret Analysis

The filter technique provides a robust estimation of the location feature vector regardless of the statistical model of the noise process. However, this approach is also conservative since it needs to accommodate the disturbances of all kinds of noise processes. In this subsection, the performance bound of filter based prediction and caching algorithm is given.

Note that the prescribed performance threshold is crucial to the prediction accuracy, the adaptive threshold in HPDT algorithm is firstly characterized in this subsection. Note that is initialized as a symmetric and positive definite matrix, and is also guaranteed to be symmetric and positive definite with the help of . According to Weyl’s monotonicity theorem [33], the smallest eigenvalue of can be bounded as:


where is the largest eigenvalue of , and can be simple bounded by matrix trace as . As is positive definite, we have


where is very close to but smaller than one. According to Eq. (20), the smallest eigenvalue of equals to , which is suppressed to be small but positive. Hence, is guaranteed to be nonsingular. Iteratively, is positive definite for all .

In the following, a theorem is given to provide a bound on the caching regret of the HPDT algorithm. Let be the caching size of each EN, and be the cardinality of the ground file set. Suppose content hit rate satisfies the linear model given by Eq. (13), and note that the attribute vectors are bounded as for all , and . Let and be the upper bound of and for all and , respectively. Then, we have the following theorem.

Theorem 2.

The time-averaged regret of the HPDT algorithm is of order .

Please refer to Appendix C for the proof. Theorem 2 indicates that, if the linear model is free of noises, the time-averaged regret of the HPDT algorithm tends to zero as grows to infinity. Otherwise, the HPDT algorithm may not approach the optimal solution, and its performance depends on the noise magnitude. This is due to the characteristics inherited from the filter. Since filter makes no assumption on the noise feature, to minimize the worst-case estimation error, it needs to accommodate all possible noise processes, which turns out to be over-conservative. However, when the noise is zero-mean, the regret of exploiting HPDT algorithm reduces to the order of , which is smaller than that using the RPUC algorithm. In the next section, we will further evaluate the proposed two algorithms by numerically decomposing the estimation errors, and examine the algorithms by experiments on real dataset.

Vi Numerical Analysis and Experimental Results

To validate performance of the proposed caching algorithms, numerical analysis is firstly performed in this section. Afterwards, an experiment based on real-world dataset from YouTube is conducted to further illuminate the performance of the proposed algorithms in practical scenarios.

Vi-a Numerical Analysis

The proposed two algorithms can be used for content caching with different user demand features. In essence, they are estimating the location feature vector , which specifies the location characteristics and user preferences on each EN. Given the linear model of user demand, the performance of the proposed algorithms highly depends on the accuracy of the estimation. We use mean square error (MSE) to evaluate the accuracy of the proposed algorithms. For notational simplicity, the indices and are omitted in this section. Let be the underlying feature vector, and be the estimation of by using the proposed algorithms. Denote , then, the MSE can be defined in Euclidean norm as


The two terms in Eq. (26) turn out to be the variance and bias of , respectively.

Note that the RPUC algorithm is based on ridge regression. Unlike the ordinary least square linear regression, which makes an unbiased estimation on the feature vector, ridge regression intentionally introduces bias so as to reduce variance of the estimation. Moreover, the RPUC algorithm adds a perturbation to the estimation of ridge regression to account for noise uncertainty, which further increases the bias.

In contrast, the HPDT algorithm makes no assumption on the statistical model of the underlying noise process. It is able to meet the prescribed performance threshold even if the noise process leads to the worst case. Moreover, the HPDT algorithm makes unbiased estimation on the feature vector. This can be observed from the definition of the cost function in Eq. (14). By decomposing the numerator of , we have


As filter makes robust estimation over all kinds of noise structures and initial conditions, given any , there exists a combination of initial condition and noise such that . Since this is true for all , according to Eq. (27), the cost function , which will grow linearly if . Consequently, any algorithm that bounds the cost function must be unbiased, i.e., .

On the other hand, the performance of the HPDT algorithm also depends on the a priori confidence on the estimation of the initial state, i.e., the selection of . A smaller matrix (eigenvalue) should be chosen if the estimation of initial condition is made with larger uncertainty, and vice versa.

Fig. 3:

Comparison of estimation errors of ridge regression and HPDT algorithm under varying sample rate and noise structure. a) Zero-mean noise; b) Non-zero-mean noise of uniform distribution (with smaller mean value); c) Non-zero-mean noise of normal distribution (with larger mean value).

Time Span Ridge Regression ( = 5) ( = 10) ( = 30)
20 0.1659 0.4284 0.5943 0.2881 0.1094 0.3975 0.2687 0.1045 0.3732 0.2647 0.1036 0.3683
50 0.1364 0.1364 0.2728 0.1315 0.0126 0.1441 0.1244 0.0116 0.1360 0.1228 0.0115 0.1343
200 0.0545 0.0927 0.1472 0.0335 0.0131 0.0466 0.0318 0.0129 0.0447 0.0314 0.0129 0.0444
500 0.0347 0.0381 0.0728 0.0134 0.0155 0.0289 0.0127 0.0154 0.0282 0.0126 0.0154 0.0280
1000 0.0211 0.0207 0.0418 0.0067 0.0164 0.0231 0.0064 0.0164 0.0227 0.0063 0.0164 0.0227
2000 0.0157 0.0070 0.0227 0.0034 0.0174 0.0207 0.0032 0.0174 0.0206 0.0032 0.0174 0.0205
TABLE I: Comparison of Estimation Variance and Bias

We conduct simulation based on synthesized time sequences, which is generated according to a prescribed linear model with zero-mean noise and non-zero-mean noise, respectively. Since the proposed RPUC algorithm is perturbed intentionally, we only present the comparison between ridge regression and the HPDT algorithm. Fig. 3 shows that, under all scenarios, ridge regression provides a more stable estimation as it achieves smaller variance than that using HPDT algorithm when the amount of historical data is small. Such stability advantage of ridge regression benefits from its intentional penalty. However, the HPDT algorithm performs much better in terms of bias and MSE under varying sampling rate in all scenarios, which proves the robustness of the HPDT algorithm. Moreover, with the increase of noise magnitude, the gain of HPDT over ridge regression also increases.

To demonstrate the impact of on the performance of HPDT, estimation results of the HPDT algorithm with different initialization matrices are presented. is initialized as diagonal matrix with positive elements on the main diagonal. When the confidence on the initial state is small, a larger with bigger eigenvalues is used. As shown in Table I, with larger eigenvalue achieves better performance than the others, which means that the initial guess is close to the prescribed vector. In practice, the matrix can be selected according to the prior information regarding the initial condition.

Vi-B Real Dataset Experiment

Fig. 4:

Popularity skewness of the video set in our experiment. Note that the videos are randomly crawled from YouTube, which may not reflect the overall skewness of video popularity.

Fig. 5: The content hit rate comparison between the proposed algorithm and other benchmarks with varying caching size, where the total number of files is 800 and the caching sizes of six figures are 5, 20, and 100 respectively.

Vi-B1 Experiment Setup

To further demonstrate the advantages of the proposed algorithms in practical scenarios, we conduct an experiment on the dataset crawled from YouTube. On YouTube, some video owners made their video view statistics open to public. Among other information, the view amount information is recorded on a daily basis. To obtain such information, a Python-based crawling program is written, and the request record of each video is crawled into a .json file. Based on which we conducted the rest Python-based experiments . In total, videos are randomly crawled, which were uploaded before January 2013, with full view statistics till May 2017. The most popular video has been watched over billion times by the end of the timespan, while the least popular one has been rarely viewed across the time span. Fig. 4 shows the statistics of video popularity skewness. The popularity of the most popular videos is highly skewed, and the most popular 50 videos account for almost of the total view amount.

Note that the dataset only contains the global statistics of each video record (with the recent update of YouTube webpage, even the global view statistics is inaccessible), while the view statistics of most online video content providers in a local area is unavailable. To emulate the video request processes on different locations, the original statistic of each video is shifted and scaled randomly over the time span. For the record of a certain video, by shifting, the request record of the original global data of each content is moved backward and forward on the timespan. By scaling, each content request record is then randomly scaled up and down. After shifting and scaling, the original global request statistics are transformed and treated as the requests from different locations. The key point is that the pattern of the record remains valid after the above transformation. In this way, we are able to characterize the location diversity based on the emulated view statistics.

Specifically, consider the content library containing those videos, each video can be cached on ENs, each with caching size . Content refreshing is performed upon the network traffic pattern. For example, wireless traffic presents regular peak and valley every day. Hence, content refreshing can be performed during the off-peak period with minimal impact on normal network activity. A video can be characterized from several aspects, including video quality, genre, length, and historical view statistic. In this experiment, we use view amount information in the past days as the attribute vector of each content, i.e., . Based on the attribute vectors and an initial guess on the content popularity, the algorithms gradually select contents that are predicted to be more popular than the others, and cache them on each EN accordingly. The long-term content hit rates of the proposed algorithms are shown by comparing with the following benchmark algorithms. 1) Hindsight optimal. By analyzing the full view record over the time span at each EN, the most popular videos are selected and cached respectively. Note that this benchmark requires future information and hence cannot be implemented in practice. 2) Location oblivious (denoted by LO). During each time slot, the historical demands of all the contents from all ENs are analyzed, afterwards the ones that are predicted (by ridge regression) to have the highest demands in the next time slot are identified. Then, all ENs will cache the same set of contents without location differentiation. 3) Random. A random set of videos is selected to update the ENs during each time slot.

Vi-B2 Experimental Results

As shown in Fig. 4, the popularity of YouTube video is highly skewed, and the most popular videos have attracted almost of user requests. The skewness of video popularity has also been validated in [25]. The popularity of this dataset can be roughly divided into three levels: highly skewed (popularity of the top videos); medium skewed (popularity of videos ranking from to ); and less skewed (the rest ones).

Figure 5 shows the comparison of long-term content hit rates of different algorithms with varying EN caching sizes. The performance of the caching algorithms is affected by the skewness of the popularity profile. However, the proposed location-based approaches always outperform the location-oblivious scheme in varying caching size scenarios. Specifically, when the caching size falls into the highly skewed area and less skewed area, the proposed caching algorithms RPUC and HPDT outperform other benchmarks considerably. In particular, the HPDT algorithm performs better than the RPUC. For the highly skewed area, the top videos present much higher variance than the rest. As a result, the noise mean of those records is also significant. Since RPUC algorithm is designed for zero-mean noises, their performance is limited when noise amplitude is significant. In contrast, when the caching size falls into the less skewed area, the algorithms need to corporate various noise types of different videos, which may not always be zero-mean. RPUC and HPDT perform equally well when the caching size falls into the medium skewed area (Fig. 5). The filter is utilized to provide guaranteed performance even when noise type lead to the worst case for estimation. As a result, the HPDT algorithm is conservative yet robust.

Figure 4 also indicates that content popularity is long-tailed, i.e., the less popular contents attract almost vanishing requests compared to the popular ones. As a result, the total hit rate of different caching schemes in Fig. 5 does not increase linearly with the caching size. Note that, both algorithms run iteratively in an online fashion. During each iteration, the most computationally intensive execution is the times sorting of the estimated demands of contents, which has a typical computational complexity of . Complexity of the value assignments and matrix update could be neglected compared with sorting. As a result, both algorithms are of low time complexity.

Vi-C Discussions

Vi-C1 Another dimension of the prediction

This work focuses on the estimation of location feature vector. Actually, the selection of video attribute vector also influences the prediction accuracy. As mentioned before, other factors, such as video quality, length and genre, can also be used to characterize video contents. If such labeling information is available, by reducing the dimension as well as training the dataset, we can identify influential features that affect content popularity. On the other hand, for the location feature vector , both RPUC and HPDT algorithms are designed with the precondition that is time-invariant, as indicated in both Eq. (6) and (13). Actually, the HPDT algorithm can be directly extended to time-variant scenario if the state equation is also linear, i.e., , where is the transition matrix, and is the state noise vector. By resorting to technique, the adaptive estimation on can be made with guaranteed accuracy [28].

Vi-C2 The applicability in practical video streaming

The proposed popularity prediction approach can be applied to the delivery of various types of contents. As video content consumes the most bandwidth, it deserves to be in-depth investigated. Practical video streaming protocols (such as HTTP-based Adaptive Streaming, HAS) divide a video content into several chunks/segments, each with multiple bitrates and quality versions [38]. Those pull-based streaming protocols dynamically change the quality of the streamed video according to the observed network conditions on a per-fragment basis. Most of the research works on adaptive video streaming (both server-side bitrate switching [37] and client-side switching [39]) strive to predict the network condition when transmitting the next video segment. In contrast, with the aid of edge storage resources, our work focuses on push-based content distribution. In other words, estimating the available bandwidth is out of the scope of this work, and content updates are scheduled to off-peak periods, where streaming bandwidth is sufficient.

When streaming video contents based on HAS, whether to cache individual segments or the whole quality representation depends on both the available bandwidth and the content popularity. In particular, for the popular contents, users tend to keep requesting them regardless of the received video quality. Hence, it is more appropriate to store the whole representation of the video so as to fulfill users’ requirements via dynamic bandwidth. As the popularity profile of contents are highly skewed (shown in Fig. 4), the streaming provider only needs to store the full quality representation of the most popular videos (the amount of which really depends on the caching size budget). For the less popular ones, the provider may choose to cache the individual segments that are of moderate quality, so as to save bandwidth and meanwhile be responsive to user requests. The rationale behind such decision is the content popular profile and the available caching resources, which is the merit of our work. In this sense, the proposed location-based popularity prediction approaches are crucial in HAS-based streaming system, and the prediction of content popularity and network condition will collectively contribute to improved video streaming.

Vii Conclusion

In this paper, we investigate popularity prediction for mobile edge caching, with special focus on location awareness. We model the content popularity profile by a linear model and propose online algorithms to deal with different statistical models of the noise process. The proposed RPUC algorithm achieves content hit rate that asymptotically approaches the optimal solution when the noise is zero-mean. Noticing that the noise may not necessarily be zero-mean, we resort to the filter technique and propose the HPDT algorithm for popularity prediction. This algorithm can achieve guaranteed prediction accuracy even when the worst-case noise occurs. Both algorithms can be implemented without training phases. Numerical analysis shows how the performance of the proposed algorithms is affected by different types of noises, the amount of historical data, and the initial state. Extensive experiments on real dataset demonstrate the advantage of the proposed algorithm, which helps to make customized caching decisions in practical scenarios. For future works, we will exploit locational features of neighboring ENs to make better caching decisions.

Appendix A Proof of Lemma 1

Let , based on Eq. (8), the estimation error can be rewritten as

Since , Hölder’s inequality indicates that . Then, the estimation error is bounded as


The right-hand side of above inequality decomposes the estimation error into two parts, where the first (variance term) specifies the error caused by linear model, and the second (bias term) is the bias incurred by ridge regression parameter . According to Eq. (6), we have . The Azuma’s inequality gives a probabilistic upper bound on the variance term of Eq. (28):


where the last inequality is due to the fact that


Hence, the variance term of Eq. (28) can be bounded by with probability at least . Further, The bias term of Eq. (28) can be bounded as


By substituting Eq. (29) and (31) into Eq. (28), the probabilistic bound in Eq. (