1. Introduction and Related work
[]T he classical caching problem, which seeks to make popular contents quickly accessible by prefetching them in lowlatency storage, has been extensively studied in the literature. The core idea of caching has been used in many diverse domains, including improving the CPU paging performance via caches (silberschatz2006operating, ), webcaching by Content Distribution Networks (aggarwal1999caching, ; nygren2010akamai, ; amazon2015amazon, ), and lowlatency wireless video delivery through Femtocaching (shanmugam2013femtocaching, ). With the exponential growth of internet video traffic and the advent of new services consuming high bandwidth, such as augmented and virtual reality (AR/VR), the importance of caching for ensuring the quality of service (QoS) is on the rise (chakareski2017vr, ). Top CDN providers, such as Amazon AWS and Microsoft Azure, now offer caching as a service (varia2014overview, ; chappell2010introducing, ).
Several caching algorithms have been proposed in the literature. The MIN algorithm (van2007short, ) is an optimal offline caching policy which assumes that the entire file request sequence is known noncausally in advance. MIN is often used as a benchmark for comparing the performance of the online caching policies. Among the online policies, the Least Recently Used policy (LRU), the Least Frequently Used policy (LFU) (lee1999existence, ), the FIFO policy (dan1990approximate, ), and the online coded caching policy (pedarsani2016online, ) have been studied extensively. However, the performance guarantees available for most of the online caching policies are highly contingent upon some a priori assumptions on the generative model of the file request sequence (breslau1999web, ; Zipf, ). The paper (jelenkovic2008characterizing, ) analyzed the performance of the LRU caching policy with an i.i.d. file request sequence, also known as the Independent Reference Model (vanichpun2004output, ). Under a Markovian assumption, the paper (pedarsani2016online, ) shows that a coded caching policy outperforms the LRU policy. The paper (flajolet1992birthday, ) develops a unified framework for analyzing a number of popular caching policies, again with a stationary file request model. On a different line of work, the papers (maddah2014fundamental, ; caching_rate1, ; caching_rate2, ; caching_rate3, ) derive informationtheoretic lower bounds and efficient caching, computing, and coding schemes to facilitate bandwidthefficient delivery of the cached files to the users.
With frequent addition of new content to the library, mobility of the users, Femtocaching with small caches, and change in the popularity distribution with time, the assumption of stationary file popularity barely holds in practice (traverso2015unravelling, ). This prompts us to consider the problem of caching from an online learning pointofview with no a priori statistical assumption on the file request sequence. Our work is inspired by the recent paper (paschos2019learning, ), which describes an online gradientbased coded caching policy (OGA), and proves a sublinear regret upperbound for the same. Interestingly, they also show that popular uncoded caching policies, such as LRU, LFU, and FIFO, suffer from linear regrets in the worst case. In fact, no uncoded caching policy with a sublinear regret is known previously in the literature. More seriously, no regret lower bound is known for the networkcaching problem.
In contrast to the multiarmed bandits setting (DBLP:conf/colt/MagureanuCP14, ; combes2015learning, ; kveton2015tight, ; grunewalder2010regret, ; agrawal2013further, ), relatively few results are known for the regret lower bounds for online convex optimization problems. Technically, the networkcaching problem is an instance of an online convex optimization problem with a piecewise linear reward function and polytope constraints. The paper (Abernethy08optimalstrategies, ) establishes a minimax regret lower bound for linear cost functions with hyper ball constraints. A regret lower bound for the unconstrained linear cost functions has been obtained in (hazan2006efficient, ). The papers (hazan2007logarithmic, ; hazan2014beyond, ) prove logarithmic regret bounds for online stochastic strongly convex problems. However, to the best of our knowledge, with the exception of (paschos2019learning, ), the regret for a linear cost function with a simplex and several box constraints, which arise in the context of the single cache problem, has not been studied before. Moreover, the problem of lower bounding the regret for a piecewise linear cost function with polytope constraints, which arise in the context networkcaching, is completely open.
The above considerations inspire us to ask the following two questions in this paper:
Question 1. What is the fundamental performance limit of all online caching policies regardless of their operational constraints or computational complexity?
Question 2. Can a simple, distributed networkcaching strategy be designed which meets the above fundamental limit?
In answering Question 1, we derive universal regret lower bounds that also apply to computationally intensive caching policies, which can completely change the profile of the cached contents at every time slot. Surprisingly enough, we answer Question 2 in the affirmative. In particular, our matching upper and lower bounds reveal that a simple gradientbased incremental coded caching policy is regretoptimal. Moreover, we propose a new FollowthePerturbedLeaderbased uncoded caching policy that has nearoptimal regret. Hence, one of the key takeaway points from this paper is that there exist computationally cheap caching policies that perform excellently in an online setting even with adversarial request sequence.
Our contributions:
In the process of answering the above questions, we make the following key technical contributions:
(1) Lower bound for a single cache:
In Theorem (2.3), we prove a tight nonasymptotic regret lower bound for the singlecache problem. This result improves upon the previously known asymptotic regret lower bound in (paschos2019learning, ), which can be arbitrarily loose for a sufficiently large library size.
(2) Lower bounds for caching networks:
In Theorems (3.3) and (3.4), we derive nonasymptotic sublinear regret lower bounds for bipartite caching networks. We also show that the lower bounds are tight within constant factors. To the best of our knowledge, this is the first known regret lower bound for piecewise linear functions with polytope constraints. Hence, our results also contribute to the growing literature on online convex optimization.
(3) Nearoptimal uncoded caching policy:
Although the above lower bounds certify the optimality of the gradientbased coded caching policy of (paschos2019learning, ), it was an open problem whether one could achieve the optimal regret without coding. In Algorithm 1, we propose a simple uncoded caching policy based on the FollowthePerturbedLeader (FTPL) paradigm. Theorem 2.4 shows that the FTPL policy has nearoptimal regret.
(4) New proof techniques:
Technically, the regret lower bounds are established by relating the online caching problem to the classic probabilistic setup of ballsintobins via Lemma 2.2. In this Lemma, we derive a nonasymptotic lower bound to the expected total load in the most populated bins when balls are randomly thrown into bins. For our lower bound proofs, we are particularly interested in the regime . It is to be noted that classical results on randomized load balancing, such as (mitzenmacher1996power, ; mitzenmacher2017probability, ; gonnet1981expected, ), do not apply to this setting because they apply only to the regime where It is to be noted that the paper (ballsinbins, )
derives an asymptotic highprobability bound for the maximum load for various asymptotic regimes of
and . However, since we are primarily interested in the nonasymptotic expected value of MaxLoad, the results of (ballsinbins, ) do not suffice for our purpose. Consequently, we tackle this problem from the first principles, culminating in Lemma 2.2. To the best of our knowledge, this is the first paper where a connection between online learning and the framework of ballsintobins has been explicitly brought out and exploited in proving regret lower bounds.(5) Numerical experiments
In Section 4, we compare the performance of different caching policies using the popular MovieLens 1 M dataset (movieLens, ). Our experiments reveal that the proposed FTPL policy beats other competitive caching policies in terms of longterm average regret.
2. Single Cache
In this section, we begin our investigation with the single cache problem. We establish a key technical lemma on the ballsintobins problem, which is used in all of our lower bound proofs. Our analysis in this section improves upon the bestknown regret lower bound for the single caches (paschos2019learning, ) and paves the way for analyzing the bipartite caching problem in Section 3.
2.1. System Model
In the classical caching problem with a single cache, there is a library of distinct files. A cache with a limited storage capacity can store at most files at any time slot. In practice, the cache capacity is significantly smaller compared to the entire library size (i.e., ). As an example, we may think of caching movie files in the Netflix data center, where the library size increases every day as new movies are released, but the physical cachesize in the Netflix data centers remains constant on the relevant timescale. Time is slotted, and a user may request at most one file at a time. The file requests at time slot is represented by an
dimensional binary vector
, where if the ^{th} file is requested by the user at time, and is zero otherwise (onehot encoding). Following an online caching policy
, files are cached at every time slot before the request for that slot arrives. We do not make any statistical assumption on the file request sequence . Thus, we may as well assume that the requests are made by an omniscient adversary who has complete knowledge of the cached contents and the caching policy in use. See Figure 1 for a schematic.Coded Caching:
In this paper, we consider both coded and uncoded caching. In the classical uncoded caching, complete files are cached. On the other hand, in coded caching, the original files are first encoded using fountain codes (a class of rateless erasure codes), e.g., Raptor code (shokrollahi2006raptor, ; luby2011raptorq, ), and then some of the resulting coded symbols are cached. These codes have the property that an original file consisting of source symbols can be recovered (with high probability) by combining any subset of coded symbols, where needs to be only slightly larger than . Hence, for decoding, it does not matter which encoding symbols are combined, as long as the decoder has access to sufficiently many encoded symbols. We will see that coding offers distinct advantages for network caching. These codes also admit highly efficient linear time encoding and decoding operations. The rateless codes are routinely used in P2P data streaming, large scale data centers, and CDNs (wu2007rstream, ).
Caching configuration:
The cache configuration at time is represented by an dimensional vector , where denotes the fraction of the file cached at time under the policy ^{1}^{1}1Whenever the caching policy is clear from the context, we will drop the superscript to simplify the notation.. Naturally, in the uncoded case . The set of all admissible caching configuration is denoted by where
(1) 
where is the capacity of the cache. The caching decision may be randomized and may depend on the file request sequence and caching decisions up to time . Any requested file, not present in the cache, is routed to a central server and accrues zero reward.
2.2. Reward
A popular performance metric for any online caching policy is its average hit rate, i.e., the average number of requested files already present in the cache so that the files can be quickly retrieved. In this connection, let the usergenerated file request sequence be denoted by . The total reward up to time accrued by a caching policy, which responds to the file requests by setting the cache configuration to , at time , is denoted by . We assume that the cumulative reward over a time horizon has an additive structure, which may be obtained by summing up the rewards obtained at every slot up to time , i.e.,
(2) 
The oneslot reward function captures the reward obtained per slot. Intuitively, denotes the extent of cachehits for the request vector against the cache configuration vector . The function takes different functional forms depending on whether we consider (a) single cache, or (b) a network of caches. For the case of a single cache, we define the oneslot reward to be the amount of the requests successfully served by the cache, i.e.,
(3) 
A generalized definition of the above oneslot reward function applicable to caching networks will be given in Section 3.
2.3. Regret
Since we do not make any assumption on the usergenerated file request sequence , it is futile to attempt to optimize the total reward (i.e., hit rate). This is because, at any slot, an omniscient adversary can always request a file which is not present in the cache, thus yielding a total of zero hits. To obtain a nontrivial performance measure, we cast the caching problem into the framework of online learning. This prompts us to compare the performance of any policy with the best offline stationary optimal policy (shalev2012online, ). Let the vector denote any fixed stationary cache configuration vector. The vector may be selected offline after seeing the entire request sequence . Following the usual convention in the online learning literature, we define the regret for a request sequence to be the difference in the reward obtained by the best stationary caching configuration and that of the online policy . Mathematically^{2}^{2}2Recall that the cache configuration sequence is determined by the policy .,
(4) 
The regret of any caching policy up to time is defined to be the maximum regret over all admissible request sequences, i.e.,
(5) 
2.4. Regret lower bounds  preliminaries
Using the simple observation that the maximum of a set of real numbers is at least equal to their average, for any joint probability distribution
on the file request sequence , the regret in Eqn. (5) may be lower bounded as follows:(6) 
2.5. Regret lower bound for unitsized cache
We first consider a single cache of unit capacity and derive a universal nonasymptotic regret bound. As we will see in the sequel, understanding this special case lays the foundation to our subsequent analysis of caching networks serving multiple users.
Theorem 2.1 (Lower bound for a singlecache of unit capacity).
The regret of any online caching policy, for a librarysize and cachecapacity , is lower bounded as:
Proof.
Denote the cache configuration selected by the caching policy at slot by the vector , where denotes the fraction of cached by the policy (). Recall the definition of regret in this context:
(7) 
where the set of all admissible configurations is given by Eqn. (1). Denote the file request vector at slot by , where . It can be easily seen that for a given file request sequence , an optimal choice of the fixed cache configuration vector in Eqn. (7) is given as follows:
To prove a universal regret lower bound, we need to show the existence of an adversarial file request sequence under which the caching policy performs poorly. Towards this, let
be a sequence of i.i.d. uniform Bernoulli random variables such that
. Construct a random file request sequence . The regret incurred for the sequence may be obtained from Eqn. (7) as:where the r.v. , being the summation of
i.i.d. uniform Bernoulli variables, is Binomially distributed with parameter
. Using linearity of expectation, we can write(8) 
where we have used the fact that and the caching decision is independent of the incoming request . Observe that, we can write
(9) 
Thus, combining Eqns. (8) and (9), we have
(10) 
The mean absolute deviation for a symmetric binomial random variable may be computed in closed form by using De Moivre’s formula ((berend2013sharp, ), Eqn. (1)) as follows:
(11) 
Eqn. (11), in combination with nonasymptotic form of Stirling’s formula (robbins1955remark, ), yields the following nonasymptotic lower bound
(12) 
For details of the above calculations, please refer to Appendix 7.1. Equation (12), coupled with Eqn. (10), shows the existence of a file request sequence such that
∎
Remarks: It can be seen that the above proof and the lower bound in Theorem 2.1 continue to hold even if we let the library size to be . This observation follows by constructing a randomized file request sequence where the first two files are requested with probability each and other files are requested with zero probability.
2.6. Lower bound for caches of arbitrary size
We now extend the previous result to caches with arbitrary size . This extension is nontrivial. We will see in the sequel that our analysis naturally leads us to investigate a random variable arising in connection with the classic probabilistic framework of ballsintobins, where a number of balls are thrown uniformly and independently at random to some bins (ballsinbins, ). The following lemma, which might be of independent interest, gives a nonasymptotic lower bound to the total number of balls in the most populated half of the bins.
Lemma 2.2 (Total occupancy in the most popular half).
Suppose that balls are thrown independently and uniformly at random into bins. Let the random variable denote the number of balls in the most populated bins. Then
Proof outline:
The proof proceeds by pairing up the bins to form super bins (see Fig. 4), and then selects the mostoccupied bin in each super bin to obtain a lower bound on . Finally, we conclude the proof by appealing to the meandeviation bound in Eqn. (12). Please refer to Section 6.1 for the complete proof of Lemma 2.2.
To improve readability, in the rest of the paper we will rephrase the above bound as
with the understanding that an explicit form of the lower order terms may be obtained by using Lemma 2.2, if required. The following Theorem is the main result of this section.
Theorem 2.3 (Lower bound for a single cache of arbitrary capacity).
The regret of any online caching policy , for a library size and cache capacity with , is lower bounded as
(13) 
Proof outline:
The proof proceeds along the lines of Theorem 2.1, where the ^{th} file requested is chosen independently and uniformly at random from the first files from the library. Then, we show that the reward accrued by the best fixed cache configuration corresponds (in distribution) to the total number of balls in the most populated half of the bins. We then conclude the proof by appealing to Lemma 2.2. Please refer to Section 6.2 for the complete proof of Theorem 2.3.
Comparison with Theorem 1 of (paschos2019learning, ):
In the single cache setting, the paper (paschos2019learning, ) establishes a rather loose asymptotic regret lower bound of , which could be arbitrarily smaller than the lower bound given in Eqn. (13) when the library size is sufficiently large. Theorem 2.3 improves the result in (paschos2019learning, ) in two ways. First, it proves a regret lower bound that is independent of . As a consequence, we will soon see that it implies that the gradientbased coded policy of (paschos2019learning, ) is regretoptimal up to a constant factor. Theorem 2.3 also implies near optimality of the uncoded FTPL policy described in the next section. Second, unlike the regret lower bound in (paschos2019learning, ), the bound in Eqn. (2.3) is nonasymptotic, thus giving a valid lower bound for any .
2.7. Achievability
We note that many popular classical uncoded caching policies, such as LRU, LFU, and FIFO, have linear regrets (Proposition 1 of (paschos2019learning, )). This can be simply understood from the following example: consider the single cache setting with and an alternating file request sequence . Since any missed content is always immediately loaded to the cache, each of the above policies gets zero cumulative hits. On the other hand, caching either one of the files forever achieves a total of cumulative hits for a horizon of length . Thus, all of the above policies have regret. This is surprising as the LRU and FIFO policies are known to have a finite competitive ratio (albers, ). To the best of our knowledge, no uncoded caching policy with sublinear regret is known in the literature.
Achievability with uncoded caching:
Making use of the theory of Online Structured Learning (cohen2015following, ), we now propose a simple Follow the Perturbed Leader (FTPL)based uncoded caching policy, which achieves expected regret against an oblivious adversary. The FTPL
policy maintains a cumulative running count of the number of times a file was requested so far. This count is then perturbed by adding i.i.d. Gaussian noise of zero mean and an appropriate variance to each of the count values. Finally, at each time slot, the top
files with the highest perturbed count are loaded to the cache. The FTPL policy is formally described below in Algorithm 1.We prove the following achievability bound for the FTPL policy.
Theorem 2.4 (Achievability with uncoded caching).
In the single cache setting, the FTPL uncoded caching policy achieves the following upper bound for expected regret (the expectation is taken over the randomness of the algorithm)
The proof of Theorem 2.4 follows a similar line of arguments as the proof of Theorem 1 of (cohen2015following, ). However, in this paper, we tighten the regret upper bound of (cohen2015following, ) further by a factor of . This improvement follows by taking into account the constraint that only one file is requested per slot and any feasible cache configuration respects a natural box constraint. This tightening is essential in order to match the FTPL upper bound with the lower bound given in Theorem 2.3. The details of the proof are given in Section 6.3.
The lower bound of Theorem 2.3 shows that FTPL is regretoptimal up to polylogarithmic factors in the library size . In the following, we show that this extra polylog factor may be removed if we allow coded caching.
Achievability with coded caching:
In (paschos2019learning, ), Corollary 2, the authors showed that an Online Gradient Ascent (OGA)based singleserver caching policy, serving one filerequest per slot, achieves a regret of value at most . To avoid repetition, a description of the general version of the OGA policy in the context of network caching will be given in Section 3.1, which subsumes the single cache case.
3. Caching in a Content Distribution Network
We now begin investigating the problem of optimal caching in a Content Distribution Network (CDN). In this problem, there is a set of geographically distributed users who periodically request files to a content provider (e.g., Netflix). The content provider maintains a global network of data centers, each caching some files up to its capacity. A user’s filerequest may be served by its neighboring data centers if the requested file exists in any of the neighboring caches. This clientserver architecture gives rise to a bipartite content distribution network with the set of users and the set of caches constituting its two parts (shanmugam2013femtocaching, ). For a detailed casestudy on the above caching architecture, including the global distribution of the data centers and performance measurement in the context of Netflix CDN, please refer to (bottger2018open, ). In the following Section, we show how the tools and techniques, developed in the previous section for a single cache, may be generalized to address this more challenging problem.
3.1. System Model
We now formalize the above system model for caching in a CDN. A set of users is connected to a set of caches in the form of a bipartite network.
For simplicity, we assume that the caches are homogeneous in the sense that each cache has the same storage capacity . As before, the library size (i.e., the number of all possible files) is assumed to be sufficiently large. The connection between the users and the caches is represented by the bipartite graph . The set of caches connected to a user is denoted by . Similarly, the set of users connected to a cache is denoted by . The indegree of the cache is defined as . For the sake of simplicity, we assume the network to be right regular, i.e., . See Figure 2 for a schematic.
Each user requests one file per time slot. Each filerequest may be served by any (one or more) neighbouring caches. As before, the file request generated by a user at time is onehot encoded by an dimensional binary vector , with the interpretation that if and only if the ^{th} file is requested by the user at time . The cache configuration of the ^{th} cache at time is represented by the dimensional vector , with each component denoting the fraction of the corresponding coded file cached. The cache configuration must always satisfy the cachecapacity constraints. Thus, the set of all feasible cache configuration is given by:
(14) 
As before, for uncoded caching, we have
3.2. Reward and Regret
For content distribution networks, it will be useful to distinguish between elastic and inelastic contents, as defined below. Making this distinction is essential due to the possibility that, in a caching network, the same content may be cached and retrieved from more than one cache at a time. Recall that, for rate less codes, it is only the total amount of received encoded symbols that determine the decoding quality.
Elastic contents:
We call a content to be elastic if receiving multiple layers (i.e., resolutions) of the same content improves its overall utility for the users. Examples of elastic contents include multibitrate video files for adaptive streaming (gu2013multiple, ), (adaptive_video, ), multiresolution HD videos (holcomb2008multi, ), and erasurecoded files in faulttolerant distributed file system, such as Hadoop (dimakis2010network, ), (shvachko2010hadoop, ). In this setting, an incoming file request from the ^{th} user can be satisfied by fetching and combining parts of the cached layers of contents from different neighboring caches . Accordingly, for elastic contents, we define the oneslot reward to be the aggregate of the cachehits at that slot, i.e.,
(15) 
Hence, a user’s utility increases linearly as she receives more layers of the requested content from different neighboring caches.
Inelastic contents:
We call a content to be inelastic if the content has only a single layer (resolution) and a user is fully satisfied if she is able to retrieve the original content. Popular examples of inelastic contents include traditional webpages, databases, documents, and single resolution images. Similar to the elastic case, if the files are encoded by rate less MDS codes, a request from a user may be satisfied by combining different fractional parts of the content (from different neighboring caches), with the fractions adding up to unity. Consequently, we define the oneslot reward for inelastic content to be
(16) 
where is the allone vector, and the operator outputs a vector whose components are the pointwise minimum of the corresponding components of the input argument vectors. In comparison with the reward definition in Eqn. (15), the operator in Eqn. (16) takes into account the fact that receiving multiple fractions of an inelastic content, summing up to more than one, does not add additional utility. Hence, unlike the elastic contents, inelastic contents have bounded rewards. We note that the reward in Eqn. (16) coincides with the utility definition in Eqn. (13) of (paschos2019learning, ). The regret of any caching policy is defined exactly in the same way as in the single cache case via Eqns. (4) and (5).
3.3. Achievability for Caching Networks
In the following, we describe a simple and distributed gradientbased coded caching policy that achieves regret in both elastic and inelastic settings. We then propose an extension of the Follow the Perturbed Leader based uncoded caching policy given in Algorithm 1, which also achieves nearoptimal regret in the elastic setting.
Achievability with coded caching (paschos2019learning, ):
Let be a generic oneslot reward function, which is concave in the cacheconfiguration vector (e.g., could be chosen to be either or ). Let be a supergradient of at . The paper (paschos2019learning, ) describes the following Online Gradient Ascent (OGA)based caching policy: starting from any initial feasible configuration , iterate as follows:
(17) 
where is the set of all feasible cache configurations given in Eqn. (14), is the Euclidean projection operator on the set , and is an appropriate stepsize parameter. For the single user single cache setting of Section 2, we simply set
Distributed implementation:
The OGAbased caching policy can be implemented at each cache in a distributed fashion with locally available information only. This can be seen from the following two observations:

Since the cache capacity constraints are separable for different caches, the projection operation may be carried out separately for each cache as where is the single cache constraint set given by Eqn. (1).
We have the following achievability result for OGA:
Theorem 3.1 (Achievability with coded caching (paschos2019learning, )).
For both elastic and inelastic contents, the OGAbased caching policy (17) with step size , achieves the following upperbound on regret for a right regular bipartite network:
Proof outline:
In this proof, we appeal to Theorem 2 of (paschos2019learning, ), which gives a generic regret upper bound for the OGAbased caching policy. We conclude the proof of Theorem 3.1 by computing the diameter of the feasible set subject to the cachecapacity constraints. Note that we can not directly use the regret upperbound given in Theorem of (paschos2019learning, ) because there the authors make an assumption that only one user out of users may request for contents at a slot. In our model, there is no such restriction so that all users may simultaneously request for contents at a slot. Please refer to Appendix 6.4 for a proof of Theorem 3.1.
Achievability with uncoded caching:
For uncoded caching in a bipartite network, we propose a simple extension of the FTPL policy given in Algorithm 1 for a single cache. In this extension, each of the caches independently implements the FTPL policy irrespective of whether the content is elastic or inelastic. We have the following achievability result for elastic contents:
Theorem 3.2 (Achievability with uncoded caching for elastic contents).
For elastic contents, the FTPL caching policy with the noise parameter yields the following upperbound on expected regret for a right regular bipartite network:
3.4. Converse for Caching Networks
The question of regretoptimality of the OGA policy (17) for caching networks was left open in (paschos2019learning, ) due to lack of known lower bounds. In the following, we prove tight universal lower bounds for the regret, which applies to both coded and uncoded caching.
Theorem 3.3 (Lower bound for elastic contents).
For caching elastic contents in a bipartite network in the above set up with , the regret of any online caching policy is lower bounded as:
Proof outline:
In this proof, we construct a common randomized file request sequence , which is identical for each user. In other words, all users request the same random file at each slot. Thus, unlike most other applications of the probabilistic method, which usually proceeds with i.i.d. random variables, we consider a set of mutually dependent file request sequence. The expected reward accrued by any caching policy is then obtained by using the statistical symmetry of the file requests and the linearity of expectation. Finally, the static optimal caching configuration is identified, and the reward accrued by the optimal stationary policy is lower bounded by appealing to Lemma 2.2. Combining the above two results yields the regret lower bound. Please refer to Section 6.5 for the complete proof of Theorem 3.3.
The following theorem gives regret lower bound for caching inelastic content in a bipartite network.
Theorem 3.4 (Lower bound for inelastic contents).
For caching inelastic contents in a bipartite network in the above set up with , the regret of any online caching policy is lower bounded as:
Proof outline:
The principal difficulty in extending the argument from the proof of Theorem 3.3 to the inelastic case is the presence of nonlinearity in the reward function (16) in the form of operator. As a result, it becomes difficult to analyze the expected reward accrued by the optimal stationary caching configuration. To get around this obstacle, we lower bound the reward of the optimal caching configuration with the help of a carefully constructed suboptimal caching configuration . Interestingly, under the caching configuration , the nonlinearity of the reward function vanishes, which leads to tractable analysis. Similar to the proof of Theorem 3.3, this proof also uses an identical (i.e., dependent) file request sequence across all users. Please refer to Section 6.6 for the complete proof of Theorem 3.4.
Tightness:
Comparing the regret upper bounds in Theorems 3.1 and 3.2 with the lower bounds in Theorems 3.3 and 3.4, we see that, for elastic contents, the OGA and FTPL network caching policies are regretoptimal up to a constant and polylog factors respectively. For inelastic contents, the OGA policy is regretoptimal up to a factor of .
4. Experiments
Dataset description: In this section, we compare the performance of the existing caching policies with the proposed FTPL policy using a popular and stable benchmark  MovieLens M dataset (movieLens, ; harper2015movielens, ). This dataset contains M ratings for movies, along with the timestamps of the ratings. The ratings were given by unique users. For our experiments, we assume that the users request movies from a CDN (such as Netflix) in the same chronological order as the recorded timestamps of the ratings. A histogram of the movie request frequencies by all users together is shown in Fig. 5 of the Appendix.
Experimental Setup: Following the standard industry practice, the cache capacity of each cache is set to be a fixed fraction of the total library size , where we take . For the bipartite caching scenario, we assume that a total of users are connected to caches. Each cache has indegree . The first cache is connected to the users and , the second cache is connected to the users and , the third cache is connected to the users and , and the fourth cache is connected to the users and . The entire dataset with entries is uniformly divided into disjoint blocks. Each user is allocated one block of the dataset. It is assumed that each user makes requests serially from its allocated block in the chronological order.
Results: The timeaveraged regrets for different caching policies in the single cache setting and the bipartite network setting for both elastic and inelastic contents are plotted in Figure 3 (see the following page). From the plots in Figures 3 (a), 3 (b), and 3 (c), we conclude that the FTPL and LFU policies have the best performance in terms of average regret uniformly in all cases, and there is hardly any noticeable difference between their performance for large enough . These two policies also perform very close to the theoretical lower bounds (corresponding to the worstcase request sequence). On the other hand, we find that the LRU policy has the worst performance, followed by OGA, which performs only marginally better. It is quite surprising to find that the uncoded caching policy FTPL outperforms the coded caching policy OGA in all scenarios. Figure 6 in the Appendix shows the variation of the average regret as the cache capacity is increased from percent to percent of the library size for a fixed time . These plots confirm that the regret increases with the cache size. However, we find that the increase in the regret is smallest for the LFU and FTPL policies followed by the OGA and the LRU policy.
5. Conclusion and future work
In this paper, we obtain tight sublinear regret lower bounds for the online caching problem for single caches and bipartite caching networks. In the process, we derive a key technical result on the ballsintobins problem and utilize the result in deriving all our lower bounds. We also propose a new randomized caching policy, called FTPL, which is shown to be both sound in theory and superior in practice. We envision the following future research directions stemming from this work: (1) As an immediate followup, it will be interesting to narrow down the gap between the lower and upper regret bounds for inelastic contents in a bipartite caching network. Moreover, obtaining a regret guarantee for the FTPL policy for bipartite networks with inelastic contents would be nice. (2) We defined the reward functions primarily with the online performance of the caching policies (i.e., hit rates) in mind. In particular, our reward definitions do not take into account the system cost associated with cache replacements at every slot. Hence, it would be interesting to design an uncoded caching policy, which makes incremental changes to the caching configuration at every slot, yet matches the sublinear regret lower bounds. (3) A variation of the caching problem arises in the context of inventory management where, instead of digital files, physical commodities are stored in the caches (e.g., retail stores). Requests for the commodities arrive sequentially. The requested commodities, which are currently present in the cache, are immediately removed from the cache (e.g., sold). Hence, unlike in our setting, there is no scope of “coding”, and it would make sense to cache multiple copies of the same commodity at the same slot, subject to the cache capacity constraints (c.f., Eqn. (1)) . The performance of FTPLlike randomized caching policies will be exciting to investigate in this setup. (4) Finally, it would also be interesting to go beyond the singlehop setting of bipartite caching networks and design a regretoptimal joint routing and caching policy for multihop CDNs (liu2019joint, ).
6. Proof of the results
6.1. Proof of Lemma 2.2
We index the bins sequentially as . Next, we logically combine every two consecutive bins to obtain Super bins (See Figure 4). Let us denote the (random) number of balls in the ^{th} super bin by Conditioned on the r.v. , the number of balls in the corresponding bins: and are jointly distributed as , where is a binomial random variable with parameter . Let denote the maximum number of balls between the corresponding bins and . Then, as shown in the proof Theorem 2.1, when :
(18) 
Since we have
(19)  
where the equality (a) follows from the fact that the random variables have identical distribution. For equation (b), we write
Now, observe that if then a.s. Hence, almost surely, we have . The equation (c) follows from the tower property of conditional expectation, the inequality (d) follows from the bound (18), and the equality (e) follows from the facts that
The Lemma now follows by using the bounds on moments of the binomial distribution as computed next.
Bounding the Expectations in Eqn. (19):
For bounding the middle term in Eqn. (19), consider the factorization (stack_of, ):
The RHS is nonnegative for any . Thus, we have the following algebraic inequality:
Replacing the variable with the random variable point wise, we have almost surely,
Taking expectation of both sides, the above yields
(20) 
Finally, recall that . Hence, and . Using this, Eqn. (20) yields the following lower bound
(21) 
For bounding the last term in Eqn. (19), we write
Recall that Taking expectation of both sides of the above inequality, we have
The probability term in the above expression may be bounded using Chebyshev’s inequality as follows:
Taking the above bounds together, we obtain
(22) 
6.2. Proof of Theorem 2.3
To lower bound the regret in equation (7), as in Section 2.4, we consider a random file request sequence , each sampled independently and uniformly at random from the set of first unit vectors of dimension ^{3}^{3}3Recall that, the unit vector has one at its ^{th} coordinate and zeros everywhere else. . In other words, at every time slot, the user independently requests a random file from the set of first files uniformly at random.
The expected reward obtained by any caching policy, given by the second term in Eqn. (4), is now easy to evaluate:
(23) 
where, in (a), we have used the fact that the caching decision made at time is independent of the incoming file request , and in (b), we have made use of the cachecapacity constraint (1):
in addition to the fact that
Note that, for any given file request sequence , the optimal offline stationary cache configuration vector is obtained by caching the most popular files. Hence, the optimal vector , corresponding to the first term in Eqn. (4) is obtained by simply setting the coordinates corresponding to the maximum coordinates of the dimensional vector to unity. Given the distribution of the file request sequence, it immediately follows that the reward accrued by the optimal stationary policy is identically distributed to the total number of balls in the most heavily loaded bins when a total of balls are randomly thrown into bins. Finally, invoking Lemma 2.2, we have
(24) 
Combining equations (23) and (24), we obtain the following lower bound for regret in the single cache setting:
6.3. Proof of Theorem 2.4
Our proof follows a similar line of arguments as the proof of Theorem 1 of (cohen2015following, ). However, we improve the regret upper bound by a factor of . This additional improvement results from making use of the constraint that only one file is requested at every slot. The notations of (cohen2015following, ) are slightly altered in order to remain consistent throughout the paper.
Let the set denote the set of all possible uncoded caching configuration in the single cache setting. Clearly, Define the potential function
Also, denote the cumulative file request arrivals to the cache up to time by with Then, as shown in Eqn. (3) of (cohen2015following, ), the expected regret of the FTPL policy in Algorithm 1 with noise variance is upper bounded as ^{4}^{4}4The signs are flipped as we are in the rewards maximization setting, as opposed to the loss minimization setting of (cohen2015following, ).
(25) 
for some connecting the line segment and .
Next, we bound each of the above two terms separately. The first term may be bounded in the same way as in (cohen2015following, ):
where the last inequality follows from the fact that . Since only one file is requested at every slot, the quadratic form above may be upper bounded as
(26) 
Moreover, following Lemma 7 of (abernethy2014online, ), we have that
where Hence, using Jensen’s inequality we have that
(27) 
where the inequality (a) follows from the fact that for all , we have and the equality (b) follows from the fact that Hence, substituting the above bounds in Eqn. (25), we have the following upper bound on expected regret under the FTPL caching policy:
Finally, choosing yields the following regret upper bound for the FTPL policy:
6.4. Proof of Theorem 3.1
From Eqn. (15), we know that is linear (and hence, concave) in the cacheconfiguration vector . Moreover, since the pointwise minimum of linear functions is concave (bertsekas, ), it follows from Eqn. (16) that the reward function is also concave in the cacheconfiguration vector . To obtain a regret upper bound for the the OGA algorithm (17), we appeal to Theorem 2 of (paschos2019learning, ), which states that with an appropriate choice of the stepsize parameter ,
(28) 
where denotes the Euclidean diameter (rudin1964principles, ) of the set defined in (14), and is an upperbound for the norm of the (super) gradient of the reward function.
For bounding the diameter, consider any two vectors and from the set . We have
Comments
There are no comments yet.