# Fundamental Limits of Online Network-Caching

Optimal caching of files in a content distribution network (CDN) is a problem of fundamental and growing commercial interest. Although many different caching algorithms are in use today, the fundamental performance limits of network caching algorithms from an online learning point-of-view remain poorly understood to date. In this paper, we resolve this question in the following two settings: (1) a single user connected to a single cache, and (2) a set of users and a set of caches interconnected through a bipartite network. Recently, an online gradient-based coded caching policy was shown to enjoy sub-linear regret. However, due to the lack of known regret lower bounds, the question of the optimality of the proposed policy was left open. In this paper, we settle this question by deriving tight non-asymptotic regret lower bounds in both of the above settings. In addition to that, we propose a new Follow-the-Perturbed-Leader-based uncoded caching policy with near-optimal regret. Technically, the lower-bounds are obtained by relating the online caching problem to the classic probabilistic paradigm of balls-into-bins. Our proofs make extensive use of a new result on the expected load in the most populated half of the bins, which might also be of independent interest. We evaluate the performance of the caching policies by experimenting with the popular MovieLens dataset and conclude the paper with design recommendations and a list of open problems.

## Authors

• 7 publications
• 2 publications
• 30 publications
01/18/2021

### Online Caching with Optimal Switching Regret

We consider the classical uncoded caching problem from an online learnin...
04/22/2019

### Learning to Cache With No Regrets

This paper introduces a novel caching analysis that, contrary to prior w...
02/09/2021

### The Exact Rate Memory Tradeoff for Small Caches with Coded Placement

The idea of coded caching was introduced by Maddah-Ali and Niesen who de...
09/17/2020

### Caching in Networks without Regret

We consider the online problem where n users are connected to m caches ...
05/21/2019

### Fundamental Limits of Coded Caching: The Memory Rate Pair (K-1-1/K, 1/(K-1))

Maddah-Ali and Niesen, in a seminal paper, introduced the notion of code...
11/10/2017

### Practical Bounds on Optimal Caching with Variable Object Sizes

Many recent caching systems aim to improve hit ratios, but there is no g...
12/09/2019

### Similarity Caching: Theory and Algorithms

This paper focuses on similarity caching systems, in which a user reques...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction and Related work

[]T he classical caching problem, which seeks to make popular contents quickly accessible by prefetching them in low-latency storage, has been extensively studied in the literature. The core idea of caching has been used in many diverse domains, including improving the CPU paging performance via caches (silberschatz2006operating, ), web-caching by Content Distribution Networks (aggarwal1999caching, ; nygren2010akamai, ; amazon2015amazon, ), and low-latency wireless video delivery through Femtocaching (shanmugam2013femtocaching, ). With the exponential growth of internet video traffic and the advent of new services consuming high bandwidth, such as augmented and virtual reality (AR/VR), the importance of caching for ensuring the quality of service (QoS) is on the rise (chakareski2017vr, ). Top CDN providers, such as Amazon AWS and Microsoft Azure, now offer caching as a service (varia2014overview, ; chappell2010introducing, ).

Several caching algorithms have been proposed in the literature. The MIN algorithm (van2007short, ) is an optimal offline caching policy which assumes that the entire file request sequence is known non-causally in advance. MIN is often used as a benchmark for comparing the performance of the online caching policies. Among the online policies, the Least Recently Used policy (LRU), the Least Frequently Used policy (LFU) (lee1999existence, ), the FIFO policy (dan1990approximate, ), and the online coded caching policy (pedarsani2016online, ) have been studied extensively. However, the performance guarantees available for most of the online caching policies are highly contingent upon some a priori assumptions on the generative model of the file request sequence (breslau1999web, ; Zipf, ). The paper (jelenkovic2008characterizing, ) analyzed the performance of the LRU caching policy with an i.i.d. file request sequence, also known as the Independent Reference Model (vanichpun2004output, ). Under a Markovian assumption, the paper (pedarsani2016online, ) shows that a coded caching policy out-performs the LRU policy. The paper (flajolet1992birthday, ) develops a unified framework for analyzing a number of popular caching policies, again with a stationary file request model. On a different line of work, the papers (maddah2014fundamental, ; caching_rate1, ; caching_rate2, ; caching_rate3, ) derive information-theoretic lower bounds and efficient caching, computing, and coding schemes to facilitate bandwidth-efficient delivery of the cached files to the users.

With frequent addition of new content to the library, mobility of the users, Femtocaching with small caches, and change in the popularity distribution with time, the assumption of stationary file popularity barely holds in practice (traverso2015unravelling, ). This prompts us to consider the problem of caching from an online learning point-of-view with no a priori statistical assumption on the file request sequence. Our work is inspired by the recent paper (paschos2019learning, ), which describes an online gradient-based coded caching policy (OGA), and proves a sub-linear regret upper-bound for the same. Interestingly, they also show that popular uncoded caching policies, such as LRU, LFU, and FIFO, suffer from linear regrets in the worst case. In fact, no uncoded caching policy with a sub-linear regret is known previously in the literature. More seriously, no regret lower bound is known for the network-caching problem.

In contrast to the multi-armed bandits setting (DBLP:conf/colt/MagureanuCP14, ; combes2015learning, ; kveton2015tight, ; grunewalder2010regret, ; agrawal2013further, ), relatively few results are known for the regret lower bounds for online convex optimization problems. Technically, the network-caching problem is an instance of an online convex optimization problem with a piecewise linear reward function and polytope constraints. The paper (Abernethy08optimalstrategies, ) establishes a minimax regret lower bound for linear cost functions with hyper ball constraints. A regret lower bound for the unconstrained linear cost functions has been obtained in (hazan2006efficient, ). The papers (hazan2007logarithmic, ; hazan2014beyond, ) prove logarithmic regret bounds for online stochastic strongly convex problems. However, to the best of our knowledge, with the exception of (paschos2019learning, ), the regret for a linear cost function with a simplex and several box constraints, which arise in the context of the single cache problem, has not been studied before. Moreover, the problem of lower bounding the regret for a piecewise linear cost function with polytope constraints, which arise in the context network-caching, is completely open.

The above considerations inspire us to ask the following two questions in this paper:
Question 1. What is the fundamental performance limit of all online caching policies regardless of their operational constraints or computational complexity?
Question 2. Can a simple, distributed network-caching strategy be designed which meets the above fundamental limit?

In answering Question 1, we derive universal regret lower bounds that also apply to computationally intensive caching policies, which can completely change the profile of the cached contents at every time slot. Surprisingly enough, we answer Question 2 in the affirmative. In particular, our matching upper and lower bounds reveal that a simple gradient-based incremental coded caching policy is regret-optimal. Moreover, we propose a new Follow-the-Perturbed-Leader-based uncoded caching policy that has near-optimal regret. Hence, one of the key take-away points from this paper is that there exist computationally cheap caching policies that perform excellently in an online setting even with adversarial request sequence.

#### Our contributions:

In the process of answering the above questions, we make the following key technical contributions:

#### (1) Lower bound for a single cache:

In Theorem (2.3), we prove a tight non-asymptotic regret lower bound for the single-cache problem. This result improves upon the previously known asymptotic regret lower bound in (paschos2019learning, ), which can be arbitrarily loose for a sufficiently large library size.

#### (2) Lower bounds for caching networks:

In Theorems (3.3) and (3.4), we derive non-asymptotic sub-linear regret lower bounds for bipartite caching networks. We also show that the lower bounds are tight within constant factors. To the best of our knowledge, this is the first known regret lower bound for piecewise linear functions with polytope constraints. Hence, our results also contribute to the growing literature on online convex optimization.

#### (3) Near-optimal uncoded caching policy:

Although the above lower bounds certify the optimality of the gradient-based coded caching policy of (paschos2019learning, ), it was an open problem whether one could achieve the optimal regret without coding. In Algorithm 1, we propose a simple uncoded caching policy based on the Follow-the-Perturbed-Leader (FTPL) paradigm. Theorem 2.4 shows that the FTPL policy has near-optimal regret.

#### (4) New proof techniques:

Technically, the regret lower bounds are established by relating the online caching problem to the classic probabilistic setup of balls-into-bins via Lemma 2.2. In this Lemma, we derive a non-asymptotic lower bound to the expected total load in the most populated bins when balls are randomly thrown into bins. For our lower bound proofs, we are particularly interested in the regime . It is to be noted that classical results on randomized load balancing, such as (mitzenmacher1996power, ; mitzenmacher2017probability, ; gonnet1981expected, ), do not apply to this setting because they apply only to the regime where It is to be noted that the paper (ballsinbins, )

derives an asymptotic high-probability bound for the maximum load for various asymptotic regimes of

and . However, since we are primarily interested in the non-asymptotic expected value of Max-Load, the results of (ballsinbins, ) do not suffice for our purpose. Consequently, we tackle this problem from the first principles, culminating in Lemma 2.2. To the best of our knowledge, this is the first paper where a connection between online learning and the framework of balls-into-bins has been explicitly brought out and exploited in proving regret lower bounds.

#### (5) Numerical experiments

In Section 4, we compare the performance of different caching policies using the popular MovieLens 1 M dataset (movieLens, ). Our experiments reveal that the proposed FTPL policy beats other competitive caching policies in terms of long-term average regret.

## 2. Single Cache

In this section, we begin our investigation with the single cache problem. We establish a key technical lemma on the balls-into-bins problem, which is used in all of our lower bound proofs. Our analysis in this section improves upon the best-known regret lower bound for the single caches (paschos2019learning, ) and paves the way for analyzing the bipartite caching problem in Section 3.

### 2.1. System Model

In the classical caching problem with a single cache, there is a library of distinct files. A cache with a limited storage capacity can store at most files at any time slot. In practice, the cache capacity is significantly smaller compared to the entire library size (i.e., ). As an example, we may think of caching movie files in the Netflix data center, where the library size increases every day as new movies are released, but the physical cache-size in the Netflix data centers remains constant on the relevant time-scale. Time is slotted, and a user may request at most one file at a time. The file requests at time slot is represented by an

-dimensional binary vector

, where if the th file is requested by the user at time

, and is zero otherwise (one-hot encoding). Following an online caching policy

, files are cached at every time slot before the request for that slot arrives. We do not make any statistical assumption on the file request sequence . Thus, we may as well assume that the requests are made by an omniscient adversary who has complete knowledge of the cached contents and the caching policy in use. See Figure 1 for a schematic.

#### Coded Caching:

In this paper, we consider both coded and uncoded caching. In the classical uncoded caching, complete files are cached. On the other hand, in coded caching, the original files are first encoded using fountain codes (a class of rateless erasure codes), e.g., Raptor code (shokrollahi2006raptor, ; luby2011raptorq, ), and then some of the resulting coded symbols are cached. These codes have the property that an original file consisting of source symbols can be recovered (with high probability) by combining any subset of coded symbols, where needs to be only slightly larger than . Hence, for decoding, it does not matter which encoding symbols are combined, as long as the decoder has access to sufficiently many encoded symbols. We will see that coding offers distinct advantages for network caching. These codes also admit highly efficient linear time encoding and decoding operations. The rateless codes are routinely used in P2P data streaming, large scale data centers, and CDNs (wu2007rstream, ).

#### Caching configuration:

The cache configuration at time is represented by an -dimensional vector , where denotes the fraction of the file cached at time under the policy 111Whenever the caching policy is clear from the context, we will drop the superscript to simplify the notation.. Naturally, in the uncoded case . The set of all admissible caching configuration is denoted by where

 (1) Y={y∈[0,1]N:N∑f=1yf≤C},

where is the capacity of the cache. The caching decision may be randomized and may depend on the file request sequence and caching decisions up to time . Any requested file, not present in the cache, is routed to a central server and accrues zero reward.

### 2.2. Reward

A popular performance metric for any online caching policy is its average hit rate, i.e., the average number of requested files already present in the cache so that the files can be quickly retrieved. In this connection, let the user-generated file request sequence be denoted by . The total reward up to time accrued by a caching policy, which responds to the file requests by setting the cache configuration to , at time , is denoted by . We assume that the cumulative reward over a time horizon has an additive structure, which may be obtained by summing up the rewards obtained at every slot up to time , i.e.,

 (2) Q({xt}T1,{yt}T1)=T∑t=1q(xt,yt).

The one-slot reward function captures the reward obtained per slot. Intuitively, denotes the extent of cache-hits for the request vector against the cache configuration vector . The function takes different functional forms depending on whether we consider (a) single cache, or (b) a network of caches. For the case of a single cache, we define the one-slot reward to be the amount of the requests successfully served by the cache, i.e.,

 (3) q(xt,yt)≡xt⋅yt.

A generalized definition of the above one-slot reward function applicable to caching networks will be given in Section 3.

### 2.3. Regret

Since we do not make any assumption on the user-generated file request sequence , it is futile to attempt to optimize the total reward (i.e., hit rate). This is because, at any slot, an omniscient adversary can always request a file which is not present in the cache, thus yielding a total of zero hits. To obtain a non-trivial performance measure, we cast the caching problem into the framework of online learning. This prompts us to compare the performance of any policy with the best offline stationary optimal policy (shalev2012online, ). Let the vector denote any fixed stationary cache configuration vector. The vector may be selected offline after seeing the entire request sequence . Following the usual convention in the online learning literature, we define the regret for a request sequence to be the difference in the reward obtained by the best stationary caching configuration and that of the online policy . Mathematically222Recall that the cache configuration sequence is determined by the policy .,

 (4)

The regret of any caching policy up to time is defined to be the maximum regret over all admissible request sequences, i.e.,

 (5) RπT:=sup{xt}T1RπT({xt}T1).

### 2.4. Regret lower bounds - preliminaries

Using the simple observation that the maximum of a set of real numbers is at least equal to their average, for any joint probability distribution

on the file request sequence , the regret in Eqn. (5) may be lower bounded as follows:

 (6) RπT≥E{Xt}T1RπT({Xt}T1).

needs to be chosen carefully to ensure the tractability of evaluating the expectation in (6), as well as the tightness of the resulting bound. The above technique is known as the probabilistic method popularized by Erdős (alon2004probabilistic, ). In all of our proofs, we lower bound the quantity in Eqn. (4) by a suitable binary cache configuration vector (thus corresponds to uncoded caching). As a consequence, all of our lower bounds remain valid in both uncoded and coded caching.

### 2.5. Regret lower bound for unit-sized cache

We first consider a single cache of unit capacity and derive a universal non-asymptotic regret bound. As we will see in the sequel, understanding this special case lays the foundation to our subsequent analysis of caching networks serving multiple users.

###### Theorem 2.1 (Lower bound for a single-cache of unit capacity).

The regret of any online caching policy, for a library-size and cache-capacity , is lower bounded as:

 RπT≥√T2π−12√2πT,  ∀T≥1.
###### Proof.

Denote the cache configuration selected by the caching policy at slot by the vector , where denotes the fraction of cached by the policy (). Recall the definition of regret in this context:

 (7) RπT=sup{xt}T1supy∗∈Y(y∗⋅T∑t=1xt−T∑t=1yt⋅xt),

where the set of all admissible configurations is given by Eqn. (1). Denote the file request vector at slot by , where . It can be easily seen that for a given file request sequence , an optimal choice of the fixed cache configuration vector in Eqn. (7) is given as follows:

 y∗={(10)′, if ∑Tt=1wt≥T/2,(01)′, if ∑twTt=1

To prove a universal regret lower bound, we need to show the existence of an adversarial file request sequence under which the caching policy performs poorly. Towards this, let

be a sequence of i.i.d. uniform Bernoulli random variables such that

. Construct a random file request sequence . The regret incurred for the sequence may be obtained from Eqn. (7) as:

 RπT({Xt}t) = max{T∑t=1Wt,T−T∑t=1Wt}−T∑t=1(γtWt+(1−γt)(1−Wt)) = max{Z,T−Z}−T∑t=1(2γtWt+1−γt−Wt),

where the r.v. , being the summation of

i.i.d. uniform Bernoulli variables, is Binomially distributed with parameter

. Using linearity of expectation, we can write

 (8) E{Xt}T1(RπT({Xt)}T1)=E(max{Z,T−Z})−T/2,

where we have used the fact that and the caching decision is independent of the incoming request . Observe that, we can write

 (9) max{Z,T−Z} = T2+|Z−T/2|.

Thus, combining Eqns. (8) and (9), we have

 (10) E{Xt}T1(RπT({Xt)}T1)=E∣∣Z−T/2∣∣.

The mean absolute deviation for a symmetric binomial random variable may be computed in closed form by using De Moivre’s formula ((berend2013sharp, ), Eqn. (1)) as follows:

 (11) E∣∣Z−T2∣∣=12T(⌊T2⌋+1)(T⌊T2⌋+1).

Eqn. (11), in combination with non-asymptotic form of Stirling’s formula (robbins1955remark, ), yields the following non-asymptotic lower bound

 (12) E∣∣Z−T2∣∣≥√T2π−12√2πT,  ∀T≥1.

For details of the above calculations, please refer to Appendix 7.1. Equation (12), coupled with Eqn. (10), shows the existence of a file request sequence such that

 RπT({xt}T1)≥√T2π−12√2πT,  ∀T≥1.

Remarks: It can be seen that the above proof and the lower bound in Theorem 2.1 continue to hold even if we let the library size to be . This observation follows by constructing a randomized file request sequence where the first two files are requested with probability each and other files are requested with zero probability.

### 2.6. Lower bound for caches of arbitrary size

We now extend the previous result to caches with arbitrary size . This extension is non-trivial. We will see in the sequel that our analysis naturally leads us to investigate a random variable arising in connection with the classic probabilistic framework of balls-into-bins, where a number of balls are thrown uniformly and independently at random to some bins (ballsinbins, ). The following lemma, which might be of independent interest, gives a non-asymptotic lower bound to the total number of balls in the most populated half of the bins.

###### Lemma 2.2 (Total occupancy in the most popular half).

Suppose that balls are thrown independently and uniformly at random into bins. Let the random variable denote the number of balls in the most populated bins. Then

 E(MC(T))≥T2+√CT2π−(√2+1)C3/22√2πT−√2πC2T.

#### Proof outline:

The proof proceeds by pairing up the bins to form super bins (see Fig. 4), and then selects the most-occupied bin in each super bin to obtain a lower bound on . Finally, we conclude the proof by appealing to the mean-deviation bound in Eqn. (12). Please refer to Section 6.1 for the complete proof of Lemma 2.2.

To improve readability, in the rest of the paper we will rephrase the above bound as

 E(MC(T))≥T2+√CT2π−Θ(1√T),

with the understanding that an explicit form of the lower order terms may be obtained by using Lemma 2.2, if required. The following Theorem is the main result of this section.

###### Theorem 2.3 (Lower bound for a single cache of arbitrary capacity).

The regret of any online caching policy , for a library size and cache capacity with , is lower bounded as

 (13) RπT≥√CT2π−Θ(1√T),  ∀ T≥1.

#### Proof outline:

The proof proceeds along the lines of Theorem 2.1, where the th file requested is chosen independently and uniformly at random from the first files from the library. Then, we show that the reward accrued by the best fixed cache configuration corresponds (in distribution) to the total number of balls in the most populated half of the bins. We then conclude the proof by appealing to Lemma 2.2. Please refer to Section 6.2 for the complete proof of Theorem 2.3.

#### Comparison with Theorem 1 of (paschos2019learning, ):

In the single cache setting, the paper (paschos2019learning, ) establishes a rather loose asymptotic regret lower bound of , which could be arbitrarily smaller than the lower bound given in Eqn. (13) when the library size is sufficiently large. Theorem 2.3 improves the result in (paschos2019learning, ) in two ways. First, it proves a regret lower bound that is independent of . As a consequence, we will soon see that it implies that the gradient-based coded policy of (paschos2019learning, ) is regret-optimal up to a constant factor. Theorem 2.3 also implies near optimality of the uncoded FTPL policy described in the next section. Second, unlike the regret lower bound in (paschos2019learning, ), the bound in Eqn. (2.3) is non-asymptotic, thus giving a valid lower bound for any .

### 2.7. Achievability

We note that many popular classical uncoded caching policies, such as LRU, LFU, and FIFO, have linear regrets (Proposition 1 of (paschos2019learning, )). This can be simply understood from the following example: consider the single cache setting with and an alternating file request sequence . Since any missed content is always immediately loaded to the cache, each of the above policies gets zero cumulative hits. On the other hand, caching either one of the files forever achieves a total of cumulative hits for a horizon of length . Thus, all of the above policies have regret. This is surprising as the LRU and FIFO policies are known to have a finite competitive ratio (albers, ). To the best of our knowledge, no uncoded caching policy with sub-linear regret is known in the literature.

#### Achievability with uncoded caching:

Making use of the theory of Online Structured Learning (cohen2015following, ), we now propose a simple Follow the Perturbed Leader (FTPL)-based uncoded caching policy, which achieves expected regret against an oblivious adversary. The FTPL

policy maintains a cumulative running count of the number of times a file was requested so far. This count is then perturbed by adding i.i.d. Gaussian noise of zero mean and an appropriate variance to each of the count values. Finally, at each time slot, the top

files with the highest perturbed count are loaded to the cache. The FTPL policy is formally described below in Algorithm 1.

We prove the following achievability bound for the FTPL policy.

###### Theorem 2.4 (Achievability with uncoded caching).

In the single cache setting, the FTPL uncoded caching policy achieves the following upper bound for expected regret (the expectation is taken over the randomness of the algorithm)

 E{γt}t≥1(RFTPLT)≤1.51(logN)1/4√CT.

The proof of Theorem 2.4 follows a similar line of arguments as the proof of Theorem 1 of (cohen2015following, ). However, in this paper, we tighten the regret upper bound of (cohen2015following, ) further by a factor of . This improvement follows by taking into account the constraint that only one file is requested per slot and any feasible cache configuration respects a natural box constraint. This tightening is essential in order to match the FTPL upper bound with the lower bound given in Theorem 2.3. The details of the proof are given in Section 6.3.
The lower bound of Theorem 2.3 shows that FTPL is regret-optimal up to poly-logarithmic factors in the library size . In the following, we show that this extra poly-log factor may be removed if we allow coded caching.

#### Achievability with coded caching:

In (paschos2019learning, ), Corollary 2, the authors showed that an Online Gradient Ascent (OGA)-based single-server caching policy, serving one file-request per slot, achieves a regret of value at most . To avoid repetition, a description of the general version of the OGA policy in the context of network caching will be given in Section 3.1, which subsumes the single cache case.

## 3. Caching in a Content Distribution Network

We now begin investigating the problem of optimal caching in a Content Distribution Network (CDN). In this problem, there is a set of geographically distributed users who periodically request files to a content provider (e.g., Netflix). The content provider maintains a global network of data centers, each caching some files up to its capacity. A user’s file-request may be served by its neighboring data centers if the requested file exists in any of the neighboring caches. This client-server architecture gives rise to a bipartite content distribution network with the set of users and the set of caches constituting its two parts (shanmugam2013femtocaching, ). For a detailed case-study on the above caching architecture, including the global distribution of the data centers and performance measurement in the context of Netflix CDN, please refer to (bottger2018open, ). In the following Section, we show how the tools and techniques, developed in the previous section for a single cache, may be generalized to address this more challenging problem.

### 3.1. System Model

We now formalize the above system model for caching in a CDN. A set of users is connected to a set of caches in the form of a bipartite network. For simplicity, we assume that the caches are homogeneous in the sense that each cache has the same storage capacity . As before, the library size (i.e., the number of all possible files) is assumed to be sufficiently large. The connection between the users and the caches is represented by the bipartite graph . The set of caches connected to a user is denoted by . Similarly, the set of users connected to a cache is denoted by . The in-degree of the cache is defined as . For the sake of simplicity, we assume the network to be right -regular, i.e., . See Figure 2 for a schematic.
Each user requests one file per time slot. Each file-request may be served by any (one or more) neighbouring caches. As before, the file request generated by a user at time is one-hot encoded by an -dimensional binary vector , with the interpretation that if and only if the th file is requested by the user at time . The cache configuration of the th cache at time is represented by the -dimensional vector , with each component denoting the fraction of the corresponding coded file cached. The cache configuration must always satisfy the cache-capacity constraints. Thus, the set of all feasible cache configuration is given by:

 (14) YJ={(yj,j∈J):N∑f=1yjf≤C,∀j∈J,0≤y≤1}.

As before, for uncoded caching, we have

### 3.2. Reward and Regret

For content distribution networks, it will be useful to distinguish between elastic and inelastic contents, as defined below. Making this distinction is essential due to the possibility that, in a caching network, the same content may be cached and retrieved from more than one cache at a time. Recall that, for rate less codes, it is only the total amount of received encoded symbols that determine the decoding quality.

#### Elastic contents:

We call a content to be elastic if receiving multiple layers (i.e., resolutions) of the same content improves its overall utility for the users. Examples of elastic contents include multi-bitrate video files for adaptive streaming (gu2013multiple, ), (adaptive_video, ), multi-resolution HD videos (holcomb2008multi, ), and erasure-coded files in fault-tolerant distributed file system, such as Hadoop (dimakis2010network, ), (shvachko2010hadoop, ). In this setting, an incoming file request from the th user can be satisfied by fetching and combining parts of the cached layers of contents from different neighboring caches . Accordingly, for elastic contents, we define the one-slot reward to be the aggregate of the cache-hits at that slot, i.e.,

 (15) qelastic(xt,yt)≡∑i∈Ixit⋅(∑j∈∂+(i)yjt).

Hence, a user’s utility increases linearly as she receives more layers of the requested content from different neighboring caches.

#### Inelastic contents:

We call a content to be inelastic if the content has only a single layer (resolution) and a user is fully satisfied if she is able to retrieve the original content. Popular examples of inelastic contents include traditional webpages, databases, documents, and single resolution images. Similar to the elastic case, if the files are encoded by rate less MDS codes, a request from a user may be satisfied by combining different fractional parts of the content (from different neighboring caches), with the fractions adding up to unity. Consequently, we define the one-slot reward for inelastic content to be

 (16) qinelastic(xt,yt)≡∑i∈Ixit⋅min{1,(∑j∈∂+(i)yjt)},

where is the all-one vector, and the operator outputs a vector whose components are the pointwise minimum of the corresponding components of the input argument vectors. In comparison with the reward definition in Eqn. (15), the operator in Eqn. (16) takes into account the fact that receiving multiple fractions of an inelastic content, summing up to more than one, does not add additional utility. Hence, unlike the elastic contents, inelastic contents have bounded rewards. We note that the reward in Eqn. (16) coincides with the utility definition in Eqn. (13) of (paschos2019learning, ). The regret of any caching policy is defined exactly in the same way as in the single cache case via Eqns. (4) and (5).

### 3.3. Achievability for Caching Networks

In the following, we describe a simple and distributed gradient-based coded caching policy that achieves regret in both elastic and inelastic settings. We then propose an extension of the Follow the Perturbed Leader- based uncoded caching policy given in Algorithm 1, which also achieves near-optimal regret in the elastic setting.

#### Achievability with coded caching (paschos2019learning, ):

Let be a generic one-slot reward function, which is concave in the cache-configuration vector (e.g., could be chosen to be either or ). Let be a supergradient of at . The paper (paschos2019learning, ) describes the following Online Gradient Ascent (OGA)-based caching policy: starting from any initial feasible configuration , iterate as follows:

 (17) yt+1=ΠYJ(yt+ηgt),

where is the set of all feasible cache configurations given in Eqn. (14), is the Euclidean projection operator on the set , and is an appropriate step-size parameter. For the single user- single cache setting of Section 2, we simply set

#### Distributed implementation:

The OGA-based caching policy can be implemented at each cache in a distributed fashion with locally available information only. This can be seen from the following two observations:

1. A separable supergradient for both the objective functions and may be obtained, such that the gradient ascent steps in Eqn. (17) can be carried out locally. For an expression of such a supergradient, see Eqn. (30).

2. Since the cache capacity constraints are separable for different caches, the projection operation may be carried out separately for each cache as where is the single cache constraint set given by Eqn. (1).

We have the following achievability result for OGA:

###### Theorem 3.1 (Achievability with coded caching (paschos2019learning, )).

For both elastic and inelastic contents, the OGA-based caching policy (17) with step size , achieves the following upper-bound on regret for a right -regular bipartite network:

 ROGAT≤d|J|√2CT.

#### Proof outline:

In this proof, we appeal to Theorem 2 of (paschos2019learning, ), which gives a generic regret upper bound for the OGA-based caching policy. We conclude the proof of Theorem 3.1 by computing the diameter of the feasible set subject to the cache-capacity constraints. Note that we can not directly use the regret upper-bound given in Theorem of (paschos2019learning, ) because there the authors make an assumption that only one user out of users may request for contents at a slot. In our model, there is no such restriction so that all users may simultaneously request for contents at a slot. Please refer to Appendix 6.4 for a proof of Theorem 3.1.

#### Achievability with uncoded caching:

For uncoded caching in a bipartite network, we propose a simple extension of the FTPL policy given in Algorithm 1 for a single cache. In this extension, each of the caches independently implements the FTPL policy irrespective of whether the content is elastic or inelastic. We have the following achievability result for elastic contents:

###### Theorem 3.2 (Achievability with uncoded caching for elastic contents).

For elastic contents, the FTPL caching policy with the noise parameter yields the following upper-bound on expected regret for a right -regular bipartite network:

 E{γt}t(RFTPLT)≤1.51(logN)1/4d|J|√CT.

See Appendix 7.2 for a proof of Theorem 3.2.

### 3.4. Converse for Caching Networks

The question of regret-optimality of the OGA policy (17) for caching networks was left open in (paschos2019learning, ) due to lack of known lower bounds. In the following, we prove tight universal lower bounds for the regret, which applies to both coded and uncoded caching.

###### Theorem 3.3 (Lower bound for elastic contents).

For caching elastic contents in a bipartite network in the above set up with , the regret of any online caching policy is lower bounded as:

 RπT≥d|J|√CT2π−Θ(1√T),  ∀T≥1.

#### Proof outline:

In this proof, we construct a common randomized file request sequence , which is identical for each user. In other words, all users request the same random file at each slot. Thus, unlike most other applications of the probabilistic method, which usually proceeds with i.i.d. random variables, we consider a set of mutually dependent file request sequence. The expected reward accrued by any caching policy is then obtained by using the statistical symmetry of the file requests and the linearity of expectation. Finally, the static optimal caching configuration is identified, and the reward accrued by the optimal stationary policy is lower bounded by appealing to Lemma 2.2. Combining the above two results yields the regret lower bound. Please refer to Section 6.5 for the complete proof of Theorem 3.3.

The following theorem gives regret lower bound for caching inelastic content in a bipartite network.

###### Theorem 3.4 (Lower bound for inelastic contents).

For caching inelastic contents in a bipartite network in the above set up with , the regret of any online caching policy is lower bounded as:

 RπT≥d√|J|CT2π−Θ(1√T),  ∀T≥1.

#### Proof outline:

The principal difficulty in extending the argument from the proof of Theorem 3.3 to the inelastic case is the presence of non-linearity in the reward function (16) in the form of operator. As a result, it becomes difficult to analyze the expected reward accrued by the optimal stationary caching configuration. To get around this obstacle, we lower bound the reward of the optimal caching configuration with the help of a carefully constructed sub-optimal caching configuration . Interestingly, under the caching configuration , the non-linearity of the reward function vanishes, which leads to tractable analysis. Similar to the proof of Theorem 3.3, this proof also uses an identical (i.e., dependent) file request sequence across all users. Please refer to Section 6.6 for the complete proof of Theorem 3.4.

#### Tightness:

Comparing the regret upper bounds in Theorems 3.1 and 3.2 with the lower bounds in Theorems 3.3 and 3.4, we see that, for elastic contents, the OGA and FTPL network caching policies are regret-optimal up to a constant and poly-log factors respectively. For inelastic contents, the OGA policy is regret-optimal up to a factor of .

## 4. Experiments

Dataset description: In this section, we compare the performance of the existing caching policies with the proposed FTPL policy using a popular and stable benchmark - MovieLens M dataset (movieLens, ; harper2015movielens, ). This dataset contains M ratings for movies, along with the timestamps of the ratings. The ratings were given by unique users. For our experiments, we assume that the users request movies from a CDN (such as Netflix) in the same chronological order as the recorded timestamps of the ratings. A histogram of the movie request frequencies by all users together is shown in Fig. 5 of the Appendix.

Experimental Setup: Following the standard industry practice, the cache capacity of each cache is set to be a fixed fraction of the total library size , where we take . For the bipartite caching scenario, we assume that a total of users are connected to caches. Each cache has in-degree . The first cache is connected to the users and , the second cache is connected to the users and , the third cache is connected to the users and , and the fourth cache is connected to the users and . The entire dataset with entries is uniformly divided into disjoint blocks. Each user is allocated one block of the dataset. It is assumed that each user makes requests serially from its allocated block in the chronological order.

Results: The time-averaged regrets for different caching policies in the single cache setting and the bipartite network setting for both elastic and inelastic contents are plotted in Figure 3 (see the following page). From the plots in Figures 3 (a), 3 (b), and 3 (c), we conclude that the FTPL and LFU policies have the best performance in terms of average regret uniformly in all cases, and there is hardly any noticeable difference between their performance for large enough . These two policies also perform very close to the theoretical lower bounds (corresponding to the worst-case request sequence). On the other hand, we find that the LRU policy has the worst performance, followed by OGA, which performs only marginally better. It is quite surprising to find that the uncoded caching policy FTPL outperforms the coded caching policy OGA in all scenarios. Figure 6 in the Appendix shows the variation of the average regret as the cache capacity is increased from percent to percent of the library size for a fixed time . These plots confirm that the regret increases with the cache size. However, we find that the increase in the regret is smallest for the LFU and FTPL policies followed by the OGA and the LRU policy.

## 5. Conclusion and future work

In this paper, we obtain tight sub-linear regret lower bounds for the online caching problem for single caches and bipartite caching networks. In the process, we derive a key technical result on the balls-into-bins problem and utilize the result in deriving all our lower bounds. We also propose a new randomized caching policy, called FTPL, which is shown to be both sound in theory and superior in practice. We envision the following future research directions stemming from this work: (1) As an immediate follow-up, it will be interesting to narrow down the gap between the lower and upper regret bounds for inelastic contents in a bipartite caching network. Moreover, obtaining a regret guarantee for the FTPL policy for bipartite networks with inelastic contents would be nice. (2) We defined the reward functions primarily with the online performance of the caching policies (i.e., hit rates) in mind. In particular, our reward definitions do not take into account the system cost associated with cache replacements at every slot. Hence, it would be interesting to design an uncoded caching policy, which makes incremental changes to the caching configuration at every slot, yet matches the sublinear regret lower bounds. (3) A variation of the caching problem arises in the context of inventory management where, instead of digital files, physical commodities are stored in the caches (e.g., retail stores). Requests for the commodities arrive sequentially. The requested commodities, which are currently present in the cache, are immediately removed from the cache (e.g., sold). Hence, unlike in our setting, there is no scope of “coding”, and it would make sense to cache multiple copies of the same commodity at the same slot, subject to the cache capacity constraints (c.f., Eqn. (1)) . The performance of FTPL-like randomized caching policies will be exciting to investigate in this setup. (4) Finally, it would also be interesting to go beyond the single-hop setting of bipartite caching networks and design a regret-optimal joint routing and caching policy for multi-hop CDNs (liu2019joint, ).

## 6. Proof of the results

### 6.1. Proof of Lemma 2.2

We index the bins sequentially as . Next, we logically combine every two consecutive bins to obtain Super bins (See Figure 4). Let us denote the (random) number of balls in the th super bin by Conditioned on the r.v. , the number of balls in the corresponding bins: and are jointly distributed as , where is a binomial random variable with parameter . Let denote the maximum number of balls between the corresponding bins and . Then, as shown in the proof Theorem 2.1, when :

 (18) E(Hi|Xi)≥Xi2+√Xi2π−12√2πXi, ∀1≤i≤C.

Since we have

 (19) E(MC) ≥ E(C∑i=1Hi)=C∑i=1E(Hi)(a)=CE(H1)(b)=CE(H11(X1>0)) (c)= CEE(H11(X1>0)|X1) (d)≥ CE(X11(X1>0)2+√X12π1(X1>0)−1(X1>0)2√2πX1) (e)= C2E(X1)+C√2πE(√X1)−C2√2πE(1(X1>0)√X1),

where the equality (a) follows from the fact that the random variables have identical distribution. For equation (b), we write

 H1=H11(X1=0)+H11(X1>0).

Now, observe that if then a.s. Hence, almost surely, we have . The equation (c) follows from the tower property of conditional expectation, the inequality (d) follows from the bound (18), and the equality (e) follows from the facts that

The Lemma now follows by using the bounds on moments of the binomial distribution as computed next.

#### Bounding the Expectations in Eqn. (19):

For bounding the middle term in Eqn. (19), consider the factorization (stack_of, ):

 √x−(1+x−12−(x−1)22)=√x2(√x−1)2(√x+2).

The RHS is non-negative for any . Thus, we have the following algebraic inequality:

 √x≥1+x−12−(x−1)22,  ∀ x≥0.

Replacing the variable with the random variable point wise, we have almost surely,

 √X1E(X1)≥1+X1E(X1)−12−(X1E(X1)−1)22.

Taking expectation of both sides, the above yields

 (20) E(√X1)≥√E(X1)(1−Var(X1)2(E(X1))2).

Finally, recall that . Hence, and . Using this, Eqn. (20) yields the following lower bound

 (21) E(√X1)≥√TC−12√CT.

For bounding the last term in Eqn. (19), we write

 1(X1>0)√X1≤11(X1≤T2C)+√2CT.

Recall that Taking expectation of both sides of the above inequality, we have

 E(1(X1>0)√X1)≤P(X1≤12E(X1))+√2CT.

The probability term in the above expression may be bounded using Chebyshev’s inequality as follows:

 P(X1≤12E(X1)) = P(X1−E(X1)≤−12E(X1)) ≤ P(|X1−E(X1)|≥12E(X1)) ≤ 4Var(X1)(E(X1))2=4C−1T

Taking the above bounds together, we obtain

 (22)

Finally, combining Eqns. (19), (21), and (22) together, we obtain

 E(MC)≥T2+√CT2π−(√2+1)C3/22√2πT−√2πC2T.   ■

### 6.2. Proof of Theorem 2.3

To lower bound the regret in equation (7), as in Section 2.4, we consider a random file request sequence , each sampled independently and uniformly at random from the set of first unit vectors of dimension 333Recall that, the unit vector has one at its th coordinate and zeros everywhere else. . In other words, at every time slot, the user independently requests a random file from the set of first files uniformly at random.

The expected reward obtained by any caching policy, given by the second term in Eqn. (4), is now easy to evaluate:

 (23) E(T∑t=1Yt⋅Xt)(a)=T∑t=1Yt⋅(EXt)=12CT∑t=12C∑k=1Ytk(b)≤T2,

where, in (a), we have used the fact that the caching decision made at time is independent of the incoming file request , and in (b), we have made use of the cache-capacity constraint (1):

 2C∑k=1Ytk≤N∑k=1Ytk≤C,

in addition to the fact that

 E(Xtk)={12C,  ∀1≤k≤2C0  o.w.

Note that, for any given file request sequence , the optimal offline stationary cache configuration vector is obtained by caching the most popular files. Hence, the optimal vector , corresponding to the first term in Eqn. (4) is obtained by simply setting the coordinates corresponding to the maximum coordinates of the -dimensional vector to unity. Given the distribution of the file request sequence, it immediately follows that the reward accrued by the optimal stationary policy is identically distributed to the total number of balls in the most heavily loaded bins when a total of balls are randomly thrown into bins. Finally, invoking Lemma 2.2, we have

 (24)

Combining equations (23) and (24), we obtain the following lower bound for regret in the single cache setting:

 RT ≥ E{Xt}Tt=1(Y∗⋅T∑t=1Xt−T∑t=1Yt⋅Xt) ≥ √CT2π−Θ(1√T).   ■

### 6.3. Proof of Theorem 2.4

Our proof follows a similar line of arguments as the proof of Theorem 1 of (cohen2015following, ). However, we improve the regret upper bound by a factor of . This additional improvement results from making use of the constraint that only one file is requested at every slot. The notations of (cohen2015following, ) are slightly altered in order to remain consistent throughout the paper.

Let the set denote the set of all possible uncoded caching configuration in the single cache setting. Clearly, Define the potential function

 Φη(x)=Eγ∼N(0,I)[maxy∈Y⟨y,x+ηγ⟩].

Also, denote the cumulative file request arrivals to the cache up to time by with Then, as shown in Eqn. (3) of (cohen2015following, ), the expected regret of the FTPL policy in Algorithm 1 with noise variance is upper bounded as 444The signs are flipped as we are in the rewards maximization setting, as opposed to the loss minimization setting of (cohen2015following, ).

 (25) E(RT)≤Φη(X1)+12T∑t=1⟨xt,∇2Φη(~xt)xt)⟩,

for some connecting the line segment and .
Next, we bound each of the above two terms separately. The first term may be bounded in the same way as in (cohen2015following, ):

 Φη(X1)≤η√2Clog(NC)≤Cη√2logN,

where the last inequality follows from the fact that . Since only one file is requested at every slot, the quadratic form above may be upper bounded as

 (26) ⟨xt,∇2Φη(~xt)xt)⟩≤maxi,x(|∇2Φη(x)∣∣)ii.

Moreover, following Lemma 7 of (abernethy2014online, ), we have that

 (∇2Φη(x))ij=1ηE[^y(~xt+ηγ)iγj],

where Hence, using Jensen’s inequality we have that

 (27) (∣∣∇2Φη(x)∣∣)ii≤1ηE[|^y(~xt+ηγ)i||γi|](a)≤1ηE[|γi|](b)=1η√2π,

where the inequality (a) follows from the fact that for all , we have and the equality (b) follows from the fact that Hence, substituting the above bounds in Eqn. (25), we have the following upper bound on expected regret under the FTPL caching policy:

 E(RT)≤Cη√2logN+Tη√2π.

Finally, choosing yields the following regret upper bound for the FTPL policy:

 E{γt}t(RFTPLT)≤1.51(logN)1/4√CT.   ■

### 6.4. Proof of Theorem 3.1

From Eqn. (15), we know that is linear (and hence, concave) in the cache-configuration vector . Moreover, since the pointwise minimum of linear functions is concave (bertsekas, ), it follows from Eqn. (16) that the reward function is also concave in the cache-configuration vector . To obtain a regret upper bound for the the OGA algorithm (17), we appeal to Theorem 2 of (paschos2019learning, ), which states that with an appropriate choice of the step-size parameter ,

 (28) ROGAT≤diam(YJ)L√T,

where denotes the Euclidean diameter (rudin1964principles, ) of the set defined in (14), and is an upper-bound for the -norm of the (super) gradient of the reward function.
For bounding the diameter, consider any two vectors and from the set . We have

 ||y−z||