I Introduction
The prevalence of ubiquitously connected smart devices and the Internet of Things are driving the development of intelligent applications, turning data and information into actions that create new capabilities, richer experiences, and unprecedented opportunities. As these applications become increasingly powerful, they are also turning to be more computationaldemanding, making it difficult for resourceconstrained mobile devices to fully realize their functionalities solely. Although mobile cloud computing [1] has provided mobile users with a convenient access to a centralized pool of configurable and powerful computing resources, it is not a “onesizefitall” solution due to the stringent latency requirement of emerging applications and often unpredictable network condition. In addition, as the mobile applications (e.g. mobile gaming and virtual/augmented reality) are becoming more datahungry, it would be laborious to transmit all these data over today’s already congested backbone network to the remote cloud.
As a remedy, Mobile Edge Computing (MEC) [2] has been recently proposed as a new computing paradigm to enable service provisioning in close proximity of user devices at the network edge, thereby enabling analytics and knowledge generation to occur closer to the data source and providing lowlatency responses. Such an edge service provisioning scenario is no longer a mere version but becoming a reality. For example, Vapor IO’s Kinetic Edge [3] places edge data centers at the base of cell towers and nearby aggregation hubs, thereby bringing cloudlike services to the edge of the wireless network. Kinetic Edge has started in Chicago and is rapidly expanding to the other US cities. It is anticipated that cloud providers, web scale companies, and other enterprises will soon be able to rent computation resources at these shared edge computing platforms to deliver edge applications in a flexible and economical way without building their own data center or trenching their own fiber.
However, how to effectively and efficiently deliver edge service in such a shared edge system faces many special issues. Firstly, the benefit of deploying application service at a certain edge server mainly depends on the number of edge task requests received from the users, yet this service demand is unknown to the Application Service Provider (ASP) before deploying applications at edge servers. What’s more complicated is that the service demand is uncertain in both temporal and spatial domains, i.e., the demand pattern of an edge site varies across the time and the demand patterns at geographically distributed edge sites may not replicate a global demand pattern. How to learn the service demand pattern for each edge site precisely with coldstart (i.e., no prior knowledge available) is the very first step toward efficient edge service provisioning. Secondly, to deploy services at the edge sites, ASP needs to rent a certain amount of computation resource to host its applications. While renting a sufficient amount of computation resource at every possible edge site can deliver the best Quality of Service (QoS), it is practically infeasible especially for small and starting ASPs due to the prohibitive budget requirement. In common business practice, an ASP has a budget on the operating expenses and desires the best performance within the budget [4]. This means that the ASP can only rent limited computation resource at a limited number of edge sites and hence, which edge sites to deploy applications and how much computation resource to rent at these sites must be judiciously decided to optimize QoS given the limited budget. Thirdly, the service demand estimation and edge resource rental are not two independent problems but closely intertwined during online decision making. On the one hand, renting computing resource and deploying application service at an edge site allows the ASP to collect historical data on the received service demand for better demand estimation. On the other hand, accurate demand estimations help the ASP optimize its computation resource rental and achieve a higher QoS. Therefore, an appropriate balance should be made between these two purposes to maximize the utility of ASP in the long run. The main contributions of this paper are summarized as follows:
1) We formulate an edge resource rental (ERR) problem where ASP rents computation resource at edge servers to host its applications for edge service provisioning. ERR is a threefold problem in which the ASP needs to (i) estimate the service demand received by edge sites with coldstart, (ii) decide whether edge service should be provided at a certain edge site, and (iii) optimize how much resource to rent at edge sites to maximize ASP’s utility under a limited budget.
2) An online decisionmaking algorithm called Contextaware Online Edge Resource Rental (COERR) is proposed to solve the ERR problem. COERR is designed in the framework of Contextual Combinatorial Multiarmed Bandit (CCMAB). The “contextual” nature of COERR allows the ASP to observe the sideinformation (context) of edge sites for the service demand estimation and the “combinatorial” nature of COERR enables the ASP to rent computation resource at multiple edge servers for utility maximization.
3) We analytically bound the performance loss, termed regret, of COERR compared to an Oracle benchmark that knows the expected service demand of each edge site. The regret bound is first given in a general form available for arbitrary estimators and algorithm parameters. A specific sublinear regret upper bound is then derived in a concrete setting by specifying the applied service demand estimators and algorithm parameters, which not only implies that COERR produces asymptotically optimal rental decisions but also provides finitetime performance guarantee.
4) We carry out extensive simulations using the realworld service demand trace in Grid Workloads Archive (GWA) [5]. The results show that the proposed COERR algorithm significantly outperforms other benchmark algorithms.
The rest of this paper is organized as follows: Section II reviews related works. Section III presents the system model for edge resource rental problem. Section IV designs the contextaware online edge resource rental (COERR) algorithm and provides analytical performance guarantee. Section V discusses the extension of COERR when applied with approximated solutions for perslot utility maximization. Section VI shows the experiment results of the proposed algorithm on a realworld service demand trace, followed by the conclusion in Section VII.
Ii Related Work
Driven by the promising properties and tempting business opportunities, Mobile Edge Computing (MEC) [2, 6] is attracting more and more attention from both academia and industry. Various works have studied from different aspects of MEC, including edge platform design [7] for integrating edge computing platform into edge facilities (e.g., Radio Access Network [7]), computation offloading [8] for deciding what/when/how to offload tasks from user’s mobile devices to edge servers, and edge orchestration [9] for coordinating the distributed edge servers. However, these edge computing topics all rest on the assumption that the computing resources and capabilities have been provisioned to ASP at the edge sites. By contrast, this paper focuses on the problem that how should ASP rent computation resource and place edge applications among many possible edge sites such that the users can better enjoy the edge service access.
Service placement in shared edge systems has been studied in many contexts in the past. Considering content delivery as a service, many prior works study caching content replicas in traditional content delivery networks (CDNs) [10] and, more recently, in wireless caching systems such as small cell networks [11]. However, content caching concerns only data content caching given storage constraints at edge facilities while placing edge applications needs to take into account the computation resources at the edge servers. Service placement for MEC is recently studied in [12], where the authors consider a hierarchical edgecloud system and an online replacement policy is designed to minimize the cost of forwarding requests to Cloud and downloading new services to edge server. While [12] uses a placinguponrequest model, our work is in a proactive manner where the deploy applications at the beginning of each decision cycle based on service demand estimation. The authors in [13, 14] investigate service placement/caching to improve the efficiency of edge resource utilization by enabling cooperation among edge servers. However, these works assume that the service demand is known a priori whereas the service demand pattern in our problem has to be learned over time. A learningbased edge service placement is proposed in [15], which uses bandit learning similar to the framework in this paper. However, it only addresses the problem of where the service should be placed but does not optimize how much computation resource needs to rent at edge servers. However, in practice, an ASP has to decide the amount of computing resource to rent when placing the service application. This paper helps the ASP to determine the amount of computing resource to rent at edge servers when placing the edge service. In addition, [15] uses a simple sample mean to estimate the service demand, but we generalize our algorithm to work with an arbitrary estimator.
MAB algorithms have been studied to address the tradeoff between exploration and exploitation in sequential decision making under uncertainty [16]. The classic MAB algorithm, e.g. UCB1, concerns with learning the single optimal action among a set of candidate actions with unknown rewards by sequentially trying one action each time and observing its realized noisy reward [17]. Combinatorial bandits extends the basic MAB by allowing multipleplay each time (i.e. renting computation resources at multiple edge servers under a budget in our problem) [18] and contextual bandits extends the basic MAB by considering the contextdependent reward functions [19, 20]. While both combinatorial bandits and contextual bandits problems are already much more difficult than the basic MAB problem, this paper tackles the even more difficult CCMAB problem. Recently, a few other works [21, 22, 23] also started to study CCMAB problems. However, these works make assumptions that are not suitable for our problem. [21] assume that the reward of an individual action is a linear function of the contexts, which is less likely to be true in practice. In [22], the exact solution to its perslot problems can be easily derived, however, in our problem, the perslot problem is a Knapsack problem with conflict graph (KCG) whose optimal solution cannot be efficiently derived and hence, we also investigate the impact of approximation solution on CCMAB framework. Though [23] also considers the approximation solutions, it is given for a special case (greedy algorithm for submodular function maximization). The key difference of our CCMAB is that it does not simply decide which arms to pull (i.e., which edge sites to place applications), it also chooses the configuration of arms (i.e., how much resource to rent at each edge site).
Iii System Model
Iiia Network Structure and Resource Rental
We consider a typical scenario where edge computing is enabled in a heterogeneous smallcell network as illustrated in Fig.1. The small cell network consists of a set of Small Cells (SCs), indexed by , and a macro base station (MBS), indexed by . Each SC has a smallcell base station (SBS) equipped with a shared edge computing platform. The edge platforms use virtualization technology for flexible allocation of computation resource, e.g., CPU frequency and RAM. ASPs sign contracts with SBSs to rent computation resource at colocated edge servers in order to host their application and provide service access to subscribed users. SBSs provide SoftwareasaService (SaaS) to ASPs, managing computation resources requested by ASPs using virtualization, while the ASPs maintain its own user data serving as a middleman between end users and SaaS Providers. As such, SBSs charge ASPs for the amount of requested computation resource. Besides the SBSs and edge servers, there also exists an MBS that provides ubiquitous radio coverage and is connected to the cloud server in case that edge service is not available for users.
The contract of edge resource rental is signed for a fix length of time span (e.g., 3 hours or half a day). Therefore, we discretize the operational timeline into time slots. At the beginning of each time slot, ASP determines the amount of computation resource to rent from SBSs for application service deployment. In particular, we consider the processor capacity (i.e. CPU frequency) as the key component of computing resource since it decides the processing delay of tasks at edge servers as considered in most existing works [24, 25]. The other resource components, e.g., RAM, storage, I/O, are matched to the rented processor capacity. Let denote the processor capacity rented by the ASP at SBS in time slot , where is the minimal rental contract (i.e. the ASP have to at least rent to set up a virtualized computing platform at SBS ) and is the maximum computation resource that can be rented by the ASP at SBS . Based on the stateoftheart virtualization technologies, the resource allocation at edge server is often realized using containerization or virtual machine and hence we assume that each SBS discretized their computation resource into containers or VMs. In this case, the feasible rental decisions at SBS can be collected in a rental decision set . Let
be the computation resource rented by ASP at all SBSs. The vector
is referred as ASP’s rental decision in .Each SBS sets a price for its computation resource. Let denote the price charged by SBS if ASP rents processor capacity at SBS , where is a nondecreasing mapping function^{1}^{1}1The price mapping could be nondecreasing linear/nonlinear functions or tables and each SBS may have its own mapping. that determines the price for computation resource . Due to the limited budget, the resource rental decision of ASP must satisfy the budget constraint , where is ASP’s budget. Note that , , , and may possibly vary across the time slots due to certain auction strategies or stochastic resource scheduling policies carried out by SBSs. To keep the system model simple, we assume that these parameters are constants. However, the proposed method is also compatible with timevarying system parameters. In addition, we consider edge resource rental problem for one ASP in this paper, the edge system may need strategies, e.g., firstcomefirstserved or matching algorithms [26] to coordinate multiple ASPs.
Besides the edge servers, ASP also possesses an entrepreneur cloud or a configured platform at the commercial cloud to provide ubiquitous application service. The processor capacity of the cloud service is denoted by . Usually, we will have .
IiiB Service Delay for Edge Computing
During a time slot, users in the edge system have computation tasks to be offloaded to edge/cloud servers for processing. We assume that the input data size of one task is in bits and the number of required CPU cycles to process one task is . If a user device is covered by an SBS, it can offload computational tasks to the edge server colocated with the SBS. The service delays are incurred for completing these tasks using edge computing; it consists of two main parts: transmission delay and processing delay.
IiiB1 Transmission delay
User’s tasks are sent via the onehop wireless connection to SBSs. Note that the time scale of edge resource rental cycles (e.g., half a day) is much larger than that of task offloading cycle (few seconds), an SBS may receive a large number of tasks, indexed by , in time slot . For each task , the uplink transmission rate can be calculated by Shannon Capacity:
(1) 
where is the allocated bandwidth, is the transmission power of the user, is the channel gain, and are the inter/intracell interferences, and is the noise power. It is difficult to know exactly the data rate for transmitting each task during the planning stage due to unpredictable interference, fading, etc. Instead of considering the transmission rate for each task, we operate SBSs to work on an expected transmission rate in each time slot , i.e., we expect equals . Such an requirement on the expected transmission rate can be satisfied by stateoftheart spectrum allocation method [27]. We denote by the expected uplink transmission rate of SBS in time slot . Then, the expected transmission delay for a user to transmit one task to SBS is . To keep the system model simple, we assume the data size of task result is small and therefore its downloading time can be neglected. However, adding result downloading time does not make a big difference and our algorithm can still be applied.
IiiB2 Processing delay
The processing delay of edge computing is determined by the processor capacities rented by the ASP at SBSs. We assume that the edge server admits at most tasks in a time slot to avoid overloading and queuing delays at edge servers. Given the processor capacity , the processing delay for one task at SBS can be obtained as: . Therefore, the service delay for one task at SBS is:
(2) 
IiiC Service Delay for Cloud Computing
If a user has no accessible SBSs or its task request is rejected by an edge server due to overloading, it then has to offload its tasks to Cloud via an MBS. Similarly, the service delay for Cloud computing also consists of transmission delay and processing delay.
IiiC1 Transmission delay
Besides the wireless transmission delay incurred by sending the tasks from users to MBS, the offloaded tasks have to travel through congested backbone Internet, which incurs large backbone transmission delay, to reach the remote cloud server. We assume, similar to SBSs, that the MBS applies stateoftheart channel/power/interference management strategies to guarantee an expected wireless transmission rate in time slot . Therefore, the expected wireless transmission delay for one task is
. The backbone transmission delay is mainly determined by the backbone transmission rate, which is a random variable based on the network condition. Let
be the expected backbone transmission delay and be the roundtrip time in time slot , then the expected backbone transmission delay for one task can be obtained as . Taking into account all the above components, the expected transmission delay for one task using cloud computing can be obtained by: .IiiC2 Processing delay
Since the cloud server has unlimited computation resources, we assume that the cloud server has no admission constraints. Recall that the processor capacity allocated for each task at ASP cloud is , the processing delay for one task using cloud computing can be expressed easily as .
The expected service delay for one task using cloud computing is therefore:
(3) 
We assume that the maximum service delay for one task is bounded, i.e., . This is a practical assumption in edge computing since if the service delay of edge/cloud computing is too large the mobile devices can always choose to process the tasks locally, which guarantees a service delay .
IiiD ASP Utility Function
The applications deployed at the network edge improve QoS for users by providing lowlatency response. The ASP derives utilities from the improved QoS, which is defined as delay reduction achieved by deploying services at edge servers. Let
(4) 
be the delay reduction of a task processed by SBS instead of Cloud and let be the service demand within the coverage of SBS . Note that does not equal the service demand received by SBS since task requests will be offloaded to the cloud server if the ASP rents no computation resource at SBS . Therefore, the total utility achieved by SBS is:
(5) 
where is the maximum service demand can be processed by an SBS depending on the amount of rented computing resource . Intuitively, more tasks can be process at an SBS when more computing resource are rented. Therefore, the function should be nondecreasing on . Notice that the service delay for a task is bounded by , we have . The total ASP utility is
(6) 
where collects the service demands within the coverage of all SBSs.
IiiE Problem Formulation
The edge resource rental (ERR) problem for ASP is a sequential decisionmaking problem. The goal of ASP is to make rental decision to maximize the expected utility up to time horizon . Since the service demand of SBSs is not known to the ASP when making its rental decision, we write it as that needs to be estimated at the beginning of each time slot. Therefore, the edge resource rental problem can be written as:
(7a)  
s.t.  (7b)  
(7c)  
(7d) 
There are several challenges to be addressed and should be addressed simultaneously to solve the ERR problem: (i) One of the key challenges of ERR is to make precise service demand estimation, such that the derived rental decision is able to produce the expected utility when implemented. Since the algorithm is run with cold start, the algorithm should also collect the historical data for making estimations. Note that the service demand received by an SBS is revealed to ASP only when the application is deployed () at the SBS. Though the service demand received by the cloud server can also be observed, it does not help much to learn the service demand of a specific SBS due to the fact that the location information of users is usually veiled to ASP due to the privacy concerns. Therefore, the rental decision making should take into account the data collection for demand estimation. (ii) With the service demand estimations, how to optimally determine the rental decision at each SBSs given the limited budget should be carefully considered. (iii) Since the rental decisions are made based on the estimated service demand, the accuracy of demand estimation will have a deterministic impact on ASP’s utility. The ASP needs to decide when the estimation is accurate enough for guiding the computation resource rental and when more data should be collected to produce a better demand estimation. In the next section, we propose an algorithm based on the multiarmed bandit framework to address the mentioned issues at the same time.
Iv Edge Resource Rental as Contextual Combinatorial MultiArmed Bandits
In this section, we formulate our ERR problem as a Contextual Combinatorial MultiArmed Bandit (CCMAB). The problem is “combinatorial” because ASP will rent computation resource at multiple SBSs under a budget constraint. The problem is “contextual” because we will utilize context associated with SBSs to infer their service demand. In general, the contextual bandit is more applicable than noncontextual variants as it is rare that no context is available [28]. In our problem, the service demand received by an SBS depends on many factors, which are collectively referred to as context. For example, the relevant factors can be the user factor (e.g. user population, user type), temporal factor (e.g., time in a day, season), and external environment factors (e.g., events such as concerts). This categorization is clearly not exhaustive and the impact of each single context dimension on the service demand is unknown. Our algorithm learns to discover the underlying connection between context and service demand pattern over time.
In CCMAB, ASP observes the context of SBSs at the beginning of each time slot before making the rental decision. Let be the context of SBS observed in time slot , where is the context space. Without loss of generality, we assume that the context space is bounded and hence can be denoted as , is the number of context dimension. The context of all SBSs are collected in . The service demand received by SBS is a random variable parameterized by the context . Let be the mapping that maps a context to SBS ’s service demand distribution . We rewrite the service demand vector in a contextaware form: . In addition, we let be the expected value of the service demand distribution . The vector collects the expected service demands for all SBSs given context .
Iva Oracle Solution and Regret
Before proceeding with the algorithm design, we first give an Oracle benchmark solution to the ERR problem by assuming that the ASP knows exactly the contextaware service demand . In such a case, the ERR problem can be decoupled into independent subproblems, one for each time slot , as below:
(8a)  
s.t.  (8b)  
(8c)  
(8d) 
where the service demand estimation is replaced by
. The above subproblem is an combinatorial optimization problem with Knapsack constraints. The optimal solution to each subproblem can be derived by
bruteforce if the size of action space is moderate. For larger problems, the ASP may use commercial optimizers, e.g., LINDO [29], CPLEX [30], to obtain optimal solutions. For the coherence, we here skip the details for solving the subproblems and denote the optimal Oracle solution for each subproblem in time slot as . The collection is the Oracle solution to ERR problem. Later in Section V, both exact and approximate solutions for optimization problem in (8) will be discussed using the framework of Knapsack problem with Conflict Graphs (KCG). In addition, the impact of error due to approximation on the performance of the proposed algorithm will be analyzed.However, in practice, the ASP does not have a priori knowledge on the users’ service demand, and therefore the ASP has to make rental decisions based on the service demand estimation in each time slot. An online decisionmaking policy designs certain strategies to choose a rental decision based on the estimation . The performance of designed policy is measured by utility loss, termed regret, compared to the utility achieved by Oracle solution. The expected regret of a policy is defined by:
(9) 
Here, the expectation is taken with respect to the decisions made by the decisions made by the decisionmaking policy and the service demand distribution over context.
IvB Contextaware Online Edge Resource Rental Algorithm
Now, we are ready to present our online decisionmaking algorithm called Contextaware Online Edge Resource Rental (COERR). The COERR algorithm is designed in the framework of CCMAB. In each time slot , ASP operates sequentially as follows: (i) ASP observes the contexts of all SBSs . (ii) ASP determines its rental decision based on the observed context information in the current time slot and the knowledge (i.e., the connection between SBS context and service demand) learned from the previous time slots. (iii) The rental decision is applied. If , the users within the coverage of SBS can offload computation tasks to SBS for edge processing. (iv) At the end of the time slot, the number of tasks received by rented SBS (i.e. ) is observed , which is then used to update the service demand estimation for the observed context of SBS . The users who cannot access the edge service will offload tasks to the cloud server.
The context of SBSs is from a continuous space and hence there can be infinitely many contexts for an SBS. It would be extremely laborious, if not impossible, to collect historical demand records and learn a service demand distribution for each possible context. To make the contextaware demand estimation tractable, COERR groups similar contexts and learns the demand pattern for a group of contexts instead of learning the service demand pattern for each context . The rationale behind this strategy is the following intuition: an SBS will have similar service demand when its contexts are similar. This is a natural assumption in practice and is used in many existing MAB algorithms [22, 23] to facilitate the learning of contextaware service demand. To be specific, COERR groups contexts by partitioning the context space into small hypercubes. The context space is split into hypercubes give the time horizon , where each hypercube is dimensional with identical size . Here, is an important input parameter to be designed to guarantee algorithm performance. These hypercubes are collected in the context partition . Since the edge system is geographically distributed, different SBSs may exhibit distinct service demand patterns for the same context because of the SBS locations (e.g., considering the time factor, an SBS located in a school zone may have higher service demand during daytime and lower service demand during night while an SBS located in a residential area tends to have lower service demand during daytime and higher service demand at night). Therefore, ASP should learn the service demand for each SBS.
Now, a key issue is estimating the service demand pattern for context hypercubes at each SBS. Note that COERR runs with coldstart and hence it needs to collect the historical service demand data for context hypercubes by renting computation resource at SBSs and observing the received service demand in order to produce accurate demand estimation. Specifically, (i) for each SBS , ASP keeps counters , one for each hypercube , up to time slot , indicating the number of times that ASP rents computation resource at SBS (i.e., ) when the context of SBS belongs to hypercube , i.e. ; (ii) ASP keeps an experience for hypercube at each SBS up to time slot storing the contextdemand pair when the rental decision is taken and the context of SBS satisfies . Fig.2 illustrates an example of context space partition and counter/experience update.
Given the experience , the service demand estimation for SBSs with context in hypercube is obtained by an estimator :
(10) 
We do not specify the estimator used in COERR since the proposed algorithm is compatible with a variety of estimators. Note that storing all the experience may be unnecessary for certain estimators that can be updated in a recursive manner, e.g., recursive Bayesian estimator [31] and recursive least square estimator [32]
. Usually, a certain amount of historical data is required for an estimator to produce an accurateenough estimation, which is theoretically characterized by Probably Approximately Correct (PAC)
[33] as follows:Assumption 1 (PAC Property).
For an arbitrary hypercube at a SBS , the estimator satisfies Probably Approximately Correct (PAC) property below:
(11) 
where (the expectation is taken on the distribution of context in hypercube ) and .
The term is assumed to decrease as increases, i.e. , which ensures that more historical data will produce a better estimation. The PAC property is critical in guaranteeing the performance of COERR.
It is worth empathizing that service demand estimation, though important, is not the major challenge to conquer since we can always acquire enough data for each hypercube to produce an accurate estimation if the time horizon is large. A more challenging issue is to decide in each time slot whether the current demand estimation is goodenough to guide the edge resource rental (referred as exploitation) or more service demand data should be collected to improve the demand estimation for a certain hypercube (referred as exploration). COERR balances the exploration and exploitation phases during online decisionmaking in order to maximize the utility of ASP up to a finite time horizon . In addition, COERR also smartly decides the amount of computation resources to rent at different phases to achieve different purposes: in the exploration, COERR utilizes the budget to collect as much service demand data as possible to improve the estimation while in the exploitation, COERR aims to maximize the ASP utility under the budget constraint.
Algorithm 1 presents the pseudocode of COERR. In each time slot , ASP first observes the context of all SBSs in and determines for each SBS the hypercube to which belongs to, i.e. holds. The hypercubes of all SBSs are collected in . The estimated service demand for SBS in time slot is obtained by . Estimations of all SBSs are collected in . COERR is in either an exploration phase or an exploitation phase. To determine the phase for current time slot, the algorithm checks whether current contexts of SBSs have been sufficiently explored. To this end, we define the set of underexplored SBSs based on the contexts observed and counters in time slot :
(12) 
where is a deterministic, monotonically increasing control function, which is an input of COERR to determine whether the amount of collected historical data in hypercube is large enough to produce an accurate service demand estimation for exploitation in time slot. has to be designed appropriately based on the estimator property and the parameter to balance the tradeoff between exploration and exploitation (discussed later in Section IVD).
IvB1 Exploration
If the underexplored set is nonempty, i.e., , COERR enters the exploration phase. We may have two cases in exploration: (i) If , COERR can explore only a subset of SBSs in . Intuitively, we want to collect service demand data for more underexplored SBSs. Therefore, COERR rents only at SBSs such that the edge service can be deployed at more underexplored SBSs. Specifically, COERR selects underexplored SBSs sequentially as follows:
(13) 
If the SBS defined in (13) is not unique, ties are broken arbitrarily. The selection ends if the iteration satisfies and . The rental decision of ASP at SBS is where . The selection in (13) ensures that the number of underexplored SBSs with is maximized. (ii) If , COERR rents computation resource at all underexplored SBSs in . Note the there is still budget left. The rest budget is used to rent computation resources at explored SBSs based on the current estimation . The rental decision of ASP in this case can be obtained by:
(14a)  
s.t.  (14b)  
(14c) 
Constraint (14b) ensures that the computation resource is rented at underexplored SBSs.
IvB2 Exploitation
If the set of underexplored SBSs is empty, i.e., , then COERR enters the exploitation phase in which an optimal rental decision is determined based on the current service demand . The rental decision is obtained by solving:
(15) 
IvC Performance Analysis
Next, we give an upper performance bound of COERR in terms of the regret. The regret upper bound is derived based on the natural assumption that the service demands received by an SBS are similar when its contexts are similar. This assumption is formalized by the following Hölder condition [22, 23] for each SBS .
Assumption 2 (Hölder Condition).
For an arbitrary SBS , there exists , such that for any , it holds that , where denotes the Euclidean norm in .
Note that this assumption is needed for the analysis of regret but the proposed algorithm can still be applied if it does not hold true. In that case, however, a regret bound might not be guaranteed. We aim to design the input parameters , in the proposed algorithm to achieve a sublinear with . A sublinear regret bound guarantees that the proposed algorithm has an asymptotically optimal performance since holds. This means that the online decision made by COERR converges to the Oracle solution.
Since any time slot is either in exploration or exploitation, we divide the regret two parts , where , are the regrets due to exploration and exploitation, respectively. These two parts will be bounded separately to get the total regret bound. We first give an upper bound for exploration regret.
Lemma 1.
(Bound of .) Given the input parameters and , the regret is bounded by:
where and .
Proof.
Suppose time slot is an exploration phase, then according to the algorithm design, the set of underexplored SBSs is nonempty. Therefore, there must exist and a hypercube satisfies . Clearly, there can be at most exploration phases in which computation resources at SBS are rented by the ASP when its context satisfies .
In each of these exploration phase, let be the maximum utility loss for one task due to a wrong rental decision at SBS . Recall that the pertask delay reduction is bounded by and therefore it holds that . Let , then the service demand received by SBS must be bounded by , the maximum utility loss at a SBS is bounded by . Let , the maximum number of SBSs with the rental decision is bounded by . Therefore, the regret incurred in one time slot is bounded by . Since there are at most exploration phases in , the regret incurred by the exploration is bounded by:
The proof is completed. ∎
Lemma 1 shows that the order of is determined by the number of hypercubes in partition and the control function .
Lemma 2.
(Bound of .) Given the input parameter and , if the Hölder condition holds true and the additional condition is satisfied with some for all , then is bounded by:
(16) 
Lemma 2 indicates that, besides the input parameters and , the regret incurred in exploitation also depends on the estimator’s PAC property . Based on the above two Lemmas, we will have the following Theorem for the upper bound of .
Theorem 1.
Given the input parameter and , if the Hölder condition holds true and the additional condition is satisfied with some for all , then is bounded by:
The regret upper bound in Theorem 1 is given with any input parameters , and applied estimators. In addition, there is an additional condition should be satisfied when designing algorithm parameters . However, we cannot give a specific design of here to guarantee the sublinear regret since it depends on the PAC property of the applied estimator. In the next subsection, we will design input and based on the PAC property of a Maximum Likelihood Estimator, which satisfy the additional condition posed in Theorem 1 and guarantee a sublinear regret . Other parameters , , are not determinative which will be later shown in parameter design.
IvD Example: Maximum Likelihood Estimator
Note that the regret depends partially on the estimator property and hence we need to specify the estimators used by SBSs before designing the algorithm parameters and . Here, we take Maximum Likelihood Estimation (MLE) as an example. The purpose of a MLE estimator is to estimate the expected service demand for hypercube . We assume that the historical service demands collected in
follow a normal distribution denoted by
, whereis the standard deviation. Then, an unbiased estimation for
using MLE is:(17) 
Note that the normal distribution of historical service demand in is only used for deriving the above MLE estimator. COERR can be applied other historical data distributions, but the unbiased MLE estimator can be different accordingly. The MLE estimator in (17) guarantees the following PAC condition based on the ChernoffHoeffding bound [35]:
(18) 
and it holds that . Now, we can design and to ensure a sublinear regret of COERR.
Theorem 2 (Regret upper bound).
Let and . If the proposed algorithm runs with these parameters, SBSs use MLE for estimation, and the Hölder condition holds true, then the leading order of the regret is:
The leading order of regret upper bound given in Theorem 2 is sublinear. In addition, the regret bound is valid for any and therefore providing a bound on the performance loss for any time horizon. This also can be used to characterize the convergence speed of COERR. However, we see that the order of upper bound regret can be close to 1 when the dimension of context space is large. In this case, the learner may need to apply dimension reduction techniques based on empirical experience to cut down the context dimension.
Though the algorithm parameter and the regret upper bound is given based on a known time horizon , COERR can be easily extended to work with unknown time horizon with the assistance of doublingtrick [36, 37]. The key idea of doublingtrick is to partition the time into multiple phases with doubling length (), e.g., if the length of phase is , then the length of th phase is . In each phase, COERR is run from scratch without using any information from the previous phase. A salient property of doublingtrick is that it does not change the order of the upper regret bound.
IvE Complexity and Scalability
The memory requirement of COERR is mainly determined by the number of counters and experiences maintained for hypercubes. Since the counter is an integer for each hypercube, its memory requirement is determined by the number of created hypercubes. The experience is a set of observed service demand records up to time slot which needs a higher memory requirement. However, storing all historical data is actually unnecessary since most estimators, including MLE in (17), can be updated in a recursive manner. Therefore, the ASP only needs to keep current service demand estimation for a hypercube which is a floating point number. If COERR is run with the parameters in Theorem 2, the number of hypercubes is . Hence, the required memory is sublinear in the time horizon . This means that when , COERR would require infinite memory. Fortunately, in the practical implementations, ASP only needs to keep the counters and experiences of hypercubes which at least one of the observed contexts belongs to. Therefore, the number of counters and experiences to keep is actually much smaller than the analytical requirement.
V Extension: Solutions for Subproblems
Va Exact and Approximate Solutions for Subproblems
In this section, we discuss in detail the solutions for optimization problems in (8), (14), and (15). Since these optimization problems have the same form, we take the Oracle subproblem (8) as an example. Note that the problem is solved for each time slot , the time index is dropped in this section for ease of notation. The subproblem is a combinatorial optimization which can be formulated as a Knapsack problem [38]. The Knapsack problem is a classic combinatorial optimization: given a set of items, each with a weight and value, determine the items to include in a collection such that the total weight is less than or equal to a given limit and the total value is as large as possible. In ERR subproblems, each rental decision at a SBS is an item in the Knapsack problem: for an “item” , its “weight” is the rental cost and its “value” is the utility gain ( is the context of SBS in a certain time slot), and the limit is ASP budget . However, the standard formulation of Knapsack problem cannot exactly capture the ERR problem since the ASP can only take one rental decision for one SBS, which means items associated to one SBS cannot be included at the same time. Such an extension of standard Knapsack problem with addition conflict restrictions, stating that from a certain set of items at most one item can be selected, is known as the Knapsack problem with conflict graph (KCG). In the following, we formulate the subproblem as KCG problem and discuss its solutions.
These conflict constraints is represented by a undirected graph .

(Vertices): each rental decision at a SBS corresponds to a vertex in the undirected graph .

(Edges): for an arbitrary pair of vertices , add an edge between and if are rental decisions for a same SBS.
The vertices/items in are indexed by and for a vertex , we define its weight as and its value as . In addition, we introduce an indicator for each vertex indicating whether item is taken () or not (). Then, the KCG for subproblem can be written as:
(19a)  
s.t.  (19b)  
(19c)  
(19d) 
KCG is a wellinvestigated problem. Several existing algorithms, e.g., BranchandBound [39], can be directly used to derive an exact solution for KCG problem. If an exact solution for each KCG/subproblem is obtained. Then, COERR can provide the expected performance as analyzed in the previous section. However, these exact algorithms can be computationalexpensive when the number of items is large and therefore their runtime may become a bottleneck in certain applications (though the runtime is less likely to be an issue in our ERR problem since the time scale of the considered problem is relatively large, e.g., several hours). To facilitate the solution of KCG, approximation algorithms are studied to efficiently derive approximate solutions in polynomial runtime. Next, we will discuss the performance of the proposed algorithm when approximate solutions are derived for subproblems.
VB Performance Analysis with Approximate Solutions
We assume that the approximation algorithm guarantees a performance bound (approximation) compared to the optimal solution as define below:
Definition 1 (approximation).
An approximation algorithm is a approximation if the objective value achieved by the approximate solution satisfies where is the optimal object value achieve by a optimal solution .
Definition 1 indicates that a approximation algorithm achieves no less than of the optimum. Many existing approximation algorithms can be directly applied, e.g., Fully Polynomial Time Approximation Schemes (FPTAS) [40], to solve the KCG problems. The assumption of approximation prevents the approximate solution from being arbitrarily bad and enables the performance analysis for COERR.
Now we are ready to analyze the performance of proposed algorithm with approximate solution. From Theorem 2, we see that the leading order of the regret upper bound is mainly determined by the exploration regret . A sublinear upper bound of exploration regret is derived by limiting a sublinear number of time slots that COERR enters the exploration phase with properly designed and . Note that COERR is either in exploration or exploitation, a sublinear number of exploration slots indicates that the number of exploitation slots is nonsublinear. In this case, it is difficult, if not impossible, to guarantee a sublinear regret with approximate solutions even if we have perfect estimation in each exploitation: due to the approximate, the worst performance loss of approximate solution with perfect estimation in one time slot is . Let be number of exploitation slots which is nonsublinear, the upper bound of exploitation regret (with approximate solutions) must be larger than which is also nonsublinear. To address this problem, we slightly change the definition of regret by defining the regret below:
(20) 
The rental decision is still the optimal Oracle solution for subproblems in (8). The rental decision is the online decisions made by the proposed algorithm with approximation algorithm, i.e., solutions to the optimization problem in (14) during exploration and the optimization problem in (15) is approximated by a approximation algorithm. In (20), the online decisions derived by COERR with approximation algorithm is actually compared by the lower bound of approximated Oracle solution (i.e., Oracle also use a approximation algorithm to solve the subproblem in (8). Such a definition of regret is often used in MAB framework where optimal solution cannot be derived in each round [23].
Theorem 3 (regret upper bound).
If the proposed algorithm is run with parameters and conditions given in Theorem 2 and a approximation is applied for optimization, then the leading order of regret is:
Theorem 3 indicates that our algorithm is able to work well even if the subproblem in each time slot can only be approximately solved and a sublinear regret can be achieved based on the performance guarantee of approximation algorithms.
Vi Experiments
In this section, we carry out systematic experiments in a realworld dataset to verify the efficacy of the proposed algorithm.
Via Experiment Setup
We use the realword service demand trace collected by the Grid Workloads Archive (GWA) [5]. The GWA datasets record the task requests received by largescale multisite infrastructures (girds) that provide computational support for eScience. The experiment is mainly run on the GWA dataset, AuverGrid, which collects around 400,000 task requests of 5 grids. To fit the AuverGird data in our ERR context, we assume each grid corresponds to an SBS in the edge network. In some parts of the experiments, we combine other GWA datasets with AuverGrid to increase the number of sites and show the impact of SBS numbers on the algorithm performance. Each task request record has a “SubmitTime” (in second) that indicate the time of task arrival and a “RunSiteID” that indicates the site for task execution. The rental decision cycle is set as 3 hrs. With this information, we are able to analyze the service demand trace at each SBS. Fig.3(a) depicts the service demand trace of three SBSs. It can be observed that the demand patterns are different at different SBSs and hence it is necessary to learn the service demand pattern for each SBS.
The context space of SBSs has two dimension: “time in a day” and “daily report demand”. The context “time in a day” indicates the time when a rental decision is made, and the context “daily report demand” is the total service demand received by a SBS in the previous day which is provided by the site daily report. Fig.3(b) shows the expected service demand of hypercubes in the context partition of Site 1. We see that the service demand is closely related to the considered contexts. The optimization problems in (14), and (15) are transformed into KCG and solved using BrunchandBound algorithm [39]. The computing resource at edge server is discretized as Virtual Machines (VMs) and the rental decision is the number of VMs to rent at SBSs:
Comments
There are no comments yet.