I Introduction
Mobile cloud computing (MCC) supports mobile applications in resourceconstrained mobile devices by offloading computationdemanding tasks to the resourcerich remote cloud. Intelligent personal assistant applications are perhaps the most popular applications that rely on MCC, where the speech recognition engine that uses advanced machine learning technologies to function resides in the cloud server. Nowadays, mobile applications such as virtual/augmented reality and mobile gaming are becoming even more datahungry, latencysensitive and locationaware. For example, Google Lens is a realtime image recognition application that can pull up the right information (e.g. restaurant reviews and menus) in an interactive and responsive way as the user points his/her smartphone camera to objects (e.g. a restaurant) when passing by. However, as these applications become more prevalent, ensuring high quality of service (QoS) becomes very challenging due to the backbone network congestion, delay and expensive bandwidth
[1].To address these challenges, mobile edge computing (MEC) [2] has recently been proposed. The key idea of MEC is to move the computation resources to the logical edge of the Internet, thereby enabling analytics and knowledge generation to occur closer to the data sources. Such an edge service provisioning scenario is no longer a mere vision, but becoming a reality. Vapor IO [3] has launched Project Volutus [4] to deliver shared edge computing services via a network of micro data centers deployed in cellular tower sites. In a recent white paper [5], Intel envisions that its smart cell platform will allow mobile operators to sell IT realestate at the radio access network and to monetize their most valuable assets without compromising any of the network features. It is anticipated that Application Service Providers (ASPs) will soon be able to rent computation resources in such shared edge computing platforms in a flexible and economic way. Fig. 1 illustrates how the Google Lens application can leverage the shared edge computing platform to improve QoS.
While provisioning edge service in every possible edge site (i.e. base station) can deliver the best QoS, it is practically infeasible, especially for small and starting ASPs, due to the prohibitive budget requirement. In common business practice, an ASP has a budget on the operation expenses in mind and desires the best performance within the budget [6]. This means that an ASP will only be able to place edge services in a limited number of edge sites and hence, where to rent edge computation resources must be judiciously decided to improve QoS under the limited budget.
Deciding the optimal edge service placement policy faces significant challenges due to information uncertainty in both spatial and temporal domains. Firstly, the benefit of edge service provisioning primarily depends on the service demand of users, which can vary considerably across different sites. However, the demand pattern is usually unknown to the ASP before deploying edge service in a particular site, and may also vary substantially after frequent application updates. Because the demand pattern can only be observed at sites where the edge service is deployed, how to make the optimal tradeoff between exploration (i.e. to place edge service in unknown sites to learn the user demand pattern) and exploitation (i.e. to place edge service at highdemanding sites to maximize ASP utility) is a key challenge. Secondly, even in the same site, the service demand varies over time depending on who are currently in the mobile cell, what their preferences are, what mobile devices they use, time and other environmental variables. Collectively, this information is called the context information. Incorporating this valuable information into the edge service placement decision making, in addition to the plain number of devices, is likely to improve the overall system performance but is challenging because the context space can be huge. Learning for each specific context is nearly impossible due to the limited number of context occurrences. A promising approach is to group similar contexts so that learning can be carried out on the contextgroup level. However, how to group contexts in a way that enables both fast and effective learning demands for a careful design.
In this paper, we study the spatialtemporal edge service placement problem of an ASP under a limited budget and propose an efficient learning algorithm, called SEEN (Spatialtemporal Edge sErvice placemeNt), to optimize the edge computing performance. SEEN does not assume a priori knowledge about users’ service demand. Rather, it learns the demand pattern in an online fashion by observing the realized demand in sites where edge service is provisioned and uses this information to make future edge service placement decisions. In particular, SEEN is locationaware as it uses information only in the local area for each base station and is contextaware as it utilizes user context information to make edge service placement decisions.
The spatialtemporal edge service placement problem is posed as a novel Contextual Combinatorial Multiarmed Bandit (CCMAB) problem [7] (see more detailed literature review in Section II). We analytically bound the loss due to learning, termed regret, of SEEN compared to an oracle benchmark that knows precisely the user demand pattern a priori. A sublinear regret bound is proved, which not only implies that SEEN produces an asymptotically optimal edge service placement policy but also provides finitetime performance guarantee. The proposed algorithm is further extended to scenarios with overlapping service coverage. In this case, a disjunctively constrained knapsack problem is incorporated into the framework of SEEN to deal with the service demand coupling caused by the coverage overlapping among cells. We prove that the sublinear regret bound still holds. To evaluate the performance of SEEN, we carry out extensive simulations on a realworld dataset on mobile application user demand [8], whose results show that SEEN significantly outperforms benchmark algorithms.
The rest of this paper is organized as follows. Section II reviews related works. Section III presents the system model and formulates the problem. Section IV designs SEEN and analyzes its performance. Section V extends SEEN to the overlapping coverage scenario. Section VI presents the simulation results, followed by the conclusion in Section VII.
Ii Related Work
Mobile edge computing has attracted much attention in recent years [9, 10]. Many prior studies focus on computation offloading, concerning what/when/how to offload users’ workload from their devices to the edge servers or the cloud. Various works have studied different aspects of this problem, considering e.g. stochastic task arrivals [11, 12], energy efficiency [13, 14], collaborative offloading [15, 16], etc. However, these works focus on the optimization problem after certain edge services have been provisioned at the Internet edge. By contrast, this paper focuses on how to place edge service among many possible sites in an edge system.
Service placement in edge computing has been studied in many contexts in the past. Considering content delivery as a service, many prior works study placing content replicas in traditional content delivery networks (CDNs) [17] and, more recently, in wireless caching systems such as small cell networks [18]. Early works addressed the centralized cases where the demand profile is static or time invariant, and dynamic service placement in geographically distributed clouds is studied in [19] in the presence of demand and resource dynamics. Our prior works [20, 21] investigate collaborative service placement to improve the efficiency of edge resource utilization by enabling cooperation among edge servers. However, these works assume that the service demand is known a priori whereas the service demand pattern in our problem has to be learned over time. A learningbased content caching algorithm for a wireless caching node was recently developed in [22], which also takes a contextual bandit learning approach similar to ours. However, it considers the caching policies (i.e. which content to cache) in a single caching site whereas we aim to determine where to place edge service among multiple edge sites, which may have to maintain distinct context spaces. Importantly, we also consider the coupled decisions among multiple sites due to the possible overlapping coverage while content files in [22] are treated independently.
MAB algorithms have been widely studied to address the critical tradeoff between exploration and exploitation in sequential decision making under uncertainty [7]. The basic MAB setting concerns with learning the single optimal action among a set of candidate actions of a priori unknown rewards by sequentially trying one action each time and observing its realized noisy reward [23, 24]. Combinatorial bandits extends the basic MAB by allowing multipleplay each time (i.e. choosing multiple edge sites under a budget in our problem) [25, 26, 27] and contextual bandits extends the basic MAB by considering the contextdependent reward functions [28, 29, 30]. While both combinatorial bandits and contextual bandits problems are already much more difficult than the basic MAB problem, this paper tackles the even more difficult CCMAB problem. Recently, a few other works [31, 32] also started to study CCMAB problems. However, these works make strong assumptions that are not suitable for our problem. For instance, [31, 32] assume that the reward of an individual action is a linear function of the contexts. [22]
is probably the most related work that investigates contextual and combinatorial MAB for proactive caching. However, our work has many key differences from
[22]. First, [22] considers CCMAB for a single learner (a caching station) and maintains a common context space for all users. By contrast, our paper considers a multilearner case, where each learner (i.e. SBS) learns the demand pattern of users within its service range. More importantly, we allow each SBS to maintain a distinct locationspecific context space and collect different context information of connected users according to the user’s preference. Second, while [22] considers a bandit learning problem for a fixed size of content items, we allow our algorithm to deal with infinitely large user set. Third, we further consider an overlapped edge network and address the decision coupling among edge sites due to overlapped coverage.Iii System Model
Iiia Edge System and Edge Service Provisioning
We consider a heterogeneous network consisting of small cells (SCs), indexed by , and a macro base station (MBS). Each SC has a smallcell base station (SBS) equipped with a shared edge computing platform and thus is an edge site that can be used to host edge services for ASPs. The MBS provides ubiquitous radio coverage and access to the cloud server in case edge computing is not accessible. SBSs (edge sites) provide SoftwareasaService (SaaS) to ASPs, managing computation/storage resources (e.g. CPU, scheduling, etc.) to ensure endtoend QoS, while the ASP maintains its own user data, serving as a middleman between end users and SaaS Providers. As such, SBSs charge the ASP for the amount of time the edge service is rented. Fig. 2 gives an illustration for the considered scenario.
Specifically, computation and storage resource allocation in SBSs can be realized by containerization techniques [33], e.g., Dockers and Kubernetes [34]. The key advantage of containerization over the virtual machine technology is that it incurs much lower system overhead and much shorter launch time. For example, each SBS can set up a Docker Registry to store Dock images (i.e. a package that encapsulates the running environment of an application) locally. When the SBS is chosen to host the ASP’s application, it will pull up the Docker image for the corresponding application and configure the container in seconds [35]. Without loss of generality, this paper focuses on the service placement for one application. Due to the limited budget, the ASP can only rent up to SBSs, where we assume for simplicity that all SBSs charge the same price for a unit time.
Variable  Description 

a set of total SBSs  
the budget of ASP  
the set of SBSs selected in slot  
Oracle solution in slot  
user population in slot  
users covered by SBS  
service demand of user in slot  
input data size of one task  
required CPU cycles for one task  
the delay of completing one task for user at SBS  
the delay reduction of one task  
context space maintained by SBS  
dimension of context space monitored by SBS  
partition created on context space  
user ’s context observed by SBS ,  
contexts of all users in slot ,  
expected service demand for a user with context  
demand estimation for users with context in hypercube 
The operational timeline is discretized into time slots. In each time slot , ASP chooses a set of SBSs , where , for application deployment. This decision is referred to as the (edge) service placement decision in the rest of this paper. Let be the user population served by the entire network in time slot and let be the user population covered by SBS . The user population in the considered network can vary across time slots because of the user mobility. Users can also move within a time slot but we assume that the UserSBS association remains the same within a time slot for simplicity. We consider that the service placement decisions are made on the scale of minutes so that frequent reconfiguration of edge services is avoided while the temporal variation of user population is largely captured.
We will first consider the case where the service areas of SBSs are nonoverlapping and then consider the case with overlapping service areas in Section V. In the nonoverlapping case, if SBS is chosen by the ASP to host the application in time slot , i.e. , then user in its coverage can offload data to SBS for edge computing. Otherwise, users in SBS ’s coverage have to offload data to the cloud (via the MBS) for cloud computing.
IiiB ASP Utility Model
The ASP derives utility by deploying edge computing services. On the one hand, the ASP has a larger utility if the edge computing service is deployed in areas where the service demand is larger as more users can enjoy a higher QoS. Let (in terms of the number of tasks) be the service demand of user in time slot , which is unknown a priori at the edge service placement decision time, and the service demand of all users in the network is collected in . On the other hand, the ASP derives a larger utility if edge computing service is deployed in areas where edge computing performs much better than cloud computing. In this paper, we use delay as a performance metric of edge/cloud computing. Since we focus on a single application, we assume that tasks have same input data size (in bits) and required CPU cycles .
IiiB1 Delay by Edge Computing
If the task of user is processed by SBS at the edge, then the delay consists of the wireless transmission delay and edge computation delay. The achievable wireless uplink transmission rate between user and SBS can be calculated according to the Shannon capacity: , where is the channel bandwidth, is the transmission power of user ’ device, is the uplink channel gain between user and SBS in time slot , is the noise power and is the interference. Therefore, the transmission delay of user for sending a task (i.e. bits of input data) to SBS is . We assume that the data size of the task result is small. Hence the downlink transmission delay is neglected. The computation delay depends on the computation workload and the edge server’s CPU frequency. To simplify our analysis, we assume that the edge server of SBS processes tasks at its maximum CPU speed . Therefore, the computation delay for one task is . Overall, the delay of processing user ’s one task at the edge site of SBS is .
IiiB2 Delay by Cloud Computing
If the task of user is processed in the cloud, then the delay consists of the wireless transmission delay, the backbone Internet transmission delay and the cloud computation delay. The wireless transmission delay can be computed similarly as in the edge computing case by first calculating the transmission rate between user and the MBS. The cloud computing delay can also be calculated similarly using the cloud server’s CPU frequency . However, compared to edge computing, cloud computing incurs an additional transmission delay since the data has to travel across the backbone Internet. Let be the backbone transmission rate and be the round trip time in time slot , then an additional transmission delay is incurred. Overall, the delay of processing user ’s one task in the cloud is .
Taking into account the service demand and the possible delay reduction provided by edge computing. The utility of ASP when taking service placement decision in time slot is:
(1) 
where is the delay reduction for user if it is served by SBS . The above utility function assumes that the tasks from a user are independent, i.e., the utility of a task is immediately realized upon the receival of its own results and does not need to wait until all tasks of the user are completed. Therefore, the ASP concerns the service delay for each individual tasks of users instead of measuring the delay for completing all the tasks of a user in time slot . Similar utility functions are also widely adopted in the existing literature [36]. The utility is essentially a weighted service popularity, where the weight is the reduced delay by deploying edge services compared to cloud computing. Clearly, other weights, such as task/user priority, can also be easily incorporated into our framework.
Remarks on delay model: We use simple models to capture the service delay incurred by task transmission and processing. Note that other communication models (e.g., Massive MIMO) and computing models (e.g., queuing system) can also be applied depending on the practical system configuration. In these cases, the delay reduction should be recalculated accordingly. The algorithm proposed in this paper is compatible with other delay models as long as tasks’ delay reduction can be obtained.
IiiC ContextAware Edge Service Provisioning
A user’s service demand depends on many factors, which are collectively referred to as the context. For example, relevant factors can be demographic factors (e.g., age^{1}^{1}1Young people are more interested in Game Apps as shown in [8]., gender), equipment type (e.g. smartphone, tablet), equipment status (e.g., battery levels ^{2}^{2}2A device with low battery level tend to offload computational tasks to edge servers [14]. ), as well as external environment factors (e.g., location, time, and events). This categorization is clearly not exhaustive and the impact of each single context dimension on the service demand is unknown a priori. These context information helps the ASP to understand the demand pattern of connected users and provide the edge service efficiently. Our algorithm will learn to discover the underlying connection between such context and users’ service demand pattern (see an example of such a connection in Figure 5(c) based on a realworld dataset in Section VI), which will be discussed in detail in the next subsection IIID, thereby facilitating the service placement decision making.
At each SBS, a context monitor periodically gathers context information by accessing information about currently connected users and optionally by collecting additional information from external sources (e.g. social media platforms). However, collecting the user context sometimes faces a concern known as the privacy disclosure management [37], which decides when, where, and what personal information can be revealed. The central notion behind privacy disclosure management is that people disclose different versions of personal information to different entities under different conditions [37]. Therefore, the service area of an SBS (e.g. business building, apartment complex, and plaza) may influence users’ privacy preference [38, 39] and hence determine what context information an SBS can access. To capture this feature, we allow each SBS to maintain its own user context space depending on its local users’ privacy preference. This results in the heterogeneity of context spaces maintained by SBSs. Note that the context spaces of different SBSs may be completely different, partially overlapping or exactly the same. Our model captures the most general cases and all SBSs having the same context space is a special case of ours. Formally, let be the number of context dimensions monitored by SBS for its connected users. The monitored context space of SBS is denoted by which is assumed to be bounded and hence can be denoted as without loss of generality. Let
be the context vector of user
monitored by SBS in time slot . The context vectors of all users connected to SBS are collected in .IiiD Problem Formulation
Now, we formulate the edge service placement problem as a CCMAB learning problem. In each time slot , the edge system operates sequentially as follows: (i) each SBS monitors the context of all connected users and collects the context information in . (ii) The ASP chooses a set of SBSs with based on the context information collected by all SBSs in the current time slot, and the knowledge learned from previous time slots. (iii) The users are informed about the current service placement decision . Till the end of the current time slot, users connected to SBSs in can request edge computing service from these SBSs. (iv) At the end of the current slot, the service demand of user served by SBS is observed.
The service demand of user with context
is a random variable with a unknown distribution. We denote this random service demand by
and its expected value by . The random service demand is assumed to take values in , where is the maximum possible number of tasks a user can have in one time slot. The service demand is assumed to be independent, i.e., the service demands of users served by an SBS are independent of each other. Moreover, each is assumed to be independent of the past service provision decisions and previous service demands.The goal of the ASP is to rent at most SBSs for edge service hosting in order to maximize the expected utility up to a finite time horizon . Based on the system utility defined in (1), the edge service placement problem can be formally written as:
(2a)  
s.t.  (2b) 
IiiE Oracle Benchmark Solution
Before presenting our bandit learning algorithm, we first give an oracle benchmark solution to P1 by assuming that the ASP had a priori knowledge about contextspecific service demand, i.e., for an arbitrary user with context vector , the ASP would know the expected demand . It is obvious that P1 can be decoupled into independent subproblems, one for each time slot :
(3a)  
s.t.  (3b) 
The optimal solution to the subproblem P2 in time slot can be easily derived in a running time of as follows: Given the contexts of connected users in time slot , the optimal solution is to select the highest ranked SBSs (top SBSs) which, for , satisfy:
(4) 
We denote by the optimal oracle solution in time slot . Consequently, the collection is the optimal oracle solution to P1.
However, in practice, the ASP does not have a priori knowledge about the service demand. In this case, ASP cannot simply solve P1 as described above, since the expected service demands are unknown. Hence, an ASP has to learn the expected service demand over time by observing the users’ contexts and service demand. For this purpose, the ASP has to make a tradeoff between deploying edge services at SBSs where little information is available (exploration) and SBSs which it believes to yield the highest demands (exploitation). In each time slot, the ASP’s service placement decision depends on the history of choices in the past and observed user context. An algorithm which maps the decision history to the current service placement decision is called a learning algorithm. The oracle solution , is used as a benchmark to evaluate the loss of learning. The regret of learning with respect to the oracle solution is given by
(5) 
Here, the expectation is taken with respect to the decisions made by the learning algorithm and the distributions of users’ service demand.
Iv CCMAB for Edge Service Placement
In order to place edge services at the most beneficial SBSs given the context information of currently connected users, the ASP should learn contextspecific service demand for the connected users. According to the above formulation, this problem is a combinatorial contextual MAB problem and we propose an algorithm called SEEN (Spatiotemporal Edge sErvice placemeNt) for learning the contextspecific service demand and solving P1.
Iva Algorithm Structure
Our SEEN algorithm is based on the assumption that users with similar context information covered by the same SBS will have similar service demand. This is a natural assumption in practice, which can be exploited together with the users’ context information to learn future service provisioning decisions. Our algorithm starts by partitioning the context space maintained by each SBS uniformly into small hypercubes, i.e. splitting the entire context space into parts of similar contexts. Then, an SBS learns the service demand independently in each hypercube of similar contexts. Based on the observed context information of all connected users and a certain control function, the algorithm is interspersed with exploration phases and exploitation phases. In the exploration phases, ASP chooses a random set of SBSs for edge service placement. These phases are needed to learn the local users’ service demand patterns of SBSs which have not been chosen often before. Otherwise, the algorithm is in an exploitation phase, in which it chooses SBSs which on average gave the highest utility when rented in previous time slots with similar user contexts. After choosing the new set of SBSs, the algorithm observes the users’ true service demand at the end of every time slot. In this way, the algorithm learns contextspecific service demand over time. The design challenge lies in how to partition the context space and how to determine when to explore/exploit.
The pseudocode of SEEN is presented in Algorithm 1. In the initialization phase, SEEN creates a partition for each SBS given the time horizon , which splits the context space into sets and these sets are given by dimensional hypercubes of identical size . Here, is an input parameter which determines the number of hypercubes in the partition. Additionally, SBS keeps a counter for each hypercube indicating the number of times that a user with context from hypercube connects to SBS when it was rented to host edge service up to time slot . Moreover, SEEN also keeps an estimated demand for each hypercube . Let be the set of observed service demand of users with context from set up to time slot . Then, the estimated demand of users with context from set is given by the sample mean:
(6) 
where equals . Notice that the set does not need be stored since the estimated demand can be updated based on , and observed demands in time slot .
In each time slot , SBS first observes the currently connected users and their context . For each piece of context information , SEEN determines the hypercube to which the belongs, i.e., holds. The collection of these hypercubes is given by for each SBS , and for the whole network. Fig. 3 offers a simple illustration of the context hypercubes and the update of counters with a 2D context space assuming three users are currently connected to SBS .
Then the algorithm is in either an exploration phase or an exploitation phase. In order to determine the correct phase in the current time slot, the algorithm checks if there are SBSs that have not been explored sufficiently often. For this purpose, the set of underexplored SBSs are obtained in each time slot as follows:
(7) 
where is a deterministic, monotonically increasing control function, which is an input to the algorithm and has to be set appropriately to balance the tradeoff between exploration and exploitation. In the next subsection, we will design a control function that guarantees a good balance in terms of this tradeoff.
If the set of underexplored SBSs is nonempty, SEEN enters the exploration phase. Let be the number of underexplored SBSs. If the set of underexplored SBSs contains at least elements, i.e. , SEEN randomly rents SBSs from . If the number of underexplored SBS is less than , i.e. , it selects SBSs from and additional SBSs are selected. These additional SBSs are those that have the highest estimated demand:
(8) 
If the set of SBSs defined by (8) is not unique, ties are broken arbitrarily. If the set of underexplored SBSs is empty, then the algorithm enters the exploitation phase, in which it selects SBSs that have the highest estimated demand, as defined below:
(9) 
Finally, each chosen SBS observes the received service demand from users at the end of time slot and then updates the estimated service demand and the counters for each hypercube.
IvB Analysis of the Regret
Next, we give an upper performance bound of the proposed algorithm in term of the regret. The regret bound is derived based on the natural assumption that the expected service demands of users are similar in similar contexts. Because users’ preferences of service demand differ based on their context, it is plausible for SBSs to divide its user population into groups with similar context and similar preferences. This assumption is formalized by the following Hölder condition for each SBS.
Assumption 1 (Hölder Condition).
For an arbitrary SBS , there exists , such that for any , it holds that
(10) 
where denotes the Euclidean norm in .
We note that this assumption is needed for the analysis of the regret but SEEN can still be applied if it does not hold true. In that case, however, a regret bound might not be guaranteed. Under Assumption 1, the following Theorem shows that the regret of SEEN is sublinear in the time horizon , i.e. with . This regret bound guarantees that SEEN has an asymptotically optimal performance, since holds. This means that SEEN converges to the optimal edge service placement strategy used by the oracle solution. Specifically, the regret of SEEN can be bounded as follows for any finite time horizon .
Theorem 1 (Bound for ).
Let and . If SEEN is run with these parameters and Assumption 1 holds true, the leading order of the regret is , where .
Theorem 1 indicates that the regret bound achieved by the proposed SEEN algorithm is sublinear in the time horizon . Moreover, the bound is valid for any finite time horizon, thereby providing a bound on the performance loss for any finite number of service placement decision cycles. This can be used to characterize the convergence speed of the proposed algorithm. In the special case of and , the considered CCMAB problem reduces to the standard contextual MAB problem. In this case, the order of the regret is . We note that the regret bound, which although is still sublinear in , is loose when the budget is close to . Consider the special case of , SEEN actually is identical to the naive optimal service placement policy (i.e. choose all SBSs to deploy the edge service) and hence, the actual regret is 0. It is intuitive that when the budget is large, learning is not very much needed and hence the more challenging regime is when the budget is small (but not 1).
IvC Complexity and Scalability
The memory requirements of SEEN is mainly determined by the counters and estimated contextspecific demands kept by the SBSs. For each SBS , it keeps the counters and estimated demand for each hypercube in the partition . If SEEN is run with the parameters in Theorem 1, the number of hypercubes is . Hence, the required memory is sublinear in the time horizon . However, this means that when , the algorithm would require infinite memory. Fortunately, in the practical implementations, SBS only needs to keep the counters of such hypercubes to which at least one of its connected users’ context vectors belongs. Hence the required number of counters that have to be kept is actually much smaller than the analytical requirement.
SEEN can be easily implemented with a large network without incurring a large overhead, since each SBS keeps counters and estimated user demands independently according to its maintained context space. At the beginning of each time slot, the ASP queries the SBSs about their status (explored or underexplored) and estimated utilities, and then chooses SBSs based on SEEN. Therefore, the number of SBSs does not complicate the algorithm much.
V Edge Service Placement for SBSs with Coverage Overlapping
So far we have considered the edge service placement problem for a set of nonoverlapping SBSs. However, SBSs may be densely deployed in areas with large mobile traffic data and computation demand, which creates the overlapping of SBSs’ coverage. In this case, a user can be possibly served by multiple SBSs, and therefore whether a user’s service demand can be processed at the Internet edge is determined by the service availability at all reachable SBSs. This creates spatial coupling of service demand among overlapped SBSs, i.e., users observed by an SBS may send service requests to other nearby SBSs. Therefore, it is difficult for the ASP to optimize the service placement policies by considering the service availability at each SBS separately. In this section, we propose an algorithm SEENO which extends SEEN for smallcell networks with coverage overlapping.
Va SBS Component and Componentwise Service Provisioning
We start by introducing the SBS components and componentwise decision. SEENO first constructs an undirected graph based on the smallcell network. Each SBS corresponds to a vertex in . For each pair of vertices , an edge is added between them if and only if the service areas of the two SBSs have coverage overlapping. Based on the constructed graph , we give the definition of component as follows:
Definition 1 (Component).
A component of an undirected graph is a subgraph in which any two vertices are connected to each other by paths, and which is connected to no additional vertices.
By the definition of component, we know that a set of overlapped SBSs correspond to a component in graph . Let collect all components in graph . For an arbitrary component , we define a set of componentwise decisions , which collects all possible service placement decisions for SBSs in component . The componentwise decision set can also be written as , where the total number of decisions in is given by the Bell number . For an arbitrary component , if a componentwise decision is taken, then the ASP rents SBSs from the set of overlapping SBSs in . Notice that the nonoverlapping SBS network is a special case: for the components containing only one SBS (i.e., nonoverlapping SBS), its componentwise decision set contains only one element . Let be componentwise decision sets for the whole network. Fig. 4 provides a simple illustration of the SBS components and componentwise decisions.
Instead of picking service placement decisions for individual SBSs separately, SEENO chooses componentwise decisions for components due to the fact that service demand received by an SBS is jointly decided by the service availability at SBSs in the same component. Let denote the users collaboratively served by SBSs in component . For a user , it is able to request edge services from multiple SBSs in the component depending on the chosen componentwise decision . Let be the uplink channel gain between user and SBS . If SBS is not reachable for user , then . Usually, users’ devices are energyconstrained and hence we assume that the service demand of user is offloaded to the SBS that has the best uplink channel condition among those that can provide edge service, namely . In this way, users incur the least transmission energy consumption ^{3}^{3}3Our algorithm is also compatible with other UserSBS association strategies. The association decision of user can be formally written as:
(11) 
Note that the uplink channel conditions can be easily monitored by the users, and we also assume that the users report monitored channel conditions to all reachable SBSs. Therefore, the association decisions of user are known to the SBSs given the componentwise decision . Let be the users connected to SBS , we have:
(12) 
In addition, for each SBS we define , where is the delay improvement of user as defined in (1). Let be the componentwise decisions chosen by the ASP. Notice that the ASP can only draw at most one componentwise decision for each component . Then, we have the edge service placement problem as follows:
(13a)  
s.t.  (13b)  
(13c) 
where (13b) is the budget constraint for the ASP and (13c) indicates that only one componentwise decision can be selected for each component.
VB Disjunctively Constrained Knapsack Problem
Now, we consider an oracle solution for P3. Similarly, P3 can be decoupled into subproblems.Yet, the solution for each subproblem cannot be easily derived as in (4) due to different costs incurred by different componentwise decisions and, more importantly, the conflicts among componentwise decisions in (13c). The perslot subproblem of P3 can be formulated as a Knapsack problem with Conflict Graphs (KCG), which is also referred to as disjunctively constrained knapsack problem. The conflict graph is defined based on the componentwise decisions: Each componentwise decision corresponds to a vertex in . For an arbitrary pair of vertices , add an edge between and if there exist a componentwise decision set such that .
In the following, we convert P3 to a standard formulation of the KCG problem. For each , we define a tuple where is the profit of choosing componentwise decision , is the cost of decision which equals , and indicates whether the decision is taken or not. Then, a KCG problem equivalent to P3 can be written as:
(14a)  
s.t.  (14b)  
(14c)  
(14d) 
The above problem is an NPhard combinatorial optimization problem. Existing works have proposed various algorithms, including heuristic solutions
[41] and exact solutions [42] to solve KCG. In the simulation we employ the BranchandBound algorithm [43] to solve P3KCG.VC Algorithm Structure
Now, we give SEENO in Algorithm 2 for edge service placement with coverage overlapping. Similar to SEEN, SEENO also has two phases: exploration and exploitation. We first obtain the set of underexplored SBSs as in (7) based on users in their coverage ^{4}^{4}4 is the set of users within the coverage of SBS . Note that it is different from which denotes the users served by SBS depending on the componentwise decisions.. If the set of underexplored SBSs is nonempty, namely , then SEENO enters the exploration phase. Let be the number of underexplored SBSs. If the set of underexplored SBSs contains at least elements , SEENO randomly rents SBSs from . If the number of underexplored SBS is less than , i.e. , SEENO first selects SBSs from and SBSs are selected by solving a KCG problem based on the following componentwise decisions:
(15)  
(16)  
(17) 
is the set of oneelement componentwise decision . The decisions in need to be removed from since they have already been chosen by ASP; collects componentwise decisions for which do not contain the underexplored SBSs . The decisions in are also removed since the componentwise decision for must contain all the underexplored SBSs in . Then, ASP solves a KCG problem with the constructed componentwise decision set , decision , decision profit , and the modified budget . If the set of underexplored SBSs is empty, then the algorithm enters the exploitation phase. It solves the P3KCG based on the current contextspecific demand estimation with all componentwise decision and budget .
At the end of each time slot, SBSs observe service demand received from the connected users. Then, each SBS updates the estimated service demand and the counters for each context hypercube. Notice that in the overlapping case, a user can be covered by multiple SBSs and therefore, the observed service demand can be used to update the estimated service demand at multiple SBSs. For example, if a user is in the coverage of SBS and SBS , namely . Then, the observed service demand of this user can be used to update the contextspecific service demand estimation at both SBS and SBS . This also means that SEENO can learn the reward of multiple componentwise decisions in one time slot, e.g. if componentwise decision is taken. The utility of componentwise decisions can be updated at the same time. Theorem 2 proves that SEENO has the same regret bound as SEEN.
Theorem 2 (Regret Bound for SEENO).
SEENO has the same regret bound as SEEN.
The regret upper bound for SEENO in Theorem 2 is valid for any edge network layout and does not require any assumption on SBS deployment and user population distribution. This helps to carry out SEENO in a practical application since, in most cases, the SBS deployment is revealed to ASP though, the user distribution is unknown a priori.
Vi Simulation
In this section, we carry out simulations on a realworld dataset to evaluate the performance of the proposed algorithms. We use the data collected in [8] which aims to reveal the underlying link between the demand for mobile applications and the user context including age, gender, occupation, years of education, device type (e.g. phone, tablet, and laptop), and nationality. It collects the context information of a total of 10,208 end users and the users’ demand for 23 types of mobile applications. We envision that these mobile applications can be deployed on edge servers at SBSs via containerization and the UEs can send computing tasks to SBS for processing. In our simulation, we consider that the ASP aims to provide edge service for Gametype application (the most popular application out of 23 mobile applications investigated in [8]), which is also a major use case of edge computing. Fig. 5(a) and Fig 5(b) depict the user distribution, and Fig. 5(c) depicts the contextspecific service demand estimation on the two context dimensions Age and Years of education. We see clearly that the users’ demand pattern is very related to the users’ context information. Note that the Age and Years of education information is obtained from the dataset [8] and is used only as an example to illustrate the contextdemand relationship. In practice, users may be willing to disclose such information in enterprise or campus internal networks. For the more general scenario, SBSs can use other less sensitive context such as user device information.
For the smallcell network, we simulate a 1000m1000m area served by SBSs and one MBS. The SBSs are randomly scattered in this area. An SBS can serve users within the service range 150m, which tends to create coverage overlapping among SBSs. For the analysis of nonoverlapping SBSs, we assume that users request edge service only from the nearest SBSs; while, in the overlapping case, a user is allowed to decide its association based on the service availability and channel condition of reachable SBSs. To capture different compositions of user population across different SBSs, we randomly assign one out of three area types (school zone, business area, and public) to each SBS, where users with student occupation context tend to show up in school zones with a higher probability, users with fulltime worker tend to show up in business areas, and all types of users show up in public with the same probability. The default value of ASP budget is set as . Other key simulation parameters are: channel bandwidth MHz, transmission power of user equipment dBm, noise power W/Hz, CPU frequency at SBSs GHz, CPU frequency at the cloud GHz, Internet backhaul transmission rate Mbps, roundtrip time ms.
The proposed algorithm is compared with the following benchmarks:
(1) Oracle algorithm: Oracle knows precisely the expected demand for any user context. In each time slot, Oracle selects SBSs that maximize the expected system utility as in (4) based on the observed user context.
(2) Combinatorial UCB (cUCB)[44]: cUCB is developed based on a classic MAB algorithm, UCB1. The key idea is to create superarms, i.e., element combination of SBS ( is the budget). There will be a total of superarms and cUCB learns the reward of each superarm.
(3) CombinatorialContextual UCB (cUCB): cUCB considers users’ context when running cUCB. Specifically, cUCB maintains a context space for each superarm and the utility estimations of hypercubes in a context space are updated when corresponding superarm is selected.
(4) Greedy: Greedy rents a random set of SBSs with probability . With probability , the algorithm selects SBSs with highest estimated demands. These estimated demands are calculated based on the previously observed demand of rented SBSs.
(5) Random algorithm: The algorithm simply rents SBSs randomly in each time slot.
Via Performance Comparison
Fig. 7 shows the cumulative system utility achieved by SEEN and other 5 benchmarks for a nonoverlapping case. As expected, the Oracle algorithm has the highest cumulative system utility and gives an upper bound to the other algorithms. Among the other algorithms, we see that SEEN and cUCB significantly outperform cUCB, Greedy, and Random algorithm, since they take into account the context information when estimating the users’ service demand pattern. Moreover, SEEN achieves a higher system utility compared with cUCB. This is due to the fact that cUCB creates a large set of superarms and therefore is more likely to enter the exploration phase. The conventional algorithms, cUCB, and Greedy, only provide slight improvements compared to the Random algorithm. The failure of these methods is due to the uncertainty of user population in various aspects, e.g. user numbers and composition, which are difficult to estimate in each time slot without observing the user context information.
ViB Demand estimation error
Fig. 7 shows the mean square error (MSE) of service demand estimation achieved by SEEN and cUCB, where the MSE is measured across all context hypercubes compared to the oracle demand estimation. It can be observed that the MSE of SEEN converges quickly to 0.01 after first 120 time slots while the MSE of cUCB stays high and decreases slowly during 500slot runtime. This means that SEEN is able to learn the user demand pattern fast and provide more effective decisions on edge service placement.
ViC Demand allocation
Fig. 8 shows the allocation of user demand in the network, i.e., whether the demand is processed at the edge or cloud. Note that ASP desires to process more demand at the edge so that lower delay costs are incurred to users. We can see from Fig. 8 that SEEN is able to accommodate a large amount of user demand 62.2%, which is slightly lower than that of Oracle (69.2%). For other four schemes, they rely heavily on the cloud server, therefore incurring large delay cost and diminishing the system utility.
ViD Learning with More Context
Next, we evaluate the performance of SEEN under different context spaces. Figure 9 shows the cumulative system utilities achieved by SEEN and 5 other benchmarks when running with 2, 3, 4 contexts. Comparing these three figures, we see that the cumulative system utilities achieved by cUCB, Greedy, and Random stay more or less the same, since these algorithms are independent of the context information. The contextaware algorithms, i.e., SEEN, Oracle, and cUCB, achieve higher cumulative utilities with more context information since more contexts help the ASP to learn the users’ demand pattern and therefore make better service provisioning decision. In addition, it is worth noticing that SEEN incurs larger regrets when running with more context information, which is consistent with the analysis in Theorem 1.
ViE Impact of ASP budget
Fig. 11 depicts the cumulative system utility achieved by 6 schemes in 500 slots with different budgets. As expected, the system utility grows with the increase in ASP budget since more user demand can be processed at the network edge with more SBSs providing edge services. Moreover, we see that SEEN is able to achieve closetooracle performance at all levels of ASP budget. By contrast, the cUCB algorithm suffers an obvious performance degradation with . This is due to the fact that number of superarms created by cUCB becomes very large given and . This forces the cUCB algorithm to enter exploration more frequently and leads to system utility loss.
ViF Edge Service Placement with Overlapping Coverage
Fig. 11 compares the performance achieved by SEENO and 5 other benchmarks when applied to the overlapping case. Similar to the nonoverlapping case, we see that the contextaware schemes far outperform conventional MAB algorithms and SEENO achieves the highest cumulative system utility except for Oracle. However, it can be observed that SEENO incurs a larger regret compared to the nonoverlapping case. This is because users in the overlapped area are observed by multiple SBSs and their contexts are duplicated when determining the underexplored SBSs. This increases the probability of being underexplored for SBSs and pushes SEENO to enter the exploration phase. Nevertheless, it does not mean that considering coverage overlapping leads to the performance degradation. SEENO actually achieves a higher cumulative system utility compared to that of SEEN achieved in the nonoverlapping case.
ViG Impact of Overlapping Degree
The overlapping degree of the edge network is defined as where is the service area cocovered by at least two SBSs and is the total service area. In the following, we show the impact of overlapping degree on the performance of SEEN. Fig. 12 depicts the cumulative system utilities achieved by SEENO and Random in 500 time slots with different overlapping degrees. It also shows the cumulative system utility achieved by SEEN in the nonoverlapping case for comparison. In general, we see that a larger overlapping degree results in higher system utilities for both SEENO and Random. This is because more users can access multiple SBSs for edge service given a larger overlapping degree, and therefore the ASP can further optimize the edge service placement decisions to accommodate more service demand at the Internet edge by exploiting the flexible association of users. By comparing SEENO and SEEN, we also see that considering the SBS coverage overlapping helps improve the system utility and the improvement grows with the increase in the overlapping degree.
Vii Conclusion
In this paper, we investigated the edge service placement problem of an ASP in radio access networks integrated with shared edge computing platforms. To cope with the unknown and fluctuating service demand among changing user populations, we formulated a novel combinatorial contextual bandit learning problem and proposed an efficient learning algorithm to make optimal spatialtemporal dynamic edge service placement decisions. The proposed algorithm is practical, easy to implement and scalable to large networks while achieving provably asymptotically optimal performance. However, there are still efforts need to be done to improve the existing CCMAB framework. First, we currently use a simple static partition of context space. Considering dynamic partition may further help improve the algorithm performance by generating more appropriate hypercubes. Second, our paper only provides a regret upper bound for SEEN. A meaningful complementary is to analyze the regret lower bound. Besides the investigated edge service placement problem, CCMAB can also be applied to many other sequential decision making problems under uncertainty that involve multipleplay given a limited budget and context.
References
 [1] A. Li, X. Yang, S. Kandula, and M. Zhang, “Cloudcmp: comparing public cloud providers,” in Proceedings of the 10th ACM SIGCOMM conference on Internet measurement. ACM, 2010, pp. 1–14.
 [2] T. Taleb, S. Dutta, A. Ksentini, M. Iqbal, and H. Flinck, “Mobile edge computing potential in making cities smarter,” IEEE Communications Magazine, vol. 55, no. 3, pp. 38–43, March 2017.
 [3] Vapor IO, https://www.vapor.io/.
 [4] Project Volutus, https://www.vapor.io/projectvolutusextendingthecloudtothetrueedge/.
 [5] Intel, “Smart cells revolutionize service delivery,” https://www.intel.com/content/dam/www/public/us/en/documents/whitepapers/smartcellsrevolutionizeservicedelivery.pdf.
 [6] M. STANSBERRY, “Uptime institutedata center industry survey 2013.”
 [7] T. L. Lai and H. Robbins, “Asymptotically efficient adaptive allocation rules,” Advances in applied mathematics, vol. 6, no. 1, pp. 4–22, 1985.
 [8] S. L. Lim, P. J. Bentley, N. Kanakam, F. Ishikawa, and S. Honiden, “Investigating country differences in mobile app user behavior and challenges for software engineering,” IEEE Transactions on Software Engineering, vol. 41, no. 1, pp. 40–64, 2015.
 [9] Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, “A survey on mobile edge computing: The communication perspective,” IEEE Communications Surveys & Tutorials, 2017.

[10]
W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu, “Edge computing: Vision and challenges,”
IEEE Internet of Things Journal, vol. 3, no. 5, pp. 637–646, 2016.  [11] D. Huang, P. Wang, and D. Niyato, “A dynamic offloading algorithm for mobile computing,” IEEE Trans. on Wireless Communications, vol. 11, no. 6, pp. 1991–1995, 2012.
 [12] J. Liu, Y. Mao, J. Zhang, and K. B. Letaief, “Delayoptimal computation task scheduling for mobileedge computing systems,” in Information Theory (ISIT), 2016 IEEE International Symposium on. IEEE, 2016, pp. 1451–1455.
 [13] J. Xu, L. Chen, and S. Ren, “Online learning for offloading and autoscaling in energy harvesting mobile edge computing,” IEEE Trans. on Cognitive Communications and Networking, vol. PP, no. P, pp. 1–15, 2017.
 [14] Y. Mao, J. Zhang, and K. B. Letaief, “Dynamic computation offloading for mobileedge computing with energy harvesting devices,” IEEE Journal on Selected Areas in Communications, vol. 34, no. 12, pp. 3590–3605, 2016.
 [15] L. Chen and J. Xu, “Socially trusted collaborative edge computing in ultra dense networks,” in Proceedings of the Second ACM/IEEE Symposium on Edge Computing. ACM, 2017, p. 9.
 [16] S. Tanzil, O. Gharehshiran, and V. Krishnamurthy, “A distributed coalition game approach to femtocloud formation,” IEEE Trans. on Cloud Computing, 2016.
 [17] Y. Chen, R. H. Katz, and J. D. Kubiatowicz, “Dynamic replica placement for scalable content delivery,” in International Workshop on PeertoPeer Systems. Springer, 2002, pp. 306–318.
 [18] K. Shanmugam, N. Golrezaei, A. G. Dimakis, A. F. Molisch, and G. Caire, “Femtocaching: Wireless content delivery through distributed caching helpers,” IEEE Transactions on Information Theory, vol. 59, no. 12, pp. 8402–8413, 2013.
 [19] Q. Zhang, Q. Zhu, M. F. Zhani, R. Boutaba, and J. L. Hellerstein, “Dynamic service placement in geographically distributed clouds,” IEEE Journal on Selected Areas in Communications, vol. 31, no. 12, pp. 762–772, 2013.
 [20] J. Xu, L. Chen, and P. Zhou, “Joint service caching and task offloading for mobile edge computing in dense networks,” in International Conference on Computer Communications(INFOCOM). IEEE, 2018, pp. 1–9.
 [21] L. Chen and J. Xu, “Collaborative service caching for edge computing in dense small cell networks,” arXiv preprint arXiv:1709.08662, 2017.
 [22] S. Müller, O. Atan, M. van der Schaar, and A. Klein, “Contextaware proactive content caching with service differentiation in wireless networks,” IEEE Transactions on Wireless Communications, vol. 16, no. 2, pp. 1024–1036, 2017.
 [23] P. Auer, N. CesaBianchi, and P. Fischer, “Finitetime analysis of the multiarmed bandit problem,” Machine learning, vol. 47, no. 23, pp. 235–256, 2002.
 [24] R. Agrawal, “Sample mean based index policies by regret for the multiarmed bandit problem,” Advances in Applied Probability, vol. 27, no. 4, pp. 1054–1078, 1995.
 [25] V. Anantharam, P. Varaiya, and J. Walrand, “Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple playspart i: Iid rewards,” IEEE Transactions on Automatic Control, vol. 32, no. 11, pp. 968–976, 1987.
 [26] R. Agrawal, M. Hegde, and D. Teneketzis, “Multiarmed bandit problems with multiple plays and switching cost,” Stochastics and Stochastic reports, vol. 29, no. 4, pp. 437–459, 1990.
 [27] Y. Gai, B. Krishnamachari, and R. Jain, “Combinatorial network optimization with unknown variables: Multiarmed bandits with linear rewards and individual observations,” IEEE/ACM Transactions on Networking (TON), vol. 20, no. 5, pp. 1466–1478, 2012.
 [28] A. Slivkins, “Contextual bandits with similarity information,” in Proceedings of the 24th annual Conference On Learning Theory, 2011, pp. 679–702.
 [29] L. Li, W. Chu, J. Langford, and R. E. Schapire, “A contextualbandit approach to personalized news article recommendation,” in Proceedings of the 19th international conference on World wide web. ACM, 2010, pp. 661–670.
 [30] C. Tekin and M. van der Schaar, “Distributed online learning via cooperative contextual bandits,” IEEE Transactions on Signal Processing, vol. 63, no. 14, pp. 3700–3714, 2015.
 [31] L. Qin, S. Chen, and X. Zhu, “Contextual combinatorial bandit and its application on diversified online recommendation,” in Proceedings of the 2014 SIAM International Conference on Data Mining. SIAM, 2014, pp. 461–469.
 [32] S. Li, B. Wang, S. Zhang, and W. Chen, “Contextual combinatorial cascading bandits,” in International Conference on Machine Learning, 2016, pp. 1245–1253.
 [33] C. Pahl, “Containerization and the paas cloud,” IEEE Cloud Computing, vol. 2, no. 3, pp. 24–31, 2015.
 [34] D. Bernstein, “Containers and cloud: From lxc to docker to kubernetes,” IEEE Cloud Computing, vol. 1, no. 3, pp. 81–84, 2014.
 [35] B. Russell, “Kvm and docker lxc benchmarking with openstack,” 2014.
 [36] Q. Fan and N. Ansari, “Workload allocation in hierarchical cloudlet networks,” IEEE Communications Letters, vol. 22, no. 4, pp. 820–823, 2018.
 [37] S. Lederer, J. Mankoff, A. K. Dey, and C. Beckmann, “Managing personal information disclosure in ubiquitous computing environments,” Intel Research, IRBTR03015, 2003.
 [38] D. Anthony, T. Henderson, and D. Kotz, “Privacy in locationaware computing environments,” IEEE Pervasive Computing, vol. 6, no. 4, 2007.
 [39] I. Bilogrevic, K. Huguenin, B. Agir, M. Jadliwala, M. Gazaki, and J.P. Hubaux, “A machinelearning based approach to privacyaware informationsharing in mobile social networks,” Pervasive and Mobile Computing, vol. 25, pp. 125–142, 2016.
 [40] Online appendix: Spatiotemporal edge service placement: A bandit learning approach. [Online]. Available: https://www.dropbox.com/sh/hrzv46x78yjy3fl/AAAQ6YCjPcg8PvYIExjgwoyJa?dl=0
 [41] T. Yamada, S. Kataoka, and K. Watanabe, “Heuristic and exact algorithms for the disjunctively constrained knapsack problem,” Information Processing Society of Japan Journal, vol. 43, no. 9, 2002.
 [42] M. Hifi and N. Otmani, “An algorithm for the disjunctively constrained knapsack problem,” International Journal of Operational Research, vol. 13, no. 1, pp. 22–43, 2012.
 [43] A. Bettinelli, V. Cacchiani, and E. Malaguti, “A branchandbound algorithm for the knapsack problem with conflict graph,” INFORMS Journal on Computing, vol. 29, no. 3, pp. 457–473, 2017.
 [44] W. Chen, Y. Wang, and Y. Yuan, “Combinatorial multiarmed bandit: General framework and applications,” in International Conference on Machine Learning, 2013, pp. 151–159.
 [45] W. Hoeffding, “Probability inequalities for sums of bounded random variables,” Journal of the American statistical association, vol. 58, no. 301, pp. 13–30, 1963.
Appendix A Proof of Theorem 1
The regret bound of SEEN is derived based on the natural assumption that the expected demands of users are similar if they have similar context as captured by the the Hölder condition. The Hölder condition allows us to derive a regret bound, which shows that the regret of SEEN is sublinear in the time horizon , i.e. with .
For each SBS and each hypercube , we define and be the best and worst expected demand over all contexts from hypercube respectively. In some steps of the proofs, we have to compare the demands at different positions in a hypercube. As a point of reference, we define the context at the (geometrical) center of a hypercube as . Also, we define the top SBSs for hypercubes in as following SBSs which satisfy
Comments
There are no comments yet.