I Introduction
The Industrial Internet of Things (IIoT) integrates the physical industrial environment into computerbased systems, resulting in improved efficiency, accuracy and economic benefit in addition to the reduced human intervention [1]. Traditionally, industrial applications are founded on a centralized model of data processing and analytics: data generated by industrial devices are transported over the Internet infrastructure to a central computing facility (typically a cloud) where intensive data processing are carried out [2]. However, the evergrowing distributed industrial data renders it impractical to transport all data over today’s alreadycongested backbone Internet. Moreover, due to the unpredictable network latency, data processing in the cloud often cannot meet the stringent latency requirements of monitoring and controlling critical industrial devices [3].
To overcome these limitations, Fog computing has recently been integrated into IIoT to support the operating environment featured by realtime response and high automation, which exploits the spare computing resources of edge devices (e.g. network gateways) to relieve backbone traffic burden and enable ultralow latency response [4]. In a typical scenario, industrial devices are equipped with smart Sensor Nodes (SNs) which collect data and perform first order operations (e.g. filtering, aggregation, and translation) on the raw data. A cohort of SNs can be logically clustered around and communicate with a Fog Node (FN) that provides a richer computing resource. These FNs receive streamed data from SNs, perform more complex analysis on the received data, and derive actionable intelligence to maintain the devices within its “sphere of influence”. They also have the option of further offloading workload exceeding their computing capacity to the cloud as a second choice, resulting in a hierarchical Fog computing architecture [5] (see Fig 1 for an illustration). Task offloading has been a central theme of many prior works [6, 7], which concerns what/when/how to offload workload of end devices to FNs. This literature assumes that FNs can process whatever types of service demand received without considering the availability of services. However, unlike the centralized cloud which has huge and diverse resources, the limited computing/storage resources of FNs allow only a small set of services to be hosted at the same time [8]. Because different industrial devices differ in their functionality and require different services to analyze the sensed data, which services are hosted by the FN determines which devices can be maintained at the network edge, thereby affecting the performance of Fog computing.
Optimally configuring the Fog system (i.e., which services are hosted by which FNs) is a very challenging problem for IIoT. First, industrial devices are heterogeneous in terms of required service types and corresponding service demand. While the former is often fixed, the latter is changing over time since devices follow different operation schedules (regular maintenance or eventdriven). Therefore, FNs must be adaptively configured to track the temporal variations of the demands for different services. Second, to accommodate more service demand at the Internet edge, FNs are usually densely deployed and hence an SN may be in the coverage of multiple FNs. On the one hand, the overlapping coverage allows FNs to collaboratively serve SN’s demand. On the other hand, it creates a complex multicell setting where demand and resources are highly coupled in the spatial domain. Effective Fog configuration requires careful coordination among all FNs, and distributed solutions are in much favor. Third, FNs are deployed in a “dropandplay” fashion to enable Fog computing on the existing infrastructure. In this scenario, FNs may not be powered by main electric grids but have to rely on batteries (or renewable energy sources) [9]. The battery energy constraints couple the fog configuration decisions over time, yet decisions have to be made without foreseeing the future system dynamics. To address these challenges, we develop a novel online framework for adaptive Fog configuration.
In the conventional cloud computing context, virtual machine placement problems have been studied [10], and optimizing service placement over multiple cloudlets was investigated in [8]. However, these works do not consider the overlapping coverage areas or the battery energy constraints of FNs, and their algorithms are centralized. Content caching in Fog systems is also related to service hosting considered. While content caching [11] mainly deals with storage capacity constraints, our fog configuration strategy aims to improve computation delay performance withe energy budget constraints. Our main contributions are as follows:
(1) We formalize the adaptive Fog configuration problem for IIoT as a mixedinteger nonlinear stochastic optimization problem with longterm constraints. We jointly optimize service hosting and task admission of a network of FNs in order to minimize the timeaverage computation delay cost while satisfying the longterm battery energy constraints of FNs.
(2) A novel algorithm, called AFC, is developed for adaptive Fog configuration under the Lyapunov optimization framework [12]. AFC executes in an online fashion by separating the longterm problem into a sequence of perslot subproblems that are solvable with only currently available system information. We prove that AFC achieves closetooptimal service delay while bounding the potential violation of the longterm energy consumption constraint.
(3) To enable distributed coordination among a network of FNs, we develop a distributed algorithm as a key subroutine of AFC to solve each perslot subproblem. The algorithm is developed based on Gibbs Sampling and leverages Markov Random Field and Graph Theory to enable parallel execution to speed up convergence.
The rest of the paper is organized as follows. Section II presents the system model and problem formulation. Section III develops the online algorithm AFC and proves its performance guarantee. Section IV designs a distributed algorithm as a subroutine of AFC. SectionV carries out simulations, followed by the conclusion in Section VI.
Ii System Model
Iia Industrial Fog System
We consider an industrial environment comprising different types of devices. A hierarchical industrial Fog system is deployed to automate monitoring and control as well as apply embedded intelligent agents that can adjust device behaviors in relation to ongoing performance variables. Specifically, each device is equipped with a sensor node (SN), which is a lowpower wireless device with embedded microcontroller and storage. Code for devices monitoring and control is deployed at SNs. Denote the set of SNs by . Besides the SNs, there are Fog Nodes (FNs) deployed in the network, indexed by , acting as wireless network gateways for connecting SNs and providing a richer computing resource that allows more complex analysis of streamed data from SNs for event triggering, predictive modeling of critical events, and notification. We consider that both SNs and FNs are batterypowered to enable flexible deployment.
Each FN serves the demand from SNs within its “sphere of influence”. Let denote the set of SNs within the wireless transmission range of FN . Due to dense deployment of FNs, an SN can be served by multiple FNs. Let be the set of FNs reachable by SN . We say that two FNs are neighbors if there exists some SN such that . In other words, FNs and can potentially collaborate to serve at least one common SN. Given this, the Fog network can be described by a graph , where is the edge set and there exists an edge between two FNs if they are neighbors. Let denote the onehop neighborhood of FN .
IiB Service Hosting and Task Admission
Industrial SNs differ in functionalities and therefore, different SNs require different services to analyze different types of data. We consider that there are types of SNs in the network and hence types of services, indexed by . For each type service demand, its (expected) input data size (in bits) and required number of CPU cycles for one task are and , respectively. Let denote the type of SN . Running a particular service requires allocating sufficient computing resource and caching the associated libraries and databases. However, compared to the powerful cloud, FNs are constrained in their computing resource and storage, hence only a limited number of services can be hosted by a FN at a time. We assume that each FN can host at most types of services. Hosting service at a FN allows insitu analysis of the streamed data from type SNs, thereby enabling prompt response. The data from SNs, whose required services are not hosted at FNs, will be transmitted to the cloud for analysis.
Since industrial devices operate following different schedules, the service demand from SNs varies over time. To track such temporal variations, FNs adaptively reconfigure the hosted services to maximize the performance of the fog network. We consider a slotted operational timeline, where each time slot matches the time scale at which FNs can be reconfigured. The configuration decisions are made in a much slower time scale than task arrivals. During each time slot, task arrivals from SNs are assumed to follow Poisson processes, and the arrival rates in the current time slot are predicted using stateoftheart prediction algorithms [13]. Such twoscale time system is widely used in the existing literature [14]. The expected service demand of SN for type service in time slot is denoted by . At the beginning of each time slot , each FN configures itself by choosing what services to host. Let be FN ’s (service) hosting decision in time slot where
is a binary variable representing whether service
is hosted or not. The hosting decision has to satisfy the capacity constraint, namely . Let be the set of all feasible hosting decisions of FN . The hosting profile of the whole Fog network is collected in . Given the profile , let be the set of FNs that host service and are reachable by SN . We assume that the demand of SN is offloaded to FN that has the best uplink channel condition, namely , where is the uplink channel condition between SN and FN . In this way, SN incurs the least transmission energy consumption. Nevertheless, other SNFN association rules can also be easily incorporated in our framework. It is possible that SN can reach none of the FNs hosting service , namely . In this case, the service demand of SN is sent to the reachable FN with the best channel condition and then further offloaded to the remote Cloud. To facilitate the exposition, we write as the FN (or Cloud) that processes the service demand for SN in time slot :(1) 
Because FNs are batterypowered, in addition to what services to host, they also decide the amount of workload to process by itself to extend the battery lifetime. Let be the fraction of service demand admitted by FN . Note that the actual task admission will be decided during the time slot when the specific service demand is received depending on its type and priority. Nevertheless, the task admission decisions can still be planned at a reasonably high granularity at the beginning of each time slot. We collect the task admission decisions of all FNs in time slot in .
IiC Energy Consumption and Service Delay
Different service hosting and task admission decisions shape the service demand distribution among the FNs and the Cloud in different ways, resulting in different energy consumption of the FNs and service delay. Let be the type demand received by FN , which can be computed as:
(2) 
IiC1 Energy consumption
To simplify our analysis, we assume that the FN processes tasks at its maximum CPU speed and chooses the minimum CPU speed when it is idle. Then, based on the energy consumption model in [6] the computation energy consumption can be expressed as:
(3) 
where is the static energy consumption regardless of the workload as long as FN is turned on; is the unit energy consumption for one CPU cycle depending on the CPU architecture parameter and CPU frequency ; is the total number of CPU cycles required to process service demand received by FN .
IiC2 Service delay
The service delay consists of computation delay and communication delay. The computation delay is incurred by task processing that happens at either the FNs or cloud. Following the Fog server computation model in [15], the computation delay for one type service demand is if it is processed at FN and is if it is processed at Cloud, where and is the CPU frequency at FN and Cloud, respectively. Usually, we have .
The communication delay are incurred during wireless transmission between SNs and FNs and, wired transmission between FNs and cloud via the backbone Internet. Since the wireless transmission delay for each SNFN pair is similar and much smaller compared to backbone transmission [16], their impact on the service delay can be neglected. Therefore, we focus on the backbone transmission delay. Let be the backbone transmission rate and be the RoundTrip Time to Cloud, the service delay cost for SN can be obtained as:
(4)  
where, in the first case, the first term captures the service delay for demands processed at FN and the second term captures the service delay for demands processed at Cloud.
IiD Offline Problem Formulation
The goal of Fog network is to minimize the total service delay cost of all SNs while satisfying the energy consumption constraints of FNs. Formally, the offline problem is
P1  (5a)  
s.t.  (5b)  
(5c)  
(5d)  
(5e) 
where the first constraint is the longterm energy consumption constraint for each FN, and is the available battery energy of FN for a period of time slots. Every time slots the battery is replenished either manually or via an energy harvesting device. The second constraint is due to FNs’ service hosting capacity. The third and fourth conditions impose perslot constraints on the maximum energy consumption of each FN and the maximum total service delay cost.
There are several challenges that impede the derivation of the optimal solution to the offline problem P1. First, optimally solving P1 requires the complete future information (e.g., service demands for all ) which is difficult to predict in advance, if not impossible. Second, the longterm energy constraint couples the configuration decisions temporally: consuming more energy in the current slots will reduce the available energy for future use. Third, P1 is a mixed integer nonlinear programming which is very difficult to solve even if the future information is known a priori. These challenges call for an efficient online approach that can make Fog configuration decisions with only currently available information.
Iii Online Adaptive Fog Configuration
In this section, we develop an online algorithm AFC (Adaptive Fog Configuration) based on Lyapunov optimization. A salient advantage of AFC is that it converts the offline problem P1 to a sequence of perslot optimization problems that are solvable with only currently available information.
Iiia Online Fog Configuration with Lyapunov Drift
A major challenge of directly solving P1 is that the longterm energy constraint of FNs couples the Fog configuration decisions and task admission decisions across different time slots. To address this challenge, we leverage the Lyapunov drift technique and construct a set of (virtual) energy deficit queues , one for each FN, to guide the Fog configuration and task admission to follow the longterm energy constraints (5b). Initializing , the energy deficit queue for FN evolves as follows
(6) 
The length of indicates the deviation of the current energy consumption of FN from its longterm energy constraint . Based on the energy deficit queues, we present the online algorithm AFC in Algorithm 1. AFC determines the optimal service hosting profile and task admission profile in each time slot by solving the following problem:
(7a)  
s.t.  (7b) 
where is a positive control parameter used to adjust the tradeoff between service delay minimization and energy deficit minimization. Note that solving P2 requires only currently available information. By considering the additional term , AFC takes into account the energy deficit of FNs in decision making. When is larger, minimizing the energy consumption is more critical. Thus, AFC works by following the philosophy of “if violate the energy constraint, then consume less energy”, thereby satisfying the longterm energy constraints without foreseeing the future.
IiiB Performance Analysis of AFC
Next, we provide the performance bound of AFC in terms of the longterm service delay cost and longterm energy consumption compared to the optimal solution of P1 obtained by an oracle with full future information.
Theorem 1.
By following the Fog configuration profile and task admission decisions derived by AFC, the longterm service delay cost satisfies:
and the longterm energy deficit of FNs satisfies:
where , is the longterm service delay cost achieved by the optimal solution to P1; and is a constant which represents the longterm energy surplus achieved by some stationary policy.
Proof.
See online Appendix A [17]. ∎
Theorem 1 demonstrates an delayenergy deficit tradeoff. Specifically, the asymptotic expected system delay cost achieved by AFC is no higher than the optimal delay performance of the offline problem P1 plus a term where is a constant. Therefore, by choosing a large , AFC is able to achieve the optimal system delay cost. However, a lower service delay is achieved at the price of a higher energy consumption. As presented in (1), the expected energy deficit is bounded by and hence a large may incur a large energy consumption. To complete the algorithm, it remains to solve the optimization problem P2, which will be discussed in the next section.
Iv Distributed Optimization with Gibbs Sampler
The problem P2 is a mixedinteger nonlinear programming. While there exist various techniques (such as Generalized Benders Decomposition) to solve it, these methods are usually centralized. When a centralized control is absent for information collection and centralized coordination, a distributed solution is desired so that each FN or (a small subset of FNs) can be configured in a distributed way. In the following, we develop a distributed algorithm based on Gibbs sampling techniques [18] for solving P2. Since P2 is solved in each time slot , we drop the time index in the rest of this section.
Iva Gibbs Sampling for Fog Configuration
In problem P2, the service hosting profile and task admission profile have to be jointly optimized. Fortunately, the task admission decisions of FNs are fully decoupled if the Fog configuration profile is determined in advance. The optimal can be easily derived because P2
is a linear programming with a fixed
. Since each associates with an optimal , we denote as a deterministic function of , denoted by . For ease of exposition, we write as in the rest of the paper. Now, we restate P2 with only as a variable:(8a)  
s.t.  (8b) 
P2S
is a nonconvex combinatorial optimization and it cannot be solved with many distributed algorithms based on Alternating Direction Method of Multipliers (ADMM) and Dual Decomposition
[19], since these methods usually require the problem to be convex. While there exist distributed algorithms for nonconvex combinatorial optimization, e.g. Distributed Stochastic Algorithm [20], most of them provide only convergence to local minimum. In the following, we leverage Gibbs Sampling (GS) to solve P2S. The key advantage of GS is that it is able to converge to the global optimum with cooling schedule [21]. The GS is carried out in an iterative manner: in each iteration, an FN is selected (randomly or according to a predetermined order) to sample a new hosting decision from the conditional distribution with the remaining FNs fixing their decisions. The theory of Markov chain Monte Carlo guarantees that the probability of choosing a service hosting profile
is proportional to , which is known as the Gibbs distribution. Moreover, running GS while reducing the temperature parameter can obtain the optimal that minimizes the objective [21]. However, conventional GS is performed in sequential (i.e., one FN updates decision at a time) and the sequential GS has two main drawbacks: (1) it takes too long to complete one round of decision updating for large networks, (2) it works with an additional assumption that the global communication for information exchange is available, which may not hold in distributed Fog systems. To address these problems, we propose Chromatic Parallel Gibbs Sampling (CPGS) to enable the distributed decision making. It is worth noting that the convergence results of GS are typically available for sequential Gibbs samplers and extreme parallelism often cannot guarantee the ergodicity and convergence to Gibbs distribution [22]. The proposed CPGS carefully transforms sequential GS into an equivalent parallel sampling by exploiting the special structure of considered Fog network using Markov Random Field and Graph Coloring, such that the ergodicity and convergence are preserved.IvB Chromatic Parallel Gibbs Sampling
To reach the Gibbs distribution, the sequential GS samples hosting decision at FN for iteration according to a posterior conditional distribution calculated based on the hosting profile in iteration as follows:
where refers to the hosting decisions of FNs excluding the FN . Intuitively, if the decision update at FN does not affect update of FN and vice versa, (i.e., and are independent), then FNs and can update their decisions independently and simultaneously. To formalize this, we resort to the Markov Random Field (MRF). An MRF is an undirected graph over FNs’ hosting decisions. On this graph, the set of decisions adjacent to , denoted by , is called the Markov Blanket of . Given , the decision update of FN is conditionally independent of FNs outside , namely . Therefore, any two FNs that are not in the Markov Blanket of each other can evolve their decisions simultaneously. The next proposition establishes a connection between the physical fog network and the MRF.
Proposition 1.
For FN , the Markov Blanket of consists of the service hosting decisions of FNs in the twohop neighborhood of FN on the physical fog network graph
; and the probability distribution for updating
is:(9) 
where , and is the onehop neighborhood of FN , including FN itself.
Proof.
See online Appendix B [17]. ∎
Proposition 1 implies that when FN changes its decision , the change in is the same no matter how a FN outside of also changes its configuration at the same time. This property is the key to enabling the parallelism in GS. Moreover, we know from (9) that a decision is selected with a higher probability if it leads to a lower cost of FNs in .
Given this, we divide all FNs into groups such that no two FNs within a group are in each other’s Markov Blanket and hence, FNs within the same group can update their configuration in parallel. Intuitively, we would like to minimize in order to achieve the maximal level of parallelization. Finding the minimum value of is equivalent to a graph coloring problem on the MRF. Suppose a MRF is colored with coloring, each FN will be assigned one of colors and FNs in its Markov Blanket will have a different color. The colored MRF ensures that all FNs with the same color are conditionally independent of each other in configuration update. Let denote the FNs in color .
IvC Distributed Algorithm based on CPGS
Now, we present the distributed algorithm to solve P2 based on CPGS (Algorithm 2). The algorithm works in an iterative manner as illustrated in Fig. 2. In each iteration , a colorset is chosen according to a prescribed order or randomly. Each FN goes through two steps: decision update and information exchange. To update the hosting decision, FN needs two pieces of information: (1) the service demand of its onehop neighbors, which is exchanged at the beginning of each time slot ; (2) the current hosting decision of FNs within its Markov Blanket , which are exchanged in the previous iteration. With this information, FN computes locally by fixing the decisions of FNs in . Then, FN samples a new according to the probability distribution in (9). After the hosting decisions are updated, the chosen FNs send new decisions to the FNs in , which prepares for the next iteration. Note that during the iterations, FNs do not need to actually change configuration, which is only needed after the completion of CPGS.
IvD Performance Analysis of CPGS
Next, we prove the convergence and optimality of CPGS.
Proposition 2 (Convergence and Optimality).
CPGS converges from any initial distribution to Gibbs distribution
(10) 
and as , CPGS converges to the global optimum with probability 1.
Proof.
See online Appendix C [17]. ∎
Following the classic result of parallel computing in [23], we can analyze the time complexity of CPGS:
Proposition 3 (Time complexity).
Given a coloring of the MRF, CPGS generates a new configuration profile for the FN network in a runtime complexity of .
Proof.
See online Appendix D [17]. ∎
Proposition 3 indicates that CPGS advances the sampling chain for an coloring FN network in runtime rather than . Typically, is much smaller than , thereby accelerating the convergence speed of AFC.
V Simulation
In this section, we evaluate the performance of AFC with Matlab simulations. We consider a 500m500m industrial plant served by 16 batterypowered FNs deployed with mesh layout. The service hosting capacity of FNs is set as . Each FN has a longterm energy constraint Wh and unit energy consumption for each FN is set as Wh. The serving radius of FNs is set as 120m which creates cocoverage in our problem. The SNs are randomly scattered in the network using a homogeneously Poisson Process with density , which generates a total of 52 SNs. Each SN is randomly assigned with one service type from a total of services. For each type service, the input data size and required CPU cycles for one task are randomly drawn from MB and M, respectively. The decision cycle for fog configuration (length of time slot) is set as 1 min. At the beginning of each time slot, FNs decide configurations with proposed algorithm and broadcast to SNs. Each SN observe the service availability at reachable FNs and determine SNFN based on the fog configurations and wireless channel condition. The wireless channel condition is determined by the free space pathloss
and the shadowing which is randomly drawn from a normal distribution
. Till the end of the time slot, SNs can send tasks to the associated FN. The task generation of SNs follows a Poisson process with arrival rate . The FNs process the received tasks locally or offload to the cloud server via backbone Internet depending on the service availability and task admission decisions. The CPU frequency is 2 GHz at FNs and 4 GHz at cloud. The backbone transmission rate is Mb/s and the roundtrip time is ms. The proposed algorithm is compared with three benchmarks:
Delayoptimal Fog configuration (Doptimal): FNs are configured to minimize the service delay regardless of the longterm energy constraints. It is a combinatorial optimization and can be solved by Gibbs sampling with simulated annealing [21].

Noncooperative Fog configuration (NCOP): Each FN works independently to serves a dedicated set of SNs. Therefore, FNs simply host the services with the largest expected demand. The longterm energy consumption constraints are enforced by Lyapunov optimization [12].

Singleslot constraint (SSC): Instead of following the longterm constraints, FNs impose an energy constraint in each time slot such that the longterm energy constraints are satisfied. SSC is fully decoupled temporally and therefore can be solved by Constraint Gibbs sampling [24].
Va Runtime Performance Comparison
Fig.3(a) and Fig. 3(b) depict the timeaverage service delay and timeaverage energy deficit, respectively. It can be seen that AFC achieves the closetooptimal service delay while closely following the longterm energy constraints. Specifically, Doptimal achieves the lowest service delay since it is designed to minimize the service delay by fully exploiting the computation resource at FNs regardless of the energy constraints. As a result, Doptimal incurs a large amount of energy deficit as shown in Fig. 3(b). The main purpose of AFC is to follow the longterm energy constraint of each FN while minimizing the service delay. As can be observed in Fig. 3(b), the timeaverage energy deficits of AFC and NCOP converge to zero, which means that the longterm energy constraints are satisfied. Moreover, AFC achieves a closetooptimal delay performance. By contrast, NCOP incurs a large service delay with cooperation removed. The SSC scheme poses an energy constraint in each time slot thereby satisfying the longterm energy constraints. However, SSC makes the energy scheduling less flexible across time slots and hence cannot handle well the temporal variation of service demand and results in a large service delay.
VB Service Demand Allocation
Next, we proceed to see how AFC works to benefit the Fog system. The key idea of AFC is to accommodate more service demand at FNs thereby avoiding the large service delay incurred by Cloud offloading. Fig. 4 depicts the allocation of service demand in the first 20 time slots when running AFC and NCOP. We see clearly that AFC allows more service demand to be processed within the Fog system by enabling the collaborative service hosting. By contrast, NCOP relies more on the cloud server to process SNs’ service demand.
VC Impact of control parameter
Figure 5 shows the impact of control parameter on the performance of AFC. The result presents a tradeoff between the longterm system delay cost and the longterm energy deficit, which is consistent with our theoretical analysis in Theorem 1. With a larger , AFC emphasizes more on the system delay cost and is less concerned with the energy deficit. As grows to the infinity, AFC is able to achieve the optimal delay cost.
VD Service Hosting Capacity
Fig. 6 depicts the longterm service delay achieved by AFC and NCOP with different service hosting capacities. It can be observed that the service delay decreases with the increase in the service hosting capacity for both AFC and NCOP. This is due to the fact that more service demand can be satisfied by the FNs without sending to the remote cloud. Moreover, by comparing AFC and NCOP, we see that the delay reduction achieved by AFC decreases as the service hosting capacity grows. This is because most services can be satisfied by an individual FN with large service hosting capacity, and therefore the role of collaboration is diminished.
VE Convergence of CPGS
Fig. 7 compares the convergence processes of CPGS and sequential GS for one time slot. It can be observed that CPGS converges much faster than the sequential GS. In this particular example, CPGS converges in 10 iterations, by contrast, the sequential GS takes 20 iterations to converge. Moreover, we see that CPGS and the sequential GS converges to the same optimal value, which means that CPGS preserves the features of ergodicity and optimality of sequential GS.
Vi Conclusion
In this paper, we studied adaptive Fog configuration under energy constraints for IIoT systems. We proposed AFC, an online distributed algorithm for Fog configuration adaptive to both temporal and spatial service demand patterns. The proposed algorithm is easy to implement and provides provable performance guarantee. This work makes a valuable step towards optimizing the performance of Fog systems by considering service availability at fog nodes. However, future efforts are required to put the proposed framework into realworld application. For example, real demand traces of industrial things are preferred for the algorithm evaluation; practical issues such as communication failure should be considered during the distributed optimization; real IIoT platform can be constructed to run and verify the efficacy of the proposed framework.
References
 [1] T. Samad, “Control systems and the internet of things [technical activities],” IEEE Control Systems, vol. 36, no. 1, pp. 13–16, 2016.
 [2] O. Givehchi and J. Jasperneite, “Industrial automation services as part of the cloud: First experiences,” Proc. of the Jahreskolloquium Kommunikation in der Automation–KommA, Magdeburg, 2013.
 [3] K. Suto, H. Nishiyama, N. Kato, and C.W. Huang, “An energyefficient and delayaware wireless computing system for industrial wireless sensor networks,” IEEE Access, vol. 3, pp. 1026–1035, 2015.
 [4] D. Loshin, “Intelligent industrial fog computing: An overview of the winext architecture for industrial internet of things,” WINEXT: The industrial edgeware, Tech. Rep., Apr. 2017.
 [5] I. Azimi, A. Anzanpour, A. M. Rahmani, T. Pahikkala, M. Levorato, P. Liljeberg, and N. Dutt, “Hich: Hierarchical fogassisted computing architecture for healthcare iot,” ACM Trans. on Embedded Computing Systems (TECS), vol. 16, no. 5s, p. 174, 2017.
 [6] Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, “Mobile edge computing: Survey and research outlook,” arXiv preprint arXiv:1701.01090, 2017.
 [7] J. Xu, L. Chen, and S. Ren, “Online learning for offloading and autoscaling in energy harvesting mobile edge computing,” IEEE Trans. on Cognitive Commun. and Networking, vol. PP, no. P, pp. 1–15, 2017.
 [8] L. Yang, J. Cao, G. Liang, and X. Han, “Cost aware service placement and load dispatching in mobile cloud systems,” IEEE Trans. on Computers, vol. 65, no. 5, pp. 1440–1452, 2016.

[9]
S. Conti, G. Faraci, R. Nicolosi, S. A. Rizzo, and G. Schembra, “Battery management in a green fogcomputing node: a reinforcementlearning approach,”
IEEE Access, vol. 5, pp. 21 126–21 138, 2017.  [10] J. Tordsson, R. S. Montero, R. MorenoVozmediano, and I. M. Llorente, “Cloud brokering mechanisms for optimized placement of virtual machines across multiple providers,” Future Generation Computer Systems, vol. 28, no. 2, pp. 358–367, 2012.
 [11] K. Shanmugam, N. Golrezaei, A. G. Dimakis, A. F. Molisch, and G. Caire, “Femtocaching: Wireless content delivery through distributed caching helpers,” IEEE Trans. on Inform. Theory, vol. 59, no. 12, pp. 8402–8413, 2013.
 [12] M. J. Neely, “Stochastic network optimization with application to communication and queueing systems,” Synthesis Lectures on Commun. Networks, vol. 3, no. 1, pp. 1–211, 2010.
 [13] M. S. Yoon, A. E. Kamal, and Z. Zhu, “Requests prediction in cloud with a cyclic window learning algorithm,” in Globecom Workshops (GC Wkshps), 2016 IEEE. IEEE, 2016, pp. 1–6.
 [14] S. T. Maguluri, R. Srikant, and L. Ying, “Stochastic models of load balancing and scheduling in cloud computing clusters,” in INFOCOM, 2012 Proceedings IEEE. IEEE, 2012, pp. 702–710.
 [15] X. Lyu, H. Tian, C. Sengul, and P. Zhang, “Multiuser joint task offloading and resource optimization in proximate clouds,” IEEE Transactions on Vehicular Technology, vol. 66, no. 4, pp. 3435–3447, 2017.
 [16] W. Hu, Y. Gao, K. Ha, J. Wang, B. Amos, Z. Chen, P. Pillai, and M. Satyanarayanan, “Quantifying the impact of edge computing on mobile applications,” in Proceedings of the 7th ACM SIGOPS AsiaPacific Workshop on Systems. ACM, 2016, p. 5.
 [17] L. Chen, P. Zhou, L. Gao, and J. Xu. Online appendix: adaptive fog configuration for the industrial internet of things. [Online]. Available: https://www.dropbox.com/s/ow87x6crd43wckk/Online_Appendix_AFC_R.pdf?dl=0
 [18] S. Geman and D. Geman, “Stochastic relaxation, gibbs distributions, and the bayesian restoration of images,” IEEE Trans. on pattern analysis and machine intelligence, no. 6, pp. 721–741, 1984.

[19]
S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein et al.,
“Distributed optimization and statistical learning via the alternating
direction method of multipliers,”
Foundations and Trends® in Machine learning
, vol. 3, no. 1, pp. 1–122, 2011. 
[20]
W. Zhang, G. Wang, and L. Wittenburg, “Distributed stochastic search for constraint satisfaction and optimization: Parallelism, phase transitions and performance,” in
Proceedings of AAAI Workshop on Probabilistic Approaches in Search, 2002.  [21] D. Welsh, “Simulated annealing: Theory and applications,” 1989.
 [22] D. Newman, P. Smyth, M. Welling, and A. U. Asuncion, “Distributed inference for latent dirichlet allocation,” in Advances in neural information processing systems, 2008, pp. 1081–1088.
 [23] D. P. Bertsekas and J. N. Tsitsiklis, Parallel and distributed computation: numerical methods. Prentice hall Englewood Cliffs, NJ, 1989, vol. 23.

[24]
S. Ermon, C. Gomes, and B. Selman, “Uniform solution sampling using a
constraint solver as an oracle,” in
Conf. on Uncertainty in Artificial Intelligence
. AUAI Press, 2012, pp. 255–264.