I Introduction
Over the last few years, the demand for cloud computing has grown rapidly, and more and more data and computation services are migrated to geodistributed data centers. Although this has brought significant benefits, the problem of power consumption is increasingly serious with the growth of data centers. According to [1], the total number of servers in the geodistributed data centers of Google, Microsoft and Akamai were almost 1 million, 200,000 and 70,000, respectively, and their corresponding power costs were on the order of millions of dollars per year. In 2013 the electricity consumed by data centers across the US was up to 91 million MW.h, and will account for 20% of the national annual power consumption by 2030 according to some predictions[2]
. Furthermore, it is estimated that just a 1MW data center powered by thermal power can cause over 10,000 metric tons of
emissions annually [3]. Therefore, it is important to improve the usage of clean energy as well as to reduce the power cost of data centers.The recent development of the energy Internet offers further opportunities for ecofriendly power cost optimization of data centers. The concept of energy Internet was first proposed by Jeremy Rifkin in 2008 [4]. Since 2010, the energy Internet has gradually gained worldwide interests, and a series of national projects have been launched, such as the Future Renewable Electric Energy Delivery and Management system (FREEDM) in the US [5], the EEnergy program in Germany [6] and the global energy Internet project in China[7]. The effects of the energy Internet on the power use of geodistributed data centers can be summarized in the following four aspects:

The implementation of smart electricity price, varying in different times and areas, can encourage the economical workload scheduling among geodistributed data centers to reduce the total power cost.

The development of storage technology allows power to be generated and consumed at different times, which is important for the supplydemand balancing and peak load shifting.

The development of renewable energy can improve the use of clean energy, and reduce carbon emissions.

The development of electricitysale companies can break the monopoly of the traditional electricity market, and can make consumers buy power from multiple sources at lower price.
In this paper, we consider the scenario of multisource power supply in the energy Internet, and aim at the ecofriendly power cost minimization for geodistributed data centers by the combination of green and economical strategies of buying power form different sources, geographical scheduling of workload, and temporal scheduling of batteries’ charging and discharging, or called power storage.
Ia Related Work
In the literature, many studies have been done about powercostdriven workload scheduling in geodistributed data centers. For instance, [8] is a survey about the energyaware geographical workload balancing, and [9] and [10] proposed some energyaware scheduling methods towards MapReduce jobs in data centers. [11] [12] [13] and [14]
proposed a series of linear programming based methods, while
[15] and [16] proposed a stochastic process based method to minimize the power cost, where more workloads were distributed to the data centers with lower local electricity price and more delaytolerant requests were scheduled to the time slots with cheaper electricity. Besides the workload scheduling, the temporal scheduling of power storage was taken into consideration in [17], and the cooperation between the power use of data centers and the power storage of electric vehicles was studied in [18] [19] and [20]. These strategies can greatly cut down the electricity bills of data centers, but do not consider the use of renewable energy or the scenario of multisource power.Other works have researched the use of renewable energy and multisource power for data centers. The opportunities and challenges of harnessing renewable energy for data centers were well surveyed in [21]. Besides, [22] and [23] proposed a stochastic process based method for a single data center to buy power from multiple sources, while [24]
formulated a machine learning based method and designed a multisource power distribution architecture. These works assumed that data centers could buy power from the power grid, but could also obtain power from their own storage devices and renewable energy generators. However, these kinds of researches rarely consider the cooperation of multiple data centers.
Several works have researched the power cost minimization for geodistributed data centers by jointly considering renewable energy, multisource power and workload scheduling. [25] proposed a bid mechanism to reduce the carbon footprint. [26] and [27] proposed some strategies of geographical workload balancing, where more workloads were dispatched to the areas with lower carbon power or more renewable power. These strategies mainly concentrated on the workload scheduling, while not considering power buying or harvesting. [28] and [29] proposed a carbonaware power cost minimization strategy, where it was assumed that data centers could harvest power from their private green microgrid and more workload would be distributed to data centers equipped with larger microgrids to reduce power cost and carbon emission. However, the cost of building a private microgrid is very high and many operators usually cannot afford it. Different from this, [30] and [31] assumed that data centers could directly buy clean power generated by different kinds of renewable energy from the market, but they did not consider the differences of pollution among various renewable energy and forced operators to buy a fixed minimum percentage of clean power, which is not very flexible. In this paper, we aim to propose a green and economical strategy to buy power from multiple sources dynamically in terms of their pollution and realtime price. The main contributions of this paper are introduced below.
Besides, Liang Yu et al. proposed a carbonaware load balancing strategy for geodistributed data centers in [28] and then further consider the uncertain power outage in [29]. They assumed that cloud service operators would be interested to build private microgrid with the objective of protecting data centers from power outages caused by bulk power grid and cutting down their power bills. In their proposed strategy, more workload would be balanced to data centers equipped with larger microgrid to reduce power cost and carbon emission. But, in fact, the cost of building private microgrid is extremely immense and many operators usually cannot afford it. Different from this,
IB Our Contributions
The main objective of this paper is to make ecofriendly power cost minimization for geodistributed data centers by combining clean power buying, temporal powerstorage scheduling and geographical workload scheduling, which is denoted as Problem P&W. In particular, we model the total power cost as the weighted sum of the pollution cost and the monetary cost, and innovatively propose the new Pollution Index Function (PIF) to model the pollution costs of different kinds of power, which can efficiently increase the use of cleaner power. We firstly formulate Problem P&W as a multiobjective programming problem with integer constraints, and then simplify it into a singleobjective problem under the assumption that the transmission delay will not exceed an upper bound. Secondly, we propose an SCP algorithm to find the globally optimal noninteger solution of the simplified version of Problem P&W, and propose a lowcomplexity searching method to seek for the quasioptimal mixedinteger solution of the simplified Problem P&W. Finally, we give the condition when Problem P&W is equal to its simplified version, and Problem P&W is solved eventually.
Our main contributions can be summarized in three aspects as below.

We formulate the ecofriendly power cost minimization for geodistributed data centers as Problem P&W, which can improve the clean energy usage up to 50%–60% and achieve power cost savings up to 10%–30%, as well as reduce the delay of requests.

We propose the new PIF to model the pollution costs of different kinds of power, by which our proposed strategies can encourage the power buying from multiple sources and improve the usage of clean energy.

We propose an efficient SCP algorithm. Mathematical proofs show that it can obtain the globally optimal noninteger solution of the simplified Problem P&W, although it is nonconvex.
The rest of this paper is organized as follows. Firstly, we introduce the system architecture and formulate the model of this power cost minimization problem in Section II. Secondly, we propose the related algorithm to solve this problem in Section III. Thirdly, we give simulation results about the performance of our model in Section IV. Finally, we conclude the paper in Section V.
Ii Modeling and Formulation
In this section, we firstly describe the system model, where we consider a geodistributed data center system with multisource power supply. Then, we model the costs of ecofriendly power buying, storage scheduling and workload scheduling. The ecofriendly power cost minimization problem is finally formulated as a multiobjective programming problem.
Iia Overview of the System Model
A cloud computing system usually contains several portal servers and a group of geodistributed backend data centers. Each portal server aggregates user requests originating from its service area and then dispatches the corresponding service tasks to backend data centers, which possess massive computing, memory and data resources. Our system model is given as follows.
As shown in Fig.1, there are portal servers and geographically separate data centers in the cloud computing system, where and . We regard this cloud computing system as a discrete time system and discuss the optimization problem in one time slot. We assume that the workload arrival rate to the th portal server is per second, and is the total request rate of the whole system, where . Let define the request rate from the th portal server to the th data center, so that and , where is the total request rate to the th data center.
In addition, we define and as the number of servers and active servers in the th data center, respectively. According to [32], the power consumption of the th data center in the present time slot can be approximately calculated as
(1) 
where is the length of the present time slot measured in hours, is a parameter related to the CPU frequency, and represents the basic power consumption of idle servers, e.g., due to the airconditioning system, communication devices, etc. We note that must be an integer, and means that at least one server is active in each data center. When , all servers in the th data center are active and the maximum power consumption of will be reached.
Furthermore, we consider that each data center is equipped with a smart energy controller to realize the joint control of buying power, charging or discharging batteries, and supplying power for loads. In a deregulated electricity market, data centers can buy power from more than one source according to the price, pollution, or other factors. We assume that the th data center can buy kinds of power and denote as the amount of th power bought by the th data center, where . Besides, we let denote the power amount actually used when we charge batteries, let denote the power amount we actually get when batteries discharge, and let when no power is charged or discharged. Then the total power that the th data center needs to buy can be calculated as
(2) 
Here, we assume that the amount of power discharged by storage devices is always less than that used by servers in the th data center, so that .
IiB Power Cost model with Pollution Index Function
In this subsection, we will model the power cost of data centers when we buy power from multiple sources in one time slot, and define the total power cost as the weighted sum of the pollution cost and the monetary cost.
Firstly, the Pollution Index Function (PIF) is proposed to model the pollution cost^{1}^{1}1The Pollution cost can be regarded as a kind of virtual cost or real cost according to our demand. In this paper, it needs to be paid as a real cost. of different power sources. We denote as the PIF with respect to . In order to encourage ecofriendly power consumption, the following three assumptions are required to be met for PIF.

is larger for more polluting power sources, for a given .

is a strictly increasing function of , and .

is a strictly convex function of .
According to the first assumption, cleaner power always leads to less pollution cost, which will encourage the use of cleaner energy. According to the remaining two assumptions, the total cost, the unit cost and the marginal cost [33] of electricity will all increase with the growth of , which will penalize the waste of power and encourage users to save power. A typical example of PIF that meets the above three assumptions is the quadratic function
(3) 
where is a coefficient. However, we should emphasize that the characteristic that the marginal cost will increase with the growth of may be unfair to the users who always need a huge amount of power, although it can contribute to the power savings. For example, a 1MW data center needs much more power than a 1kW data center in general, and needs to pay a higher unit cost, which further increases its huge power bills. Thus, we define where is the pollution factor of power supplied by the th supplier located in the area of , and is the maximum amount of power that can be used by in one time slot. The introduction of divisor makes a 1MW data center using 1 MW.h and a 1kW data center using 1 kW.h be penalized by the same degree. And we denote as the power factor.
Secondly, let denote the electricity price of the th power source for the th data center, so that the corresponding monetary cost is .
Finally, we define the total power cost for the th data center as the weighted sum of the pollution cost and the monetary cost as follows:
(4)  
where . In addition, we denote and as the marginal cost and unit cost of power, which are given by
(5)  
IiC Model of PowerStorage Scheduling
In this subsection, we consider both the energy conversion efficiency and the potential cost to model the cost of powerstorage scheduling. We denote as the capacity of batteries in the th data center, as the initial amount of electricity stored in the batteries, as the power actually charged in the batteries or the power^{2}^{2}2We are not interested in the actual type of energy stored in the batteries and are only interested in how much power it can convert to, so that , and are all denoted in terms of the amount of power. actually used when the batteries discharge, and . When , batteries charge. Otherwise, batteries discharge.
Firstly, we consider the energy conversion efficiency of battery charging and discharging [34], denoted as or for short, where . The relationship between and is given by , where is defined as
(6) 
That is, if we want to charge batteries with kW.h, we need to actually supply kW.h. On the contrary, when batteries discharge kW.h, we can actually only obtain kW.h. Fig. (a)a shows a real example of [35]. Given that is not monotonic, we redefine as in (7) for simplicity.
(7) 
Then we have
(8) 
The graph of is shown in Fig.(b)b, which is monotonic. It is observed that may be well fitted with a power function or an exponential function, which will be further analyzed in our simulations.
Secondly, another issue that we should be concerned about is that present decisions on charging or discharging will influence the future cost[34], which is denoted as potential cost. For instance, the more the power that the battery discharges at present, the more the power it will have to charge in the future, which will increase the future power cost. Accordingly, the potential cost can be defined as , where is an evaluated parameter that can be commonly calculated as , where represents the indic of a future hour, represents the number of future hours, represents the predicted unit cost of power storage at the th hour and is a weight coefficient with the constraint of . In addition, according to [36], the influence of the current on the future will decline with the increase of . In turn, the influence of on the current will also decline with the increase of , i.e., , for . Furthermore, it was shown in [36] that after the point further changes are negligible. Thus we can reasonably make , where .
Finally, the power cost model considering storage scheduling for the th data center can be extended from (4) to
(9)  
The first condition in (9) can be decoupled into and , where and are denoted as the lower bound and upper bound of , respectively. Due to the fact that an excessively high speed of charge and discharge will cause severe damage to storage devices as well as exorbitant wastes of energy [37], we assume that and . In addition, we have , because batteries can discharge no more than the stored power and can charge no more than the remaining capacity.
IiD Joint Optimization of Power Cost and Workload Scheduling
In this subsection, we will describe the workload scheduling model considering delay constraints, and then formulate the ecofriendly power cost minimization model with workload scheduling. We assume that the arrival rate of requests to the
th data center approximately obeys Poisson distribution. According to the M/M/n queuing model
[11], the average delay of tasks in the th data center is given by(10) 
where and are the average service rate per server and the average arrival rate of requests to the th data center, respectively.
A lower queuing delay usually means that more servers are active, which will cause a higher electricity bill. Thus in order to study the tradeoff between the power cost and the request delay, the cost function of the th data center can be defined as
(11)  
where and are weight parameters.
In addition, in order to meet the ServiceLevel Agreement (SLA) of the users, the transmission delay between the th portal server and the th data center should also be considered. Since the transmission delay usually depends on the routing of the request, , we define as
(12) 
Here, is an implicit function with respect to , and . For simplification, we denote , or for short, as the maximum of according to [12], where . Then we have
(13) 
where is the maximum delay that users can tolerate.
Based on (1), (8)–(11) and (13), the integral cost function of all backend data centers is given by
(14a)  
(14b)  
(14c)  
(14d)  
(14e)  
(14f)  
(14g)  
(14h) 
We aim to minimize , in order to optimize the ecofriendly use of power and delaysatisfied dispatch of workload.
In addition, the term in (14c) should be further minimized as is done in (15). In fact, the definition domain of defined by (14b)–(14h) will be expanded with the decline of , which is likely to make the minimum of smaller.
(15a)  
(15b)  
(15c) 
Finally, the whole ecofriendly power cost minimization model with workload scheduling for geodistributed data centers can be defined as
Iii Solution and Analysis
As Problem P&W is a multiobjective programming problem with integer constraints, it is difficult to solve. But if we assume that in (14c) is a known constant, then Problem P&W can be simplified into the easier Problem P. Thus in this section, we first regard as a known constant to simplify Problem P&W as Problem P. We propose a Sequential Convex Programming (SCP) algorithm, shown in Algorithm 2, to find the noninteger solutions of Problem P. Then we propose a lowcomplexity searching method, shown in Algorithm 3, to seek for the quasioptimal mixedinteger solutions of Problem P. This approach can obtain a quasioptimal solution, which is very close to the optimal solution obtained by the Branch and Bound (B&B) method, while converging much faster. Finally, we reconsider as a variable and give the equivalence condition of Problem P and Problem P&W.
Iiia Noninteger Solution of simplified Problem PW
In this subsection, we just regard in (14c) as a known constant and propose an SCP algorithm to find the noninteger solution of Problem P, the simplified version of Problem P&W. We denote the continuityrelaxed version of Problem P as Problem P. It is observed that Problem P is nonconvex because of the nonlinear equality constraint (14d) [38], in which is a nonlinear function of rather than a constant. As is seen, (14d) is mainly related to the second part of , which is defined in (4). Therefore, we will first solve (4) and then plug the solution of into (14) to eliminate variables . By this, Problem P can be transformed to sequential convex problems and its globally optimal solution can be obtained by our proposed SCP algorithm.
First, we ignore the constraint and establish the Lagrange Dual Function of (4) for the th data center as
(17) 
where is the Lagrange factor. According to the Lagrange Multiplier method [38], we set and for . Then we can obtain the Lagrange Dual Solution of (4) without as
(18)  
The corresponding optimal function value of in (4) can be given by
(19) 
where
(20)  
Here, is the marginal cost of power for the th data center when for , which is independent of . Note that , , and since , and .
Secondly, we reconsider the inequality constraint . It is proved in Appendix A that for the whose optimal Lagrange solution obtained by Eq. (18) is less than zero, its optimal solution with the constraint in problem (4) is zero. Based on this, we propose Algorithm 1 to solve the optimal in (4) for the th data center. We denote as a subset of , where for , and . In Algorithm 1, we make at first, and then repeat the following two steps until all are no less than zero: moving the indic of from the subset to , and recalculating the remaining according to (18) and (20). Finally, we make and . It is proved in Appendix A that Algorithm 1 can obtain the optimal solution of Problem (4). In addition, Fig. 3 shows that the optimal solved by Algorithm 1 and by the Interior Point method are almost identical. According to [36], Algorithm 1 converges faster than the Interior Point method when solving .
Thirdly, we plug the solution of , shown in (19), into (14) to eliminate the nonlinear constraint. Then we have
(21a)  
(21b)  
(21c)  
(21d)  
(21e)  
(21f) 
where . We denote problem (21) as Problem P1 and denote the continuityrelaxed version of problem (21) as Problem P1. We find that the feasible region of Problem P1 turns into a convex set after eliminating the nonlinear equality constraint. According to [38], only if the objective function (21a) is also convex can Problem P1 be proved to be a convex problem. In Appendix B, we prove that (21a) is convex when the fitted function of meets the condition shown in Proposition 4, which is easily to be met according to simulations. Therefore, we just regard (21a) as a convex function in this paper, so that Problem P1 is a convex programming problem.
Based on all of the above, we propose the SCP algorithm to solve Problem P. The details of the SCP algorithm are shown in Algorithm 2, which is proved to obtain the optimal noninteger solution of Problem P in Appendix A. In the SCP algorithm, every time we solve a new version of , we plug into problem (21) and obtain a new Problem P1, which can be solved by standard convex programming methods, such as Sequential Quadratic Programming (SQP). Algorithm 1 and Algorithm 2 adopt similar architectures and the main differences between them include Lines 3–4 in Algorithm 2 to obtain and solve a new version of Problem P1, and Lines 10–12 in Algorithm 2 analyzed in Appendix A. We conclude that our proposed SCP algorithm can obtain the global optimal solution of Problem P by solving several instances of convex Problem P1.
IiiB Mixedinteger Solution of Problem PW
In this subsection, we consider the integer constraints (21f) and propose Algorithm 3 to find a quasioptimal mixedinteger solution of Problem P at first.
Considering that B&B method is the most common integer programming algorithm and is the foundation of others [39] [40] [41], we firstly combine B&B with our proposed SCP algorithm, denoted as BBSCP, to seek for the optimal mixedinteger solution of Problem P. In BBSCP, we need to solve Problem P by the SCP algorithm in each branch (iteration). Simulation results show that the times of branching for the B&B algorithm will increase rapidly with the growth of , the number of integerlimited variables . For example, when reaches about 8, the number of branches will grow to more than 5000. Although many studies tried to improve the convergence speed of B&B algorithm [39] [42], it is still a severe problem. Therefore, we propose Algorithm 3 to replace BBSCP and seek for a quasioptimal mixedinteger solution of Problem P .
Algorithm 3 mainly refers to the rounding strategy towards integerlimited . In detail, we first solve Problem P by our proposed SCP algorithm and make each equal to the closest integer. Then we adjust the obtained integervalues of and recalculate the remaining variables by the SCP algorithm again. Table I shows the quasioptimal solutions obtained by Algorithm 3 and the gaps between the quasioptimal solutions and the optimal solutions obtained by the BBSCP method, where the gaps of can be seen to be very small. According to our simulation, we can conclude that Algorithm 3 can not only obtain a quasioptimal mixedinteger solution of Problem P with a tiny loss, but also needs to call the SCP algorithm only twice instead of thousands of times as required by the BBSCP algorithm. In addition, we also find from Table I that the gaps of total power cost are negligible, and the average remains small despite the nonignorable gaps.