DeepAI
Log In Sign Up

Eco-friendly Power Cost Minimization for Geo-distributed Data Centers Considering Workload Scheduling

The rapid development of renewable energy in the energy Internet is expected to alleviate the increasingly severe power problem in data centers, such as the huge power costs and pollution. This paper focuses on the eco-friendly power cost minimization for geo-distributed data centers supplied by multi-source power, where the geographical scheduling of workload and temporal scheduling of batteries' charging and discharging are both considered. Especially, we innovatively propose the Pollution Index Function to model the pollution of different kinds of power, which can encourage the use of cleaner power and improve power savings. We first formulate the eco-friendly power cost minimization problem as a multi-objective and mixed-integer programming problem, and then simplify it as a single-objective problem with integer constraints. Secondly, we propose a Sequential Convex Programming (SCP) algorithm to find the globally optimal non-integer solution of the simplified problem, which is non-convex, and then propose a low-complexity searching method to seek for the quasi-optimal mixed-integer solution of it. Finally, simulation results reveal that our method can improve the clean energy usage up to 50%--60% and achieve power cost savings up to 10%--30%, as well as reduce the delay of requests.

READ FULL TEXT VIEW PDF

page 1

page 13

09/17/2019

Network-Aware Container Scheduling in Multi-Tenant Data Center

Network management on multi-tenant container-based data centers has crit...
05/31/2021

Energy and Network Aware Workload Management for Geographically Distributed Data Centers

Cloud service providers are distributing data centers geographically to ...
04/20/2020

Data-Driven Optimization for Police Beat Design in South Fulton, Georgia

We redesign the police patrol beat in South Fulton, Georgia, in collabor...
12/17/2018

Coflow Scheduling in Data Centers: Routing and Bandwidth Allocation

In distributed computing frameworks like MapReduce, Spark, and Dyrad, a ...
10/25/2021

Let's Wait Awhile: How Temporal Workload Shifting Can Reduce Carbon Emissions in the Cloud

Depending on energy sources and demand, the carbon intensity of the publ...
08/19/2021

A Nested Cross Decomposition Algorithm for Power System Capacity Expansion with Multiscale Uncertainties

Modern electric power systems have witnessed rapidly increasing penetrat...
11/26/2021

Evacuation Shelter Scheduling Problem

Evacuation shelters, which are urgently required during natural disaster...

I Introduction

Over the last few years, the demand for cloud computing has grown rapidly, and more and more data and computation services are migrated to geo-distributed data centers. Although this has brought significant benefits, the problem of power consumption is increasingly serious with the growth of data centers. According to [1], the total number of servers in the geo-distributed data centers of Google, Microsoft and Akamai were almost 1 million, 200,000 and 70,000, respectively, and their corresponding power costs were on the order of millions of dollars per year. In 2013 the electricity consumed by data centers across the US was up to 91 million MW.h, and will account for 20% of the national annual power consumption by 2030 according to some predictions[2]

. Furthermore, it is estimated that just a 1-MW data center powered by thermal power can cause over 10,000 metric tons of

emissions annually [3]. Therefore, it is important to improve the usage of clean energy as well as to reduce the power cost of data centers.

The recent development of the energy Internet offers further opportunities for eco-friendly power cost optimization of data centers. The concept of energy Internet was first proposed by Jeremy Rifkin in 2008 [4]. Since 2010, the energy Internet has gradually gained worldwide interests, and a series of national projects have been launched, such as the Future Renewable Electric Energy Delivery and Management system (FREEDM) in the US [5], the E-Energy program in Germany [6] and the global energy Internet project in China[7]. The effects of the energy Internet on the power use of geo-distributed data centers can be summarized in the following four aspects:

  1. The implementation of smart electricity price, varying in different times and areas, can encourage the economical workload scheduling among geo-distributed data centers to reduce the total power cost.

  2. The development of storage technology allows power to be generated and consumed at different times, which is important for the supply-demand balancing and peak load shifting.

  3. The development of renewable energy can improve the use of clean energy, and reduce carbon emissions.

  4. The development of electricity-sale companies can break the monopoly of the traditional electricity market, and can make consumers buy power from multiple sources at lower price.

In this paper, we consider the scenario of multi-source power supply in the energy Internet, and aim at the eco-friendly power cost minimization for geo-distributed data centers by the combination of green and economical strategies of buying power form different sources, geographical scheduling of workload, and temporal scheduling of batteries’ charging and discharging, or called power storage.

I-a Related Work

In the literature, many studies have been done about power-cost-driven workload scheduling in geo-distributed data centers. For instance, [8] is a survey about the energy-aware geographical workload balancing, and [9] and [10] proposed some energy-aware scheduling methods towards MapReduce jobs in data centers. [11] [12] [13] and [14]

proposed a series of linear programming based methods, while

[15] and [16] proposed a stochastic process based method to minimize the power cost, where more workloads were distributed to the data centers with lower local electricity price and more delay-tolerant requests were scheduled to the time slots with cheaper electricity. Besides the workload scheduling, the temporal scheduling of power storage was taken into consideration in [17], and the cooperation between the power use of data centers and the power storage of electric vehicles was studied in [18] [19] and [20]. These strategies can greatly cut down the electricity bills of data centers, but do not consider the use of renewable energy or the scenario of multi-source power.

Other works have researched the use of renewable energy and multi-source power for data centers. The opportunities and challenges of harnessing renewable energy for data centers were well surveyed in [21]. Besides, [22] and [23] proposed a stochastic process based method for a single data center to buy power from multiple sources, while [24]

formulated a machine learning based method and designed a multi-source power distribution architecture. These works assumed that data centers could buy power from the power grid, but could also obtain power from their own storage devices and renewable energy generators. However, these kinds of researches rarely consider the cooperation of multiple data centers.

Several works have researched the power cost minimization for geo-distributed data centers by jointly considering renewable energy, multi-source power and workload scheduling. [25] proposed a bid mechanism to reduce the carbon footprint. [26] and [27] proposed some strategies of geographical workload balancing, where more workloads were dispatched to the areas with lower carbon power or more renewable power. These strategies mainly concentrated on the workload scheduling, while not considering power buying or harvesting. [28] and [29] proposed a carbon-aware power cost minimization strategy, where it was assumed that data centers could harvest power from their private green micro-grid and more workload would be distributed to data centers equipped with larger micro-grids to reduce power cost and carbon emission. However, the cost of building a private micro-grid is very high and many operators usually cannot afford it. Different from this, [30] and [31] assumed that data centers could directly buy clean power generated by different kinds of renewable energy from the market, but they did not consider the differences of pollution among various renewable energy and forced operators to buy a fixed minimum percentage of clean power, which is not very flexible. In this paper, we aim to propose a green and economical strategy to buy power from multiple sources dynamically in terms of their pollution and real-time price. The main contributions of this paper are introduced below.

Besides, Liang Yu et al. proposed a carbon-aware load balancing strategy for geo-distributed data centers in [28] and then further consider the uncertain power outage in [29]. They assumed that cloud service operators would be interested to build private micro-grid with the objective of protecting data centers from power outages caused by bulk power grid and cutting down their power bills. In their proposed strategy, more workload would be balanced to data centers equipped with larger micro-grid to reduce power cost and carbon emission. But, in fact, the cost of building private micro-grid is extremely immense and many operators usually cannot afford it. Different from this,

I-B Our Contributions

The main objective of this paper is to make eco-friendly power cost minimization for geo-distributed data centers by combining clean power buying, temporal power-storage scheduling and geographical workload scheduling, which is denoted as Problem P&W. In particular, we model the total power cost as the weighted sum of the pollution cost and the monetary cost, and innovatively propose the new Pollution Index Function (PIF) to model the pollution costs of different kinds of power, which can efficiently increase the use of cleaner power. We firstly formulate Problem P&W as a multi-objective programming problem with integer constraints, and then simplify it into a single-objective problem under the assumption that the transmission delay will not exceed an upper bound. Secondly, we propose an SCP algorithm to find the globally optimal non-integer solution of the simplified version of Problem P&W, and propose a low-complexity searching method to seek for the quasi-optimal mixed-integer solution of the simplified Problem P&W. Finally, we give the condition when Problem P&W is equal to its simplified version, and Problem P&W is solved eventually.

Our main contributions can be summarized in three aspects as below.

  1. We formulate the eco-friendly power cost minimization for geo-distributed data centers as Problem P&W, which can improve the clean energy usage up to 50%–60% and achieve power cost savings up to 10%–30%, as well as reduce the delay of requests.

  2. We propose the new PIF to model the pollution costs of different kinds of power, by which our proposed strategies can encourage the power buying from multiple sources and improve the usage of clean energy.

  3. We propose an efficient SCP algorithm. Mathematical proofs show that it can obtain the globally optimal non-integer solution of the simplified Problem P&W, although it is non-convex.

The rest of this paper is organized as follows. Firstly, we introduce the system architecture and formulate the model of this power cost minimization problem in Section II. Secondly, we propose the related algorithm to solve this problem in Section III. Thirdly, we give simulation results about the performance of our model in Section IV. Finally, we conclude the paper in Section V.

Ii Modeling and Formulation

In this section, we firstly describe the system model, where we consider a geo-distributed data center system with multi-source power supply. Then, we model the costs of eco-friendly power buying, storage scheduling and workload scheduling. The eco-friendly power cost minimization problem is finally formulated as a multi-objective programming problem.

Ii-a Overview of the System Model

A cloud computing system usually contains several portal servers and a group of geo-distributed back-end data centers. Each portal server aggregates user requests originating from its service area and then dispatches the corresponding service tasks to back-end data centers, which possess massive computing, memory and data resources. Our system model is given as follows.

As shown in Fig.1, there are portal servers and geographically separate data centers in the cloud computing system, where and . We regard this cloud computing system as a discrete time system and discuss the optimization problem in one time slot. We assume that the workload arrival rate to the th portal server is per second, and is the total request rate of the whole system, where . Let define the request rate from the th portal server to the th data center, so that and , where is the total request rate to the th data center.

In addition, we define and as the number of servers and active servers in the th data center, respectively. According to [32], the power consumption of the th data center in the present time slot can be approximately calculated as

(1)

where is the length of the present time slot measured in hours, is a parameter related to the CPU frequency, and represents the basic power consumption of idle servers, e.g., due to the air-conditioning system, communication devices, etc. We note that must be an integer, and means that at least one server is active in each data center. When , all servers in the th data center are active and the maximum power consumption of will be reached.

Furthermore, we consider that each data center is equipped with a smart energy controller to realize the joint control of buying power, charging or discharging batteries, and supplying power for loads. In a deregulated electricity market, data centers can buy power from more than one source according to the price, pollution, or other factors. We assume that the th data center can buy kinds of power and denote as the amount of th power bought by the th data center, where . Besides, we let denote the power amount actually used when we charge batteries, let denote the power amount we actually get when batteries discharge, and let when no power is charged or discharged. Then the total power that the th data center needs to buy can be calculated as

(2)

Here, we assume that the amount of power discharged by storage devices is always less than that used by servers in the th data center, so that .

Fig. 1: System model of geo-distributed data centers

Ii-B Power Cost model with Pollution Index Function

In this subsection, we will model the power cost of data centers when we buy power from multiple sources in one time slot, and define the total power cost as the weighted sum of the pollution cost and the monetary cost.

Firstly, the Pollution Index Function (PIF) is proposed to model the pollution cost111The Pollution cost can be regarded as a kind of virtual cost or real cost according to our demand. In this paper, it needs to be paid as a real cost. of different power sources. We denote as the PIF with respect to . In order to encourage eco-friendly power consumption, the following three assumptions are required to be met for PIF.

  1. is larger for more polluting power sources, for a given .

  2. is a strictly increasing function of , and .

  3. is a strictly convex function of .

According to the first assumption, cleaner power always leads to less pollution cost, which will encourage the use of cleaner energy. According to the remaining two assumptions, the total cost, the unit cost and the marginal cost [33] of electricity will all increase with the growth of , which will penalize the waste of power and encourage users to save power. A typical example of PIF that meets the above three assumptions is the quadratic function

(3)

where is a coefficient. However, we should emphasize that the characteristic that the marginal cost will increase with the growth of may be unfair to the users who always need a huge amount of power, although it can contribute to the power savings. For example, a 1-MW data center needs much more power than a 1kW data center in general, and needs to pay a higher unit cost, which further increases its huge power bills. Thus, we define where is the pollution factor of power supplied by the th supplier located in the area of , and is the maximum amount of power that can be used by in one time slot. The introduction of divisor makes a 1-MW data center using 1 MW.h and a 1-kW data center using 1 kW.h be penalized by the same degree. And we denote as the power factor.

Secondly, let denote the electricity price of the th power source for the th data center, so that the corresponding monetary cost is .

Finally, we define the total power cost for the th data center as the weighted sum of the pollution cost and the monetary cost as follows:

(4)

where . In addition, we denote and as the marginal cost and unit cost of power, which are given by

(5)

Ii-C Model of Power-Storage Scheduling

In this subsection, we consider both the energy conversion efficiency and the potential cost to model the cost of power-storage scheduling. We denote as the capacity of batteries in the th data center, as the initial amount of electricity stored in the batteries, as the power actually charged in the batteries or the power222We are not interested in the actual type of energy stored in the batteries and are only interested in how much power it can convert to, so that , and are all denoted in terms of the amount of power. actually used when the batteries discharge, and . When , batteries charge. Otherwise, batteries discharge.

Firstly, we consider the energy conversion efficiency of battery charging and discharging [34], denoted as or for short, where . The relationship between and is given by , where is defined as

(6)

That is, if we want to charge batteries with kW.h, we need to actually supply kW.h. On the contrary, when batteries discharge kW.h, we can actually only obtain kW.h. Fig. (a)a shows a real example of [35]. Given that is not monotonic, we redefine as in (7) for simplicity.

(7)

Then we have

(8)

The graph of is shown in Fig.(b)b, which is monotonic. It is observed that may be well fitted with a power function or an exponential function, which will be further analyzed in our simulations.

(a) An example of
(b) An example of
Fig. 2: Example of and

Secondly, another issue that we should be concerned about is that present decisions on charging or discharging will influence the future cost[34], which is denoted as potential cost. For instance, the more the power that the battery discharges at present, the more the power it will have to charge in the future, which will increase the future power cost. Accordingly, the potential cost can be defined as , where is an evaluated parameter that can be commonly calculated as , where represents the indic of a future hour, represents the number of future hours, represents the predicted unit cost of power storage at the th hour and is a weight coefficient with the constraint of . In addition, according to [36], the influence of the current on the future will decline with the increase of . In turn, the influence of on the current will also decline with the increase of , i.e., , for . Furthermore, it was shown in [36] that after the point further changes are negligible. Thus we can reasonably make , where .

Finally, the power cost model considering storage scheduling for the th data center can be extended from (4) to

(9)

The first condition in (9) can be decoupled into and , where and are denoted as the lower bound and upper bound of , respectively. Due to the fact that an excessively high speed of charge and discharge will cause severe damage to storage devices as well as exorbitant wastes of energy [37], we assume that and . In addition, we have , because batteries can discharge no more than the stored power and can charge no more than the remaining capacity.

Ii-D Joint Optimization of Power Cost and Workload Scheduling

In this subsection, we will describe the workload scheduling model considering delay constraints, and then formulate the eco-friendly power cost minimization model with workload scheduling. We assume that the arrival rate of requests to the

th data center approximately obeys Poisson distribution. According to the M/M/n queuing model

[11], the average delay of tasks in the th data center is given by

(10)

where and are the average service rate per server and the average arrival rate of requests to the th data center, respectively.

A lower queuing delay usually means that more servers are active, which will cause a higher electricity bill. Thus in order to study the trade-off between the power cost and the request delay, the cost function of the th data center can be defined as

(11)

where and are weight parameters.

In addition, in order to meet the Service-Level Agreement (SLA) of the users, the transmission delay between the th portal server and the th data center should also be considered. Since the transmission delay usually depends on the routing of the request, , we define as

(12)

Here, is an implicit function with respect to , and . For simplification, we denote , or for short, as the maximum of according to [12], where . Then we have

(13)

where is the maximum delay that users can tolerate.

Based on (1), (8)–(11) and (13), the integral cost function of all back-end data centers is given by

(14a)
(14b)
(14c)
(14d)
(14e)
(14f)
(14g)
(14h)

We aim to minimize , in order to optimize the eco-friendly use of power and delay-satisfied dispatch of workload.

In addition, the term in (14c) should be further minimized as is done in (15). In fact, the definition domain of defined by (14b)–(14h) will be expanded with the decline of , which is likely to make the minimum of smaller.

(15a)
(15b)
(15c)

Finally, the whole eco-friendly power cost minimization model with workload scheduling for geo-distributed data centers can be defined as

(16)

For convenience, we denote Problem (16) as Problem P&W, which is a multi-objective programming problem, and denote Problem (14) and Problem (15) as Problem P and Problem W, respectively.

Iii Solution and Analysis

As Problem P&W is a multi-objective programming problem with integer constraints, it is difficult to solve. But if we assume that in (14c) is a known constant, then Problem P&W can be simplified into the easier Problem P. Thus in this section, we first regard as a known constant to simplify Problem P&W as Problem P. We propose a Sequential Convex Programming (SCP) algorithm, shown in Algorithm 2, to find the non-integer solutions of Problem P. Then we propose a low-complexity searching method, shown in Algorithm 3, to seek for the quasi-optimal mixed-integer solutions of Problem P. This approach can obtain a quasi-optimal solution, which is very close to the optimal solution obtained by the Branch and Bound (B&B) method, while converging much faster. Finally, we reconsider as a variable and give the equivalence condition of Problem P and Problem P&W.

Iii-a Non-integer Solution of simplified Problem PW

In this subsection, we just regard in (14c) as a known constant and propose an SCP algorithm to find the non-integer solution of Problem P, the simplified version of Problem P&W. We denote the continuity-relaxed version of Problem P as Problem P. It is observed that Problem P is non-convex because of the non-linear equality constraint (14d) [38], in which is a non-linear function of rather than a constant. As is seen, (14d) is mainly related to the second part of , which is defined in (4). Therefore, we will first solve (4) and then plug the solution of into (14) to eliminate variables . By this, Problem P can be transformed to sequential convex problems and its globally optimal solution can be obtained by our proposed SCP algorithm.

First, we ignore the constraint and establish the Lagrange Dual Function of (4) for the th data center as

(17)

where is the Lagrange factor. According to the Lagrange Multiplier method [38], we set and for . Then we can obtain the Lagrange Dual Solution of (4) without as

(18)

The corresponding optimal function value of in (4) can be given by

(19)

where

(20)

Here, is the marginal cost of power for the th data center when for , which is independent of . Note that , , and since , and .

Secondly, we reconsider the inequality constraint . It is proved in Appendix A that for the whose optimal Lagrange solution obtained by Eq. (18) is less than zero, its optimal solution with the constraint in problem (4) is zero. Based on this, we propose Algorithm 1 to solve the optimal in (4) for the th data center. We denote as a subset of , where for , and . In Algorithm 1, we make at first, and then repeat the following two steps until all are no less than zero: moving the indic of from the subset to , and recalculating the remaining according to (18) and (20). Finally, we make and . It is proved in Appendix A that Algorithm 1 can obtain the optimal solution of Problem (4). In addition, Fig. 3 shows that the optimal solved by Algorithm 1 and by the Interior Point method are almost identical. According to [36], Algorithm 1 converges faster than the Interior Point method when solving .

1:Initialize , , , , and for the th data center. Make and .
2:Repeat on the renewed subset
3:Build the Lagrange Dual Function as (17)
4:Calculate , and according to (20)
5:Calculate as (18)
6:If there is any then
7:Move all indics of from to
8:End If
9:Until No new indic is moved into .
10:Make for and for .
Algorithm 1 Obtain Optimal in (4)
Fig. 3: Comparison of Algorithm 1 and Interior Point Method

Thirdly, we plug the solution of , shown in (19), into (14) to eliminate the non-linear constraint. Then we have

(21a)
(21b)
(21c)
(21d)
(21e)
(21f)

where . We denote problem (21) as Problem P1 and denote the continuity-relaxed version of problem (21) as Problem P1. We find that the feasible region of Problem P1 turns into a convex set after eliminating the nonlinear equality constraint. According to [38], only if the objective function (21a) is also convex can Problem P1 be proved to be a convex problem. In Appendix B, we prove that (21a) is convex when the fitted function of meets the condition shown in Proposition 4, which is easily to be met according to simulations. Therefore, we just regard (21a) as a convex function in this paper, so that Problem P1 is a convex programming problem.

Based on all of the above, we propose the SCP algorithm to solve Problem P. The details of the SCP algorithm are shown in Algorithm 2, which is proved to obtain the optimal non-integer solution of Problem P in Appendix A. In the SCP algorithm, every time we solve a new version of , we plug into problem (21) and obtain a new Problem P1, which can be solved by standard convex programming methods, such as Sequential Quadratic Programming (SQP). Algorithm 1 and Algorithm 2 adopt similar architectures and the main differences between them include Lines 3–4 in Algorithm 2 to obtain and solve a new version of Problem P1, and Lines 10–12 in Algorithm 2 analyzed in Appendix A. We conclude that our proposed SCP algorithm can obtain the global optimal solution of Problem P by solving several instances of convex Problem P1.

1:Initialize relevant parameters. Make and for .
2:Repeat
3:Calculate , and on subset according to (20) for , and plug them into Problem P1.
4:Solve Problem P1 with convex programming method, such as SQP.
5:Calculate and as in (18) and (20), respectively, on subset for .
6:For
7:If there is any then
8:Move all indices of from to
9:End If
10:If there is any then
11:Move all indices of back to
12:End If
13:End For
14:Until No is newly moved into or for
15:Output the renewed , , , and for .
Algorithm 2 Sequential Convex Programming Algorithm

Iii-B Mixed-integer Solution of Problem PW

In this subsection, we consider the integer constraints (21f) and propose Algorithm 3 to find a quasi-optimal mixed-integer solution of Problem P at first.

Considering that B&B method is the most common integer programming algorithm and is the foundation of others [39] [40] [41], we firstly combine B&B with our proposed SCP algorithm, denoted as BB-SCP, to seek for the optimal mixed-integer solution of Problem P. In BB-SCP, we need to solve Problem P by the SCP algorithm in each branch (iteration). Simulation results show that the times of branching for the B&B algorithm will increase rapidly with the growth of , the number of integer-limited variables . For example, when reaches about 8, the number of branches will grow to more than 5000. Although many studies tried to improve the convergence speed of B&B algorithm [39] [42], it is still a severe problem. Therefore, we propose Algorithm 3 to replace BB-SCP and seek for a quasi-optimal mixed-integer solution of Problem P .

Algorithm 3 mainly refers to the rounding strategy towards integer-limited . In detail, we first solve Problem P by our proposed SCP algorithm and make each equal to the closest integer. Then we adjust the obtained integer-values of and recalculate the remaining variables by the SCP algorithm again. Table I shows the quasi-optimal solutions obtained by Algorithm 3 and the gaps between the quasi-optimal solutions and the optimal solutions obtained by the BB-SCP method, where the gaps of can be seen to be very small. According to our simulation, we can conclude that Algorithm 3 can not only obtain a quasi-optimal mixed-integer solution of Problem P with a tiny loss, but also needs to call the SCP algorithm only twice instead of thousands of times as required by the BB-SCP algorithm. In addition, we also find from Table I that the gaps of total power cost are negligible, and the average remains small despite the non-ignorable gaps.

Initialize relevant parameters, solve Problem P with our proposed SCP algorithm, and then adjust the values of as follows.
round
, where and
Denote as the

th smallest element of vector

Denote as the indic of for
If then
=ceil{abs()}, where
For
If then
Add 1 to