Task scheduling is the key problem in computational grid research, which commonly studies on task allocation. Task allocation aims at getting a fairness of load between each computational node, while minimizing the average cost of task run, or minimizing the average task execution time, which generally considered as to reach a best performance. Different with traditional distributed system, computational grid has its own characteristics, which cost or execution time of tasks running on computational nodes is only one side in the whole system. In fact, computational grid often coves a wide scope, cost on communication is a considerable factor in the system cost.
In this paper, we propose a game theoretic based solution to the grid load balancing problem. Different with other research, our research focuses on minimizing the cost of system when executing tasks, but not minimizing the execution time of tasks. In this game theory based solution to the grid load balancing problem, we make the cost as the object of game, and the slice strategy on each scheduler as the game strategy.
In general, job allocation algorithms in distributed systems can be classified as static or dynamic. In static algorithms, job allocation decisions are made at compile time and remain constant during runtime. For example, in , Kim and Kameda proposed a simplified load balancing algorithm, which targets at the minimizing the overall mean job response time via adjusting the each node’s load in a distributed computer system that consists of heterogeneous hosts, based on the single-point algorithm originally presented by Tantawi and Towsley. Grosu and Leung formulated a static load balancing problem in single class job distributed systems from the aspect of cooperative game among computers. Also, there exists several studies on static load balancing in multi-class job systems [5, 6]. In contrast, dynamic job allocation algorithms attempt to use the runtime state information to make more informative job allocation decisions. In 
, Delavar introduced a new scheduling algorithm for optimal scheduling of heterogeneous tasks on heterogeneous sources, according to Genetic Algorithm which can reach to better makespan and more efficiency. In, Fujimoto proposed a new algorithm RR that uses the criterion called total processor cycle consummation, which is the total number of instructions the grid could compute until the completion time of the schedule, regardless how the speed of each processor varies over time, the consumed computing power can be limited within ( represents the number of the processor, n represents the number of independent coarse-grained tasks with the same length)times the optimal one.
For balanced task scheduling, [9, 10, 11] proposed some models and task scheduling algorithms in distributed system with the market model and game theory. [12, 13] introduced a balanced grid task scheduling model based on non-cooperative game. QoS-based grid job allocation problem is modeled as a cooperative game and the structure of the Nash bargaining solution is given in . In , Wei and Vasilakos presented a game theoretic method to schedule dependent computational cloud computing services with time and cost constrained, in which the tasks are divided into subtasks. The above works generally take the scheduler or job manager as the participant of the game, take the total execution time of tasks as the game optimization goals and give the proof of the existence of the Nash equilibrium solution and the solving Nash equilibrium solution algorithm, or model the task scheduling problem as a cooperative game and give the structure of the cooperative game solution.
2 Multi-objective Non-cooperative Game Model
2.1 System Model
A relatively complex computational grid is as Fig. 1 illustrates. There are users, schedulers and computational nodes:
. Users generate tasks to schedulers. Each user generates tasks independently and obeys Poisson distribution.
Schedulers. Schedulers accept tasks from the users, according to the numbers of computational nodes, make a task into slices (-th slice to computational node from scheduler ). The slice satisfied the constrain of Equation (1).
Computational Nodes. Computational nodes really execute task slices. The average processing rate of node is , and the processing time can obey any distribution, so, every computational node can be deemed as an M/G/1 queuing system . Assume that is the average task arriving rate at the scheduler . Two constraints should be satisfied as Equations (2) and (3) define.
2.2 Costs for Tasks in Computational Grid
Chandy  proposed a cost model of tasks in computational grid. It includes four kinds of costs: power cost, network cost, loss cost and utilization cost. Their computational methods is listed in Table 1. The total costs are the sum of the four kinds of costs, that is, .
|Cost Types||Computational Methods||Parameter Meanings|
|Power Cost||is cost/Joule, is the capacity, is the utilization time of resources.|
|Network Cost||is the cost of bandwidth, is the average bandwidth utilized, is the utilization cost within a unit time, is the utilization time of networks.|
|Loss Cost||is the cost of the resources, is the failure time, is the utilization time of the related resources.|
|Utilization Cost||is the fixed cost of amortization, is related parts of resources, is the utilization factor of resources, is the utilization time of resources.|
2.3 Game Model
Assume that is the average data length of a task, the transmission time of task slice is as Equation (4) defines, where is the transmission delay between the scheduler and the computational node , is the bandwidth between the scheduler and the computational node .
A computational node in computational grids can be deemed as an M/G/1 queuing system . Task processing time on a computational node includes servicing time and waiting time, their computational methods as Equation (6) and (7) define.
Where is the average of task servicing time on computational node ,
is the variance of task servicing time, andis the average arriving rate of the scheduler .
The power cost of task slice processing on computational node is composed of the power cost during waiting time and serving time. Replacing in the power cost computation method in Table 1 by Equation (6) and Equation (7), we get the power cost of task slice on node as Equation (8) defines.
Where is the capacity of node during servicing time, while is the capacity of node during waiting time.
The loss cost of task slice processing on node is determined by the processing time on node . Replacing by Equation (6) and Equation (7) in the loss cost computation method in Table 1, we get loss cost of task slice on node as Equation (9) defines.
The resource utilization cost of task slice on node is composed by two parts: one is the CPU utilization cost, and the other is the hard disk utilization cost. The computing percentage of CPU is , where is average computation of tasks and is the computation provided by node . The CPU utilization cost occurs during servicing time. The disk utilizing percentage is , where is the average bits of tasks and is the disk space of node . The hard disk utilization cost occurs during servicing time and waiting time. So, we get the formula of utilization cost as Equation (10) defines.
So, the total costs of task slice is .
Each scheduler shares the set of computational nodes in a computational grid with each other, it is independent and competes with each other to make its task processing costs minimal. Thus, a non-cooperative game exists among schedulers from to , and every scheduler acts as a player of the game. The task slicing strategy of scheduler is . In the game, every scheduler expects that its task processing cost is minimal. That is, the objective function is as Equation (11) defines.
The game using Equation (11) as objective function has a unique Nash equilibrium.
Just because Equation (11) is continuous, convex, and increasing, that is,
The game using Equation (12) as objective function has a unique Nash equilibrium.
Just because Equation (12) is continuous, convex, and increasing, that is,
We introduce a new variable as Equation (14) defines, which denotes the computational power of node that is available to scheduler .
Since all formulae in Equation (12) are all convex, the first-order Karush-Kuhn-Tucker conditions are necessary and sufficient to solve it. The Lagrangian is given by
We get the following equations.
By constraint (1), is given by the following equation.
All computational nodes that make must be excluded, we set for these nodes.
2.4 Solving Algorithm
Based on the game model in the above subsection, we can design the solving algorithm as follows.
Order the computational nodes according to potential power cost such that , is defined by the following equation.
Then we have the following equation.
is given by the following equation.
is the maximum positive integer that satisfies (26).
In this section, we analyze the effects of different aspects on the average task power cost of the schedulers given by (13) and compare the result with the average-allocated algorithm. Besides, with the same parameter of game algorithm, we observe its impacts on other costs.
3.1 Convergence to Equilibrium of the Game Algorithm
In this experiment, we analyze the convergence to equilibrium of the game algorithm. Each scheduler chooses its own strategy independently and dependents on a particular system state. However, the system state is always changing when schedulers are running, so that the strategy for the scheduler needs to be updated when the system state is changing. In order to make the system reach to a stable state, whereby no player has a tendency to unilaterally change its strategy, it needs to make the algorithm run iteration and reach the Nash equilibrium eventually. The initial strategy of each scheduler
is the zero vector, each scheduler then refines and updates its strategy at each iteration. When the result does not change, we expect the system to reach a Nash equilibrium.
In this experiment, we set the average system load to 0.2. The values in Table 2 represent the average processing rate of each computational node and the relative job arrival rate for each scheduler is shown in Table 3. The result, as shown in Fig. 2 that the algorithm converges to a Nash equilibrium in 4 iterations (in this paper, we assume that convergence has occurred when the overall percentage change is 0.0001). In terms of the periodic scheduling done by each scheduler, an equilibrium is reached when the calculated strategy does not change from one iteration to another.
|(the average processing rate of node j)||35||46||37||28||29||30||41||32|
|(the average task arriving rate at the scheduler i)||6.672||2.78||3.336||6.672||5.004|
The average power cost per task for each scheduler, when the system is at equilibrium, is shown in Fig. 3. The power cost includes the cost of execution time of the task itself, the cost of waiting time at the queue given by (13). As shown in the figure, the average power cost per task for each scheduler, for both the game algorithm and average schemes, is normalized by dividing the sum of average task cost of all schedulers in the game algorithm. As can be seen from the figure, game algorithm has a lower cost than average scheme for every scheduler.
3.2 Effect of System Loads
In this set of experiments, we vary the average system load from 0.1 to 0.9. The same set of computational nodes and schedulers are used as in the previous set of experiments. The arrival rate of tasks for each scheduler is then adjusted to give the required average system load.
Fig. 4 shows the normalized average job costs as the system load is varied from 0.1 to 0.9. As before, the job cost is normalized by dividing each cost by the overall average cost of the game scheme. In this figure, we see an increase in the system wide average job cost as the system loads increase. This trend is explained by and is a consequence of (15). As the system loads increase, the average queue length at the computational node gets longer, and as a result, the average cost will be added by cost when the task is waiting in the queue. Both of the game algorithm and average scheme show the same trend, although the game algorithm gives lower expected costs.
3.3 Effect of System Size
In this part of the experiment, we vary the number of computational nodes and the number of schedulers in the system and investigate its effect on the average cost of the schedulers for both the game algorithm and average schemes.
First, we vary the number of computational nodes in the system from 5 to 16. The processing rate of each of the computational nodes is shown in Table 4. We keep the average system load as 20 percent. The result of the effect of system size with computational nodes on the average cost of the schedulers is shown in Fig. 5. As before, the job cost is normalized by dividing each cost by the average cost of the game algorithm. As can be seen in the figure, for both the game algorithm and average schemes, the average cost decreases as the number of computational nodes in the system increases. Fig. 5 also demonstrates that the game algorithm results in a lower overall average cost than the average scheme over system size ranging from 5 to 16 computational nodes. This shows that an efficient allocation of tasks to the computational nodes is important in grid systems having multiple computational nodes.
|(the average processing rate of node j)||35||46||37||28||29||30||41||32||35||46||40||39||41||30||41||32|
Then, we vary the number of schedulers in the system from 2 to 10. We keep the total number of tasks arrived to the schedulers in the system the same as the number of schedulers increasing. As such, we can analyze the effects on the average cost of the schedulers for both the game algorithm and average schemes. The result of the effect of system size with schedulers on the average cost of the schedulers is shown in Fig. 6. As can be seen in the figure, for the average scheme, the average cost keeps unchanged as the number of schedulers in the system increases. This is because total tasks are kept the same so the numbers of tasks, which are sent to each computational node kept the same, then the average cost would not change. But for the game algorithm, the figure shows a different result, which average cost of schedulers are different, as the number of schedulers are different. When there are 7 schedulers in the system, the result can be best that the average cost of schedulers can be lowest.
3.4 Effect of Service Time
In the first set of experiments, we assume that the task service times follow an exponential distribution. However, it has been suggested that the service time of tasks for certain applications follows a heavy-tailed distribution, instead of an exponential distribution. One of the most common distributions used to model such a heavy-tailed distribution is the Bounded Pareto distribution.
The Bounded Pareto distribution is characterized by the following probability density function (pdf):
where is the minimum job execution time and
is the maximum job execution time; the parameter defines the shape of the hyperbolic curve of the distribution. The mean (first moment) of the distribution is given by
and the second moment is given by
As before, we use 8 computational nodes in this set of experiments. The parameters and used for each of the computational nodes are shown in Table 5 The hyperbolic curve parameter of the Bounded Pareto distribution is then set to = 1.1. Table 6 summarizes the above values in terms of the expected task execution time at each of the computational nodes; Table 6 also shows the variance of the task execution time at each of the computational nodes.
The same set of schedulers and parameters are then used as in the previous set of experiments. We then vary the system loads from 0.1 to 0.9 and investigate the effect of the Bounded Pareto service time of tasks on the number of iterations required to reach equilibrium for the game algorithm. The effect of the Bounded Pareto service times on the overall average task power cost are shown in Fig. 7. As can be seen in the figure, the trend in the system wide average job power cost is similar with the previous result shown in Fig. 4, where the service times follow an exponential distribution. Besides, the game algorithm gives a lower overall task power cost than the average-allocated algorithm.
In this part of the experiment, we investigate the fairness of each of the different schemes. Fairness would be achieved when the average task power cost for each of the scheduler is the same. If one scheduler has a lower average task power cost and another has a higher average task power cost, then the scheduling scheme can be considered unfair, as it gives some schedulers an advantage and other schedulers a disadvantage.
A fairness index given by
where is the average task power cost of scheduler . If a load-balancing scheme is 100 percent fair, then is 1.0. A fairness index close to 1.0 indicates a relatively fair load-balancing scheme
In the first part of the experiment, we vary the average system load from 0.1 to 0.9. The results are shown in Fig. 8. As can be seen in the figure, the average-allocated algorithm has a fairness index of 1.0 across the entire utilization range from 0.1 to 0.9. This is the inherent advantage of the average-allocated algorithm even though it is a distributed, decentralized scheme and has more cost compared with the game algorithm, it guarantees the same average task power cost for each of the schedulers. As shown in the figure, the game algorithm decreases in fairness as the system nears full capacity. However, the fairness index at 90 percent system load is still above 0.98, and depending on the requirement of the application, this value may be more optimal than the minimum acceptable level.
In the next set of experiments, we set the average system load to 20 percent and vary the number of computational nodes in the system from 2 to 8. The results are shown in Fig. 9. As in the previous experiment, the average-allocated algorithm gives a fairness index of 1.0 as the number of computational nodes is varied from 2 to 8. The game algorithm shows some variations as the number of computational nodes is varied. As before though, the fairness index in all of the cases is above 0.99, which, depending on the application, may be better than the minimum acceptable level.
In this part of the experiment, we change the set of schedulers from a highly heterogeneous set of schedulers shown in Table 2 to a less heterogeneous set of schedulers, as shown in Table 6. The results are shown in Figs. 10 and 11. As can be seen in the figures, using a less heterogeneous set of schedulers has improved the fairness of the game algorithm as compared to the previous set of experiments.
3.6 Impact on other costs
From these experiments above, we can find that the game algorithm has a strong advantage in power cost of scheduler. Followed by the analysis of power cost above with the same set of parameters and the same objective equation, we observe its impact on other objectives, that is network cost, loss cost, and utilization cost.
These results are shown in Fig. 12, Fig.13, Fig. 14 for network cost, lost cost, and utilization cost respectively. As before, the job cost is normalized by dividing each cost by the average cost of the game scheme. As can be seen from figures, game algorithm has a lower cost than average scheme for every scheduler no matter on network cost or lost cost, or utilization cost. Which means that the result of multi-objective non-cooperative game, what we define in equation 12, is reasonable.
In this paper, we propose a game theoretic algorithm that solves the grid load balancing problem. It aims at minimizing the average task cost for schedulers when tasks are executed in the grid system. The algorithm is semi-static and responds to the changes in system states during runtime. This game algorithm does not assume any particular distribution for service times. It can run correctly only with the first moment and second moment of service times. The experiment results show that the algorithm has a lower cost in the schedule of computational grid.
-  I. FOSTER, Y. ZHAO, I. RAICYI, et al. Cloud Computing and Grid Computing 360-degree Compared, In: Proc. of the 2008 Grid Computing Environments Workshop(GCE2008), 1–10.
-  R. SUBRATA, A. Y. ZOMAYA, B. A. LANDRELDT. A Cooperative Game FrameWork for QoS Guided Job Allocation Schemes in Grids, IEEE Transactions on Computers, 2008, 57(10):1413–1422.
-  C. KIM, H. KAMEDA. An Algorithm for Optimal Static Load Balancing in Distributed Computer Systems, IEEE Transactions on computers,1992,41(3):381–384.
-  D. GROSU, M. LEUNG. Load Balancing in Distributed Systems: An Approach Using Cooperative Games, In: Proc. of the International Parallel and Distributed Processing Symposium, 2002,1530–2075/02.
-  L. MNI, K. HWANG. Optimal Load Balancing in a Multiple Processing System with Many Job Classes, IEEE Transactions on software engineering,1985,SE-11(5):491–496.
-  J. LI, H. KAMEDA. Load Balancing Problems for Multiclass Jobs in Distributed/Parallel Computer Systems, IEEE Transactions on computers,1998,47(3):322–332.
-  A. G. DELAVAR, M. NEJADKHEIRALLAH, M. MOTALLEB. A New Scheduling Algorithm for Dynamic Task and Fault Tolerant in Heterogeneous Grid Systems Using Genetic Algorithm, In: Proc. of the IEEE International conference on computer science and information technology, 2010, 9:408–412.
-  N. FUJIMOTO, K. HAGIHARA. Near-optimal Dynamic Task Scheduling of Independent Coarse-grained Tasks onto a Computational Grid, In: Proc. of the 2003 international conference on parallel processing,2003:391–398.
-  R. MAHAJAN, M. RODRIG, D. WETHERALL, et al. Experiences Applying Game Theory to System Design, In: Proc. of the 2004 Annual Conference of The Special Interest Group on Data Communication [C], Portland, Oregon, USA:ACM Press, 2004:183–190.
-  K. RANGANATHAN, M. RIPEANU, A. SARIN, et al. Incentive Mechanisms for Large Collaborative Resource Sharing, In: Proc. of the 4th IEEE/ACM International Symposium on Cluster Computing and the Grid [C], Chicago, Illinois, USA: IEEE Computer Society, 2004:1–8.
-  D. GROSU, A. T. CHRONOPOULOS. Non-cooperative Load Balancing in Distributed Systems, Journal of Parallel and Distributed Computing, 2005, 65:1022–1034.
-  K. YI, R. WANG. Nash Equilibrium Based task Scheduling Algorithm of Multi-schedulers in Grid Computing, ACTA Electronica Sinica, 2009, 37(2):329–333.
-  R. SUBRATA, A. Y. ZOMAYA, B. LANDFELDT. Game Theoretic Approach for Load Balancing in Computational Grids, IEEE Transactions on Parallel and Distributed Systems, 2008, 19(1):66–76.
-  G. WEI, A. V. VASILAKOS, N. XIONG. Scheduling Parallel Cloud Computing Services: An Evolutional Game, In: Proc. of the 1st International Conference on Information Science and Engineering(ICISE2009), 2009:376–379.
-  O.M. Elzeki, M. Z. Reshad, M. A. Elsoud. Improved Max-Min Algorithm in Cloud Computing, International Journal of Computer Applications, 2012, 50.
-  G. LIU, J. LI, J. XU. An Improved Min-min Algorithm in Cloud Computing, In: Proc. of the 2012 International Conference of Modern Computer Science and Applications, 2013, 191:47–52.
-  C Zhao, S Zhang, Q Liu. Independent tasks scheduling based on genetic algorithm in cloud computing, Wireless Communications, Networking and Mobile Computing, 2009. WiCom’09. 5th International Conference on. IEEE, 2009: 1-4.
-  K. Li, G. Xu, G. Zhao. Cloud Task Scheduling Based on Load Balancing Ant Colony Optimization, Chinagrid Conference (ChinaGrid), 2011 Sixth Annual. IEEE, 2011: 3-9.
S. S.CHANHAN, R. JOSHI.
A Heuristic for QoS Based Independent Task Scheduling in Grid Environment, In: Proc. of the International Conference on Industrial and Information system. 2010:102–106.
-  M. XU, L. CUI, H. WANG, Y. BI. A Multiple QoS Constrained Scheduling Strategy of Multiple Workflows for Cloud Computing, In: Proc. of the 2009 IEEE International Symposium on Parallel and Distributed Processing with Application,2009:629–634.
-  E. DEELMAN, G. SINGH, M. LIVNY, et al The Cost of Doing Science on the Cloud: the Montage Example, In: Proc. of the ACM/IEEE Conference on Supercomputing Piscataway:IEEE Press, 2008:1–12.
-  M. ASSUNCAO, A. COSTANZO, R. BUYYA. Evaluating the Cost-benefit of Using Cloud Computing to Extend the Capacity of Clusters, In: Proc. of the 18th ACM International Symposium on High Performance Distributed Computing New York: ACM Press 2009:141–150
-  G. TIAN, D. MENG, J. ZHAN. Reliable Resource Provision Policy for Cloud Computing, Chinese Journal of Computers. 2010, 33(10):1859–1872.
-  R. SUN, J. LI. The Basis of Queue Theory, Beijing, China: Science Publisher, 2002.
-  Y. CHOW, W. KOHLER. Models for Dynamic Load Balancing in a Heterogeneous Multiple Processor System, IEEE Transactions on Computers, 1979, 28:354-361.
-  J. CHANDY. An Analysis of Resource Costs in a Public Computing Grid, Proceedings of the 2009 IEEE International Symposium on Parallel and Distributed Processing, 2009, Rome Italy: IEEE Computer Society Press, 2009:1-8.
-  S. Khan, I. Ahmad. A Cooperative Game Theoretical Technique for Joint Optimization of Energy Consumption and Response Time in Computational Grids, IEEE Transactions on Parallel and Distributed Systems, 2009, 20(3):346–360.