I Introduction
In the airline scheduling process (ASP), airline crew scheduling (CS) is considered as one of the most important planning activity, owing to multiple reasons. First, the crew cost is the second largest operating cost (after the fuel cost). Second, it’s optimization carries a huge potential for enormous cost savings (millions of dollars annually even with marginal improvements). Last, CS is to be performed in the presence of several complex constraints laid down by federations, labor unions, etc. in order to guarantee the safety of crew members. In the last three decades, the airline CS has received unprecedented attention from the operations research (OR) society, leading to the development of numerous CS optimization systems. Over past years, the expansion of airlines’ flight operations, to match the exponentially increasing airtravel demand, has lead to a tremendous increase in the number of flights, aircraft and crew members to be scheduled, leaving the stateofthepractice obsolete. Hence, it is imperative to improve upon the existing optimization systems by leveragingin the recent technological advancements, enhanced data handling capacities and speed of computations.
Airline crew scheduling is a combination of complex combinatorial optimization subproblems (NPComplete and NPHard problems [1]). It is decomposed into two problems, namely, crew pairing and crew assignment problems which are solved sequentially. The former problem is aimed at generating a set of flight sequences (called a crew pairing) to cover a finite set of flight legs from an airline’s timetable in minimum cost, while satisfying several legality constraints linked to the federations’ safety rules, airlinespecific regulations, labor laws, etc. The aim of the latter problem is to assign crew members to these optimal crew pairings. The scope of this paper is limited to the former problem.
In CPOP, crew pairings have to satisfy multiple legality constraints (as mentionedabove) in order to be classified as ‘operational/legal’. To solve the CPOP, it is required to develop a
legal crew pairing generationapproach in order to facilitate only legal crew pairings to the optimization phase. Depending upon the CPOP’s scale, legal pairings are generated in two ways: before the optimization phase and during the optimization phase. For each of these approaches, several optimizationbased solution methodologies have been proposed in literature. These could be broadly categorized into two techniques: metaheuristics or mathematical programming based solution approaches. Among the latter category,
Column Generation [3] is the most widely adopted technique which is proven to be successful for solving largescale integer programs. It is an efficient searchspace exploration technique which exploits the idea that the majority of variables in a largeinteger program are nonbasic in the optimal solution. Hence, it generates only those pairings which have a highpotential of bringingin the associated benefits to the objective function. It is an exact method but it’s most successful heuristic implimentations, for solving CPOP, could be found in [4] & [5].Among metaheuristics, the most successful and widely adopted technique is the Genetic Algorithm (GA) which is populationbased probabilisticsearch method, inspired by the theory of evolution [6]. GAs with customized operators are known to be successful in solving a variety of combinatorial optimization problems ([7, 8]). Several GAbased CPOP solution approaches have been proposed in the literature which are broadly presented in Table I.
Literature  Formulation  Airline  Flight Data  Airlines  

Instances  Timetable  Applicability*  # Flights**  # Pairings**  Accessibility  
[7]  SCP  Did not solve CPOP  11G  1,000  10,000  Public   
[9]  SPP    40R  823  43,749  Private   
[10]  SCP  Daily  28R  380  21,308  Private  Multiple Airlines 
[11]  SCP  Monthly  1R  2,100  11,981  Private  Olympic Airways 
[12]  SCP  Monthly  1R  710  3,308  Private  Turkish Airlines 
[13]      4R  506  11,116  Private  Turkish Airlines 
[14]  SCP    12R  714  43,091  Private  Turkish Airlines 
SCP stands for SetCovering Problem formulation and SPP stands for SetPartitioning Problem formulation. * Generated (#G)
or Realworld (#R) testcases where # represents the number of testcases being used for validation. ** The provided values are
the maximum among all the testcases being used for validation.
The research gap in this literature review could be recorded in two folds. First, in some of these instances, the results are obtained using a subset of the original searchspace i.e. all possible legal pairings are not used ([11, 12]). Second, the other instances have been validated on the flight datasets of smaller airlines (a handful of pairings), operating in lowdemand regions such as Greece, Turkey, etc. These GAbased solution approaches become obsolete when scaled to the mediumscale flight networks of bigger airlines, operating in the US. Hence, it is imperative to develop an efficient GA for optimizing such CPOPs.
In an attempt to address these limitations, the first contribution of this paper are related to the proposition of a customized Genetic Algorithm, with enhanced searchspace exploration, for solving a realworld airline CPOP. This is achieved by enhancement of the initialization phase and genetic operators (crossover and feasibilityrepair heuristic) using the CPOP’s domainknowledge. With these enhancements, the proposed GA is able to generate crew pairing solutions with varying characteristics such as less number of deadhead flights, hotel nights, etc. which are amongst the key performance indicators (KPI), apart from the crew pairing cost, used by airlines for evaluating the performance of their CS. The other contribution of this paper is the comparison of the proposed GA with a column generation based largescale airline crew pairing optimizer, referred to as CGOptimizer, which has been developed by the authors and validated by GE Aviation. The utility of these contributions is demonstrated on a realworld mediumscale flight dataset (involving 830 flights and 430,873 legal pairings), extracted from the networks of large airlines operating in the US.
Ii Airline Crew Pairing Problem
In CPOP, the input data includes a finite set of flight schedule from the airline’s timetable, along with the pairings’ costing and legality rules. A crew pairing is a flight sequence to be flown by a crew member, beginning and ending at the same crew base. Other associated terminologies are explained with the help of a crew pairing example, shown in the Fig. 1.
In realtime operations, sometimes crew miss their flight connections due to uncertain events. As a result, they are transported to their scheduled airports either by road transportation (in case of same city airports) or by traveling as passengers in some other flights (in case of distant airports). These flights are called deadheads (Dhds). Airlines desire to minimize deadheads in their crew operations (ideally zero) in order to maximize their profits.
Iia Legal Crew Pairing Generation
As mentioned in Section I, it is imperative to develop a legal crew pairing generation approach in order to facilitate legal pairings to the optimization phase. In small and mediumscale CPOPs, all legal pairings are generated explicitly before the optimization phase. The same approach is adopted in this work and a dutynetwork based parallel legal pairing generation algorithm [15] is used for generating all legal pairings explicitly. Interested readers are referred to [15] for an extensive review of the pairing generation literature too.
IiB Crew Pairing Optimization Problem
The goal of the optimization phase is to find a pairing subset from the generated set of all legal pairings in order to cover the given flights with the minimum cost possible. In literature, the CPOP is modeled either as a setpartitioning problem (SPP; each flight leg is allowed to be covered only once) or as a setcovering problem (SCP; overcoverage of flight legs i.e. deadheads are allowed). In this paper, the SCP formulation is adapted and modified to define the optimization problem for the proposed GA. It’s mathematical model is presented in Section IIIG.
Iii Genetic Algorithm
This section presents a customized GA for solving mediumscale CPOPs of large airlines. For such problems, the number of legal pairings is so huge that it is intractable to consider all of them in the GA’s population. Hence, the proposed GA solves the underlying CPOP by initializing the population from smaller pairing sets and improving the population repeatedly by bringingin new pairings from the rest of the searchspace with the help of customized genetic operators. The overall procedure of the proposed GA is mentioned in lines 110 of the Algorithm LABEL:PseudoCode and its components are explained in the following subsections. First, the GA’s first population is initialized and afterwards, its main loop is performed in which selection, reproduction (crossover and mutation), and feasibilityrepair operators are applied sequentially. In the presented work, these operators are either enhanced or replicated from algocf[htbp] the works presented in [7, 11] & [16]. For simplicity, the generated list of all pairings is referred to as AllPairs.
Iiia Chromosome Representation
As mentioned above, the length of each chromosome cannot be equal to the number of pairings in AllPairs list. Hence, in this work, a chromosome with 2bits gene encoding is adapted, as shown in Fig. 2. In this chromosome structure, the first bit,
represents a binary variable corresponding to the pairing selected in the second bit,
(selected from the AllPairs). Moreover, being a singleobjective optimization problem, it is desired to maintain diversity through additional means in order to prevent premature convergence. The chromosome structure, used in [16], is adopted in this work which is made up of two parts: expressed part (includes pairings that participate in evaluating the quality of the solution) and unexpressed part (includes pairings which are not part of the solution but are considered for maintaining diversity withrespectto the expressed part). In this work, the fixedlength chromosome is used while the lengths of expressed and unexpressed parts are allowed to vary dynamically which is in contrast to the structure given in [16].IiiB DeadheadMinimizing Initialization Heuristic
Mostly in GAs, randomized initialization is performed for generating the initial population i.e. random bits are assigned to each gene. It is known that in the optimization algorithms, exploration is desired upfront and exploitation is desired subsequently. Hence, to support exploration initially and to save some runtime, it is important to generate a diverse as well as a reasonably goodquality initial population. In this work, an effective initialization heuristic, referred to as Dhdminimizing Initialization Heuristic, is proposed which randomly selects pairing that brings less number of deadheads to the solution. This procedure is given in lines 1222 of the Algorithm LABEL:PseudoCode. Though the resulting initial population is composed of reasonable goodquality feasible solutions, it also reflects a great extent of diversity.
IiiC Selection
This operator is used for selecting the chromosomes which will become the parents for the reproduction of child chromosomes. In the proposed GA, a binary tournament selection operator is adopted in which two sets of two chromosomes, each, are formed randomly. Out of each of these sets, the parent with the best fitness value is passed on to the crossover phase.
IiiD Crossover
Crossover phase is the transition phase in which the genetic information from the parent chromosomes is passed on to the next generation i.e. to reproduce new child chromosomes. In the literature, multiple crossover operators have been proposed such as onepoint crossover, twopoint crossover, uniform crossover, fusion crossover [7], etc. In the presented work, the following crossover operators are studied and compared.
– Crossover1: The fusion crossover, proposed in [7]
, has been widely adopted in the CPOP’s literature and has been found to be most effective. In this crossover, probabilities, based on parents’ fitness, are used to decide the genes being passed to the child chromosome.
– Crossover2: In order to improve the convergence rate, it is desired to incorporate greediness in the reproduction operators with the help of domain knowledge. One such example is proposed in [16]. With inspiration from the same work, a new crossover operator is proposed in this work for solving airline CPOPs. In this crossover, the expressed part of a child chromosome constitutes a zero deadhead solution which is made up of randomly selected pairings/columns from the parent chromosomes. The procedure for this crossover is given in lines 2433 of the Algorithm LABEL:PseudoCode.
Both of these crossover operators are modified in order to generate two child chromosomes from two parent chromosomes by repeating the similar procedure for both of them.
IiiE Mutation
After crossover, the mutation operator is applied to the resulting child chromosomes. The mutation operator is used to prevent the premature convergence i.e. to avoid getting stuck at local optima, by altering certain genes of the child chromosomes using some probability. In the presented work, two mutation operators are studied and compared, one is a bitflip mutation operator, referred as Mutation1, and the other is the mutation operator proposed in [11], referred as Mutation2, which is dependent on the density of the fittest solution in the population. In Mutation1, if an gene is selected for mutation, then the is flipped from 0 to 1 or viceversa. Whereas in Mutation2, if an gene is selected for mutation, then the is mutated from 0 to 1 with a probability equivalent to the percentage of 1s in the fittest individual and viceversa.
IiiF FeasibilityRepair Heuristic
After the crossover and mutation processes, the feasibility of the resulting child chromosomes is not guaranteed i.e. they may or may not cover all the given flights. Hence, a feasibilityrepair heuristic is required to enforce the feasibility in the child chromosomes while at the same time it is desirable to maintain the fitness of the child chromosome during this repair. A repair heuristic, proposed in [7], is adapted in this work and is modified to involve a redundantpairing removal step in the end. In this heuristic, a pairing with minimum quality index (given in Eq. 1) is selected for each uncovered flight leg.
(1) 
The detailed procedure is given in [7]. A redundantpairing removal step is added to this heuristic which tries to find and remove those pairings that covers the same flight legs as that of the whole solution without them. This step is explained in lines 3541 of the Algorithm LABEL:PseudoCode.
IiiG Fitness Evaluation
Fitness function is the objective function of the problem, and is used to evaluate the fitness value of a chromosome. In CPOP, the main objective is the minimization of the total crew pairing cost while covering all flights atleast once. Different airlines utilize different costing rules, making each fitness function unique. In this work, fitness function is given in Eq. 2 where and are the total number of pairings and flights to be covered respectively.
(2) 
(3) 
In order to be a feasible solution, the chromosome must satisfy the flightcoverage constraints, given in Eq. 3. In these equations, is the total cost of pairing; is the binarydecision variable which represents whether the pairing is selected in the solution () or not (); is an auxiliary binary variable which represents whether the flight is covered by pairing () or not (); and DhdPenalty is the deadhead penalty cost set by airlines.
IiiH Population Replacement
The last step of the GA is the population replacement step where the surviving population from the parent and child chromosomes is selected to become the parent population for the next GA iteration, termed as generation. There are two main population replacement approaches: generational and steadystate approaches. In this work, generational approach is adopted in which the elitist population (best n chromosomes out of n parent and n child chromosomes are selected) is passed to the next generation.
Iv Computational Experiments
All the computational experiments in this work are performed with a realworld mediumscale testcase which includes 839 flights and a single crew base, DAL (Dallas, US). This testcase is provided by GE Aviation and has been carved out from the networks of USbased big airlines (operating upto 33000 monthly flights and upto 15 crew bases). It is found that 430,873 legal crew pairings are possible for this testcase which is enormously huge in comparison to the amount of pairings handled with GAbased approaches in the literature. In this work, all the algorithms are implemented using an alternative implementation of Python v3.6, called PyPy v3, improving the computational speeds by a great extent. All computations are performed on a HP Z640 workstation (2 X Intel Xeon Processor E52630v3 @2.40GHz and 8Cores/16Threads, enabled with multiprocessing capabilities).
The parameter settings of the proposed GA, used in these experiments, are given in Table II.
Parameters  Value 

Population Size  
Term_Criterion  
Chromosome Length  
Crossover Rate  
Mutation Rate 
It is seen that on increasing the population size, the number of GA generations may decrease but it does not affect the overall runtime because on increasing the population size, the generation time also increase. Due to different calculation times of the proposed operators, the overall runtime of the GA is selected as the termination criterion instead of the number of generations so as to carry out a fair comparison among multiple GAruns. In [17], is proposed as the lower bound for the optimal mutation rate. With experiments, it is observed that this lower bound should be increased by a factor (3 in this work) in order to test the premature convergence. In this work, variants of GA operators are proposed which are either developed by the authors or adapted from the literature. To solve the abovementioned testcase and similar problems, it is imperative to find the most effective combination of these operators. For this, four configurations of the GA are implemented and compared whose structure is shown in Table III.
Operators  GA Configuration  

GA1  GA2  GA3  GA4  
Proposed Initialization Heuristic  
Mutation1  
Mutation2  
Crossover1  
Crossover2 
For each of these GA configuration, 10runs with different random seeds (uniformly distributed between 0 and 1) are performed. The experimental results of these runs are summarized in Table
LABEL:GAallruns and the comprative plots are shown in Fig. 3.Runtime  GAs  Crew Pairing Cost (USD)  # Deadheads  
(sec)  Best  Worst  Best  Worst  
70  GA1  2649823 57559  2494649  2710084  109545  977  1151 
GA2  1417223 9380  1398427  1430115  15606  149  164  
5000  GA1  980226 23091  964857  1037504  4004  35  49 
GA2  1195229 225555  957832  1430115  9861  35  164  
GA3  1192104 228745  949591  1430115  9861  30  164  
GA4  993209 5337  987638  1001487  0904  06  21 
First, the effect of the proposed deadheadminimizing initialization heuristic is studied. For this, the best solution among the initial populations of GA1 and GA2 are compared (first two rows of Table LABEL:GAallruns). It is observed that the characteristics (number of deadheads and total cost) of the best initial solution from the GA2runs are of reasonable highquality in comparison to that of the GA1runs. It is to be noted that the initialization runtime for these GAconfigurations are almost equivalent because the additional time consumed by the proposed heuristic is compensated by the time required to repair the infeasibility of the solutions from random initialization. Moreover, GA2runs leads to a bettercost crew pairing solution (best sol. among all seeds) than that from the GA1runs. Hence, the proposed initialization heuristic is highly effective in achieving a better initial population in the same runtime. Second, the effects of the mutation operator are studied. For this, the GAconfigurations, GA2 (using Mutation1) and GA3 (using Mutation2) are compared, results given in Table LABEL:GAallruns. From these results, it is observed that GA3runs lead to a better (w.r.t. both cost and deadheads) crew pairing solution than that from the GA2runs. However, the difference between them is marginal, equalizing the effects of both mutation operators. Hence, Mutation2 is considered in the following experiments. Third, the effects of the crossover operator are studied. For this, the GAconfigurations, GA3 (using Crossover1) and GA4 (using Crossover2) are compared. From the plot of GA4 in Fig. 3, it is evident that the Crossover2, proposed in this work, is highly effective in reducing the number of deadheads to a largeextent that too in a very short runtime. However, the cost of the final crew pairing solution from GA4runs is poorer (marginal) than that from GA3runs. On analyzing the crew pairings of the GA4runs’ best solution, it is found that the majority of pairings are those which covers less number of flight legs, referred as short pairings, increasing the total amount of pairings in the solution. With such large number of shortpairings, the solution becomes too rigid to allow the compact, yet large, pairings (covering a large number of flights in an efficient way) to enter the solution, hence, stopping at local optima.
As mentioned in Section I, a largescale column generation based airline crew pairing optimizer referred to as CGOptimizer, is used to evaluate the performance of the proposed GAconfigurations. Developed by the authors, CGOptimizer is a research output of this project which has been validated by GE Aviation. Due to commercial restrictions by the fundingsponsors, the details of CGOptimizer could not be revealed. CGOptimizer has been used to solve a largescale CPOP, targeting a weekly flight schedule (containing 3202 flights, 15 crew bases, and billion legal pairings). The bestknown solution of the testcase used in this work (839 flights and 1 crew base) is carved out of the solution of this bigger testcase (3202 flights and 15 crew bases) and is compared with the best solutions of the proposed GAconfigurations in Table LABEL:bestSols.
V Conclusions and Future Work
This paper proposes an efficient GA, with improved initialization and genetic operators, to solve a realworld mediumscale CPOP (839 flights, 1 crew base, and 430873 pairings), belonging to the network of larger airlines from the US. In this GA, the dhdminimizing initialization heuristic is highly effective in achieving a betterinitial solution ( in cost, in Dhds) than the randomized initialization in almost the same runtime. On studying the effects of two widelyadopted mutation operators, it is seen that Mutation2 performs better than Mutation1 though marginally. In this GA, a dhdminimizing crossover operator, Crossover2, is also proposed which is found to be highly effective in reducing the number of dhds (by a large extent) in short runtimes. Another contribution of this paper is the comparison of the proposed GA with a column generation based largescale optimizer (CGOptimizer), developed by authors to solve a largescale CPOP (3202 flights, 15 crew bases, billion legal pairings). For the given mediumscale CPOP, it is seen that the gap between the results of CGOptimizer and all GAconfigurations is more than , making the column generation a superior method to solve mediumscale and largescale CPOPs.
Algorithms  Total Cost  # Deadheads  # Pairings  %age Gap 

(USD)  (Cost)  
CGOptimizer  850303  2  142  0 
GA1  964858  39  169  13.47 
GA2  957833  35  172  12.65 
GA3  949592  30  171  11.68 
GA4  987639  09  242  16.15 
In the proposed crossover, Crossover2, the greediness towards minimizing dhds is inbuilt in its construct, making the GA biased towards selecting shorterpairings and driving the search towards local optima. Searchspace expansion heuristics [18] and variable mutation rates [7] could be adapted/utilized for improving the performance of the proposedGA. Moreover, the independent computations in GA (evaluation, etc.) could be parallelized by using the multiprocessing capabilities of the computational hardware.
Acknowledgment
The authors would like to acknowledge the invaluable support of GE Aviation team members: Saaju Paulose (Senior Manager), Arioli Arumugam (Senior Director Data & Analytics), and Alla Rajesh (Senior Staff Data & Analytics Scientist) for providing problem definition, realworld test cases, and for sharing domainknowledge during numerous stimulating discussions which helped the authors in successfully completing this work.
References
 [1] M.R. Garey and D.S. Johnson, “Computers and Intractability: A Guide to the Theory of NPCompleteness,” New York: W. H. Freeman & Company, vol. 44, 1979.
 [2] C. Barnhart, A. Cohn, E. Johnson, D. Klabjan, G. Nemhauser, and P. Vance, “Airline crew scheduling,” In Handbook of Transportation Science, Kluwer’s International Series, Dordrecht: Kluwer Academic Publishers, 2003.
 [3] C. Barnhart, E.L. Johnson, G.L. Nemhauser, M.W. Savelsbergh, and P.H. Vance, “Branchandprice: Column generation for solving huge integer programs,” Operations research, 46(3), pp. 316329, 1998.
 [4] P.H. Vance, C. Barnhart, E. Gelman, E.L. Johnson, A. Krishna, D. Mahidhara, G.L. Nemhauser, and R. Rebello, “A heuristic branchandprice approach for the airline crew pairing problem,” Georgia Institute of Technology, Atlanta, Technical Report LEC9706, 1997.
 [5] B. Zeren and İ. Özkol, “A novel column generation strategy for large scale airline crew pairing problems”, Expert Systems with Applications, 55, pp.13344, 2016.
 [6] J.H. Holland, “Adaptation in Natural and Artificial Systems,” MIT Press, Cambridge, 1975.
 [7] J.E. Beasley and P.C. Chu, “A genetic algorithm for the set covering problem,” European Journal of Operational Research, 94(2), pp.392404, 1996.

[8]
K. Deb and C. Myburgh, “Breaking the billionvariable barrier in realworld optimization using a customized evolutionary algorithm,” In Proceedings of the Genetic and Evolutionary Computation Conference 2016, pp. 653660, ACM, 2016.
 [9] D. Levine, “Application of a hybrid genetic algorithm to airline crew scheduling,” Computers & Operations Research, 23(6), pp.547558, 1996.
 [10] H.T. Ozdemir, and C.K. Mohan, “Flight graph based genetic algorithm for crew scheduling in airlines,” Information Sciences, 133(34), pp.165173, 2001.

[11]
H. Kornilakis, and P. Stamatopoulos, “Crew pairing optimization with genetic algorithms,” In Hellenic Conference on Artificial Intelligence, pp.109120, Springer, Berlin, Heidelberg, 2002.
 [12] B. Zeren, and İ. Özkol, “An Improved Genetic Algorithm for Crew Pairing Optimization,” Journal of Intelligent Learning Systems and Applications, 4(1), pp.7080, 2012.
 [13] M. Deveci, and N.Ç. Demirel, “A hybrid genetic algorithm for airline crew pairing optimization,” Economic and Social Development: Book of Proceedings, p.118, 2016.
 [14] M. Deveci, and N.Ç. Demirel, “Evolutionary algorithms for solving the airline crew pairing problem,” Computers & Industrial Engineering, 115, pp.389406, 2018.
 [15] D. Aggarwal, D.K. Saxena, M. Emmerich, and S. Paulose, “On largescale airline crew pairing generation,” In 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pp.593600, Nov. 2018.
 [16] T. Park, and K.R. Ryu, “Crew pairing optimization by a genetic algorithm with unexpressed genes,” Journal of Intelligent Manufacturing, 17(4), pp.375383, 2006.
 [17] T. Bäck, “Optimal mutation rates in genetic search,” In Proceedings of the fifth International Conference on Genetic Algorithms, pp. 28, Morgan Kaufmann, San Mateo, CA, 1993.
 [18] N.Ç. Demirel, and M. Deveci, “Novel search space updating heuristicsbased genetic algorithm for optimizing mediumscale airline crew pairing problems,” International Journal of Computational Intelligence Systems, 10(1), pp.10821101, 2017.
Comments
There are no comments yet.