A Non-Dominated Sorting Based Customized Random-Key Genetic Algorithm for the Bi-Objective Traveling Thief Problem

02/11/2020 ∙ by Jonatas B. C. Chagas, et al. ∙ The University of Adelaide Universidade Federal de Ouro Preto Michigan State University 8

In this paper, we propose a method to solve a bi-objective variant of the well-studied Traveling Thief Problem (TTP). The TTP is a multi-component problem that combines two classic combinatorial problems: Traveling Salesman Problem (TSP) and Knapsack Problem (KP). In the TTP, a thief has to visit each city exactly once and can pick items throughout their journey. The thief begins their journey with an empty knapsack and travels with a speed inversely proportional to the knapsack weight. We address the BI-TTP, a bi-objective version of the TTP, where the goal is to minimize the overall traveling time and to maximize the profit of the collected items. Our method is based on a genetic algorithm with customization addressing problem characteristics. We incorporate domain knowledge through a combination of near-optimal solutions of each subproblem in the initial population and a custom repair operation to avoid the evaluation of infeasible solutions. Moreover, the independent variables of the TSP and KP components are unified to a real variable representation by using a biased random-key approach. The bi-objective aspect of the problem is addressed through an elite population extracted based on the non-dominated rank and crowding distance of each solution. Furthermore, we provide a comprehensive study which shows the influence of hyperparameters on the performance of our method and investigate the performance of each hyperparameter combination over time. In addition to our experiments, we discuss the results of the BI-TTP competitions at EMO-2019 and GECCO-2019 conferences where our method has won first and second place, respectively, thus proving its ability to find high-quality solutions consistently.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In optimization research, problems with different characteristics are investigated. To find an appropriate algorithm for a practical problem often assumptions about characteristics are made, and then a suitable algorithm is chosen or designed. For instance, an optimization problem can have several components interacting with each other. Because of their interaction they build an interwoven system (klamroth2017multiobjective) where interdependencies in the design and the objective space exist. An optimal solution for each component independently will in general not be a good solution for the interwoven optimization problem. For instance, in multidisciplinary design optimization various disciplines are linked with each other and have influence on the objective value(s). The optimization of an aircraft wing for example, combines stress analysis, structural vibration, aerodynamics and controls (mdo). Due to the interwoveness modifying a single variable is likely to affect all objective values.

Such complex optimization problems usually require domain knowledge and a sufficient amount of computational resources to be investigated. For this reason, many researchers prefer solving academic test problems to show the performance of their algorithms. In order to provide an academic interwoven optimization test problem, the Traveling Thief Problem (TTP) (bonyadi2013travelling) was proposed in 2013 where two well-known subproblems, the Traveling Salesman Problem (TSP) and the Knapsack Problem (KP), interact with each other. As in the TSP problem a so-called thief has to visit each city exactly once. In addition to just traveling, the thief can make profit during its tour by stealing items and putting them in the rented knapsack. However, the thief’s traveling speed decreases depending on the current knapsack weight, which then increases the rent that the thief has to pay for the knapsack. Even though researchers have been investigating both subproblems for many years and a myriad of optimization algorithms has been proposed, the interaction of both problems with each other turned out to be challenging. The TTP problem seeks to optimize the overall traveling time and the profit made through stealing items. Most of the research focused on the single-objective problem where the objectives are composed by using a weighted sum. To be more preciously, the profit is reduced by the costs due to renting the knapsack. The costs are calculated by multiplying the overall traveling time by a renting rate. However, because the traveling time and profit represent solutions with different trade-offs, the problem is bi-objective in nature.

In this paper, we propose a Non-Dominated Sorting Biased Random-Key Genetic Algorithm (NDS-BRKGA) to obtain a non-dominated set of solutions for the BI-TTP, a bi-objective version of the TTP, where the goal is to minimize the overall traveling time and to maximize the profit of the collected items. The algorithm is based on the well-known evolutionary multi-objective optimization algorithm NSGA-II (deb2002fast), and the biased-random key encoding is used to deal with the mixed-variable nature of the problem. The customization makes use of domain knowledge which is incorporated by evolutionary operators. Our method uses existing solvers of the subproblems for the initial population and maps a genotype to phenotype to deal with heterogeneous variables. Also, solutions detected as infeasible are repaired before they are evaluated. Moreover, we use a customized survival selection to ensure diversity preservation.

The remainder of this paper is structured as follows. In Section 2, we provide a brief review of the literature about the TTP. Afterwards, we present a detailed description of the BI-TTP, as well as a solution example to demonstrate the interwovenness characteristic of the problem in Section 3. In Section 4, we describe our methodology to address the problem and present results evaluated on different test problems in Section 5. Finally, conclusions of the study are presented in Section 6.

2 Related Work

Thus far, many approaches have been proposed for the TTP. Note that most research so far considered the single-objective TTP formulation “TTP1” from bonyadi2013travelling, which is typically the TTP variant that is referred to as the TTP, however, multi-objective considerations of problems with interconnected components are becoming increasingly popular.

For the single-objective TTP, a wide range of approaches has been considered, ranging from general-purpose iterative search-heuristics 

(polyakovskiy2014comprehensive), co-evolutionary approaches (bonyadi2014socially), memetic approaches (mei2014interdependence), and swarm-intelligence based approaches (wagner2016stealing), to approaches with problem specific search operators (faulkner2015approximate). wagner2018casestudy provide a comparison of 21 algorithms for the purpose of portfolio creation.

Optimal approaches are rare but exist. neumann2017ttpPTAS showed that the TTP with fixed tours can be solved in pseudo-polynomial time via dynamic programming taking into account the fact that the weights are integer. wu2017ttpexact extended this to optimal approaches for the entire TTP, although their approaches are only practical for very small instances.

The number of works on multi-objective TTP formulations is significantly lower. Interestingly, the “TTP2” from bonyadi2013travelling is defined as a multi-objective problem right away. The objectives are the total time and the total value, because the value of items assumes to drop over time. blank2017solvingBittp investigated a variant of it, however, they decided to omit the value drop effect defined in the original TTP2.

When it comes to multi-objective approaches for the TTP1, only few works exist here as well. yafrani2017ttpemo created an approach that generates diverse sets of TTP/TTP1 solutions, while being competitive with the state-of-the-art single-objective algorithms; the objectives were travel time and total profit of items. wu2018evolutionary considered a bi-objective version of the TTP1; the objectives were the weight and the TTP objective score. This hybrid approach makes use of the dynamic programming approach for fixed tours and then searches over the space of tours only.

A more general discussion of a multi-objective approach to interconnected problems can be found in klamroth2016interwoven, and a more general discussion on the opportunities of multi-component problems can be found in Bonyadi2019.

In this present paper, we consider the original TTP1, with the objectives being time and profit, hence using a setting comparable to that of blank2017solvingBittp. This setting was also used in the BI-TTP competitions at the conference conferences EMO-2019111https://www.egr.msu.edu/coinlab/blankjul/emo19-thief/ and GECCO-2019222https://www.egr.msu.edu/coinlab/blankjul/gecco19-thief/. This definition is equivalent to considering the TTP2 without a value-reduction of items over time. The proposed problem definition aimed to build the bridge from TTP1 to TTP2 by having two objectives but not adding another additional complexity to the problem.

3 Bi-objective Traveling Thief Problem

The TTP is a combinatorial optimization problem that consists of two interweaving problems, TSP and KP. In the following, first the two components, TSP and KP, are described independently and then the interaction of the two subcomponents is shown.

In the TSP (tsp_study) a salesman has to visit cities. The distances are given by a map represented as a distance matrix with

. The salesman has to visit each city once and the result is a permutation vector

, where is the -th city of the salesman. The distance between two cities divided by a constant velocity (usually ) results in the traveling time for the salesman denoted by . The goal is to minimize the total traveling time of the tour:


There are different tours to consider, if we assume that the salesman has to start from the first city and travels on a symmetric map where .

For KP (knp_survey) a knapsack has to be filled with items without violating the maximum weight constraint. Each item has a value and a weight where . The binary decision vector defines, if an item is picked or not. The aim is to maximize the profit :


The search space of this problem is exponential with respect to and contains possible combinations. However, the optimal solution can be obtained by using dynamic programming with a running time of which makes the complexity of the problem pseudo-polynomial.

The traveling thief problem combines the above defined subproblems and lets them interact with each other. The traveling thief can collect items from each city he is visiting. The items are stored in a rented knapsack carried by the thief. In more detail, each city provides one or multiple items, which could be picked by the thief. There is an interaction between the subproblems: The velocity of the traveling thief depends on the current knapsack weight . It is calculated by considering all cities, which were visited so far, and summing up the weights of all picked items. The weight at city given and is calculated by:


The function is defined for each item and returns if the item could be stolen at city and otherwise. The current weight of the knapsack has an influence on the velocity. When the thief picks an item, the weight of the knapsack increases and therefore the velocity of the thief decreases.

The velocity is always in a specific range and cannot not be negative for a feasible solution. Whenever the knapsack is heavier than the maximum weight , the capacity constraint is violated.


If the knapsack is empty, then the velocity is equal to . Contrarily, if the current knapsack weight is equal to the velocity is .

Furthermore, the traveling time of the thief is calculated by:


The calculation is based on TSP, but the velocity is defined by a function instead of a constant value. This function takes the current weight, which depends on the index of the tour. The current weight, and therefore also the velocity, will change on the tour by considering the picked items defined by . In order to calculate the total tour time, the velocity at each city needs to be known. For calculating the velocity at each city, the current weight of the knapsack must be given. Since both calculations are based on and is part of the knapsack subproblem, it is challenging to solve the problem to optimality. In fact, such problems are called interwoven systems as the solution of one subproblem highly depends on the solution of the other subproblems.

Regarding the total profit of the items, this remains unchanged in the TTP from the original formulation in the KP problem.

Finally, the TTP problem is defined by


In order to illustrate the equations and interdependence, we present an example scenario here (see Figure 1). The thief starts at city 1 and has to visit city 2, 3, 4 exactly once and to return to city 1. In this example, each city provides one item, and the thief must decide whether to steal it or not.

Figure 1: Exemplary traveling thief problem instance.

A permutation vector, which contains all cities exactly once, and a binary picking vector are needed to calculate the objectives. Even though, this is a very small example with four cities and three items the total solution space consists of combinations.

In order to understand how the objectives are calculated, an example hand calculation for the tour [1,3,2,4] and the packing plan [1,0,1] is done as follows. The thief starts with the maximum velocity, because the knapsack is empty. He begins its tour at city and picks no item there. For an empty knapsack the velocity is . The distance from city to city is and the thief needs time units. At city the thief will not pick an item and continue to travel to city with and therefore with in additional time units. Here he picks item with and the current weight becomes , which means the velocity will be reduced to . For traveling from city to city the thief needs the distance divided by the current velocity . At city he picks item with and the current knapsack weight increases to . For this reason the velocity decreases to . For returning to city the thief needs according to this current speed time units. Finally, we sum up the time for traveling from each city to the next to calculate the whole traveling time.

1 1 0 1 9 9 -
2 3 0 1 5 5 9
3 2 30 0.6625 5 7.5472 14
4 4 51 0.42625 3 7.0381 21.547
5 1 - - - -


Table 1: Hand calculations for [0,2,1,3] [0,1,0,1] on the example scenario.

The final profit is calculated by summing up the values of all items which is . Consequently, the TTP solution [1,3,2,4] [1,0,1] is mapped to the point in the bi-objective space.

Below all Pareto-optimal solutions of this example are listed. The Pareto front contains solutions. The solution considering for the hand calculation is highlighted in bold.

[1, 2, 3, 4] [0, 0, 0] 20.0 0.0
[1, 4, 3, 2] [0, 0, 0] 20.0 0.0
[1, 2, 3, 4] [0, 0, 1] 20.93 25.0
[1, 4, 3, 2] [1, 0, 0] 22.04 34.0
[1, 4, 3, 2] [0, 1, 0] 27.36 40.0

[1, 3, 2, 4]

[1, 0, 1]



[1, 2, 3, 4] [0, 1, 1] 33.11 65.0
[1, 4, 3, 2] [1, 1, 0] 38.91 74.0
Table 2: Pareto front of the example scenario.
Figure 2: Exemplary traveling thief problem instance

Figure 2 shows the objective space colored by tour, where triangles denote non-dominated solutions, and circles denote dominated ones. We can observe that different Pareto-optimal solutions can have different underlying tours. In addition, we can see that for each solution where no item is picked, there is another solution with its tour symmetric to the tour of , and, consequently, both solutions e have the same traveling time, once we consider that the thief travels on a symmetric map.

4 A Customized Non-dominated Sorting Based Genetic Algorithm with Biased Random-Key Encoding

Genetic algorithms (GAs) provide a good starting point because almost no assumptions about the problem properties itself are made. GAs are highly customizable and the performance can be improved through defining/redefining the evolutionary operators. For the BI-TTP, we propose a Non-Dominated Sorting Biased Random-Key Genetic Algorithm (NDS-BRKGA), which combines two classical evolutionary metaheuristics: Biased Random-Key Genetic Algorithm (BRKGA) (gonccalves2011biased) and Non-Dominated Sorting Genetic Algorithm II (NSGA-II) (deb2002fast). Both concepts come together to address the following characteristics of the BI-TPP:

  1. [label=(), leftmargin=13mm]

  2. Existing solvers for each subproblem: Both subproblems, TSP and KP, have been studied for decades and good solvers for each problem exist. We incorporate this domain knowledge by using a heuristic-based initial population by combining near-optimal solutions of each subproblem. In our initial population, we seek to preserve a high diversity among individuals, in order not to lead our algorithm to premature convergence.

  3. Maximum capacity constraint: Through a repair operation before any evaluation of an individual, the domain knowledge can be incorporated to avoid the evaluation of infeasible solutions. An effective repair allows the algorithm to search only in the feasible space.

  4. Heterogeneous variable types: A tour (permutation) and a packing plan (binary decision vector) need to be provided to evaluate a solution. Both variables are linked with each other. Handling different types of variables can be challenging, therefore, we introduce a real-valued genotype by using the biased-random key principle. This allows applying traditional evolutionary recombination operators on continuous variables.

  5. Bi-objective: The traveling time of the thief is supposed to be minimized, and the profit to be maximized. We consider both conflicting objectives at a time by using the non-dominated sorting and crowding distance in the survival selection. This ensures the final population contains a set of non-dominated solutions with a good diversity in the objective space.

In the remainder of this section, we first explain the overall procedure and then the role of each criterion mentioned above.


Figure 3 illustrates the overall procedure of NDS-BRKGA. At first, we generate the initial population using efficient solvers for the subproblems independently. Afterward, we combine the optimal or near-optimal solutions for both subproblems and convert them to their genotype representation which results in the initial population. For the purpose of mating, the population is split into an elite population and non-elite population . The individuals for the next generations are a union of the elite population directly, the offspring of a biased crossover and mutant individuals. In case an individual violates the maximum capacity constraint, we execute a repair operation. Then, we convert each individual to its corresponding phenotype and evaluate it on the problem instance. In order to insert an explicit exploitation phase in our algorithm, we apply at some evolutionary cycles a local search procedure in some elite individuals. Finally, the survival selection is applied and if the termination criterion is not met, we increase the generation counter by one and continue with the next generation. In the following, we describe the purpose of each of the design decisions we have made and explain what role it plays during a run of the algorithm.

Figure 3: NDS-BRKGA: A customized genetic algorithm.

Genotype to phenotype decoding

In order to facilitate the exploration of the BI-TTP solution space, we represent the genotype of each individual as a vector of random-keys, which is a vector of real numbers between in the interval [0,1]. It is an indirect representation that allows us to navigate in the feasible solution space of any optimization problem through simple genetic operators. This representation strategy has been successfully applied to several complex optimization problems (gonccalves2012parallel; resende2012biased; gonccalves2013biased; lalla2014biased; gonccalves2015biased; santos2018thief).

Because this representation is independent of the problem addressed, a deterministic procedure is necessary to decode each individual to a feasible solution of the problem at hand, i.e., an algorithm that decodes a genotype to its respective phenotype. In Figure 4, we illustrate the genotype and phenotype structure for the BI-TTP. The structure can be divided into two parts, the tour, and the packing plan. The tour needs to be decoded to a permutation vector. It is known that the thief is starting at city and, therefore, the order of the remaining cities needs to be determined. To achieve this, the sorting of the random key vector with length forms a permutation from to . Then, each value is increased by to shift the permutation from to . By appending this permutation to the first city, the tour is decoded to its phenotype. The packing plan needs to be decoded to a binary decision vector of length . The decision whether to pick an item or not is made based on the value of the biased random key which has the same length. If the corresponding value is larger than the item is picked up, otherwise not. A exemplary decoding of the biased random key vector (see Figure 4) would be the following: First separate the genotype into two parts and . Then, sort the first vector and increase each value by results in the permutation . By appending it this vector to the first city the tour is . For the second part, for each value in we set the bit if it is larger than which results in . Note that, this example decodes to the variable used for our hand-calculation in Section 3. Moreover, the decoding is a many-to-one mapping which means the same phenotype can be represented by different genotypes.

Figure 4: Chromosome structure: genotype to phenotype.

Repair operator

According to the decoding procedure previous described, a genotype can generate an infeasible phenotype concerning the packing plan. This occurs when the total weight of the picked items is higher than the maximum limit of the knapsack. In order to repair an infeasible genotype, we apply an operator that removes items from the packing plan until it becomes feasible. In this repair operator, we give preference to keeping items collected last, since this way the thief can travel faster at the beginning of its journey. Therefore, we first remove all items collected from the first city visited by the thief. If the removal of these items makes the packing plan feasible, the repair operator is finished; otherwise, we repeat the previous step considering all items of the next city visited by the thief. This process repeats until the weight of all remaining items in the packing plan does not exceed the limit of the knapsack. We also repair the genotype to avoid propagating non-feasibility throughout the evolutionary process. For this purpose, we assign a real number less than 0.5 to every random-key that references an item that has not been collected.

Initial population

We use a biased initial population to incorporate domain knowledge into the genetic algorithm. Because both subproblems of the BI-TTP are well-studied, we make use of existing algorithms to generate a good initial population. We maintain a population of individuals throughout the evolutionary process. To create the initial population , we combine the tour found by TSP and the packing plan by KP solvers. To be not too biased to near-optimal solutions of each subproblem, those combinations represent only a small fraction of the entire population. Because the corresponding solvers provide the phenotype presentation, we convert them to their genotype representation to be able to apply evolutionary operators later on. We complete the population by adding randomly created individuals to it, where each random-key is generated independently at random in the real interval .

solve the TSP component   // LKH algorithm
  // symmetric
solve the KP component   // GH + DP algorithm
3 repeat
4       select item with the highest rate
7       if  then
9      else
11       end if
13until  select randomly a set of individuals from

using a uniform distribution

14 generate a set of random individuals
Algorithm 1 Generate initial population

In Algorithm 1, the required steps to create the initial population are described in more detail. At first, we use the Lin-Kernighan Heuristic (LKH) (lin1973effective), version 2.0.9333Available at http://akira.ruc.dk/~keld/research/LKH/, for solving the TSP component (Line 1). We consider the symmetrical tour found by LHK (Line 1). As we consider that the thief travels on a symmetric map, where both these tours result in the same overall traveling time. Note that achieving near-optimal TSP tours is not a guarantor for near-optimal TTP solutions, and it has been observed that slightly longer tours have the potential to yield overall better TTP solutions (wagner2016stealing; wu2018evolutionary). However, we observed that near-optimal TSP tours combined with KP packing with lighter items generate BI-TTP solutions very close to the Pareto front regarding the traveling time objective.

Next, we apply a two-stage heuristic algorithm, which has been developed by us for solving the KP component (Line 1). We named this two-stage heuristic algorithm GH+DP because it combines a Greedy Heuristic (GH) with classical Dynamic Programming (DP) for solving the knapsack problem. The GH+DP algorithm starts by sorting all items according to profit/weight ratio in non-increasing order. It then proceeds to insert the first items such that the total weight is not greater than where delta is a hyperparameter of our method. Next, it uses the classic dynamic programming algorithm (toth1980dynamic) for solving the smaller KP considering the last items and a knapsack of capacity . There is no guarantee that near-optimal KP packing plans generate to near-optimal TTP solutions. However, we observed that we can generate BI-TTP solutions very close to the Pareto front regarding the profit objective by combining near-optimal KP packing plans with efficient TSP tours for collecting them.

Afterward, we combine TSP and KP solutions to create new individuals (Line 1 to 1). Note that we first create two individuals (Line 1) from the tour (and its symmetric tour) found by LKH and from the empty knapsack solution. Next, iteratively, we create new non-dominated individuals so that at each iteration a single individual is created from the TSP solutions previously considered and also from a partial solution of the KP solution found by GH+DP algorithm. After creating all individuals, we select only a subset of them to compose the initial population. We randomly select ( is a parameter with its value between 0 and 1) individuals uniformly distributed from all individuals generated (Line 1), then we generate random individuals (Line 1) in order to complete the initial population, which is returned at the end of algorithm (Line 1).

Elite and non-elite population

It is a common strategy of multi-objective optimization algorithms to give more importance to non-dominated solutions in the population during the recombination and environmental survival. We split the population into two groups: the elites and non-elites. The number of elites is defined beforehand by the parameter . We use the survival selection of NSGA-II (deb2002fast) as a splitting criterion (see Figure 5). The current population and the offspring are merged together and non-dominated sorting is applied. The outcome is a number of fronts , each of which is a set of individuals. Because the survival selection requires to select only individuals from the merged population, it might be the case that a front needs to be split into surviving and non-surviving individuals. In our example, and are surviving individuals because of their non-domination criterion. However, needs to be split. Therefore, a second criterion, crowding distance is introduced. Based on the distance to neighboring individuals in the objective space a crowding distance metric is calculated and assigned to each individual.

We use the non-dominated sorting and crowding distance to incorporate elitism. As is usually done, before calculating the crowding distance, we normalize the objectives in order to avoid a possible higher influence of a single objective. The number of elite individuals are determined by executing the NSGA-II survival on our current population with the goal to let individuals survive. The resulting survivors are added to the group of elites and the remaining to the group of non-elites .

Figure 5: NSGA-II survival selection (Illustration based on deb2002fast).

Biased crossover

For the purpose of diversity preservation, we apply a biased crossover in order to create new offspring individuals. This is common practice when random-keys are used as a genotype representation. The biased crossover operator involves two parents. The first is randomly selected from the elite population and the second randomly from all population . Moreover, the biased crossover operator has a parameter

which defines the probability of each random-keys of the first parent (it always belongs to the elite population) to be inherited by the offspring individual. More precisely, from an elite parent

and another any parent , we can generate an offspring according to the biased crossover as follows:

where , and are, respectively, the -th random-key of individuals , and .

Mutant individuals

As in BRKGAs, we are not using any mutation operator, which are commonly used in most GAs (mitchell1998introduction). In order to maintain diversity in the population, we use so called mutant individuals. Mutant individuals are simply randomly created individuals where each random-key is sampled from a uniform distribution in the range.


The population of the next generation is formed based on the current population . The survival is the union of three different groups of individuals:

  1. [label=(), leftmargin=13mm]

  2. Elite population: Part of the survival is based on elitism. Before the mating, a sub-population of individuals are selected as elites according to the NSGA-II criteria. Which means non-dominated sorting determines the rank of each solution and then for each non-dominated front the crowding distance is calculated. In case of a tie, two solutions are ordered randomly. Based on this absolute ordering we pick individuals and directly copy them to .

  3. Mutant individuals: In order to maintain a high diversity, a population of mutant individuals are added to . The strategy of keeping a separate set with mutants introduces a diversity of offspring during the mating. Otherwise, the recombination would be biased towards elites in the population and a premature converge through loss of diversity is likely.

  4. Offsprings from biased crossover: To complete the number of individuals in the next population , individuals are generated and added on it through mating by the biased crossover operator. The biased crossover chooses one parent from the elites and another one from the non-elites. This mating is even more biased towards the elite population than the traditional binary tournament crossover used in NSGA-II, because it forces for each mating a non-dominated solution to participate in it.

Finally, the surviving individuals are obtained by merging these three sets together . The survival is partly based on elitism through letting survive for sure, but also adds two more diverse groups through evolutionary operators.

Local search

At some evolutionary cycles, we apply an exploitation procedure of the search space by modifying the genotypes of some individuals to enhance the fitness of the current population. This methodology is commonly applied to traditional genetic algorithms in order to balance the concepts of exploitation and exploration, which are aspects of effective search procedures (neri2012memetic). Genetic Algorithms (GAs) with exploitation procedure are known and widely referenced as Memetic Algorithms (MAs). According to krasnogor2005tutorial, MAs have been demonstrated to be more effective than traditional GAs for some problem domains, especially for combinatorial optimization problems.

In NDS-BRKGA, the local search is only applied to some percentage of the population and consists of two phases: First the tour is considered separately and the permutation is modified; Second, it considers only the packing plan and through bit-flips items are either removed or added. In Algorithm 2 we describe the exploitation phase in more detail. In order to ensure a high diversity in the current population, we execute a local search only for 10% of all elite individuals (Line 2). Initially, we decode each individual to its phenotype consisting of a tour and a packing plan (Line 2). Then, the two phases of local optimizations are considered. First, we apply a limited local search procedure (Line 2 to 2) to the tour . The local search makes use of the well-known -opt move the permutation vectors. This principle has been successfully incorporated to solve various combinatorial optimization problems, including the single-objective TTP (el2016population; el2018efficiently). We limited the number of -opt moves to a small value since the exploitation phase may become computationally expensive when large instances are considered. After all -opt moves have been executed, the tour and the packing plan are added to the elite population if it is not dominated by any solution in (Line 2). Second, we intend to improve the packing plan by applying bit-flip random moves (Line 2 to 2). The bit-flip is also a well-known operator widely used in combinatorial optimization problems, including the single-objective TTP as well (faulkner2015approximate; chand2016fast). Again because of the computational expensiveness, we apply random bit-flip moves (Line 2-2). As before, the new BI-TTP solution is insert into if it is not dominated by any solution in (Line 2). Finally, if contains more than solutions (Line 2), we select the best according to the their non-dominated rank and crowding distance (Line 2).

1 select randomly individuals from current elite population
2 foreach  do
3       decode in a feasible BI-TTP solution
4       for  to  do
5             generate a random 2-opt move in
6             if  then 
7       end for
8       if  then  for  to  do
9             select randomly an
10             if  then else    if  then 
11       end for
13 end foreach
14if  then
15       select individuals according to the NSGA-II criteria
16 end if
Algorithm 2 Exploitation phase.

To balance the exploration and exploitation phases in a run of the algorithm, we apply the exploitation phase only at every evolutionary cycles, which is another hyperparameter of our proposed method.

5 Computational Experiments

In this section, we present computational experiments we have employed to study the performance of our proposed method. We have chosen C/C++ as a programming language and have used BRKGA framework developed by toso2015c and the Lin-Kernighan Heuristic (LKH), version 2.0.9444Available at http://akira.ruc.dk/~keld/research/LKH/. The experiments have been executed on a high-performance cluster where each node is equipped with Intel(R) Xeon(R) 2.30 GHz processors. Each run of our algorithm has been sequentially (nonparallel) performed on a single processor. Our source code as well as all non-dominated solutions found for each test instance considered our available online555Available at https://github.com/jonatasbcchagas/nds-brkga_bi-ttp.

In the following, we evaluate the performance of our proposed method on a variety of test instances. To be neither biased towards test instances with only a small or large number of cities and items, we have selected test instances with the purpose to cover different characteristics of the problem. Due to the design decisions we have made during the algorithm development, we provide a detailed hyperparameter study to show the effectiveness of each customization of the evolutionary algorithm. Moreover, we present the rankings of competitions where we submitted our implementation to.

5.1 Test instances

In order to analyze the performance, we have considered nine medium/large instances from the comprehensive TTP benchmark developed by polyakovskiy2014comprehensive. These instances, which are described in Table 3, have been used in the BI-TTP competitions at EMO-2019666https://www.egr.msu.edu/coinlab/blankjul/emo19-thief/ and GECCO-2019777https://www.egr.msu.edu/coinlab/blankjul/gecco19-thief/ conferences.

Instance n m Q Knapsack Type R
a280_n279 280 279 25936 bsc 01
a280_n1395 1395 637010 usw 05
a280_n2790 2790 1262022 unc 10
fnl4461_n4460 4461 4460 387150 bsc 01
fnl4461_n22300 22300 10182055 usw 05
fnl4461_n44600 44600 20244159 unc 10
pla33810_n33809 33810 33809 2915215 bsc 01
pla33810_n169045 169045 77184794 usw 05
pla33810_n338090 338090 153960049 unc 10
Table 3: BI-TTP test instances.

From Table 3, we can observe the characteristics of the instances, which involve 280 to 33810 cities (column ), 279 to 338090 items (column ), and knapsacks (column ). Furthermore, the knapsack component of each instance has been built according to profit/weight ratio of items in three different ways: bounded strongly correlated (bsc), uncorrelated with similar weights (usw), and uncorrelated (unc). To diversify the size of the knapsack component, it has been defined for each instance how many items per city are available (column ). For instance, when , then items are available at each city. For example, assuming a problem with cities, this results in items in total (assuming the first city never has any items).

5.2 Hyperparameter Study

Customization often involves adding new hyperparameters to the algorithm. Therefore, it is crucial to ensure the hyperparameters are chosen well with respect to the performance of the algorithm on a variety of test problems. For this reason, we investigate the influence of hyperparameters in our proposed method. In the experiment, we run a systemic setup of hyperparameters to finally draw conclusions regarding their performance on different type of test instances. Finally, we provide suggestions of how to choose hyperparameters for new unknown problems.

Our proposed method has eight hyperparameters in total which are shown in Table 4: Population size , elite population size , mutant population size , elite allele inheritance probability , fraction of the initial population created from TSP and KP solutions (see Algorithm 1), and the frequency , in terms of evolutionary cycles in which a local search is applied (see Algorithm 2). Furthermore, two subproblem dependent parameters have to be defined: which is the upper bound for the time of the TSP solvers to be executed and which is the number of different KP capacities that should be considered. To evaluate the influence of each hyperparameter we conduct several experiments.

Parameter Description
Population Size
Number of Elites
Number of Mutant Individuals
Probability of Elite during Biased Crossover
Fraction of near-optimal Solutions obtained by solving TSP or KP independently
Frequency of Local Search Procedures
TSP - t Time in Seconds to solve the TSP problem
KP - Gap of Knapsack Capacity in between different KP Optimizer Runs
Table 4: Hyperparameter overview.

In this hyperparameter study, we first investigate the effect of and on the performance. Both variables affect the initial population which consists partly of solutions from TSP and KP solvers. For solving the TSP component independently, we have used the Lin-Kernighan Heuristic (LKH). The LKH is one of the most efficient algorithms for generating optimal or near-optimal solutions for the symmetric traveling salesman problem. Naturally, the LKH has higher computational costs as TSP instances increase. To balance the computational cost and the quality of the solution achieved, we limit the LKH execution time to different values and compare the obtained solution with the optimal solution. As LKH has random components, we run it ten independent times and use the average reached by them. In Table 5, for each TSP instance and each runtime, we show the relative percentage difference between the solution obtained with limited time and the TSP optimal solution.

TSP t in seconds
comp. 60 180 300 420 600 1800 3600
a280 0.0000% 0.0000% 0.0000% 0.0000% 0.0000% 0.0000% 0.0000%
fnl4461 0.0180% 0.0078% 0.0035% 0.0028% 0.0028% 0.0000% 0.0000%
pla33810 0.6653% 0.5964% 0.4261% 0.3339% 0.2377% 0.1084% 0.0837%
Avg. 0.2278% 0.2014% 0.1432% 0.1122% 0.0802% 0.0361% 0.0279%
Table 5: Influence of execution time on the LKH heuristic algorithm.

Table 5 shows that even for shorter computational times LKH is efficient. On average, LKH has been able to find solutions with a gap less than to the optimal solutions, even considering only seconds of processing. Naturally, the quality of solutions increases with longer computational time. As we do not pursue spending a significant amount of time solving the TSP independently, we have limited the LKH in seconds in our implementation. The experiment indicates that this is sufficient to produce near-optimal TSP solutions to be used in the initial population.888Note that we rotate the computed tours to conform with the requirement for all TTP tours to start and finish in city number 1.

Moreover, the parameter used in the KP solver has to be studied and the performance of the GH+DP algorithm evaluated. This algorithm has been developed for solving the KP component independently. For each KP instance, we have run the GH+DP for different values of and have measured the quality of the solutions obtained. Table 6 shows the difference between the solution obtained and the KP optimal solution for different ’s. In addition to the difference, it provides the computational time required in seconds . It can be observed that for almost all instances GH+DP was able to find the KP optimal solution even with a relatively small . Moreover, the larger , the larger the fraction of the knapsack solved using the dynamic programming algorithm, hence the higher quality of the solution and the longer the computation time. In order to find the best possible solutions for the KP component within a reasonable time, we have chosen to use .

KP comp.
t(s) t(s) t(s) t(s) t(s) t(s)
n279 0 0 0 0 0 0 0 0 0 0 0 2
n1395 0 0 0 0 0 0 0 3 0 33 0 1364
n2790 0 0 0 0 0 0 0 2 0 18 0 1339
n4460 0 0 0 0 0 1 0 54 0 162 0 9007
n22300 27 0 12 1 12 2 0 105 0 265 0 10234
n44600 0 0 0 0 0 1 0 47 0 136 0 5193
n33809 0 1 0 3 0 6 0 261 0 1149 0 22945
n169045 298 3 298 7 295 14 228 437 11 3980 0 35711
n338090 0 1 0 2 0 5 0 226 0 928 0 19594
Avg. 36.1 0.6 34.4 1.4 34.1 3.2 32.0 126.1 1.2 741.2 0.0 11708.8
Table 6: Influence of on GH+DP algorithm.

This preliminary study on subproblem solvers has shown that for the TSP solver and for KP solver seem to be reasonable parameters regarding the trade-off of running time and quality of solutions. So far, we have evaluated the goodness on each of the subproblems without considering the interwovenness aspect. Next, the parameter defines how many solutions are used from those subproblem solvers during the initialization. To draw conclusions about the remaining six parameters, we have conducted an experiment with predefined parameter settings. The considered parameter values for each parameter are shown in Table 7. We have considered all 972 possible combinations that can be formed by combining these values. Because the proposed method contains components with underlying randomness, we have run each parameter configuration times for hours. Altogether, the experiments have consumed CPU hours which is equivalent to almost CPU years.

Parameter Values
100, 200, 500, 1000
0.3, 0.4, 0.5
0.0, 0.1, 0.2
0.6, 0.7, 0.8
0.0, 0.1, 0.2
10, 50, 100
Table 7: Parameter values considered during the experiment.

We use the hypervolume indicator (HV) (zitzler1998multiobjective) as a performance indicator to compare and analyze results obtained from parameter configurations. It is one of the most used indicators for measuring the quality of a set of non-dominated solutions by calculating the volume of the dominated portion of the objective space bounded from a reference point. Considering the BI-TTP, it considers the dominated volume regarding the minimum time and the maximum profit. Note that maximizing the hypervolume indicator is equivalent to finding a good approximation of the Pareto front, thereby the higher the hypervolume indicator the better the solution sets are (in general terms). To make hypervolume suitable for comparison in the objective values need to be normalized beforehand. Therefore, have first normalized the values of the objectives between 0 and 1 according to their boundaries before computing the hypervolume.

Figure 15 shows the convergence according to the hypervolume indicator for each instance throughout hours, considering -minute intervals. For each interval, we have plotted the result of the best parameter configuration at each time interval. Each parameter configuration is described in Table 8. It is important to note that the vertical axis (hypervolume values) of the plots are not on the same scale. The figure shows that our proposed method has been able to quickly converge for most of the instances, indicating that our algorithm does not need excessive processing time to achieve good solutions.

(a) a280_n279
(b) a280_n1395
(c) a280_n2790
(d) fnl4461_n4460
(e) fnl4461_n22300
(f) fnl4461_n44600
(g) pla33810_n33809
(h) pla33810_n169045
(i) pla33810_n338090
Figure 15: Convergence plots according to the hypervolume indicator.
Instance Runtime Best parameter configuration
a280_n279 600 500 0.4 0.1 0.7 0.1 10
1200 - 1800 500 0.4 0.1 0.7 0.2 10
2400 - 3000 500 0.3 0.1 0.6 0.2 10
3600 - 6000 1000 0.3 0.1 0.8 0.2 10
6600 - 7800 1000 0.3 0.2 0.8 0.1 10
8400 500 0.4 0.1 0.8 0.1 100
9000 - 9600 1000 0.3 0.2 0.8 0.1 10
10200 - 18000 500 0.3 0.1 0.8 0.2 50
a280_n1395 600 - 1200 1000 0.5 0.0 0.7 0.1 10
1800 1000 0.5 0.0 0.6 0.1 10
2400 - 4800 1000 0.5 0.0 0.7 0.1 50
5400 - 16200 1000 0.5 0.0 0.6 0.1 50
16800 - 18000 1000 0.5 0.0 0.6 0.1 100
a280_n2790 600 1000 0.5 0.0 0.7 0.1 100
1200 - 1800 1000 0.5 0.0 0.6 0.1 50
2400 - 3000 1000 0.5 0.0 0.7 0.1 50
3600 - 18000 1000 0.5 0.0 0.6 0.2 50
fnl4461_n4460 600 500 0.5 0.0 0.7 0.2 50
1200 - 18000 1000 0.5 0.0 0.6 0.2 50
fnl4461_n22300 600 1000 0.5 0.0 0.8 0.1 10
1200 - 4800 1000 0.5 0.0 0.6 0.2 100
5400 - 10800 1000 0.5 0.0 0.6 0.1 100
11400 - 18000 1000 0.5 0.0 0.6 0.2 50
fnl4461_n44600 600 1000 0.5 0.0 0.8 0.2 100
1200 1000 0.5 0.0 0.7 0.2 100
1800 - 3000 1000 0.5 0.1 0.7 0.2 100
3600 - 7800 1000 0.5 0.0 0.6 0.2 100
8400 - 15000 1000 0.5 0.1 0.6 0.2 100
15600 - 18000 1000 0.5 0.0 0.6 0.2 50
pla33810_n33809 600 500 0.4 0.2 0.7 0.1 10
1200 500 0.5 0.1 0.8 0.2 10
1800 1000 0.4 0.0 0.7 0.1 10
2400 - 18000 1000 0.5 0.0 0.6 0.2 10
pla33810_n169045 600 - - - - - -
1200 100 0.3 0.0 0.7 0.1 10
1800 500 0.3 0.0 0.7 0.1 10
2400 - 3000 500 0.5 0.2 0.8 0.2 10
3600 1000 0.4 0.0 0.7 0.2 10
4200 1000 0.4 0.0 0.8 0.1 10
4800 - 5400 1000 0.5 0.0 0.6 0.1 10
6000 - 10200 1000 0.5 0.0 0.8 0.1 10
10800 - 18000 1000 0.5 0.2 0.8 0.2 10
pla33810_n338090 600 1000 0.5 0.2 0.8 0.2 100
1200 1000 0.5 0.0 0.8 0.2 50
1800 - 4800 1000 0.5 0.0 0.8 0.2 100
5400 - 18000 1000 0.5 0.2 0.8 0.2 100
Table 8: Best parameter configurations as observed in the hyperparameter study.

Table 8 shows that the best parameter configuration for each instance changes over runtime. However, the number of changes among them decreases as the runtime increases, which means our method keeps stable regarding its best parameter configuration as the runtime increases. Although a single parameter configuration cannot extract the best performance by considering all instances, we can observe some patterns and trends among the different parameter configurations. A good parameter configuration is a population with a larger number of individuals, higher survival rate for the best individuals, insignificant contribution from mutant individuals, and high contribution of the TSP and KP solvers for creating part of the initial population.

In the following, we analyze the behavior of the parameters considering all instances together. In Figure 22, we visualize the best parameter configurations at six different execution times. In each plot the best obtained parameter configuration regarding hypervolume is highlighted in red and parameter configurations up to 0.1% worse than the best are highlighted in blue. Note that the importance of values of each parameter among the best parameter configurations is indicated by the intensity of the blue color once some parameter configurations share some parameter values. The following can be observed:

  1. [label=(), leftmargin=13mm]

  2. More execution time, better results: The number of parameter configurations that are capable of generating large hypervolume values increases as the execution time of our algorithm increases. This means that in some runs even though the hyperparameters were not set appropriately, the algorithm is still able to converge.

  3. Importance of TSP and KP solvers: It has influence on the overall performance of the algorithm if TSP and KP solvers are used for initialization which is determined by . The best results are obtained if at least or percent of the initial solutions are biased towards those solutions found a TSP and KP solvers.

  4. Trends when execution time increases: We can see a trend as the execution time increases. Our method performs better with a large population, a large survival rate, a small or no explicit diversification through mutant individuals, a small influence of single-parent inheritance, an minor influence of a good initial population, and significant influence of local search procedure.

(a) 600 seconds
(b) 1200 seconds
(c) 1800 seconds
(d) 3600 seconds
(e) 7200 seconds
(f) 18000 seconds
Figure 22: Best parameter configurations over all instances with varying execution times.

5.3 Competition Results

In order to analyze the efficiency of NDS-BRKGA compared to other methods, we present the results of the BI-TTP competitions held at EMO-2019 and GECCO-2019 conferences, where our method has been used and its solutions have been submitted. Following the criteria of both competitions, we compared the efficiency of the solutions of each submission for each test instance according to the hypervolume indicator. For each instance, the reference point used to calculate the hypervolume has been defined as the maximum time and the minimum profit obtained from the non-dominated solutions, which have been built from all submitted solutions.

In both competitions, the number of accepted solutions for each instance has been limited: For test instances based on a280 to , fnl4461 to , and for pla33810 to . Because NDS-BRKGA returns a non-dominated set of solutions (named here as ), where its size can be larger than the maximum number of solutions accepted, we apply the dynamic programming algorithm developed by auger2009investigating, which is able to find a subset with , such that the weighted hypervolume indicator of is maximal. As stated by auger2009investigating, this dynamic programming can be solved in time .

For the EMO-2019 competition, we have used a preliminary version of the NDS-BRKGA described in Section 4. At that time, our method did not use the exploitation phase, which is summarized in the Algorithm 2. In addition, the initial population of the algorithm used in that version has been created essentially at random. Only four individuals have been created from the TSP and KP solutions, which have been obtained by the same solvers previously described in this work. More precisely, those four individuals have been built from BI-TTP solutions , , , and , where is the tour found by LKH algorithm, is the symmetric tour to , and is the packing plan for the knapsack obtained by GH+DP algorithm.

In addition to our method, five other teams also submitted their solutions to the EMO-2019 competition. Among all submissions, NDS-BRKGA has had the best performance in seven of the nine test instances, resulting in the first-place award. All competition details, classification criteria, and results can be found at https://www.egr.msu.edu/coinlab/blankjul/emo19-thief/, where our submission is identified as “jomar” (a reference to the two authors (Jonatas and Marcone) who first worked on our algorithm). We herewith also present later a brief summary of these results.

After the EMO-2019 competition, we have realized that the inclusion of more individuals created from TSP and KP solvers helped the evolutionary process of our algorithm by combining more individuals with higher fitness from the first evolutionary cycles. Therefore, we initialize the population as shown in Algorithm 1. This new initial population initialization has been used in the GECCO-2019 competition, another BI-TTP competition that has considered the same criteria and classification rules of the EMO-2019 competition.

In the GECCO-2019 competition, teams have submitted their solutions. In this competition, NDS-BRKGA has won the second place in the final ranking. All detailed competition results can be found at https://www.egr.msu.edu/coinlab/blankjul/gecco19-thief/, where our submission is identified as “jomar” again.

Table 9 shows a summary of the final results of both BI-TTP competitions. For each instance, we list the hypervolume achieved by the five best approaches that have been submitted to each competition. Our results are highlighted in bold.

Instance EMO-2019 GECCO-2019
Approach HV Approach HV



HPI 0.898433
ALLAOUI 0.835566



shisunzhang 0.823563 shisunzhang 0.886576
rrg 0.754498 NTGA 0.883706
CIRG_UP_KUNLE 0.000000 ALLAOUI 0.873484



HPI 0.825913
shisunzhang 0.756445



rrg 0.684549 shisunzhang 0.820893
ALLAOUI 0.581371 NTGA 0.811490
CIRG_UP_KUNLE 0.000000 ALLAOUI 0.808998





shisunzhang 0.861102 HPI 0.887571
rrg 0.704428 ALLAOUI 0.885144
ALLAOUI 0.621785 NTGA 0.882562
CIRG_UP_KUNLE 0.000000 shisunzhang 0.874371



HPI 0.933901
shisunzhang 0.719242



ALLAOUI 0.553804 NTGA 0.914043
CIRG_UP_KUNLE 0.000000 ALLAOUI 0.889219
OMEGA 0.000000 SSteam 0.854150
fnl4461_n22300 shisunzhang 0.670849 HPI 0.818938





ALLAOUI 0.139420 NTGA 0.803470
CIRG_UP_KUNLE 0.000000 SSteam 0.781462
OMEGA 0.000000 ALLAOUI 0.760480
fnl4461_n44600 shisunzhang 0.540072 HPI 0.882894





ALLAOUI 0.009693 SSteam 0.856863
CIRG_UP_KUNLE 0.000000 shisunzhang 0.850339
OMEGA 0.000000 NTGA 0.824830



HPI 0.927214
shisunzhang 0.496913 NTGA 0.888680
ALLAOUI 0.090569 ALLAOUI 0.873717
CIRG_UP_KUNLE 0.000000



OMEGA 0.000000 SSteam 0.832557



HPI 0.818259
shisunzhang 0.022390 SSteam 0.776638
ALLAOUI 0.007377 NTGA 0.773589
CIRG_UP_KUNLE 0.000000 ALLAOUI 0.769078
OMEGA 0.000000






HPI 0.876129
shisunzhang 0.049182 SSteam 0.853805
ALLAOUI 0.001853



CIRG_UP_KUNLE 0.000000 ALLAOUI 0.836965
OMEGA 0.000000 NTGA 0.781286
Table 9: Results of the BI-TTP competitions. The results obtained by NDS-BRKGA were submitted by the team jomar.

The results of the EMO-2019 competition show the difference between the hypervolume achieved by NDS-BRKGA and by the others on smaller instances has been less significant than on larger instances. Also it is worth mentioning that for the instances pla33810_n169045 and pla33810_n338090 the difference between the NDS-BRKGA and the second-best approach has been higher than 0.65 (65% of the total hypervolume). Regarding the results of NDS-BRKGA and other submissions, we can clearly see the improvement achieved, especially for larger instances, by considering the current form of generating the initial population. The results of the instances fnl4461_n22300 and fnl4461_n44600 have been significantly improved compared to results obtained with the preliminary version of the algorithm submitted to the EMO-2019 competition.

The results of the competition GECCO-2019 show that NDS-BRKGA was able to win the test instance a280_n2790 and has reached the second place five times. For test instances based on pla33810, NDS-BRKGA was able to achieve one of the top five ranks ( participants in total). After the GECCO-2019 competition, we incorporated the exploitation phase as the most recent enhancement to our method, thus completing the NDS-BRKGA described in Section 4. In order to compare the performance of all versions, we plot the hypervolume reached by each version in each instance according to the criteria of the competitions (see Figure 23). It can be observed that including more individuals from good solutions of the individual BI-TTP components brought a significant improvement. However, we did not observe major improvement after incorporating the exploitation phase after hours running time, except for the test instance pla33810_n169045 in which the hypervolume increases around 4.2%. Nevertheless, we have noticed a significant faster convergence with the incorporation.

Figure 23: Hypervolumes obtained by the different NDS-BRKGA versions.

Because we have improved our method after the GECCO-2019 competition has passed, we have reevaluated the results based on the final version of NDS-BRKGA proposed in this paper. The results are shown in Table 10. In addition to the results, we present the hypervolume for each instance achieved by the four best approaches that have been submitted to both BI-TTP competitions. In the last column of the table, we list the difference between the hypervolume reached by each approach and that reached by the best one for each instance.

Instance Approach HV Diff.
a280_n279 HPI 0.898433 0.000000




shisunzhang 0.886576 0.011857
NTGA 0.883706 0.014727
ALLAOUI 0.873484 0.024949




HPI 0.825913 0.000964
shisunzhang 0.820893 0.005984
NTGA 0.811490 0.015387
ALLAOUI 0.808998 0.017879




HPI 0.887571 0.000374
ALLAOUI 0.885144 0.002801
NTGA 0.882562 0.005383
shisunzhang 0.874371 0.013574
fnl4461_n4460 HPI 0.933901 0.000000




NTGA 0.914043 0.019858
ALLAOUI 0.889219 0.044682
SSteam 0.854150 0.079751
fnl4461_n22300 HPI 0.818938 0.000000




NTGA 0.803470 0.015468
SSteam 0.781462 0.037476
ALLAOUI 0.760480 0.058458
fnl4461_n44600 HPI 0.882894 0.000000




SSteam 0.856863 0.026031
shisunzhang 0.850339 0.032555
NTGA 0.824830 0.058064
pla33810_n33809 HPI 0.927214 0.000000
NTGA 0.888680 0.038534
ALLAOUI 0.873717 0.053497




SSteam 0.832557 0.094657
pla33810_n169045 HPI 0.818259 0.000000




SSteam 0.776638 0.041621
NTGA 0.773589 0.044670
ALLAOUI 0.769078 0.049181
pla33810_n338090 HPI 0.876129 0.000000




SSteam 0.853805 0.022324
ALLAOUI 0.836965 0.039164
NTGA 0.781286 0.094843
Table 10: Best result of the BI-TTP competitions vs. our final NDS-BRKGA.

One can notice that NDS-BRKGA has outperformed all other approaches in two instances (a280_n1395 and a280_n2790). For the instances a280_n279, fnl4461_n4460, fnl4461_n22300, and fnl4461_n44600, NDS-BRKGA won the second place with a small difference to the first (average difference between hypervolumes equal to 0.002970). For the three largest instances NDS-BRKGA won the second place.

5.4 Comparison with single-objective TTP solutions

Lastly, we build the bridge to the single-objective TTP which has been mostly investigated so far. Therefore, we compare our results with the best known single-objective TTP objective scores, which come from a comprehensive comparisons of a variety of algorithms. The computational budgets of the approaches which have obtained the best known solutions might vary. Table 11 compares for each instance the best known score of the TTP with the best score found by our algorithm when it optimized the BI-TTP. NDS-BRKGA has found found better scores for the three smallest instances with cities with up to items and provides competitive results for the other test instances.

a280_n279 18470.000 18603.120
a280_n1395 110147.219 115445.521
a280_n2790 429081.783 429085.353
fnl4461_n4460 263040.254 256402.882
fnl4461_n22300 1705326.000 1567933.421
fnl4461_n44600 6744903.000 6272240.702
pla33810_n33809 1863667.592 1230174.003
pla33810_n169045 15634853.130 12913836.427
pla33810_n338090 58236645.120 55678603.771
Table 11: Single-objective comparison of the TTP objectives scores: best known solution (BKS) vs. NDS-BRKGA. BKS data is available at https://cs.adelaide.edu.au/~optlog/research/combinatorial.php. In the BI-TTP formulation, the TTP objective score is not an explicitly stated optimization goal.

6 Concluding Remarks

In this paper, we have investigated the bi-objective traveling thief problem where the Traveling Salesman and Knapsack Problem interact with each other. We have proposed an evolutionary optimization algorithm that uses the principles of customization to effectively solve this challenging combinatorial optimization problem. Each customization addresses one specific problem characteristic that needs either to be considered during the optimization or can be used to further improve the convergence of the algorithm.

Our proposed method incorporates problem knowledge by creating a biased initial population that contains individuals generated by existing efficient solvers for each subproblem independently. Moreover, the constraint is handled through a customized repair operator during the evolution and the heterogeneous variables are unified through a genotype to phenotype mapping. To address the existence of two objectives, we have used non-dominated sorting and crowding distance during the environment survival. To further improve the convergence, a local search is applied selectively during evolution.

Since those customizations come with various hyperparameters, we have conducted an extensive experiment to show the effect of each parameter on the overall performance of the algorithm. Our results indicate that the best-performing configurations are those with larger population size, a higher survival rate for the best individuals and a high contribution of the TSP and KP solvers for creating part of the initial population. The contribution of mutant individuals seems to be insignificant.

As a future study, a new way of initialization the population is worth investigating. So far, we have used solutions obtained by subproblem solvers, but did not consider seeding it with good already known TTP solutions. Moreover, it is worth investigating how the proposed concepts can be used for other optimization problems where two problems interact with each other. Since the customization of genetic algorithms is a powerful tool to design algorithms specifically for a given problem, the proposed concepts can be embedded into other algorithms in order to solve similar problems.

The authors thank Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG), Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Universidade Federal de Ouro Preto (UFOP), Universidade Federal de Viçosa (UFV) for supporting this research. The authors would also like to thank the HPI Future SOC Lab and its team for enable this research by providing access to their compute infrastructure.