Code available at https://github.com/lfsimoes/beam_paco__gtoc5.
The design of multi-rendezvous spacecraft trajectories poses a considerable challenge to aerospace engineers. This is due, in part, to the combinatorial nature of the problem that emerges with the increase in number of bodies to visit along a mission (e.g., planets, moons, asteroids). The complexity of the task stems from an interplay of multiple factors under optimization, including: the decision of which of the bodies of interest to visit, the order in which they are to be visited, and the design of the actual trajectory arcs to take the spacecraft between them. Maximization of such a mission’s scientific return may demand for as many bodies to be visited as possible, in the shortest possible amount of time, while consuming the lowest possible amount of propellant mass. The underlying optimization problem can be seen as a variant of the well known Traveling Salesman Problem (TSP), with nodes corresponding to the celestial bodies under consideration, and edge weights a function of the costs (time and mass) to propel the spacecraft between them. These costs vary as bodies move in space along their trajectories, but also as a function of the spacecraft’s state: a lighter spacecraft that has already shed some of its mass is simpler to maneuver.
As evidence into the complexity, and relevance, of the above described problems, consider the following: since 2005, the aerospace engineering community has periodically organized GTOC, the Global Trajectory Optimization Competition 
. In it, different groups have taken turns at creating “nearly-impossible” problems of interplanetary trajectory design to pose to the community. Of the 8 competitions organized to date, 4 were multiple-asteroid rendezvous problems of the kind considered here, and 3 others were multiple fly-by problems posing similar combinatorial optimization challenges.
, a multi-objective Beam Search algorithm is described, and applied to a low-thrust model of the GTOC7 problem. In this research, we propose a series of extensions to it. First, we provide an improved orbital phasing indicator, and a procedure to create from it a probability distribution over candidate bodies to extend missions with. Second, we hybridize the Beam Search procedure with the well known Ant Colony Optimization algorithm. We conclude by evaluating the resulting algorithms on a Lambert model of the GTOC5 problem . Two main research questions are investigated in this paper:
Can the randomization of Beam Search, via probabilistic branching choices, improve performance?
Does the pheromone-based positive reinforcement of sequences lead to improved performance?
This paper is organized as follows: in Sec. 2 we list related work in the combinatorial optimization of spacecraft trajectories. Sec. 3 describes the Beam Search algorithm, and the proposed randomized variants. In Sec. 4 we describe the GTOC5 problem used in our experiments, and in Sec. 5 the interfacing between it and the search algorithms. Sec. 6 reports on an experimental evaluation, and Sec. 7 discusses its results. Conclusions are drawn in Sec. 8.
2 Related work
Beam Search [2, 28] has emerged as the de facto standard approach to tackle the combinatorial optimization sub-problems present in most GTOC competitions. Though at times called by other names, it is common to find the general architecture of a tree search that has its computational cost bounded via the selection of a limited number of nodes to branch at each depth-level (non-selected nodes at that depth being immediately discarded). We can find examples of such algorithms in the winning solutions to GTOC4 , GTOC5 , and in the second ranked solution to GTOC7 , which the present research builds on. The Lazy Race Tree Search described in , which at the time presented the best known solution to the GTOC6 problem, can also be seen as a Beam Search variant. In it, the “beam” is composed of all nodes, possibly originating from different tree depths, that fall within a given mission time window. The most promising nodes in that sliding window are branched, and the remaining ones discarded.
used Genetic Algorithms with “hidden genes”, to evolve chromosomes encoding asteroid sequences. These approaches were however outperformed in the GTOC5 competition by tree-based approaches. In an evolutionary approach is described for designing debris removal missions. In this highly dynamic trajectory problem, the Inver-over Genetic Algorithm was found to provide competitive solutions to those constructed by different approaches.
Ant Colony Optimization (ACO)  was used by some teams in the GTOC competitions over the years. However, to our knowledge, no scientific publication has been produced to date with the details of such deployment. The most successful use of ACO in a GTOC competition was possibly that by the NASA/JPL team, winners of GTOC7. According to the GTOC portal , a “very competitive solution was found by JPL using an Ant Colony Optimization approach. Eventually a different solution turned out to be better and was thus submitted”111The NASA/JPL team’s GTOC7 submission report and workshop slides, containing details of their ACO deployment, can be found in the GTOC portal .. Other applications of ACO algorithms to optimize the sequences of bodies to visit along a spacecraft’s trajectory can be found in [5, 4, 26]. In them, the test problems over which algorithms are evaluated have few bodies to select from (), and the found sequences visit bodies. In contrast, sequences of up to 17 asteroids are assembled here, from a database of 7075 available asteroids.
A hybridization of Beam Search and ACO was previously presented in . A different hybridization is introduced here, “Beam P-ACO”, that differs mainly in the ACO variant under use, and in being a multi-objective algorithm.
3 Beam Search
is a tree search algorithm where computational cost is bounded by employing heuristics that allow for non-promising solutions under construction to be discarded. It can be executed as a variant of depth-first or breadth-first search. When operating as a variant of breadth-first search, as is done here, Beam Search traverses the tree one depth-level at a time. From all the solutions generated at one level, only a limited subset (the so called “beam”) will be selected for carrying over to the next level. An evaluation of solutions’ quality determines whether they are included in the beam, or instead permanently discarded from the search. The size of the beam is designated as the “beam width”, and is here represented as. The extension of partial solutions in the beam can be performed towards all possible successor nodes, or instead towards only a limited number, enabling further control over the search’s computational cost. A “branching factor” parameter, here represented as , indicates that each solution in the beam will be extended only towards
successor nodes, chosen as a function of how good the solutions they lead to are estimated to be.
In summary, at one level of the tree, each of the beam’s solutions will be expanded towards new nodes, resulting in new partial solutions. These are then evaluated, and the best of those will become the beam that is carried over to the next tree level. The search process is therefore driven by two heuristics: 1) , which evaluates partial solutions (a path down from the tree’s root node), and 2) , which evaluates candidate successor nodes for extending solutions with. The more accurately these heuristics point to the complete optimal solution, the more successful the search will be. Beam Search being an “incomplete” search algorithm however, the identification of the globally optimal solution is not guaranteed .
3.1 Multi-objective heuristics
The first extension we introduce to the conventional Beam Search framework is the addition of multi-objective heuristics. That is, either , or (or both) will evaluate partial solutions or candidate extensions, respectively, according to multiple objectives. In such a scenario, it is then necessary to employ techniques that will allow for the ranking of alternatives, or probabilistic selection among them, that take into account the multiple evaluations.
In our current implementation, evaluates partial solutions according to multiple objectives, while the evaluation of candidate extensions by remains single-objective. Beam Search must then be able to select at each tree level the best solutions from among the newly generated extensions. To that end, we employ a Pareto dominance approach. Specifically, we apply non-dominated sorting  to the pool of extensions, and include in the beam as many of the best Pareto fronts as needed to reach the beam size . That process will most likely lead to a final Pareto front whose full inclusion in the beam would exceed . A tie-breaker criterion must then be defined to determine which of those (equally good) solutions to keep, and which to discard (see Sec. 5.3).
3.2 Probabilistic branching
In the “Stochastic Beam” algorithm, the beam’s construction remains deterministic, but the branching stage will now be subjected to probabilistic decisions. Given a solution in the beam, with a probability it will be extended towards the nodes with best evaluation. With a probability of , the choice of nodes will instead be a biased sampling without replacement, proportional to . Note that through a parameter setting of the algorithm reverts to the previously described (deterministic) Beam Search.
Beam Search will always converge to the same solution, given the same root node. Stochastic Beam however, can be executed multiple times, and possibly converge to different solutions in each run. These solutions may outperform those found by Beam Search, by including links to nodes that Beam Search incorrectly prunes, due to ranking above the threshold. Non-determinism therefore provides a degree of robustness against imperfections in the heuristic.
3.3 Hybridization with Ant Colony Optimization
A hybridization of Beam Search with Ant Colony Optimization brings two mains changes to the algorithmic framework: 1) multiple tree searches are now performed, in consecutive runs designated as “generations”, and 2) positive feedback takes place, in the form of “pheromones” that change the heuristic evaluation of candidate successor nodes, therefore biasing the dynamics of tree searches in subsequent generations.
Of the many ACO variants in existence, we chose to hybridize with the Population-based Ant Colony Optimization algorithm (P-ACO), introduced in [13, 14, 15, 12]. A detailed analysis in  found it to be “competitive to the state-of-the-art ACO algorithms with the advantage of finding good solution quality in a shorter computation time”. Also, a recent thorough benchmarking of a high number of approaches to solve the Traveling Salesman Problem found P-ACO to be the best of the tested global optimization algorithms, as well as the best overall algorithm when seeded and hybridized with local search .
In “Beam P-ACO”, a partial solution at node , with available successors evaluates the quality of extending towards node by
where is the pheromone concentration along an edge, and (known as in the common ACO notation) is the problem-specific heuristic. The weighting factors and determine the relative contributions of pheromone and heuristic values to the branching decision. As in the previously described Stochastic Beam algorithm, with a probability the solution is extended towards the nodes with best evaluation, while with a probability of , that choice is instead a biased sampling without replacement proportional to . A setting of would result in all pheromone information being ignored, and Beam P-ACO would then revert to the Stochastic Beam algorithm.
The pheromone concentration along an edge takes in P-ACO discrete values in a given range . We define these parameters as and , where is the total number of nodes. This implements the convention from  of having the row/column sum of initial pheromone values be 1 (assuming a problem where revisits are disallowed, and the diagonal of the pheromone matrix is therefore 0). Given the pheromone range, and , the maximum size of a population of top solutions, we can define as the pheromone increment deposited in an edge by a solution in the population that follows it along its path. If solutions in the population include the edge , its pheromone concentration will then be .
3.4 Pareto elitism
To complete the definition of Beam P-ACO, a population update model must be defined, a model that handles multi-objective evaluations produced by the heuristic. We propose here a variation of the method defined in [15, Sec. 3.1]. An archive will collect the set of non-dominated solutions found so far. At the end of each tree search, the best found solutions are merged into it. After updating the archive, the population that defines the pheromone matrix is reset.
In standard P-ACO , the population is implemented as a FIFO-queue (each generation’s best solution enters the population, possibly displacing the oldest solution it contains so as not to exceed the population size ). We propose here instead to have one FIFO-queue per each of the problem’s nodes, each with a size limit of . When resetting the population, all FIFO-queues are emptied. Then, solutions in the archive are shuffled, and one by one, they are added to the population (of edges). Each edge in those solutions will result in being added to the th FIFO-queue. The th FIFO-queue then directly maps to the th row of the pheromone matrix, and its contents define which nodes do pheromones most bias a solution at node to branch towards.
If solutions visit all nodes, as is the case in many problems (e.g., TSP), this process will result in only the last solutions of the shuffled archive adding pheromones to the population. A random unbiased sampling of archive solutions would then be sufficient. However, in problems where solutions include only some of the nodes (such as here in the GTOC5 problem, where , but a solution will visit nodes), only small amounts of pheromone would be deposited, and plenty of information in the archive would be ignored. By following in such problems the process described above, the pheromone matrix may now receive contributions from more than solutions, while still only receiving contributions at the level of each individual node. This way, in the limit, all solutions in the archive may end up depositing pheromones.
4 The GTOC5 trajectory design problem
The trajectory design problem posed in the 5th edition of the Global Trajectory Optimization Competition (GTOC5) is used as benchmark in the current research. The full problem specification can be found in . The problem dataset, along with additional information related to this edition of the competition, can be found in the GTOC portal . Our current work makes use of the problem model developed by the competition’s 4th ranked team .
In the GTOC5 problem, a spacecraft leaves the Earth at some point along an 11-year time-window, to embark on a 15 year (max.) mission of asteroid exploration. The spacecraft starts with a mass of 4000 kg, of which 3500 are reserved for propellant mass, and the scientific equipment used at asteroids. A total of 7075 asteroids are available as possible targets to visit along the mission. The exploration of a single asteroid is carried out in two stages. First, the spacecraft must rendezvous (match position and velocity) with the asteroid, and leave there a 40 kg scientific payload. Later in the mission, the spacecraft performs a fly-by of the asteroid, and sends a 1 kg “penetrator” towards it. Upon impact, this penetrator would release a cloud of debris, that would be investigated by the payload left there. Partial scores are given for the rendezvous and fly-by maneuvers. An asteroid on which both are performed contributes 1 point to the full score. In the problem models developed by most teams, including the one used here, the problem is simplified by having an asteroid’s fly-by maneuver performed immediately after its rendezvous: the spacecraft departs the asteroid, moves some distance away, and then accelerates back towards it. Under this simplification, a trajectory that manages to complete all pairs of maneuvers on a given sequence of asteroids will score points.
GTOC5 was won by a team from NASA/JPL with a score 18 trajectory . In contrast, the model being used here, with the initial trajectory conditions listed in Sec. 6.1, has only ever allowed the discovery of trajectories with at most a score 16. Nevertheless, for the goal of evaluating the performance of algorithms that solve the combinatorial part of the problem (finding the asteroid sequence that enables the greatest possible score), the used model is perfectly suitable.
5 Bilevel optimization of GTOC5 trajectories
The design of GTOC5 trajectories is here tackled as a multi-objective bilevel optimization problem . At an upper level, optimization seeks to identify a good subset of asteroids, and the order in which they are to be visited. At a lower level, each chosen pair of asteroids triggers an optimization of the trajectory leg that takes the spacecraft between them. This section details how the different Beam Search variants are employed for solving the upper-level combinatorial problem, and how the lower-level process optimizes transfer legs.
5.1 Orbital phasing indicators as heuristic estimators
The problem of assembling an efficient heuristic able to help select possible asteroid targets is crucial and very difficult at the same time. The ground truth (i.e. the optimal cost obtained optimizing the transfer leg) is far too expensive to be be computed for all possible asteroid targets and for all states encountered along the search. A solution to this problem was recently proposed via the development of so-called orbital phasing indicators 
. These essentially allow to introduce for each epocha metric over the set of all possible asteroids, a metric that can, in turn, be used to detect asteroid neighborhoods efficiently. It was shown in  how the orbital indicator, defined as , where and and correspond to the asteroid ephemeris, positively correlates to some extent to the ground truth (i.e. the asteroids that are actually easy to reach via an orbital transfer). It essentially considers a snapshot at of the asteroid population and, using a zero order approximation for the dynamics, predicts what asteroids are the closest in terms of transfer . As such, it is bound to neglect the known state of the asteroid population at the arrival time , which seems as a loss of available information. A simple modification to the orbital indicator, though, allows to account for the final asteroid geometry and thus to improve the overall correlation to the ground truth. In essence, one can consider the orbital indicator backward in time, starting from the arrival asteroid, to get a new indicator (note that asteroid velocities will have to have their sign inverted). The average between the two, i.e. the orbital indicator and the backward orbital indicator, is what we use here and call improved orbital indicator, defined as , where
For every node being branched during the search, to which corresponds a trajectory presently at asteroid , we compute for all of the problem’s nodes, assuming a reference transfer time of days. From it we build a probability distribution over successor nodes as , where is the rank in of among all estimated costs. This results in a selection probability that decays exponentially with increasing rank, at a rate tuned through . A setting of is used in this problem. There is then a chance of branching towards an asteroid ranked among the 50 best, and among the 250 best. Finally, the problem disallows revisits to asteroids. So, for any node , if was already visited at any previous point.
5.2 Optimization of transfer legs
During the tree search, branching a partial solution into a given node triggers an optimization process. Its end result will be the definition of the transfer leg that allows the spacecraft to rendezvous with the corresponding asteroid. Adding the new leg extends the mission’s trajectory, which can then be reevaluated by the heuristic. In the approach followed here, only the trajectory’s rendezvous legs need to be optimized. The cost estimates for self-fly-by legs are instead approximated by a linear acceleration model [21, Sec. 3].
Each transfer will have a duration in a set time window of days. A grid of 50 evenly spaced values defines the candidate transfer times, ( days separation between grid points). For each , we employ PyKEP’s 222http://esa.github.io/pykep/ multiple revolution Lambert solver [17, 19] to design a trajectory arc having that exact duration. If multiple revolution solutions exist for a given , the one with lowest is chosen. Two constraints are imposed: 1) should exceed the parabolic time of flight given by Barker’s equation333Legs failing this check are immediately discarded, saving computation time that would otherwise be spent generating Lambert arcs with excessive ., and 2) the leg’s maximum acceleration should be of the maximum supported by the spacecraft. These constraints lead many transfers to have no feasible solution, for any (the targeted asteroid is simply unreachable). From all points in the grid that do have a feasible solution, the one with lowest is chosen to define the new rendezvous leg. In this approach, the optimization of one leg then equates to finding the solutions to at most 50 Lambert’s problems. Note that this final cost is the ground-truth to the indicator described in the previous section, from which is defined. Note also that we take here a greedy choice of , imposing in that way a transfer leg upon the combinatorial search problem that might in the long term be sub-optimal with respect to its own goals.
5.3 Trajectory evaluation, ranking, and selection
Trajectories are evaluated by with respect to three criteria: 1) the mission’s score, 2) the total mass required for propellant, and scientific equipment left at asteroids, and 3) the total time of flight. The ideal mission will have the greatest possible score, while requiring the lowest possible amounts of mass and time.
As mentioned in Sec. 3.1, a Pareto dominance approach  is used for handling the multiple objectives. However, in this problem mass and time evaluations are only fairly comparable among trajectories that share the same score. As such, ranking a set of trajectories involves 1) binning trajectories according to their score, and 2) applying non-dominated sorting  over the mass and time costs of trajectories within each bin. Identifying the top trajectories in a given set takes place by iterating through bins in descending order of score, gradually extracting their Pareto fronts. If only a subset of a Pareto front’s trajectories is required, those with lowest mass cost are favored.
In a tree search, this process is applied at each depth-level to construct the beam with the best of the newly extended solutions. Before that, however, is used for pruning nodes corresponding to missions that require kg, or years. Should that result in an empty pool of extended solutions, or the pool otherwise be empty because no feasible transfer legs were found, the tree search has then reached its final level and is terminated. The beam of solutions carried over from the previous tree level will then be its final output. In Beam P-ACO, this signals the end of a generation. The contents of that final beam are then merged with the archive of non-dominated solutions found so far. By applying the previously described ranking process, the archive will always be a Pareto front of trajectories that all share the maximum score reached to date. Beam P-ACO will at this point refresh pheromones. Note that the combinatorial problem is asymmetric: an edge present in a good solution is not predictive of an edge being likely to lead to good solutions. As such, an edge in an archived solution only results in pheromones along the direction.
6 Experimental evaluation
An experimental evaluation was carried out to assess the performance of the different Beam Search variants, using the GTOC5 problem as a test case. Performance is here evaluated in terms of multiple criteria. Primarily, we care about search algorithms that enable the consistent discovery of asteroid sequences having the greatest possible length (score). Among equally scored missions, we care for the best possible coverage of the Pareto front of mass and time costs required to achieve such score. Finally, these considerations must be traded-off against the computational cost to obtain such solutions.
We adopt as measure of computational cost the number of trajectory legs optimized throughout the search. This is in practice the main performance bottleneck, especially if instead of a Lambert model one were to use a low-thrust one. A threshold of 100000 optimized legs was used as stopping criterion.
The performance of deterministic Beam Search was evaluated over a dense grid of settings for the beam width () and branching factor () parameters. In total, 118 different setups were evaluated, all having upfront an estimated cost of optimized legs required in order to complete execution at tree depth 16, where missions reach and fully score the 16th asteroid (see Fig. 1). Being a deterministic algorithm, only a single run was executed per setup.
Stochastic Beam and Beam P-ACO were evaluated under the 5 different configurations of and highlighted in red in Fig. 1. Being stochastic algorithms, 100 independent runs were performed per setup. Deterministic branching decisions were taken with a probability of ( in Beam Search). In Beam P-ACO, pheromone and heuristic values contributed equally to the heuristic: (in the other Beam Search variants, implicitly ). Pheromone concentrations were limited to at most contributions to each node.
All tree searches reported here had the same root node. Its initial conditions were originally obtained during the GTOC5 competition through a time-optimal low-thrust optimization of the launch leg, as described in [21, Sec. 2] and exemplified in [16, Sec. 6.1]. After applying the linear model to define the self-fly-by leg, the initial state is then a trajectory that has already scored 1 point at asteroid 2001 GP2 (id: 1712), and is ready to depart from it at epoch 59325.360 MJD, having already expended 253.518 kg and 198.155 days from the total budgets.
Empirical Attainment Functions: probabilities of objective space vectors being dominated (or matched) in an algorithm’s run. The darker a region is, the likelier it is that in a run a solution will be found that dominates it. Considers the Pareto fronts of score 16 trajectories found up to the 100000 optimized legs threshold.
The results obtained in the experimental evaluation are shown in Figures 1–4. Analyses into the quality of solutions found in a run consider their evaluations, in particular, the extent to which they minimize mass and time costs. Though missions of score 17 were found in these experiments, they were only rarely found, and furthermore only a single distinct mission was ever found with that score. Therefore, analyses into the quality of solutions found consider exclusively the score 16 missions found in runs. Specifically, we evaluate the Pareto fronts of score 16 missions found, and measure their coverage of the objective space using the hypervolume indicator  – a measure of the total area of objective space dominated by points in the Pareto front. A reference point of 3500 kg and 15 years is used in all hypervolume calculations, corresponding to the limits set forth in the problem specification. Runs that do not find any score 16 mission, and therefore have an empty Pareto front, have a hypervolume of 0.0.
Parameter sweep of Beam Search configurations: the results from these experiments are shown in Fig. 1. Fig. 1(a) shows the highest score reached among the missions designed in each run. Fig. 1(b) shows the hypervolume of the Pareto front of mass and time costs in score 16 missions found in each run. Fig. 1(c) shows the prevalence of unfeasible solutions in the continuous search spaces of asteroid transfers. In the setups that reached score 17, for instance, on average only of the attempted asteroid transfers had a feasible solution.
Quantity of top solutions: Fig. 3 shows the results from the evaluation of the quantity of score 16 or 17 solutions found by each algorithm over time. Only distinct trajectories count here to a run’s totals – two trajectories are equal if their asteroid sequence is exactly the same. The three plots shown correspond, from left to right, to the results from Beam Search, Stochastic Beam, and Beam P-ACO. Fig. 3(a) shows one curve for each of Beam Search’s 118 evaluated setups. The algorithm being deterministic, each curve depicts also one single run. Overlaid in this plot is the Pareto front of cost-benefit trade-offs attainable through different parameter settings: it shows the minimum number of trajectory legs that need to be optimized to obtain different amounts of top scoring trajectories. This Pareto front is replicated in the other two plots to allow a performance comparison between the deterministic and randomized Beam Search variants. The “hooks” shape seen, especially in Fig. 3(a), results from the way Beam Search operates: most of its execution time is spent gradually descending through the tree, level by level. Eventually, the search reaches depth 15, at which point each branching event possibly leads to a score 16 solution being found. Hence, the sudden explosion in the total count. The plots for Stochastic Beam and Beam P-ACO also display this effect, but in them values are averaged over 100 runs, and consecutive tree searches (generations) are chained together.
Quality of top solutions: Figures 3 and 4 show the results from the evaluation of the quality of score 16 solutions found by each algorithm over time. Fig. 3 is structured in the same way as Fig. 3, so the description made above for it also applies here. Fig. 3 shows how the total dominated area of objective space (hypervolume) grows over time, as new score 16 trajectories are found and enter the Pareto front. Fig. 4 takes a closer look at the setup , over which both randomized Beam Search algorithms are seen in Fig. 3 reaching median performance (out of the five evaluated setups). The plots shown are Empirical Attainment Functions (EAFs) , which depict the likelihood of objective space vectors being dominated (or matched) in an algorithm’s run. It aggregates into one visualization the final Pareto fronts of score 16 trajectories found across all 100 runs. In Fig. 4(a–b), the boundaries of the shaded areas show the 0.25, 0.5 and 0.75 “attainment surfaces”. In other words, the areas having at least 25, 50 or 75 chance of being attained (dominated or matched) by points in a run’s final Pareto front. Fig. 4(c) takes the difference between the EAFs for Beam P-ACO and Stochastic Beam, and shows how they compare in terms of likelihood to attain different areas of objective space, making clear the extent to which Beam P-ACO outperforms Stochastic Beam. The Pareto front with greatest hypervolume found by deterministic Beam Search is shown for reference.
7 Analysis & discussion
Conceptually, we showed in this research a formal equivalence between four combinatorial optimization algorithms: Beam Search, Stochastic Beam, Beam P-ACO, and P-ACO. We demonstrated how they can all be implemented in the same algorithm, and made accessible through minor parameter changes.
A surprising result, that validates the introduced phasing indicator, as well as the baseline multi-objective Beam Search algorithm employed here, was the discovery of a score 17 mission (17 asteroids visited, and fully investigated). Searches over this model of the GTOC5 problem, using these same initial conditions, had previously only reached a maximum score of 16 . Furthermore, those searches, employing a Branch & Prune tree search algorithm, took days to complete during the GTOC5 competition. The Beam Search variants under consideration could all find that single score 17 mission in runs lasting 10 to 20 minutes. Pure P-ACO is the exception here, having never surpassed a score 15 in our experiments (with a , the high chance of an asteroid transfer’s optimization problem not having a feasible solution greatly limits performance).
The first research question in Sec. 1 called for a demonstration of examples where randomized Beam Search variants would outperform the deterministic approach. If we evaluate in terms of mean performance, then, as we can see from Figures 3–3
, the current experimental evaluation could not find any such cases. At all computational cost thresholds we can find deterministic setups outperforming both of the randomized Beam Search variants, both in the quantity of score 16 solutions found, and in their quality. The deterministic algorithm was tuned to a considerably greater extent than the randomized ones (118 setups, against only 5), so it is possible that we are presenting a skewed view of each algorithm’s capabilities. Alternatively, it may be that the specific problem we consider here is so resource-constrained, that the construction of long asteroid sequences actually demands for greedy branching decisions to be taken at every single step. In such a case, the search will not benefit from the greater tolerance for local sub-optimality present in the randomized Beam Search variants.
The second research question in Sec. 1 considers how the addition of feedback (pheromones) in Beam P-ACO changes performance, by comparison with Stochastic Beam, where search proceeds along multiple independent generations with no feedback between them. Figures 3–4 show a clear positive effect of feedback on performance, both in the quantity and quality of top solutions found. This effect is seen to be larger the smaller the branching factor is. In other words, the greater the branching factor, the likelier it is for good solutions to be identified in a single generation, and thus the lower the benefit from feedback.
An identical comparison between randomized Beam Search variants, with and without pheromone updates, can be found in [3, Sec. 6.2] (for a different hybrid algorithm, and different problems). There, pheromones are also found to improve performance, though in small amounts. The current research goes beyond the analysis in  by uncovering the inverse relationship between branching factor and benefit from pheromones, and in demonstrating the superior performance of deterministic Beam Search over the randomized variants, given proper tuning.
Overall, an apt description of the behaviours displayed by the randomized algorithms is that they approximate the performance of deterministic Beam Search setups that use larger beam widths and branching factors – probabilistic branching effectively picks a subset of the successor nodes that deterministic Beam Search with larger branching factors would follow. The eventual success or failure of a run will then depend on how well those decisions align with those a better informed algorithm would take. This can be seen on display in Fig. 4. There, we see that in the extreme a randomized Beam Search run can closely approximate the Pareto front of score 16 trajectories found by an “optimally” tuned deterministic Beam Search (with a , less than half of the needed in the deterministic setting). However, it can also fail to find a single score 16 trajectory: 7 of the 100 runs in Fig. 4(a), and 4 in Fig. 4(b) could only reach a score 15, thus ending in this analysis with a min. hypervolume of 0.0. Overall, the median hypervolume of 52.75 reached by Beam P-ACO in Fig. 4(b) is higher than that reached in of the 118 deterministic Beam Search setups, and also higher than that reached in of the 32 deterministic setups that had equal or larger beam width and branching factors.
In a real setting, in the preliminary design phase, missions would be constructed not from a single set of initial conditions, as done here, but from a great number of them, possibly numbering in the thousands. Being anytime algorithms, the randomized Beam Search variants investigated here can be employed in a racing approach , with multiple tree searches being executed in parallel. Computational effort would then be dynamically allocated across searches, as a function of the growing statistical evidence as to which searches lead to better missions. In such a setting, the randomized Beam search variants would be preferable choices, over the deterministic algorithm.
We considered here a hard real-world problem of spacecraft trajectory design, featuring an interplay of combinatorial and (constrained) continuous optimization sub-problems, dealing at different levels with uncertain and multi-objective quality functions. In this challenging domain, we investigated a number of extensions to Beam Search, the traditionally used approach to solve such problems. We provided an improved orbital phasing indicator, and its transformation into a probability distribution over candidate bodies to extend missions with. We then hybridized the search process with the well known Ant Colony Optimization algorithm, and investigated the behaviours of the resulting randomized Beam Search variants. We found them to have lower sensitivity to the beam width and branching factor parameter settings, while offering in each generation a partially-informed approximation to the behaviours of deterministic setups running at higher computational costs.
Luís F. Simões was supported by FCT (Ministério da Ciência e Tecnologia) Fellowship SFRH/BD/84381/2012.
-  Global Trajectory Optimization Competition portal, http://sophia.estec.esa.int/gtoc_portal/
Bisiani, R.: Beam search. In: Shapiro, S.C. (ed.) Encyclopedia of Artificial Intelligence, vol. 1, pp. 56–58. John Wiley & Sons, Inc. (1987)
-  Blum, C.: Beam-ACO–Hybridizing ant colony optimization with beam search: an application to open shop scheduling. Computers & Operations Research 32(6), 1565–1591 (2005)
-  Ceriotti, M., Vasile, M.: Automated multigravity assist trajectory planning with a modified ant colony algorithm. Journal of Aerospace Computing, Information, and Communication 7(9), 261–293 (2010)
-  Ceriotti, M., Vasile, M.: MGA trajectory planning with an ACO-inspired algorithm. Acta Astronautica 67(9–10), 1202–1217 (2010)
-  Deb, K.: Multi-Objective Optimization using Evolutionary Algorithms. John Wiley & Sons, Inc., Chichester, England (2001)
-  Dorigo, M., Stützle, T.: Ant Colony Optimization. A Bradford Book, MIT Press, Cambridge, MA (2004)
-  Grunert da Fonseca, V., Fonseca, C.M., Hall, A.O.: Inferential performance assessment of stochastic optimisers and the attainment function. In: Zitzler, E., Thiele, L., Deb, K., Coello Coello, C.A., Corne, D. (eds.) Evolutionary Multi-Criterion Optimization: First International Conference, EMO 2001, pp. 213–225. Springer, Berlin, Heidelberg (2001)
-  Gad, A.H.: Space trajectories optimization using variable-chromosome-length genetic algorithms. Ph.D. thesis, Michigan Technological University (2011)
-  Grigoriev, I.S., Zapletin, M.P.: Choosing promising sequences of asteroids. Automation and Remote Control 74(8), 1284–1296 (2013)
-  Grigoriev, I.S., Zapletin, M.P.: GTOC5: Problem statement and notes on solution verification. Acta Futura 8, 9–19 (2014)
-  Guntsch, M.: Ant algorithms in stochastic and multi-criteria environments. Ph.D. thesis, Karlsruher Institut für Technologie (2004), http://d-nb.info/1013929756
Guntsch, M., Middendorf, M.: A Population Based Approach for ACO. In: Cagnoni, S., Gottlieb, J., Hart, E., Middendorf, M., Raidl, G.R. (eds.) Applications of Evolutionary Computing: EvoWorkshops 2002: EvoCOP, EvoIASP, EvoSTIM/EvoPLAN. pp. 72–81. Springer, Berlin, Heidelberg (2002)
-  Guntsch, M., Middendorf, M.: Applying Population Based ACO to Dynamic Optimization Problems. In: Dorigo, M., Di Caro, G., Sampels, M. (eds.) Ant Algorithms: Third International Workshop, ANTS 2002. pp. 111–122. Springer, Berlin, Heidelberg (2002)
-  Guntsch, M., Middendorf, M.: Solving Multi-criteria Optimization Problems with Population-Based ACO. In: Fonseca, C.M., Fleming, P.J., Zitzler, E., Thiele, L., Deb, K. (eds.) Evolutionary Multi-Criterion Optimization: Second International Conference, EMO 2003. pp. 464–478. Springer, Berlin, Heidelberg (2003)
-  Izzo, D.: PyGMO and PyKEP: Open Source Tools for Massively Parallel Optimization in Astrodynamics (the case of interplanetary trajectory optimization). In: 5th International Conference on Astrodynamics Tools and Techniques (ICATT) (2012)
-  Izzo, D.: Revisiting Lambert’s problem. Celestial Mechanics and Dynamical Astronomy 121(1), 1–15 (2015)
-  Izzo, D., Getzner, I., Hennes, D., Simões, L.F.: Evolving Solutions to TSP Variants for Active Space Debris Removal. In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation. pp. 1207–1214. ACM (2015)
-  Izzo, D., Hennes, D., Simões, L.F., Märtens, M.: Designing complex interplanetary trajectories for the global trajectory optimization competitions. In: Fasano, G., Pintér, J.D. (eds.) Space Engineering: Modeling and Optimization with Case Studies, pp. 151–176. Springer (2016)
-  Izzo, D., Simões, L.F., Märtens, M., de Croon, G.C.H.E., Heritier, A., Yam, C.H.: Search for a Grand Tour of the Jupiter Galilean Moons. In: Genetic and Evolutionary Computation Conference (GECCO 2013). pp. 1301–1308. ACM (2013)
-  Izzo, D., Simões, L.F., Yam, C.H., Biscani, F., Di Lorenzo, D., Addis, B., Cassioli, A.: GTOC5: Results from the European Space Agency and University of Florence. Acta Futura 8, 45–55 (2014)
-  Maron, O., Moore, A.W.: The racing algorithm: Model selection for lazy learners. In: Aha, D.W. (ed.) Lazy Learning, pp. 193–225. Springer Netherlands (1997)
-  Oliveira, S., Hussin, M.S., Stützle, T., Roli, A., Dorigo, M.: A Detailed Analysis of the Population-Based Ant Colony Optimization Algorithm for the TSP and the QAP. Tech. Rep. TR/IRIDIA/2011-006, IRIDIA (Feb 2011)
-  Petropoulos, A.E., Bonfiglio, E.P., Grebow, D.J., Lam, T., Parker, J.S., Arrieta, J., Landau, D.F., Anderson, R.L., Gustafson, E.D., Whiffen, G.J., Finlayson, P.A., Sims, J.A.: GTOC5: Results from the Jet Propulsion Laboratory. Acta Futura 8, 21–27 (2014)
-  Sinha, A., Malo, P., Deb, K.: Evolutionary bilevel optimization: An introduction and recent advances. In: Bechikh, S., Datta, R., Gupta, A. (eds.) Recent Advances in Evolutionary Multi-objective Optimization, pp. 71–103. Springer (2017)
-  Stuart, J.R., Howell, K.C., Wilson, R.S.: Design of end-to-end trojan asteroid rendezvous tours incorporating scientific value. Journal of Spacecraft and Rockets 53(2), 278–288 (2016)
-  Weise, T., Chiong, R., Lässig, J., Tang, K., Tsutsui, S., Chen, W., Michalewicz, Z., Yao, X.: Benchmarking optimization algorithms: An open source framework for the traveling salesman problem. IEEE Computational Intelligence Magazine 9(3), 40–52 (2014)
-  Wilt, C.M., Thayer, J.T., Ruml, W.: A comparison of greedy search algorithms. In: Proceedings of the Third Annual Symposium on Combinatorial Search (SOCS-10). pp. 129–136 (2010)
-  Zitzler, E., Thiele, L., Laumanns, M., Fonseca, C.M., Grunert da Fonseca, V.: Performance assessment of multiobjective optimizers: an analysis and review. IEEE Transactions on Evolutionary Computation 7(2), 117–132 (2003)