1 Introduction
The impact of Artificial Intelligence (AI) technologies on our modern world is now more apparent then ever. Once a niche research field in computer science and mathematics, the vision of machines displaying human levels of intelligence has spread and matured into solid and well established industrial products. Powerful techniques from Machine Learning (ML) have gained increasing attention from scientists over the past few decades and are now transforming whole economies as a result.
The space sector is catching up to these developments, as more and more works are published that incorporate concepts related to AI, like natural language processing, knowledge representation, automated reasoning, computer vision, robotics, etc. Applications of interest range from preliminary spacecraft design to mission operations, from guidance and control algorithms over navigation to the prediction of the dynamics of perturbed motion and towards classification of astronomical objects and refinement of remote sensing data to only name a few.
The goal of this survey is to present a slice of this work to highlight the progress that has been made in the adoption of AI techniques. More precisely, our focus will be on the recent AI trends that have emerged in spacecraft guidance dynamics and control. Even when limited to this area, a comprehensive overview of all work would quickly grow out of proportions, which is why we can only present selected pointers to the reader. We decided to give importance to work published in the last few years, avoiding the historical perspective of older and wellestablished fundamental works. Additionally, we decided to avoid publications which are strongly speculative in nature: while visionary ideas are interesting to follow, an important requirement for this survey was a wellmotivated applicability for a spacerelated challenge, ideally inspired by an already established or newly proposed mission concept. This narrow scope allows our survey to be concise while remaining relevant for the interested practitioner.
Many times, results obtained by one AI technology for a specific task appear stunning, but perform rather poorly when transferred to a different task, which often happens when its strength and weaknesses are not thoroughly understood. However, due to the pioneering works of many researchers combined with the results of large competitions, like the Global Trajectory Optimisation Competition (GTOC), acting as benchmarks, a better understanding about the most promising approaches has been obtained. In particular, this survey will guide the reader through the intersection of Guidance and Control with evolutionary optimisation, tree searches and machine learning (including Deep Learning and Reinforcement Learning) for which we witnessed a stream of results of remarkable quality. Consequently, we devote one section to each of the aforementioned techniques in that order. Additionally, we contribute by highlighting the synergies and relations between these techniques, as it has been frequently demonstrated that they can benefit vastly from each other if combined, which is often the most viable solution strategy to achieve optimal results for the complex challenges in space.
1.1 Related Surveys
We believe that our report on the state of the art in AI for Guidance and Control is timely and, to the best of our knowledge, has not been reported in this form yet by others. However, there exists work with a different scope in close proximity to our area which might be useful to deepen or broaden ones knowledge for selected topics. Girimonte and Izzo girimonte give a general overview focused on distributed AI for swarm autonomy and distributed computing for enhanced situation selfawareness and for decision support in spacecraft system design. Some useful examples of applied machine learning for geoscience and remote sensing are given by Lary lary2010artificial . A more recent review by Zhu et al. Zhu2017Deep highlights recent advances in remotesensing data analysis with a main focus on Deep Learning in particular. For a stronger historical perspective on the development of evolutionary optimisation and machine learning over time in trajectory optimisation, we suggest to consult Izzo et al. izzo2018machine . A survey on the methods applied by the international community during the GTOC (and their Chinese equivalent CTOC) was just recently published by Li et al. gtocreview and contains major contributions from AI algorithms and methodologies. Lastly, the book from Russel and Norvig russell2010artificial provides a good starting point to obtain a comprehensive overview about modern AI detached from a particular application domain.
1.2 Glossary
While this survey explains its acronyms along the way, we provide the following list of reoccurring abbreviations as a reading aid:
AI  Artificial Intelligence 

ACO  Ant Colony Optimisation 
ANN  Artificial Neural Network 
CMAES  Covariance Matrix Evolutionary Strategy 
CNN  Convolutional Neural Network 
CTOC  Chinese Trajectory Optimisation Competition 
DE  Differential Evolution 
DL  Deep Learning 
DNN  Deep Neural Network 
DQN  Deep QLearning Network 
DRL  Deep Reinforcement Learning 
GA  Genetic Algorithm 
G&CNETS  Guidance and Control Networks 
GTOC  Global Trajectory Optimisation Competition 
GTOP  Global Trajectory Optimisation Problem 
LSTM  Long ShortTerm Memory 
MCTS  Monte Carlo Tree Search 
ML  Machine Learning 
NSGAII  NonDominated Sorting Genetic Algorithm (Mark 2) 
PSO  Particle Swarm Optimisation 
RL  Reinforcement Learning 
SVM  Support Vector Machine 
2 Evolutionary Optimisation
Evolutionary algorithms are a class of global optimisation techniques that make use of heuristic rules, often inspired but not limited to natural paradigms such as Darwinian evolution. For example, a standard Genetic Algorithm (GA) encodes a population of solutions, which undergoes mutation, crossover and selection to search for (close to) optimal solutions in often heavily discontinuous and rugged landscapes. As such, GAs have proven to be very useful to solve interplanetary trajectory optimisation problems where the planetary constellations define a complex solution landscape, exhibiting multiple closeby minima already in simple cases such as that of planet to planet transfers.
The ecosystem of genetic and evolutionary algorithm (sometimes referred to as metaheuristics) together with their variants and mixtures is so vast that it seems almost futile to summarize. The biological inspirations for these algorithms range from insect swarming, foraging and ant behaviour, to even more obscure examples including algae, the American buffalo, humpback whales and penguins. A rather exhaustive list of this evolutionary computing bestiary is maintained by Campelo et al.
bestiary . Nevertheless, certain types of algorithms have been deployed consistently and with high degrees of success to solve challenges in aerospace. Most notably, trajectory optimisation problems, such as the numerous ones assembled in the Global Trajectory Optimisation Problem (GTOP) database vinko2008global (a sort of open trajectory optimisation gym), are frequently solved and studied with such methodologies stracquadanio2011design ; addis2011global ; schlueter2014midaco ; islam2012adaptive ; cassioli2012machine ; simoes2014self , building a domainspecific benchmark for performance of those evolutionary techniques. Some of the interplanetary trajectory problems (e.g. Messenger and Cassini2) in the GTOP database were also used during the CEC2011 competition, attracting the attention of the larger scientific community of scientists involved in evolutionary computations to work on spacerelated problems (see elsayed2011ga for the competition winner).In the following, we discuss some of the algorithms and approaches ordered by their applicability to different forms of optimisation problems, beginning with singleobjective unconstrained and leading to multiobjective and combinatorial constrained problems.
2.1 Singleobjective, Unconstrained, Continuous Problems
Differential Evolution (DE) is a comparatively simple variant of a GA that is effective for nonlinear and nondifferentiable continuous space functions as encountered frequently, for example, in chemical propulsion spacecraft transfers where sequences of multiple impulsive velocity increments need to be decided. After Myatt et al. ari1 ; izzo2007search introduced its use in this context, Olds and Kluever olds2007interplanetary made an analysis of the DE’s performance with respect to the Cassini and Galileo missions (among others) finding the expected sensitivity of the algorithm performance on its parameters. Recent research in DE is thus concerned with selfadaptation, i.e. the incorporation of hyperparameters like the mutation rate into the chromosome such that they are evolved together with the optimal solution. Izzo et al. izzo2013search deploy a selfadaptive DE to design a grand tour between the Galilean moons of the Jovian system. Yao et al. yao2017improved propose a double selfadaptive DE with random mutations and evaluate its performance on a Lambert transfer problem. Theoretical advances on the DE algorithm itself have also been obtained by Vasile et al. vasile2011inflationary who were able to use theoretical insights on DE’s evolutionary mechanisms to design an algorithm able to outperform, on some trajectory optimisation problems, a canonical (i.e. not selfadapted) DE variant.
Similar in popularity to DE is Particle Swarm Optimisation (PSO), a bioinspired searchheuristic with clear links to the foraging behaviour of flocks of birds, school of fishes or similar types of intelligent swarms. Pontani and Conway pontani2010particle give a comprehensive overview about the adaptation of this technique towards spacecraft trajectory optimisation. Benefits of PSO are its comparatively easy implementation and a generally high convergence speed to global optima with good accuracy. Vasile et al. vasile2010analysis benchmark PSO, DE and other evolutionary algorithms against each other for different setups, highlighting the problem dependency of metaheuristic performances (a general issue in metaheuristics known as the “no free lunch theorem” wolpert1997no ). It is thus advisable in practice, to not rely too heavily on a single metaheuristic, but to explore their performances in parallel or to even combine several metaheuristics into one. Englander and Conway Englander2012Automated achieve their best performance by a combination of DE and PSO for an automated mission design for sequences of interplanetary transfers including multiple gravity assist maneuvers. Sentinella and Casalino sentinella2009hybrid deploy PSO, DE and other GAs together to obtain the global optimal solution consistently for EarthMars transfer scenarios.
Another, more sophisticated evolutionary metaheuristic that was recently investigated in the context of trajectory optimisation by Izzo et al. izzo2014constraint is Covariance Matrix Evolutionary Strategy (CMAES). CMAES is a GA that deploys an adaptive mutation scheme, exploiting the pairwise dependencies of the decision variables given by their covariance matrix. The basic idea is to increase the likelihood of selections that have been proven beneficial before. While the implementation details of CMAES are much more involved than DE or PSO, it has been shown to be able to outperform these approaches on a large class of interplanetary transfer problems.
Last but not least, Radice et al. radice2006ant analyze Ant Colony Optimisation (ACO) for an EarthMars transfer inspired by the Mars Express mission. The ACO paradigm simulates the foraging behaviour of natural ant colonies, where ants deposit biomarkers (pheromones) along their paths to communicate and further reinforce exploration in comparatively large search environments. While ACO is traditionally deployed in discrete domains, the authors show how the algorithm can be modified for continuous optimisation problems (even though their proposal seems to be only a preliminary attempt with large margins for improvement). MIDACO developed by Schlueter et al. schlueter2013midaco ; schlueter2014midaco is an optimisation framework that deploys ACO to problems that can be single or multiobjective, allowing for constrained and mixed integer decision variables as well. Some of the best solutions to trajectory problems in the GTOP database, although singleobjective (i.e. Cassini2, GTOC1, MessengerFull), were discovered by MIDACO.
2.2 Multiobjective Problems
The researches mentioned in the previous section are mostly concerned with the optimisation of box constrained continuous variables towards a single objective, most commonly fuel consumption or total travel time in the case of interplanetary trajectories. However, the true nature of most design problems is multiobjective, potentially also including integer decisions variables or nonlinear constraints. In this multiobjective setting, the concept of the best design (i.e. the global optimum) is substituted with that of a Pareto front, a collection of nondominated solutions expressing the tradeoffs between different conflicting objectives. Consequently, a set of best possible solutions (Paretooptimal front) is required to guide engineering decisions (i.e. trajectories, who could have been improved in one objective without sacrificing another). Since in this context we are interested in a set of solutions, populationbased algorithms such as evolutionary algorithms are a natural choice offering substantial advantages. Indeed, multiobjective problems are one of the prime examples for the success of evolutionary algorithms in general, for which they are the de facto standard approach (see Coello coello2006evolutionary ).
One of the classic evolutionary approaches to multiobjective optimisation is the NonDominated Sorting Genetic Algorithm (NSGAII), which has been studied for the optimisation of planetary flyby sequences including integer variables also by its inventor (see Deb et al. deb2007interplanetary ). Schutze schutze2009designing considers a biobjective approach for the design of multiple lowthrust gravity assist trajectories (minimisation of flight time and fuel consumption) and deploys NSGAII on that landscape. Later studies by Märtens and Izzo Martens2013Asynchronous show how the performance of NSGAII can be improved by implementing a multiobjective migration scheme in an island model. The authors evaluate their approach for a transfer from Earth to Jupiter, including multiple approximate Paretooptimal fronts for different flyby sequences. Similar trajectories are also under investigation by Zotes and Penas Zotes2012Particle , who deploy a multiojective extension of PSO called MOPSO. Lee Seungwon et al.lee2005multi benchmark several evolutionary multiobjective techniques, including NSGAII, to find the nondominated front depending on the free control parameters of a Qlaw feedback controller.
Izzo et al. izzo2014constraint analyse a decompositionbased approach to multiobjective optimisation named MOEA/D (Multi Objective Evolutionary Algorithm by Decomposition), which is shown to improve NSGAII’s approximation of the Paretooptimal front significantly for transfer problems in the Jovian system. Furthermore, different constraint handling techniques like coevolution and artificial immune systems are applied and benchmarked. A rather comprehensive survey on multiobjective methods for spacecraft design can be found in Montano et al. montano2014multi where several evolutionary approaches are described and discussed.
Besides trajectory optimisation, also other guidance and control problems have been analysed in a multiobjective setup. Vasile and Ricciardi Vasile2016direct introduce a memetic multiobjective algorithm to solve the optimal control problem. The approach is evaluated on two tasks: a rocket launch trajectory for which minimum time and maximum horizontal velocity are optimized and an orbit rising task that optimizes final energy and maneuver time. Chai et al. chai2018unified , discuss an aeroassisted trajectory optimisation problem with mission priority constraints. They propose a gradientbased hybrid GA surpassing the need to perform nondominated sorting (required by algorithms like NSGAII) and thus is more efficient in contexts where the complexity of the nondominated sorting operations becomes a practical issue.
2.3 Combinatorial Problems
With an increasing complexity, interplanetary trajectories can no longer (effectively) be described by continuous and unconstrained decision variables alone. For example, if multiple gravity assisted flybys become imperative, the sequence of planetary encounters is part of the optimisation problem and results in a combinatorial dimension that often turns out to be crucial for achieving the mission goals. A similar combinatorial part also appears in many other mission profiles where multiple bodies or rendezvous points have to be considered. The ninth edition of the global trajectory optimisation competition (GTOC9) asked for a debris removal mission in which a large number of space debris need to be visited (and deorbited) in quick succession while minimizing the amount of launches necessary (see Izzo and Märtens Izzo2018Kessler for details on the development of the challenge). In this situation, the aforementioned ACO has proven to be particularly effective. In fact, the top scoring teams gtoc9JPL ; gtoc9NUDT ; gtoc9XSCC in the GTOC9 made use of some ACO variant. Furthermore, a variant of ACO is also deployed by Ceriotti and Vasile ceriotti2010mga to optimize missions containing multiple flyby maneuvers like Cassini and Laplace.
The full automation of the interplanetary trajectory pipeline is also explicitly proposed by Englander et al. englander2016automated , who makes use of a number of well assembled techniques to propose an automated solution strategy. In this work, the combinatorial part of the problem is solved by a GA while an approach based on monotonic basin hopping takes care of the inner lowthrust problem as suggested in the work of Yam et al. yammbh .
Other challenges worth mentioning include mission concepts inside the asteroid belt where the number of astronomical bodies and the sequence of possible transfers lead to a combinatorial explosion as multiple categorical decision variables have to be considered. Izzo et al. izzo2014gtoc5 design a GA for a multiple asteroid rendezvous mission which allows to deactivate (or hide) certain genes in the solution representation. The systematic use of hidden genes in spacecraft trajectory optimisation is further studied by Abdelkhalik and Darani hiddenOSSAMA who design a Jupiter transfer including the optimisation of intermediate flyby using this technique.
Some particularly complex interplanetary trajectory optimisation problems, like it is the case for active multiple debris removal missions, can also be interpreted as Travelling Salesman Problems and be dealt with accordingly. Izzo et al. tsp_izzo show that once the problem is framed as a Travelling Salesman Problem, specific heuristics like inverover become effective operators for evolutionary algorithms.
3 Tree Searches
Although direct application of evolutionary algorithms to combinatorial problems is possible, it may lead to suboptimal results if the search space becomes too large to be sampled effectively. An alternative approach to deal with these problems is that of deploying tree searches, which becomes an option whenever the problem domain allows to construct a solution incrementally by the subsolutions to smaller, separable subproblems. This is often the case for the combinatorial challenges mentioned before, i.e. complex rendezvous, flyby problems, etc. which can be seen as bilevel optimisation problems, in which the combinatorial selections at the outer level influence the resulting fitness landscape of the inner continuous level. Vice versa, the performance of the optimized trajectories at the innerlevel feed back into guiding the selection on the outer level.
Tree search methods emerged within these contexts as one of the most successful approaches for this type of challenge. In a tree search, decision points (i.e. which orbital body should the spacecraft visit next?) are modeled as nodes which can be expanded for further evaluation. As an exhaustive enumeration of all possible node expansions becomes quickly intractable, each tree search deploys a strategy that only explores the most promising branches as needed and produces subproblems that can be handled by evolutionary algorithms more easily.
Wilt et al. Wilt2010comparison study such strategies on multiple benchmark problems (although none inspired by interplanetary mission designs). They conclude that the strategy of the Beam Search algorithm is the best choice, especially when confronted with massive search spaces. This might be the reason, why the Beam Search algorithm emerged in many works for preliminary studies of complex missions.
Arguably one of the best features of Beam Search is its balance between the exploration and exploitation of the search tree by limiting its expansion to a subset of nodes determined by a ranking criteria. For example, LazyRace Tree Search (see Izzo et al. izzo2013search ) is a Beam Search that ranks the (partial) trajectories by their total time of flight while each node is expanded to minimize mass consumption. This allows for an effective exploration on different levels of the tree in comparison to a greedy search strategy which would, typically, proceed levelwise through the tree selecting only the best few solutions and discarding path that would have become promising to explore at deeper levels.
While ranking criteria are typically deterministic, expansion of the search tree can also be guided by a stochastic process. Hennes and Izzo hennes2015interplanetary study Monte Carlo Tree Search (MCTS), a tree search variant frequently applied to develop agents for difficult games with large search spaces like the game of Go (Compare Browne et al. Browne2012Survey ). By applying MCTS to the large design space of the Cassini missions, the authors were able to rediscover the correct sequence of planetary encounters within a time line that was very close to the actual trajectory flown.
Another stochastic tree search that hybridizes Beam Search with ACO is Beam PACO as introduced by Simões et al. simoes2017multi . Following this approach, the search paths of the tree are modified by pheromone markers to reinforce exploitation of promising paths. Furthermore, Beam PACO is implemented as an anytime algorithm and can thus provide a solution at every point during its execution, allowing for an easy finetuning between computational resources and solution quality. The winning team in the GTOC9 competition, from NASA’s Jet Propulsion Laboratory, made use of a modified Beam PACO algorithm as part of their solution strategy gtoc9JPL .
4 Machine Learning
While evolutionary techniques are widely adopted to aid the design of spacecraft trajectories, the use of machine learning (ML) concepts like Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), Decision Trees, Random Forests, etc. is still at its beginning. Reasons for the slow adoption of ML include the lack of publicly available and suitable large scale data sets for aerospace challenges and the sometimes less obvious applicability of these methods to the typical problems encountered. Nevertheless, the interest in ML is high and more research is produced every year that features, in one form or another, a machine learning model. In the following, we first highlight some important connections that exists between ML and evolutionary algorithms, before we focus on Deep Learning (DL) as one of the major technologies. To complement DL, we briefly mention some alternative approaches for supervised learning before concluding with important work that applies and explores the reinforcement learning paradigm.
4.1 Machine Learning and Evolutionary Optimisation
ML algorithms have many relations to the evolutionary techniques discussed before. The most common relation is the generation of training data: during the optimisation of an interplanetary transfer, as in any optimisation task, a large number of solutions are computed and assessed to inform the search for better candidates. These design points are used to guide the search into more promising directions, but are typically discarded due to their intermediary nature. However, these (partial) trajectories and solutions become extremely valuable when applied within a supervised learning setup. In particular, future evolutionary runs may improve significantly if initial guesses are provided by a ML model as demonstrated by Cassioli et al. cassioli2012machine on some of the trajectory problems in the GTOP database by using SVMs.
Similarly, Basu et al. Base2017Timeoptimal study the design of timeoptimal slew maneuver for a rigid space telescope in order to reorient some light sensitive parts to observe quick events like gamma ray bursts. While timeoptimal solutions can be obtained with PSO, the convergence time of the swarm is considered too high to be practical for this task. As a solution, a neural network is deployed to predict advantageous initial conditions for the PSO, significantly reducing the convergence time and moving this approach closer to a realtime optimal control system.
Machine learning may also construct an inexpensive proxy to the objective function (Ampatzis et al. ampatzis2009machine
) if trained with the design points sampled during evolutionary runs. Building such a surrogate model is particularly relevant if the computation of some interplanetary mission fitness requires a high amount of computational resources and effort such as in the case of optimal lowthrust transfers. In this case, a surrogate model approximating the final optimal transfer mass enables to quickly search for ideal launch and arrival epochs, as well as favorable planetary body sequences, e.g. in the case of multiple asteroid or debris rendezvous missions as shown in the works of Hennes et al.
hennes2016fast and Mereta et al. mereta2017machine .Unsupervised learning techniques such as clustering or nearest neighbours have also been deployed to select the target of transfers in multiple asteroid rendezvous missions, upon proper definition of a metric coping with the orbital nonlinearities (Izzo et al. izzo2016designing ), or to define new box bounds and hence focus successive evolutionary runs in promising areas of the search space by cluster pruning (Izzo et al. izzo2007search ; izzo2010global ).
Lastly, GAs have been used to directly modify the weights of a single layer neural network (neuroevolution) in the work of Dachwald dachwald2004low ; bernd:2018
for low thrust transfers. However, most recently, deep neural networks have shown superior performance in this setup and provide a promise of future onboard realtime computation of guidance profiles. In the following subsection, we highlight recent work that follows this paradigm and can be classified as DL.
4.2 Deep Learning
Artificial Neural Networks provide the most popular and successful applications in the field of machine learning in our decade. In particular, Deep Neural Networks (DNN), i.e. networks with a large number of hidden layers, are frequently deployed to learn a model from a database of examples (see Schmidhuber schmidhuber2015deep ) using some form of gradient descent.
SánchezSánchez et al. sanchez2016learning ; sanchez2016real
provide a systematic study on how DNNs can be trained on optimal state feedback of continuous time, deterministic, nonlinear systems like inverted pendulum stabilization, pinpoint landing of a multicopter and landing of a spacecraft (imitation learning or supervised learning). These systems include cost functions for smooth continuous, but also discontinuous (bangbang) optimal control, as is for example the mass optimal case considered there. It is shown that a deep network is able to learn all tasks with remarkable accuracy and, for the multicopter and spacecraft scenario, that the network generalizes well outside the bounds of the training set whenever deep network topologies are deployed. The authors also suggest that deep architectures do more than merely interpolate between data but may actually learn underlying dynamic principles of the models under investigation, like the HamiltonJacobiBellman equations. Izzo et al.
izzo2018machine extend these results to interplanetary trajectories, showing how it is possible to train a DNN to guide a spacecraft optimally from an Earth orbit to a Mars orbit. The same authors izzo2018neurostabilityalso propose the name Guidance and Control Networks (G&CNETs) to indicate a generic deep architecture trained to perform optimal manoeuvres using the imitation learning (or supervised learning) paradigm, and provide a new method based on differential algebra and automated differentiation to study their stability margins and controlling performances. As such, G&CNETs are one of the most promising Deep Learning based technologies that can potentially simplify the on board control and guidance software replacing it with one, relatively simple, trained neural model. While G&CNETs are simple feed forward networks, more complex, recurrent topologies are explored as well, i.e. Recurrent Neural Networks based on Long ShortTerm Memory (LSTM) by Furfaro et al.
furfaro2018recurrent in a similar context.Since accurate position information might not be readily available due to technical limitations and costs of sensory hardware, there is a recent trend to develop control networks solely based on simple visual cues and optical flow that can be obtained by comparatively inexpensive cameras (see Franceschini franceschini2014small ). The most popular architectures for image processing within the Deep Learning world are arguably Convolutional Neural Networks (CNNs) due to their superior performance in image classification benchmarks krizhevsky2012imagenet . In a different work of Furfaro et al. Furfaro2018Deep
, a stack of DNNs is assembled, starting with a CNN for the visual processing of simulated moon images and followed by a Recurrent Neural Network to estimate landing controls in a 2D simulated environment. The Recurrent Neural Network is based on LSTM cells, which are a known for their good performance on sequential or timedependent data. The total network consists of almost 30M trainable parameters and performs classification on the thrust vector magnitude (bangbang) and regression on the thrust vector angle, both with high degrees of accuracy.
4.3 Alternative Supervised Learning Approaches
Acquisition of reliable data from space remains one of the biggest roadblocks for DL in general. However, more parsimonious alternatives to DNNs are under study as well: Shang et al. Shang2018Parameter design fuel and time optimal trajectories for transfers between all asteroids in the inner belt (around 150K) by a Gaussian Process Regression model. The challenge is to accurately predict the orbital parameters of the corresponding pairwise transfers which would take a considerable amount of time to compute by conventional methods. By grouping asteroids of similar orbits together, sophisticated feature engineering allows to train the Gaussian Process Regression model on merely 300 numerically derived training samples to reach high levels of accuracy.
Shah and Beeson Shah2017Vishwa apply neural networks and random forests (among GAs) to approximate manifold structures emerging in threebody problems. The authors compare their trajectories with cubic convolution, the current stateoftheart method for approximation of such manifolds and find that Random Forests perform reasonably well across multiple orbital energies.
Support Vector Machines (SVMs) provide an alternative supervised learning approach for both, classification and regression problems. SVMs can efficiently handle linear and nonlinear problems, relying on the kernel functions which are used for training. Another key property of SVMs is their universal approximation capability with various kernels, including Gaussian, several dot product, or polynomial kernels Hammer2003A .
Li et al. Li2015Trajectory use SVMs to classify the trajectories in the circular restricted threebody problem. Based on the SVM approach, the transit orbits, which can pass through the bottleneck region of the zero velocity curve and escape from the vicinity of the primary or the secondary, can be rapidly separated from the other types of orbits. Peng and Bai peng2018exploring explore the capabilities of SVMs for improving orbit prediction accuracy. The SVM model is designed and trained to learn the underlying pattern of the orbit prediction errors from historical data. The simulation results demonstrate that the trained SVM model is able to capture the underlying relationship between the learning variables and the desired orbit prediction error. However limitations to its generalization apply if predicted orbits are too far in the future peng2017limits. The authors later also deploy ANNs on the same problem Peng2018Artificial complementing the SVM approach.
4.4 Reinforcement Learning
Reinforcement Learning (RL) is a subdiscipline of machine learning in which agents learn how to solve a task (like navigation or planning) within a potentially changing environment. The system receives feedback by means of a reward function, which informs the agents how good or how bad they are currently performing. The overall goal is to maximize the cumulative reward of this function by selecting the best (sequence) of actions given certain observations of the environment. An advantage of RL is its ability to adapt to unknown situations and circumstances that are difficult to foresee or cumbersome to account for manually. Consequently, challenges in autonomous navigation and control are natural fits for RL as soon as uncertainty arises and robustness is of importance.
Typical challenges involve controllers that enable spacecrafts to hover and orbit irregular shaped bodies (like asteroids). First explorations by Gaudet and Furfaro Gaudet2012Robust show that RL is able to learn the nonuniform gravitational and rotational fields of simulated asteroids to develop a thrustprofile for accurate and robust hovering. This task would otherwise be extremely timeconsuming to solve with traditional methods given the plethora of differently shaped asteroids found just in our solar system.
Willis et al. Willis2016reinforcement improve upon the previous work by increasing the accuracy of the approach by an order of magnitude and transferring the task to a more general gravitational model. A further improvement is the transfer to optical flow as single source of sensory information for the controller. Interestingly, the weights of the network deployed for this scenario are trained with neuroevolution by a variant of PSO.
There is an additional stream of work emerging outside the terminology of machine learning, but strongly related to RL. Most notably, Pellegrini et al. Pellegrini2017Multiple and Ozaki et al. Ozaki2015differential explore dynamic programming for low thrust transfers. Also here, the main motivation is to provide a robust optimal control in uncertain environments.
Looking at AI in general, arguably one of the most effective approaches in this decade was the arrival of Deep Reinforcement Learning (DRL), spearheaded by the success of AlphaGo Silver2016mastering ; Silver2017mastering , the first neural network capable of beating human world champions at the game. While the first iteration (AlphaGo) learned from a database recording several human matches of Go, later variants (like AlphaGo Zero) made heavy use of competitive selfplay, i.e. training with previous iterations of itself, starting from random actions. This creates a large amount of data, on which the system gradually improves itself. One of the key ingredients of the overwhelming success of this approach is experience replay, a history of previously learned action and reward pairs that is being replayed during training.
So far, adaptation of DRL in the domain of Guidance and Control is slow, but first studies are emerging: Chu et al. Chu2018Q deploy a Deep Qlearning Network (DQN) to design a rendezvousmission, where one satellite needs to meet a second satellite within a constellation while avoiding collisions. The deployed QLearning algorithm allows the satellite controller to make sequential decisions on the path to take, safely navigating the environment. As this problem features extremely highdimensional state and action spaces, the standard QLearning algorithm had to be augmented by a DNN, resulting in a DQN. By deploying the DQN to approximate the highdimensional rewardfunction, the satellite controller is able to produce twokey behaviours necessary to complete the task: obstacle avoidance and target seeking. Most notably, experience replay is deployed in this setting as a mechanism to prevent divergence and stabilize the training of the DQN.
Very recently, Gaudet et al. gaudet2018deep
published a work on Deep Reinforcement Learning on a planetary descent and landing problem, which deploys Policy Gradient Optimization as the learning algorithm. The authors design and train a 6degreeoffreedom integrated closedloop controller capable of fueloptimal pinpoint landing, satisfying varying flight and system constraints. The work highlights the importance of shaping the reward function towards the task and provides exhaustive technical details on how the DNN is trained for this setting.
5 Final Remarks
The use of results and ideas originated in the AI research community has been already fruitful in domains such as orbital prediction, planetary landing, spacecraft guidance, interplanetary trajectory optimisation and lowthrust propulsion, allowing the development of novel methods and architectures that are competitive, and often superior, to the state of the art methodologies currently used and taught to aerospace engineers. We are confident that a widespread use of these techniques, currently only known and used by academics and a few practitioners, will increase the level of automation and the performances of space systems already in the next decade. This trend has already begun and is continuing, taking now advantage of the increased attention on AI that happened worldwide thanks to the success of paradigms such as Deep Learning in the IT industry.
We expect that more success stories in new domains such as formation flying, rendezvous and docking, inorbit selfassembly, or autonomous detection on planet/asteroid surface will appear in the next decades with Deep Learning and Deep Reinforcement Learning powering many developments in all areas as per their novelty and increasing attractiveness. Work on the validation of AI based system is also expected to appear to provide methods for increasing the trust in trained models (like DNNs), which are often regarded with skepticism when seen as a black box.
Bibliography
 (1) Girimonte D, Izzo D. Artificial intelligence for space applications. In Intelligent Computing Everywhere, Springer2007, 235–253.
 (2) Lary DJ. Artificial intelligence in Aerospace. In Aerospace Technologies Advancements, InTech2010.
 (3) Zhu XX, Tuia D, Mou L, Xia GS, Zhang L, Xu F, Fraundorfer F. Deep learning in remote sensing: a comprehensive review and list of resources. IEEE Geoscience and Remote Sensing Magazine, 2017, 5(4): 8–36.
 (4) Izzo D, Sprague C, Tailor D. Machine learning and evolutionary techniques in interplanetary trajectory design. arXiv preprint arXiv:1802.00180, 2018.
 (5) Li S, Huang X, Yang B. Review of optimization methodologies in global and China trajectory optimization competitions. Progress in Aerospace Sciences, 2018, 102: 60 – 75, doi:https://doi.org/10.1016/j.paerosci.2018.07.004.
 (6) Russell SJ, Norvig P. Artificial intelligence: a modern approach, Pearson Education, Inc.2010.
 (7) Campelo F, Aranha C. EC Bestiary: A bestiary of evolutionary, swarm and other metaphorbased algorithms, 2018, doi:10.5281/zenodo.1293352.
 (8) Vinkó T, Izzo D. Global optimisation heuristics and test problems for preliminary spacecraft trajectory design. Eur. Space Agency, Adv. Concepts Team, ACT Tech. Rep., id: GOHTPPSTD, 2008.
 (9) Stracquadanio G, La Ferla A, De Felice M, Nicosia G. Design of Robust Space Trajectories. In SGAI Conf., Springer2011, 341–354.
 (10) Addis B, Cassioli A, Locatelli M, Schoen F. A global optimization method for the design of space trajectories. Computational Optimization and Applications, 2011, 48(3): 635–652.
 (11) Schlueter M. MIDACO software performance on interplanetary trajectory benchmarks. Advances in Space Research, 2014, 54(4): 744–754.
 (12) Islam SM, Das S, Ghosh S, Roy S, Suganthan PN. An adaptive differential evolution algorithm with novel mutation and crossover strategies for global numerical optimization. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2012, 42(2): 482–500.
 (13) Cassioli A, Di Lorenzo D, Locatelli M, Schoen F, Sciandrone M. Machine learning for global optimization. Computational Optimization and Applications, 2012, 51(1): 279–303.
 (14) Simões LF, Izzo D, Haasdijk E, Eiben AE. Selfadaptive genotypephenotype maps: neural networks as a metarepresentation. In International Conference on Parallel Problem Solving from Nature, Springer2014, 110–119.
 (15) Elsayed SM, Sarker RA, Essam DL. GA with a new multiparent crossover for solving IEEECEC2011 competition problems. In Evolutionary Computation (CEC), 2011 IEEE Congress on, IEEE2011, 1034–1040.
 (16) Myatt D, Becerra VM, Nasuto SJ, Bishop J. Advanced global optimisation for mission analysis and design. Final Report. Ariadna id 04/4101, 2004.
 (17) Izzo D, Becerra VM, Myatt DR, Nasuto SJ, Bishop JM. Search space pruning and global optimisation of multiple gravity assist spacecraft trajectories. Journal of Global Optimization, 2007, 38(2): 283–296.
 (18) Olds AD, Kluever CA, Cupples ML. Interplanetary mission design using differential evolution. Journal of Spacecraft and Rockets, 2007, 44(5): 1060–1070.
 (19) Izzo D, Simões LF, Märtens M, De Croon GC, Heritier A, Yam CH. Search for a grand tour of the Jupiter galilean moons. In Proceedings of the 15th annual conference on Genetic and evolutionary computation, ACM2013, 1301–1308.
 (20) Yao W, Luo J, Macdonald M, Wang M, Ma W. Improved Differential Evolution Algorithm and Its Applications to Orbit Design. Journal of Guidance, Control, and Dynamics, 2017.
 (21) Vasile M, Minisci E, Locatelli M. An inflationary differential evolution algorithm for space trajectory optimization. IEEE Transactions on Evolutionary Computation, 2011, 15(2): 267–281.

(22)
Pontani M, Conway BA. Particle swarm optimization applied to space trajectories.
Journal of Guidance, Control and Dynamics, 2010, 33(5): 1429–1441.  (23) Vasile M, Minisci E, Locatelli M. Analysis of some global optimization algorithms for space trajectory design. Journal of Spacecraft and Rockets, 2010, 47(2): 334.
 (24) Wolpert DH, Macready WG. No free lunch theorems for optimization. IEEE transactions on evolutionary computation, 1997, 1(1): 67–82.
 (25) Englander JA, Conway BA, Williams T. Automated mission planning via evolutionary algorithms. Journal of Guidance, Control, and Dynamics, 2012, 35(6): 1878–1887.
 (26) Sentinella MR, Casalino L. Hybrid evolutionary algorithm for the optimization of interplanetary trajectories. Journal of Spacecraft and Rockets, 2009, 46(2): 365.
 (27) Izzo D, Hennes D, Riccardi A. Constraint handling and multiobjective methods for the evolution of interplanetary trajectories. Journal of Guidance, Control, and Dynamics, 2014.
 (28) Radice G, Olmo G. Ant colony algorithms for twoimpulse interplanetary trajectory optimization. Journal of Guidance Control and Dynamics, 2006, 29(6): 1440.
 (29) Schlueter M, Erb SO, Gerdts M, Kemble S, Rückmann JJ. MIDACO on MINLP space applications. Advances in Space Research, 2013, 51(7): 1116–1131.
 (30) Coello CC. Evolutionary multiobjective optimization: a historical view of the field. IEEE computational intelligence magazine, 2006, 1(1): 28–36.
 (31) Deb K, Padhye N, Neema G. Interplanetary trajectory optimization with swingbys using evolutionary multiobjective optimization. In International Symposium on Intelligence Computation and Applications, Springer2007, 26–35.
 (32) Schütze O, Vasile M, Junge O, Dellnitz M, Izzo D. Designing optimal lowthrust gravityassist trajectories using space pruning and a multiobjective approach. Engineering Optimization, 2009, 41(2): 155–181.
 (33) Märtens M, Izzo D. The asynchronous island model and NSGAII: study of a new migration operator and its performance. In Proceedings of the 15th annual conference on Genetic and evolutionary computation, ACM2013, 1173–1180.
 (34) Zotes FA, Peñas MS. Particle swarm optimisation of interplanetary trajectories from Earth to Jupiter and Saturn. Engineering Applications of Artificial Intelligence, 2012, 25(1): 189–199.
 (35) Lee S, von Allmen P, Fink W, Petropoulos A, Terrile R. Multiobjective evolutionary algorithms for lowthrust orbit transfer optimization. In Genetic and Evolutionary Computation Conference (GECCO 2005), 2005.
 (36) Montano AA, Coello CAC, Schütze O, No AT, Ticoman CSJ. Multiobjective Optimization for Space Mission Design Problems. Computational Intelligence in Aerospace Sciences, 2014: 1–46.
 (37) Vasile M, Ricciardi L. A direct memetic approach to the solution of MultiObjective Optimal Control Problems. In Computational Intelligence (SSCI), 2016 IEEE Symposium Series on, IEEE2016, 1–8.
 (38) Chai R, Savvaris A, Tsourdos A, Chai S, Xia Y. Unified Multiobjective Optimization Scheme for Aeroassisted Vehicle Trajectory Planning. Journal of Guidance, Control, and Dynamics, 2018: 1–10.
 (39) Izzo D, Märtens M. The Kessler Run: On the Design of the GTOC9 Challenge. Acta Futura, 2018, 11: 11–24.
 (40) Petropoulos A, Grebow D, Jones D, Lantoine G, Nicholas A, Roa J, Senent J, Stuart J, Arora N, Pavlak T, et al.. GTOC9: Methods and Results from the Jet Propulsion Laboratory Team. Acta Futura, 2018, 11: 25–36.
 (41) Luo Y, Zhu Y, Zhu H, Yang Z, Mou S, Zhang J, Sun Z, Liang J. GTOC9: Results from the National University of Defense Technology. Acta Futura, 2018, 11: 37–47.
 (42) HongXin S, TianJiao Z, AnYi H, Zhao L. GTOC 9: Results from the Xi’an Satellite Control Center (team XSCC). Acta Futura, 2018, 11: 49–55, doi:10.5281/zenodo.1139240.
 (43) Ceriotti M, Vasile M. MGA trajectory planning with an ACOinspired algorithm. Acta Astronautica, 2010, 67(9): 1202–1217.
 (44) Englander JA, Conway BA. Automated Solution of the LowThrust Interplanetary Trajectory Problem. Journal of Guidance, Control, and Dynamics, 2016: 15–27.
 (45) Yam CH, Lorenzo DD, Izzo D. Lowthrust trajectory design as a constrained global optimization problem. Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace Engineering, 2011, 225(11): 1243–1251, doi:10.1177/0954410011401686.
 (46) Izzo D, Simoes LF, Yam CH, Biscani F, Di Lorenzo D, Addis B, Cassioli A. GTOC5: results from the European Space Agency and University of Florence. Acta Futura, 2014, 8: 45–55.
 (47) Abdelkhalik O, Darani S. Hidden genes genetic algorithms for systems architecture optimization. In Proceedings of the Genetic and Evolutionary Computation Conference 2016, ACM2016, 629–636.
 (48) Izzo D, Getzner I, Hennes D, Simões LF. Evolving solutions to TSP variants for active space debris removal. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, ACM2015, 1207–1214.
 (49) Wilt CM, Thayer JT, Ruml W. A comparison of greedy search algorithms. In third annual symposium on combinatorial search, 2010.
 (50) Hennes D, Izzo D. Interplanetary Trajectory Planning with Monte Carlo Tree Search. In IJCAI, 2015, 769–775.
 (51) Browne CB, Powley E, Whitehouse D, Lucas SM, Cowling PI, Rohlfshagen P, Tavener S, Perez D, Samothrakis S, Colton S. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in games, 2012, 4(1): 1–43.

(52)
Simões LF, Izzo D, Haasdijk E, Eiben A. Multirendezvous Spacecraft
Trajectory Optimization with Beam PACO. In
European Conference on Evolutionary Computation in Combinatorial Optimization
, Springer2017, 141–156.  (53) Basu K, Melton RG, AguasvivasManzano S. Timeoptimal reorientation using neural network and particle swarm formulation. In 2017 AAS/AIAA Astrodynamics Specialist Conference, 2017, AAS 17–816.
 (54) Ampatzis C, Izzo D. Machine learning techniques for approximation of objective functions in trajectory optimisation. In Proceedings of the IJCAI09 workshop on artificial intelligence in space, 2009, 1–6.
 (55) Hennes D, Izzo D, Landau D. Fast approximators for optimal lowthrust hops between main belt asteroids. In Computational Intelligence (SSCI), 2016 IEEE Symposium Series on, IEEE2016, 1–7.
 (56) Mereta A, Izzo D, Wittig A. Machine Learning of Optimal LowThrust Transfers Between NearEarth Objects. In International Conference on Hybrid Artificial Intelligence Systems, Springer2017, 543–553.
 (57) Izzo D, Hennes D, Simões LF, Märtens M. Designing complex interplanetary trajectories for the global trajectory optimization competitions. In Space Engineering, Springer2016, 151–176.
 (58) Izzo D. Global optimization and space pruning for spacecraft trajectory design. Spacecraft Trajectory Optimization, 2010, 1: 178–200.
 (59) Dachwald B. Lowthrust trajectory optimization and interplanetary mission analysis using evolutionary neurocontrol. Ph.D. thesis, Doctoral thesis, Universität der Bundeswehr München Fakultät für Luftund Raumfahrttechnik, 2004.
 (60) Dachwald B, Ohndorf A. Global Optimization of ContinuousThrust Trajectories Using Evolutionary Neurocontrol. In G Fasano, J Pinter, editors, Modeling and Optimization in Space Engineering  2018, Springer International Publishing, Switzerland (2019)2019.
 (61) Schmidhuber J. Deep learning in neural networks: An overview. Neural networks, 2015, 61: 85–117.
 (62) SánchezSánchez C, Izzo D, Hennes D. Learning the optimal statefeedback using deep networks. In Computational Intelligence (SSCI), 2016 IEEE Symposium Series on, IEEE2016, 1–8.
 (63) SánchezSánchez C, Izzo D. Realtime optimal control via Deep Neural Networks: study on landing problems. arXiv preprint arXiv:1610.08668, 2016.
 (64) Izzo D, Tailor D, Vasileiou T. On the stability analysis of optimal state feedbacks as represented by deep neural models. arXiv preprint arXiv:1812.02532, 2018.
 (65) Furfaro R, Bloise I, Orlandelli M, Di Lizia P, Topputo F, Linares R, et al.. A Recurrent Deep Architecture for QuasiOptimal Feedback Guidance in Planetary Landing. In IAA SciTech Forum on Space Flight Mechanics and Space Structures and Materials, 2018, 1–24.
 (66) Franceschini N. Small brains, smart machines: from fly vision to robot vision and back again. Proceedings of the IEEE, 2014, 102(5): 751–781.

(67)
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In
Advances in neural information processing systems, 2012, 1097–1105.  (68) Furfaro R, Bloise I, Orlandelli M, Di P. Deep Learning for Autonomous Lunar Landing. In 2018 AAS/AIAA Astrodynamics Specialist Conference, 2018, AAS 18–363.
 (69) Shang H, Wu X, Qiao D, Huang X. Parameter estimation for optimal asteroid transfer trajectories using supervised machine learning. Aerospace Science and Technology, 2018, 79: 570–579, doi:10.1016/j.ast.2018.06.002.
 (70) Shah V, Beeson R. Rapid approximation of invariant manifolds using machine learning methods. In 2017 AAS/AIAA Astrodynamics Specialist Conference, 2017, AAS 17–784.
 (71) Hammer B, Kai G. A Note on the Universal Approximation Capability of Support Vector Machines. Neural Processing Letters, 2003, 17(1): 43–53.
 (72) Li W, Huang H, Peng F. Trajectory classification in circular restricted threebody problem using support vector machine. Advances in Space Research, 2015, 56(2): 273–280.
 (73) Peng H, Bai X. Exploring Capability of Support Vector Machine for Improving Satellite Orbit Prediction Accuracy. Journal of Aerospace Information Systems, 2018, 15(6): 366–381.
 (74) Peng H, Bai X. Artificial Neural Network–Based Machine Learning Approach to Improve Orbit Prediction Accuracy. Journal of Spacecraft and Rockets, 2018: 1–13.
 (75) Gaudet B, Furfaro R. Robust spacecraft hovering near small bodies in environments with unknown dynamics using reinforcement learning. In AIAA/AAS Astrodynamics Specialist Conference, 2012, 5072.
 (76) Willis S, Izzo D, Hennes D. Reinforcement learning for spacecraft maneuvering near small bodies. American Institute of Aeronautics and Astronautics, Napa, 2016: 16–277.
 (77) Pellegrini E, Russell RP. A MultipleShooting Differential Dynamic Programming Algorithm. In AAS/AIAA Space Flight Mechanics Meeting, 2017.
 (78) Ozaki N, Campagnola S, Yam CH, Funase R. Differential Dynamic Programming Approach for RobustOptimal LowThrust Trajectory Design Considering Uncertainty. In 25th International Symposium on Space Flight Dynamics, Munich, Germany, 2015.
 (79) Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, et al.. Mastering the game of Go with deep neural networks and tree search. nature, 2016, 529(7587): 484.
 (80) Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, et al.. Mastering the game of Go without human knowledge. Nature, 2017, 550(7676): 354.
 (81) Chu X, Alfriend KT, Zhang J, Zhang Y. Qlearning Algorithm for Pathplanning to maneuver through a satellite cluster. In 2018 AAS/AIAA Astrodynamics Specialist Conference, 2018, AAS 18–268.
 (82) Gaudet B, Linares R, Furfaro R. Deep Reinforcement Learning for Six DegreeofFreedom Planetary Powered Descent and Landing. arXiv preprint arXiv:1810.08719, 2018.
Comments
There are no comments yet.