Multi-condition multi-objective optimization using deep reinforcement learning

by   Sejin Kim, et al.

A multi-condition multi-objective optimization method that can find Pareto front over a defined condition space is developed for the first time using deep reinforcement learning. Unlike the conventional methods which perform optimization at a single condition, the present method learns the correlations between conditions and optimal solutions. The exclusive capability of the developed method is examined in the solutions of a novel modified Kursawe benchmark problem and an airfoil shape optimization problem which include nonlinear characteristics which are difficult to resolve using conventional optimization methods. Pareto front with high resolution over a defined condition space is successfully determined in each problem. Compared with multiple operations of a single-condition optimization method for multiple conditions, the present multi-condition optimization method based on deep reinforcement learning shows a greatly accelerated search of Pareto front by reducing the number of required function evaluations. An analysis of aerodynamics performance of airfoils with optimally designed shapes confirms that multi-condition optimization is indispensable to avoid significant degradation of target performance for varying flow conditions.



There are no comments yet.


page 39


A Multi-Objective Deep Reinforcement Learning Framework

This paper presents a new multi-objective deep reinforcement learning (M...

gTLO: A Generalized and Non-linear Multi-Objective Deep Reinforcement Learning Approach

In real-world decision optimization, often multiple competing objectives...

Multi-objective Analysis of MAP-Elites Performance

In certain complex optimization tasks, it becomes necessary to use multi...

Dynamic Bicycle Dispatching of Dockless Public Bicycle-sharing Systems using Multi-objective Reinforcement Learning

As a new generation of Public Bicycle-sharing Systems (PBS), the dockles...

Pareto Conditioned Networks

In multi-objective optimization, learning all the policies that reach Pa...

A Review of the Deep Sea Treasure problem as a Multi-Objective Reinforcement Learning Benchmark

In this paper, the authors investigate the Deep Sea Treasure (DST) probl...

Multi-Objective Optimization of the Textile Manufacturing Process Using Deep-Q-Network Based Multi-Agent Reinforcement Learning

Multi-objective optimization of the textile manufacturing process is an ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Optimization is central to all decision-making problems including economics, business administration, and engineering Chong and Zak (2004). In particular, in applied mechanics such as structural mechanics, electromagnetism, and biomechanics, designing an optimal shape that maximizes target performance, so-called shape optimization, has been actively studied to this day Semmler et al. (2015); Chu et al. (2021); Taylor and Dirks (2012); Park et al. (2018). Likewise, in fluid mechanics, studies on shape optimization for numerous real applications have been conducted Mohammadi and Pironneau (2004). For example, studies to improve aerodynamic or hydrodynamic characteristics through shape optimization of airplanes, ships, and automobiles have been continuing Droandi and Gibertini (2015); Peri et al. (2001); Percival et al. (2001); Yun et al. (2008). In addition, there have been many efforts in designing wind turbine blades that maximize power efficiency Xudong et al. (2009) and marine propellers to reduce underwater radiated noise Bertetta et al. (2012). Since fluids have nonlinear and high-dimensional characteristics, the performance of fluid machines can vary greatly depending on the shapes. Therefore, shape optimization is essential for the efficient operation of fluid machines.

In practical applications, generally, the operating conditions and target performance of fluid machines vary depending on the situation. For example, in the case of wind turbines, varying wind conditions alter the aerodynamic and structural performance of the blades Lachenal et al. (2013). Also, in the case of aircraft, the aerodynamic requirements of the wing vary according to flight situations such as cruise, departure, and landing Secanell et al. (2006). Therefore, in order to maintain the optimal state during operation, it is necessary to know the optimal shape under changing conditions and objectives and to modify its shape accordingly. In many application fields, numerous efforts have been made to maximize the target performance by changing the shape according to the situation Vasista et al. (2019). For aircraft, many studies have been conducted to improve the performance by modifying the shape using smart adaptive devices and morphing materials Diaconu et al. (2008); Barbarino et al. (2011); Ajaj et al. (2016). In addition, morphing hydrofoils and morphing composite propellers have been actively studied in marine applications Garg et al. (2015); Sacher et al. (2018); Chen et al. (2017).

Despite these efforts, ironically, related studies in the perspective of optimization methodology are insufficient. In order to remain optimal, knowing the optimal solution under changing conditions and objectives should be preceded. Optimization for various objectives is possible through conventional multi-objective (MO) optimization methods Srinivas and Deb (1994); Deb et al. (2002); Coello Coello and Lechuga (2002); Miettinen and Mäkelä (2002). However, to the best of our knowledge, there is no optimization method that can find the optimal solution considering various conditions. Therefore, in addition to MO optimization, an optimization method that can handle both various conditions and objectives is needed.

MO optimization method is to optimize multiple objectives that generally conflict with each other at a single condition. The goal of MO optimization is to find Pareto front which is a set of optimal trade-off solutions among the objectives. However, it can be only applied to a single prescribed condition. To find optimal solutions within a condition range, it is necessary to prescribe some conditions as representatives in advance, and perform optimization separately. For example, Secanell et al. Secanell et al. (2006) performed optimization at seven prescribed flight conditions to design a morphing airfoil. In addition, Wang et al. Wang et al. (2020) conducted optimization considering three representative conditions to design a centrifugal pump. However, if optimization is performed only at some prescribed conditions, the obtained solutions can be valid only at the predetermined conditions. In addition, because optimization has to be repeated from scratch for multiple conditions, it is very inefficient to perform optimization at sufficiently many conditions. Thus, in order to overcome these limitations, a multi-condition multi-objective (MCMO) optimization method that can efficiently find a set of optimal solutions (Pareto front) in a condition range is needed.

Recently, with the advancement of artificial intelligence, studies combining it with optimization are being actively conducted 

Yan et al. (2019); Li et al. (2020). In particular, deep reinforcement learning (DRL) is emerging as a new trend in the field of shape optimization Rabault et al. (2020). Viquerat et al. Viquerat et al. (2021b) showed the capability of DRL in shape optimization by successfully performing airfoil shape optimization at a single flow condition. Thereafter, DRL has started to be adopted for many shape optimization problems. Qin et al. Qin et al. (2021) conducted MO optimization of a cascade blade at a target flow condition using DRL. Also, Li et al. Li et al. (2021) conducted airfoil shape optimization to reduce drag using DRL, and they showed that the learned network can extract more improved shapes compared to the original shape at unlearned conditions. As these studies indicate, DRL has ample potential to be a key to the development of a MCMO optimization method. The basic concept of DRL is to find an optimal action with a given state. Thus, if the condition and objective of optimization are set as the state, it can take a role of a MCMO optimizer. Moreover, in contrast to conventional methods where each condition should be treated independently, it is expected to be more efficient by learning the correlations between conditions and optimal solutions.

In the present study, a DRL-based MCMO optimization method that can efficiently find Pareto front over a condition space is developed. Then, two MCMO optimization problems are dealt with by the developed method. The first problem is a benchmark problem to validate the method. As a benchmark problem, the Kursawe test function Kursawe (1991), which is a representative MO optimization problem, is newly extended to be suitable for MCMO optimization. Next, it is applied to airfoil shape optimization to show its applicability in practical engineering applications. The airfoil shape optimization is a representative shape optimization problem involving fluid dynamics, where nonlinearity and high-dimensionality are combined. Despite these difficulties, it has been actively studied due to its direct applicability to numerous engineering fields Zhang et al. (2021); Wang et al. (2019); Gillebaart and De Breuker (2016). Finally, further analysis is conducted in each problem to identify the exclusive capability of the proposed method.

2 Background

2.1 Multi-objective optimization

2.1.1 Problem description

A MO optimization problem with the number of objective functions can be defined as follows:


where is an element in the decision space , : consists of real-valued objective functions, and is the objective space.

In this problem, generally, no single solution can optimize these objectives simultaneously as they conflict with each other. Instead, a set of optimal trade-off solutions among different objectives exists by the concepts of Pareto dominance and Pareto optimality, which are defined as follows:

  • Pareto dominance: is said to Pareto dominate , denoted by , if and only if , and for at least one index .

  • Pareto optimality: A solution is said to be Pareto optimal if and only if such that .

The goal of a MO optimization problem is to find the Pareto optimal set and the corresponding Pareto front which is defined as Pareto optimal set.

2.1.2 Weighted Chebyshev method

The weighted Chebyshev method is one of decomposition-based methods for solving MO optimization problems. It scalarizes a MO optimization problem into multiple single-objective (SO) optimization problems by introducing a weight vector

, and the Chebyshev scalarizing function . determines the weight between objectives and is the scalarized objective of each SO optimization problem. Then, the original MO optimization problem can be solved by performing a number of scalarized SO optimization processes with different . The scalarized SO optimization problem can be written as follows:


is an utopia value which is defined as , where is a relatively small value.

Unlike other decomposition-based methods, the weighted Chebyshev method guarantees that all Pareto optimal solutions can be obtained for both convex and nonconvex problems Miettinen (2012). Because of this advantage, it has been widely used in literature Zhang and Li (2007); Tan et al. (2013b) and was also successfully used with DRL Van Moffaert et al. (Conference Proceedings). One of the difficulties in adopting the weighted Chebyshev method is that the utopia point, , has to be known before optimization, which requires SO optimization for each objective in advance. In the present study, the difficulty is overcome by integrating these overall processes into a single process, which will be further discussed in Section 4.1.1.

2.2 Deep reinforcement learning based optimization

2.2.1 Deep reinforcement learning

Reinforcement learning is a process of learning a policy to determine an optimal action with a given state Sutton and Barto (2018). At each discrete step n, it determines an action according to its current policy = . Then, through the execution of the action, a reward , according to the decision and the next state , are given. As the step progresses, data is accumulated and learning proceeds. The goal of learning is to find the optimal policy that maximizes the value function  Bellman (1966), which is defined as the expected sum of an immediate reward, , and discounted future rewards as follows:


where is a discount factor that determines the weight between short-term and long-term future rewards. This process is repeated until the terminal state, and it is called one episode.

In particular, if deep learning is adopted for learning, it is called DRL. For example, deep neural networks can be used as a policy itself or for predicting the value function. By incorporating the deep neural network, DRL is known to be able to handle complex and high-dimensional problems 

Mnih et al. (2015, 2013). Especially, DRL has shown its outstanding ability in optimal control and optimization Rabault et al. (2020); Buşoniu et al. (2018); Garnier et al. (2021).

2.2.2 Single-step deep reinforcement learning based optimization

Single-step DRL based optimization is very recently introduced by Viquerat et al. Viquerat et al. (2021b) where one learning episode consists of a single step; if an action is determined with a given state, a reward is given accordingly and the episode ends without the next state. Since the future rewards in Eq. (3) do not exist, the discount factor, , does not have to be defined and learning proceeds to maximize only the immediate reward. As a result, the optimal action that maximizes the reward itself can be directly determined. Therefore, if the reward is set as the objective function to be optimized, the optimal solution that maximizes the objective function can be directly obtained. By virtue of this characteristic, single-step DRL is known to be suitable as an optimization method Viquerat et al. (2021a).

3 Problem description of multi-condition multi-objective optimization

A MCMO optimization problem is extended from a MO optimization problem to include not only the decision variable , but also the condition variable . The problem with the number of objective functions is defined as follows:


where is an element in the decision space , is an element in the condition space , : consists of real-valued objective functions, and is the objective space.

Likewise, the concepts of Pareto dominance and Pareto optimality are extended to cover the condition variable, , which are defined as follows:

  • Pareto dominance: is said to Pareto dominate at the condition variable , denoted by , if and only if , and for at least one index .

  • Pareto optimality: A solution is said to be Pareto optimal at the condition variable , if and only if such that .

As in the MO optimization problem, solving a MCMO optimization problem is to find the Pareto optimal set and the corresponding Pareto front which is defined as Pareto optimal set. If is fixed, the MCMO optimization problem is reduced to a MO optimization problem.

4 Method

4.1 Deep reinforcement learning algorithm for multi-condition multi-objective optimization

4.1.1 State, action, and reward

In MCMO optimization, the optimal solution varies depending on the condition and objective. Therefore, the state of DRL is set to include the condition and objective, which is defined as follows:


where is a condition variable, is a weight vector, and is a utopia point at that condition. In the present study, is adaptively updated during optimization to a slightly lower value than the minimum value of each objective function. Since the Chebyshev scalarizing function, , differs depending on as in Eq. (2), the changing utopia information is included in the state for stable learning.

The action of DRL determines , element in the decision space, according to its policy, which is defined as follows:


In addition, all variables in the state and action are normalized to an absolute magnitude around for scaling.

Lastly, the reward of DRL is a quantitative evaluation of an action, which is defined as follows:


where is the value of the objective function obtained by executing an action. Note that the minus sign is added because the aim of optimization is to find minimizing the Chebyshev scalarizing function.

4.1.2 Data reproduction method

In the present study, a data reproduction method is applied to enlarge the number and diversity of data by exploiting the nature of the Chebyshev scalarizing function, . As in Eq. (7), the reward of DRL is a function of , , and . Since is independent of and , different rewards can be determined for arbitrary , once the objective functions are evaluated. Therefore, it is possible to reproduce an original data of pair by changing with a single function evaluation. It is expected that this method would accelerate learning and, thus, be essential for optimization problems where the function evaluation is costly. In the present study, at each episode, 100 data are reproduced from a single original data by changing

in a uniform distribution.

4.1.3 Learning procedure

The learning procedure of the present study is summarized in Algorithm 1. For the DRL algorithm, the actor-critic algorithm Konda and Tsitsiklis (2000)

is used, which is one of the representative DRL algorithms. In the algorithm, two types of neural networks are introduced. One is an actor network, policy itself, which determines an action in continuous space. The other is a critic network which predicts the value function depending on the state and action. As learning progresses, the critic network predicts the value function more and more accurately and, based on this, the probability that the actor network selects the optimal action increases.

Both networks are set as fully connected networks with four hidden layers of , , , and neurons and the Leaky ReLU activation function Maas et al. (2013) is used for the hidden layers in both networks. At the output layer of the actor network, the Tanh activation function is added so that the action values range from to . The learning rates are equally set to and Adam optimizer Kingma and Ba (2017) is used for updating the network parameters. Especially, the actor network is updated every two learning iterations () for stable learning Fujimoto et al. (2018). The mini-batch size, , is set to and the learning amount per one episode, , is set to

which is the same as the number of reproduced data per one original data. The standard deviation of the exploration noise,

, is set to in the initial warm-up episodes and afterward. The use of the cosine function enables both exploration for avoiding local minima and exploitation for accurately finding optimal solutions.

4.2 Selection of Pareto front

Pareto dominance in a MCMO optimization problem is defined at each condition variable . However, as in the obtained data are scattered over , there is no exactly the same where the dominance can be judged. Therefore, a concept of decomposition of the condition space is introduced to derive approximate solutions for a MCMO optimization problem. is decomposed into spaces as follows:


In each , is assumed to be the same, and Pareto front is selected from the data.

Note that the decomposition has no effect on the optimization process and can be modified during or after the optimization process. Therefore, can be freely adjusted according to the desired quality. For example, the denser the decomposition, the higher the resolution of the selected Pareto front, but the number of episodes required for convergence increases. In the present study, is decomposed into 100 spaces of the same size for selecting Pareto front.

4.3 Convergence judgment

In order to judge the convergence of an optimization process, the hypervolume indicator (HV) Zitzler (1999) is adopted. It refers to the volume in the objective space between Pareto front and a fixed reference point as shown in Fig. 1. Due to its monotonic characteristic, the larger the HV, the more accurate the Pareto front. Therefore, it is one of the most frequently used indicators for convergence and capability assessment of MO optimization methods. A general guideline for determining the reference point is to use a slightly worse point than the nadir point consisting of the worst objectives values over the Pareto front Auger et al. (2012).

In MCMO optimization, as described in Section 4.2, the condition space, , is decomposed and Pareto front is selected respectively in each decomposed space. Likewise, the HV is defined in each decomposed space. Therefore, in this study, the convergence of an optimization process is judged by HV, the average HV over all the decomposed spaces.

5 Results and discussion

In this section, the proposed DRL-based MCMO optimization method is applied to two problems, and the results are analyzed. The first problem is a newly modified Kursawe test function to a MCMO optimization problem. The second problem is airfoil shape optimization which is a representative shape optimization problem involving fluid dynamics.

5.1 Modified Kursawe test function

5.1.1 Problem setup

The modified Kursawe problem for MCMO optimization is defined as follows:


For the modification, is introduced for rotational transformation. If is set to , it reduces to the original Kursawe problem which is a MO optimization problem. Extending the problem through rotational transformation has two advantages. First, the characteristics of the original problem can be preserved. As the original Kursawe problem has discontinuous and nonconvex Pareto front, it has been actively adopted to evaluate the capability of MO optimization methods Lim et al. (2015); Tan et al. (2013a); Leung et al. (2014); Naranjani et al. (2017). Thus, with the preserved characteristics, the modified Kursawe problem can be a satisfactory benchmark problem for validating the developed MCMO optimization method. Second, real solutions can be readily obtained, which is crucial in designing a benchmark problem. The boundary shape of the feasible region in the objective space remains unchanged from the original Kursawe problem by rotational transformation. Therefore, the real Pareto front corresponding to can be easily obtained by judging dominance from the rotated boundary.

5.1.2 Optimization results

The modified Kursawe problem is solved as described in Algorithm 1. Fig. 2 shows the optimization process. As shown in Fig. 1(a), in the early episodes, data are widely scattered as the network is not developed enough. However, as the episode progresses, data are accumulated, and the network learns to find an optimal action with a given state. As a result, better solutions are obtained for newly given conditions and objectives, increasing the resolution of the Pareto front. Also, the clustered data near the Pareto front reinforce the learning again, forming a positive feedback loop. Finally, the Pareto front and the network converge at episode , which can be also seen in terms of HV as shown in Fig. 1(b).

Fig. 3 shows the optimization results at the converged episode. Through the optimization, high-resolution Pareto front of solutions over the whole condition space is obtained. As shown in Fig. 2(a), it shows good agreement with the real Pareto front including highly nonlinear parts near where the shape of the Pareto front drastically changes along . It shows the exclusive ability of the proposed MCMO optimization method. If several representative conditions are predetermined and optimization is performed at each condition, it is difficult to capture the nonlinear parts. On the other hand, the developed MCMO optimization is performed over the entire condition space, so that high-resolution Pareto front can be found. Fig. 2(b) shows the optimization results in five decomposed condition spaces. Note that since the condition space, , is equally decomposed into spaces in the present study, each figure in Fig. 2(b) shows one decomposed condition space. Even in those decomposed spaces, the solutions match well with the real Pareto front.

5.1.3 Effectiveness of multi-condition optimization

In this section, a computational experiment based on the modified Kursawe problem is set to analyze how effective multi-condition (MC) optimization is compared to single-condition (SC) optimization. The experiment is designed to compare the number of function evaluations to reach the same quality of optimization. Since SC optimization cannot be performed over a condition space, equally distributed conditions are prescribed in the condition space, , for the comparison. To measure the quality of optimization, a reference is set at each prescribed condition.

Then, two cases are compared by the total number of function evaluations required to reach the same at all prescribed conditions. The first case is SC optimization performed independently at each prescribed condition. The SC optimization method can be easily derived by fixing the condition in the developed method. The second case is MC optimization modified to be conducted only at the prescribed conditions. Although the proposed method in this study is conducted over a whole condition space, it is modified in this experiment for fair comparison.

at each condition is determined as an average HV by performing SC optimization ten times up to episode since the optimization process is stochastic due to the exploration of DRL. The HV of SC optimization at each condition shows convergence around episode and the obtained Pareto front matches well with the real Pareto front as shown in Fig. 4. This is quite comparable to other studies using the Kursawe test function for evaluating their optimization methods Lim et al. (2015); Tan et al. (2013a); Leung et al. (2014); Naranjani et al. (2017). In addition, when comparing the two cases, the average number of total function evaluations of ten runs is used for precise analysis.

Fig. 4 shows one example of the results when . As shown in Fig. 3(a), SC optimization is performed independently at each prescribed condition while MC optimization is performed simultaneously at the five prescribed conditions. As shown in the figure, the number of function evaluations at each condition is reduced in the MC optimization, resulting in a significant reduction of the total number of function evaluations. Total 28481 function evaluations are required in the SC optimization while total 49811 function evaluations are required to reach the same in the MC optimization. This reduction is attributed to the fact that the MC optimization learns the correlations between the conditions and the optimal solutions. By utilizing the correlations, it can effectively find the Pareto front with a small number of function evaluations. Fig. 3(b) shows the Pareto front obtained from the two cases. Because both cases satisfy the same , the Pareto front shows good agreement with the real Pareto front in both cases.

Fig. 5 shows the experiment results according to . In SC optimization, the number of function evaluations increases linearly with . This is a natural result because optimization is performed at each condition independently. However, in MC optimization, the increment gradually decreases, so that the difference between the two cases increases with . Especially, when , the number of function evaluations of MC optimization is only of that of SC optimization. Considering the proposed method in the present study is conducted continuously over a whole condition space (), it can be inferred that the reduction of the number of required function evaluations will be much greater than this result. Therefore, we can conclude that MC optimization is much effective than SC optimization and it is enabled by learning the correlations between conditions and optimal solutions.

5.2 Airfoil shape optimization

5.2.1 Problem setup

In numerous engineering fields, the flow condition and aerodynamic requirement of an airfoil can vary depending on the situation. In this section, a MCMO airfoil shape optimization problem is defined reflecting the practical applications. First, the lift coefficient, , and the lift-to-drag ratio, , are set as the objectives of optimization to be maximized. These objectives are crucial factors in designing an airfoil which many researchers are interested in Mukesh et al. (2014); Ribeiro et al. (2012); Zhang et al. (2019b); Huyse et al. (2002). Next, as a representative value of the flow condition, the chord Reynolds number, , is set as a condition variable of optimization.

In this study, an airfoil shape is parameterized using the Kármán-Trefftz transformation Milne-Thomson (1973). A Kármán-Trefftz airfoil is generated from the transformation of a circle in the -plane to the physical -plane. The circle in the -plane centered on is defined to pass . Then, a complex variable on the circle is transformed to to generate an airfoil as follows:


where is a trailing-edge angle of the generated airfoil. Since it can generate various and realistic airfoils, it has been utilized in many studies Puorger et al. (2007); Berci et al. (2014). Along with the shape of an airfoil itself, the angle of attack is an important factor that greatly influences the aerodynamic characteristics. Thus, when designing an airfoil, relative to the flow direction has to be optimized to achieve optimal performance Huyse et al. (2002). In the present study, in addition to , , and which determine a Kármán-Trefftz airfoil, is included as a design variable of optimization.

The MCMO airfoil shape optimization problem is defined as follows:


Since the goal of optimization is to maximize and , minus signs are added. is multiplied to to match the scale between and . In order to evaluate and , XFOIL which is an analysis tool for airfoils Drela (1989) is adopted in the present study. It is widely used for airfoil shape optimization due to its low computational cost Hansen (2018); Zhang et al. (2019a); Ram et al. (2019). The ranges of the design variables, , , , and , are set to generate various airfoil shapes excluding unrealistic shapes, and the range of is set to cover sufficiently wide applications Lissaman (1983).

5.2.2 Optimization results

The airfoil shape optimization problem is solved as described in Algorithm 1. Fig. 6 shows the results of airfoil shape optimization. As shown in Fig. 5(a), is shown to converge at episode . Fig. 5(b) shows the Pareto front at the converged episode. Overall, Pareto front of sufficient resolution is successfully found within the defined condition space, indicating that the developed method can be applied to practical engineering applications. As shown in Fig 5(b), the maximum increases with while the maximum remains relatively constant. In particular, along the line where is maximized, does not change significantly, which refers that decreases according to . In addition, two distinct features are observed in the Pareto front. When is maximized, nonconvex parts are observed near . Next, when is maximized, nonlinear parts are observed near . These parts will be further discussed through analysis of the optimal solutions and the optimal airfoil shapes.

Fig. 7 shows the optimal solutions and the optimal airfoil shapes. As can be seen in Fig. 6(a), various values of design parameters are obtained depending on and . For each design parameter, is a factor that determines the thickness of the airfoil, so the smaller the absolute value, the thinner airfoil is generated. determines the camber of the airfoil. indicates a symmetric airfoil, and the larger the value, the upper cambered airfoil is generated. and are the trailing-edge angle and the angle of attack respectively, which are expressed in degrees.

As shown in Fig. 6(a), nonlinear features are observed where the optimal design parameters change dramatically with respect to and . These are particularly noticeable for and near and , and for and near and . These parts correspond to the aforementioned nonconvex and nonlinear parts observed in Pareto front respectively. Except for these nonlinear parts, overall trends are observed. As the weight of increases, thin and less cambered airfoils with low are generated. On the other hand, As the weight of increases, thick and highly cambered airfoils with high are generated. The trailing-edge angle, , shows relatively less variation and keeps its minimum value.

Fig. 6(b) is the optimal airfoil shapes according to and . As mentioned above, as is close to , airfoils with high camber and are generated for maximizing the lift at all . However, the values of do not increase to the maximum, which is due to the consideration of a stall phenomenon caused by excessively high . On the contrary, as is close to , the opposite tendency is observed to consider the drag. Also, when considering the drag, the thicknesses of airfoils decrease except for where the aforementioned nonlinearity exists.

5.2.3 Aerodynamic performance analysis of optimal airfoil shapes

In this section, based on the previous optimization results, the need for MC optimization which can be performed over a whole condition space is confirmed. In order to show the need, an analysis is conducted on whether the optimal shapes at some representative conditions can provide sufficient performance over the entire condition space. For the analysis, the optimal airfoil shapes that maximize with a constraint are selected. There are many situations to optimize one objective and keep others above a certain level. For example, in many aviation applications, is optimized while maintaining a certain level of to sustain their weightHuyse et al. (2002); Buckley et al. (2010); Nemec et al. (2004). The optimal solutions of the constrained optimization problem can be easily obtained from Pareto front as shown in Fig. 7(a).

As shown in the black lines in Fig. 7(b), the optimal shapes show greater than for all and the maximized increasing with . Then, two optimal shapes at different conditions are used for the analysis. The red lines show and of the optimal airfoil at . Compared to the optimal performance, both and decrease notably except for the optimized condition. In particular, drops significantly at slightly lower than the optimized condition, so the constraint cannot be satisfied at all. In the same way, the blue lines show and of the optimal airfoil at . Although it satisfies the constraint near the optimized condition, it also shows a substantial decrease in at slightly higher than the optimized condition.

Fig. 7(c) shows the optimal airfoil shapes according to various conditions. The optimal shape at quite differs from the optimal shape at , which results in the aforementioned difference in . However, The optimal shape at is very similar to the optimal shape at , although there is a large difference in as mentioned before. Likewise, except for the optimal shape at , there is no noticeable difference among the other optimal shapes while there exist significant performance differences. These results are attributed to the nonlinear characteristic of a fluid. The optimal shape can drastically change according to the condition, and even if there is no noticeable difference in shape, a slight variation in shape can make a huge performance difference.

Through the analysis, it is shown that an optimal shape at a specific condition cannot be valid at the nearby conditions, and it can be more severe in problems that have nonlinear characteristics. Therefore, it is inadequate to perform optimization by discretizing the condition space into several representative conditions. In order to overcome the problem and remain optimal for varying conditions, it is essential to consider the whole condition space through the MC optimization method proposed in the present study.

6 Concluding remarks

For the first time in the literature, a MCMO optimization method based on DRL has been developed to find Pareto front over a prescribed condition space. The main idea is based on that DRL can learn a policy for finding optimal solutions according to varying conditions and objectives. The method has been applied to two MCMO optimization problems. First, as a benchmark problem, the Kursawe test function has been newly modified to a MCMO optimization problem. Second, an airfoil shape optimization problem has been dealt with as a practical engineering application. The present MCMO optimization method shows outstanding ability in finding high-resolution Pareto front within the entire condition space including nonlinear and nonconvex parts.

Two additional analyses have been conducted to show its exclusive capability. Firstly, a computational experiment based on the modified Kursawe test function has been carried out to show the effectiveness of MC optimization. Compared with multiple operations of SC optimization for multiple conditions, the number of function evaluations required to find Pareto front is significantly reduced. This efficient optimization is enabled by learning the correlations between conditions and optimal solutions. Secondly, the necessity for MC optimization has been confirmed through an analysis of aerodynamic performance of airfoils with optimally designed shapes. An optimal solution at a specific condition cannot be valid at the nearby conditions, resulting in significant deterioration of target performance. Thus, it is essential to cover the entire condition space, which is possible through the proposed MC optimization method.

The proposed method can show its outstanding capability in optimization problems where conditions and objectives are not fixed. A representative example is shape optimization involving fluid mechanics in which the operating conditions are generally given as a range and the objectives differ depending on the situations. However, the proposed method is not limited to shape optimization and the dimensions of conditions and objectives are not restricted. It can be applied to any MCMO optimization problems. Through the developed MCMO optimization method, it is expected that the fields to which optimization can be practically applied will be greatly expanded. Moreover, from a methodological point of view, this study will pave the way to a new category of optimization as the first MCMO optimization method.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.


The work was supported by the National Research Foundation of Korea (NRF) under the Grant Number NRF-2021R1A2C2092146 and the Samsung Research Funding Center of Samsung Electronics under Project Number SRFC-TB1703-51.


  • R. M. Ajaj, C. S. Beaverstock, and M. I. Friswell (2016) Morphing aircraft: the need for a new design philosophy. Aerospace Science and Technology 49, pp. 154–166. External Links: ISSN 1270-9638 Cited by: §1.
  • A. Auger, J. Bader, D. Brockhoff, and E. Zitzler (2012) Hypervolume-based multiobjective optimization: Theoretical foundations and practical implications. Theoretical Computer Science 425, pp. 75–103. External Links: ISSN 0304-3975 Cited by: §4.3.
  • S. Barbarino, O. Bilgen, R. M. Ajaj, M. I. Friswell, and D. J. Inman (2011) A review of morphing aircraft. Journal of Intelligent Material Systems and Structures 22 (9), pp. 823–877. External Links: ISSN 1045-389X Cited by: §1.
  • R. Bellman (1966) Dynamic programming. Science 153 (3731), pp. 34–37. Cited by: §2.2.1.
  • M. Berci, V. V. Toropov, R. W. Hewson, and P. H. Gaskell (2014) Multidisciplinary multifidelity optimisation of a flexible wing aerofoil with reference to a small UAV. Structural Multidisciplinary Optimization 50 (4), pp. 683–699. External Links: ISSN 1615-147X Cited by: §5.2.1.
  • D. Bertetta, S. Brizzolara, S. Gaggero, M. Viviani, and L. Savio (2012) CPP propeller cavitation and noise optimization at different pitches with panel code and validation by cavitation tunnel measurements. Ocean Engineering 53, pp. 177–195. External Links: ISSN 0029-8018 Cited by: §1.
  • H. P. Buckley, B. Y. Zhou, and D. W. Zingg (2010) Airfoil optimization using practical aerodynamic design requirements. Journal of Aircraft 47 (5), pp. 1707–1719. Cited by: §5.2.3.
  • L. Buşoniu, T. de Bruin, D. Tolić, J. Kober, and I. Palunko (2018) Reinforcement learning for control: performance, stability, and deep approximators. Annual Reviews in Control 46, pp. 8–28. External Links: ISSN 1367-5788 Cited by: §2.2.1.
  • F. Chen, L. Liu, X. Lan, Q. Li, J. Leng, and Y. Liu (2017) The study on the morphing composite propeller for marine vehicle. part I: Design and numerical analysis. Composite Structures 168, pp. 746–757. External Links: ISSN 0263-8223 Cited by: §1.
  • E. K. Chong and S. H. Zak (2004) An introduction to optimization. John Wiley & Sons. Cited by: §1.
  • S. Chu, M. Xiao, L. Gao, Y. Zhang, and J. Zhang (2021) Robust topology optimization for fiber-reinforced composite structures under loading uncertainty. Computer Methods in Applied Mechanics and Engineering 384, pp. 113935. External Links: ISSN 0045-7825 Cited by: §1.
  • C.A. Coello Coello and M. Lechuga (2002)

    MOPSO: A proposal for multiple objective particle swarm optimization

    Conference Proceedings In

    Proceedings of the 2002 Congress on Evolutionary Computation. CEC’02 (Cat. No.02TH8600)

    Vol. 2, pp. 1051–1056. Cited by: §1.
  • Kalyanmoy. Deb, A. Pratap, S. Agarwal, and T. Meyarivan (2002)

    A fast and elitist multiobjective genetic algorithm: NSGA-II

    IEEE Transactions on Evolutionary Computation 6 (2), pp. 182–197. External Links: ISSN 1941-0026 Cited by: §1.
  • C. G. Diaconu, P. M. Weaver, and F. Mattioni (2008) Concepts for morphing airfoil sections using bi-stable laminated composite structures. Thin-Walled Structures 46 (6), pp. 689–701. External Links: ISSN 0263-8231 Cited by: §1.
  • M. Drela (1989) XFOIL: An analysis and design system for low Reynolds number airfoils. Conference Proceedings In Low Reynolds Number Aerodynamics, pp. 1–12. External Links: ISBN 978-3-642-84010-4 Cited by: §5.2.1.
  • G. Droandi and G. Gibertini (2015) Aerodynamic blade design with multi-objective optimization for a tiltrotor aircraft. Aircraft Engineering and Aerospace Technology: An International Journal 87 (1), pp. 19–29. External Links: ISSN 0002-2667 Cited by: §1.
  • S. Fujimoto, H. van Hoof, and D. Meger (2018) Addressing function approximation error in actor-critic methods. In

    Proceedings of the 35th International Conference on Machine Learning

    Vol. 80, pp. 1587–1596. Cited by: §4.1.3.
  • N. Garg, G. K. W. Kenway, Z. Lyu, J. R. R. A. Martins, and Y. L. Young (2015) High-fidelity hydrodynamic shape optimization of a 3-D hydrofoil. Journal of Ship Research 59 (04), pp. 209–226. External Links: ISSN 0022-4502 Cited by: §1.
  • P. Garnier, J. Viquerat, J. Rabault, A. Larcher, A. Kuhnle, and E. Hachem (2021) A review on deep reinforcement learning for fluid mechanics. Computers & Fluids 225, pp. 104973. External Links: ISSN 0045-7930 Cited by: §2.2.1.
  • E. Gillebaart and R. De Breuker (2016) Low-fidelity 2D isogeometric aeroelastic analysis and optimization method with application to a morphing airfoil. Computer Methods in Applied Mechanics and Engineering 305, pp. 512–536. External Links: ISSN 0045-7825 Cited by: §1.
  • T. H. Hansen (2018) Airfoil optimization for wind turbine application. Wind Energy 21 (7), pp. 502–514. External Links: ISSN 1095-4244 Cited by: §5.2.1.
  • L. Huyse, S. L. Padula, R. M. Lewis, and W. Li (2002) Probabilistic approach to free-form airfoil shape optimization under uncertainty. AIAA journal 40 (9), pp. 1764–1772. Cited by: §5.2.1, §5.2.1, §5.2.3.
  • D. P. Kingma and J. Ba (2017) Adam: A method for stochastic optimization. External Links: 1412.6980 Cited by: §4.1.3.
  • V. R. Konda and J. N. Tsitsiklis (2000) Actor-critic algorithms. In Advances in Neural Information Processing Systems, pp. 1008–1014. Cited by: §4.1.3.
  • F. Kursawe (1991) A variant of evolution strategies for vector optimization. Conference Proceedings In Parallel Problem Solving from Nature, pp. 193–197. External Links: ISBN 978-3-540-70652-6 Cited by: §1.
  • X. Lachenal, S. Daynes, and P. M. Weaver (2013) Review of morphing concepts and materials for wind turbine blade applications. Wind Energy 16 (2), pp. 283–307. External Links: ISSN 1095-4244 Cited by: §1.
  • M. Leung, S. Ng, C. Cheung, and A. K. Lui (2014) A new strategy for finding good local guides in MOPSO. Conference Proceedings In 2014 IEEE Congress on Evolutionary Computation (CEC), pp. 1990–1997. External Links: ISBN 1941-0026 Cited by: §5.1.1, §5.1.3.
  • J. Li, M. Zhang, J. R. R. A. Martins, and C. Shu (2020) Efficient aerodynamic shape optimization with deep-learning-based geometric filtering. AIAA Journal 58 (10), pp. 4243–4259. External Links: ISSN 0001-1452 Cited by: §1.
  • R. Li, Y. Zhang, and H. Chen (2021) Learning the aerodynamic design of supercritical airfoils through deep reinforcement learning. AIAA Journal 59 (10), pp. 3988–4001. External Links: ISSN 0001-1452 Cited by: §1.
  • W. J. Lim, A. B. Jambek, and S. C. Neoh (2015) Kursawe and ZDT functions optimization using hybrid micro genetic algorithm (HMGA). Soft Computing 19 (12), pp. 3571–3580. External Links: ISSN 1433-7479 Cited by: §5.1.1, §5.1.3.
  • P. B. S. Lissaman (1983) Low-Reynolds-number airfoils. Annual Review of Fluid Mechanics 15 (1), pp. 223–239. External Links: ISSN 0066-4189 Cited by: §5.2.1.
  • A. L. Maas, A. Y. Hannun, and A. Y. Ng (2013) Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the 30th International Conference on Machine Learning, Vol. 30, pp. 3. Cited by: §4.1.3.
  • K. Miettinen and M. M. Mäkelä (2002) On scalarizing functions in multiobjective optimization. OR Spectrum 24 (2), pp. 193–213. External Links: ISSN 1436-6304 Cited by: §1.
  • K. Miettinen (2012) Nonlinear multiobjective optimization. Vol. 12, Springer Science & Business Media. Cited by: §2.1.2.
  • L. M. Milne-Thomson (1973) Theoretical aerodynamics. Courier Corporation. Cited by: §5.2.1.
  • V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller (2013) Playing atari with deep reinforcement learning. External Links: 1312.5602 Cited by: §2.2.1.
  • V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis (2015) Human-level control through deep reinforcement learning. Nature 518 (7540), pp. 529–533. External Links: ISSN 1476-4687 Cited by: §2.2.1.
  • B. Mohammadi and O. Pironneau (2004) Shape optimization in fluid mechanics. Annual Review of Fluid Mechanics 36 (1), pp. 255–279. External Links: ISSN 0066-4189 Cited by: §1.
  • R. Mukesh, K. Lingadurai, and U. Selvakumar (2014) Airfoil shape optimization using non-traditional optimization technique and its validation. Journal of King Saud University - Engineering Sciences 26 (2), pp. 191–197. External Links: ISSN 1018-3639 Cited by: §5.2.1.
  • Y. Naranjani, C. Hernández, F. Xiong, O. Schütze, and J. Sun (2017)

    A hybrid method of evolutionary algorithm and simple cell mapping for multi-objective optimization problems

    International Journal of Dynamics and Control 5 (3), pp. 570–582. External Links: ISSN 2195-2698 Cited by: §5.1.1, §5.1.3.
  • M. Nemec, D. W. Zingg, and T. H. Pulliam (2004) Multipoint and multi-objective aerodynamic shape optimization. AIAA Journal 42 (6), pp. 1057–1065. External Links: ISSN 0001-1452 Cited by: §5.2.3.
  • J. Park, A. Sutradhar, J. J. Shah, and G. H. Paulino (2018) Design of complex bone internal structure using topology optimization with perimeter control. Computers in Biology and Medicine 94, pp. 74–84. External Links: ISSN 0010-4825 Cited by: §1.
  • S. Percival, D. Hendrix, and F. Noblesse (2001) Hydrodynamic optimization of ship hull forms. Applied Ocean Research 23 (6), pp. 337–355. External Links: ISSN 0141-1187 Cited by: §1.
  • D. Peri, M. Rossetti, and E. F. Campana (2001) Design optimization of ship hulls via CFD techniques. Journal of Ship Research 45 (02), pp. 140–149. External Links: ISSN 0022-4502 Cited by: §1.
  • P. Puorger, D. Dessi, and F. Mastroddi (2007) Preliminary design of an amphibious aircraft by the multidisciplinary design optimization approach. In 48th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, pp. 1924. Cited by: §5.2.1.
  • S. Qin, S. Wang, L. Wang, C. Wang, G. Sun, and Y. Zhong (2021) Multi-objective optimization of cascade blade profile based on reinforcement learning. Applied Sciences 11 (1), pp. 106. External Links: ISSN 2076-3417 Cited by: §1.
  • J. Rabault, F. Ren, W. Zhang, H. Tang, and H. Xu (2020) Deep reinforcement learning in fluid mechanics: a promising method for both active flow control and shape optimization. Journal of Hydrodynamics 32 (2), pp. 234–246. External Links: ISSN 1878-0342 Cited by: §1, §2.2.1.
  • K. R. Ram, S. P. Lal, and M. R. Ahmed (2019) Design and optimization of airfoils and a 20 kW wind turbine using multi-objective genetic algorithm and HARP_Opt code. Renewable Energy 144, pp. 56–67. External Links: ISSN 0960-1481 Cited by: §5.2.1.
  • A. F. P. Ribeiro, A. M. Awruch, and H. M. Gomes (2012) An airfoil optimization technique for wind turbines. Applied Mathematical Modelling 36 (10), pp. 4898–4907. External Links: ISSN 0307-904X Cited by: §5.2.1.
  • M. Sacher, M. Durand, É. Berrini, F. Hauville, R. Duvigneau, O. Le Maître, and J. Astolfi (2018) Flexible hydrofoil optimization for the 35th America’s Cup with constrained EGO method. Ocean Engineering 157, pp. 62–72. External Links: ISSN 0029-8018 Cited by: §1.
  • M. Secanell, A. Suleman, and P. Gamboa (2006) Design of a morphing airfoil using aerodynamic shape optimization. AIAA Journal 44 (7), pp. 1550–1562. External Links: ISSN 0001-1452 Cited by: §1, §1.
  • J. Semmler, L. Pflug, M. Stingl, and G. Leugering (2015) Shape optimization in electromagnetic applications. Book Section In New Trends in Shape Optimization, A. Pratelli and G. Leugering (Eds.), pp. 251–269. External Links: ISBN 978-3-319-17563-8 Cited by: §1.
  • N. Srinivas and K. Deb (1994) Muiltiobjective optimization using nondominated sorting in genetic algorithms. Evolutionary Computation 2 (3), pp. 221–248. External Links: ISSN 1063-6560 Cited by: §1.
  • R. S. Sutton and A. G. Barto (2018) Reinforcement learning: An introduction. MIT Press. Cited by: §2.2.1.
  • C. J. Tan, C. P. Lim, and Y. N. Cheah (2013a) A modified micro genetic algorithm for undertaking multi-objective optimization problems. Journal of Intelligent & Fuzzy Systems 24, pp. 483–495. Cited by: §5.1.1, §5.1.3.
  • Y. Tan, Y. Jiao, H. Li, and X. Wang (2013b) MOEA/D+ uniform design: a new version of MOEA/D for optimization problems with many objectives. Computers & Operations Research 40 (6), pp. 1648–1660. External Links: ISSN 0305-0548 Cited by: §2.1.2.
  • D. Taylor and J. Dirks (2012) Shape optimization in exoskeletons and endoskeletons: a biomechanics analysis. Journal of The Royal Society Interface 9 (77), pp. 3480–3489. Cited by: §1.
  • K. Van Moffaert, M. M. Drugan, and A. Nowé (Conference Proceedings) Scalarized multi-objective reinforcement learning: Novel design techniques. Conference Proceedings In 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 191–199. External Links: ISBN 2325-1867 Cited by: §2.1.2.
  • S. Vasista, O. Mierheim, and M. Kintscher (2019) Morphing structures, applications of. Book Section In Encyclopedia of Continuum Mechanics, H. Altenbach and A. Öchsner (Eds.), pp. 1–13. External Links: ISBN 978-3-662-53605-6 Cited by: §1.
  • J. Viquerat, P. Meliga, and E. Hachem (2021a) A review on deep reinforcement learning for fluid mechanics: an update. External Links: 2107.12206 Cited by: §2.2.2.
  • J. Viquerat, J. Rabault, A. Kuhnle, H. Ghraieb, A. Larcher, and E. Hachem (2021b) Direct shape optimization through deep reinforcement learning. Journal of Computational Physics 428, pp. 110080. External Links: ISSN 0021-9991 Cited by: §1, §2.2.2.
  • K. Wang, S. Yu, Z. Wang, R. Feng, and T. Liu (2019) Adjoint-based airfoil optimization with adaptive isogeometric discontinuous Galerkin method. Computer Methods in Applied Mechanics and Engineering 344, pp. 602–625. External Links: ISSN 0045-7825 Cited by: §1.
  • W. Wang, Y. Li, M. K. Osman, S. Yuan, B. Zhang, and J. Liu (2020) Multi-condition optimization of cavitation performance on a double-suction centrifugal pump based on ANN and NSGA-II. Processes 8 (9), pp. 1124. External Links: ISSN 2227-9717 Cited by: §1.
  • W. Xudong, W. Z. Shen, W. J. Zhu, J. N. Sørensen, and C. Jin (2009) Shape optimization of wind turbine blades. Wind Energy 12 (8), pp. 781–803. External Links: ISSN 1095-4244 Cited by: §1.
  • X. Yan, J. Zhu, M. Kuang, and X. Wang (2019) Aerodynamic shape optimization using a novel optimizer based on machine learning techniques. Aerospace Science and Technology 86, pp. 826–835. External Links: ISSN 1270-9638 Cited by: §1.
  • S. Yun, Y. Ku, J. Rho, and D. Lee (2008) Application of function based design method to automobile aerodynamic shape optimization. Book Section In 12th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, Multidisciplinary Analysis Optimization Conferences. Cited by: §1.
  • Q. Zhang and H. Li (2007) MOEA/D: a multiobjective evolutionary algorithm based on decomposition. IEEE Transactions on Evolutionary Computation 11 (6), pp. 712–731. External Links: ISSN 1941-0026 Cited by: §2.1.2.
  • S. Zhang, H. Li, and A. A. Abbasi (2019a) Design methodology using characteristic parameters control for low Reynolds number airfoils. Aerospace Science and Technology 86, pp. 143–152. External Links: ISSN 1270-9638 Cited by: §5.2.1.
  • S. Zhang, H. Li, W. Jia, and D. Xi (2019b) Multi-objective optimization design for airfoils with high lift-to-drag ratio based on geometric feature control. IOP Conference Series: Earth and Environmental Science 227, pp. 032014. External Links: ISSN 1755-1315 Cited by: §5.2.1.
  • X. Zhang, F. Xie, T. Ji, Z. Zhu, and Y. Zheng (2021) Multi-fidelity deep neural network surrogate model for aerodynamic shape optimization. Computer Methods in Applied Mechanics and Engineering 373, pp. 113485. External Links: ISSN 0045-7825 Cited by: §1.
  • E. Zitzler (1999) Evolutionary algorithms for multiobjective optimization: Methods and applications. Vol. 63, Citeseer. Cited by: §4.3.


  • A multi-condition multi-objective optimization method is developed based on deep reinforcement learning.

  • A novel benchmark problem for multi-condition multi-objective optimization is introduced.

  • The developed method is shown to efficiently find high-resolution Pareto front over a condition space.

  • Learning the correlations between conditions and optimal solutions enables efficient optimization.

  • Critical degradation of target performance by optimization performed only at a specific condition is confirmed.