I Introduction
Modern penetration tasks against the air defense system of the target heavily relies on the coordinated attack of multiple missiles, however the rapid development of detection technologies and closein weapon system(CIWS) has reduced the chance of successful impact with a single conventional missile [jeon2010homing]
. In addition to increasing the difficulty of interception, the cooperative guidance strategy of multiple missiles is also crucial to the lethal effect of the final impact. Usually, the cooperative guidance of multiple missiles belongs to the phase of terminal guidance, where the accurate target information can be obtained with active radar systems or other detection devices. The existing cooperative guidance laws can be roughly divided into two categories. One is the analytical method to find closed form solution, which is mainly based on sliding mode control, optimal control and multiagent consensus theory. The other is the intelligent method which generally adopts heuristic intelligent optimization algorithm and reinforcement learning (RL) theory.
The analytical cooperative guidance method have been proved to be robust and efficient for practical application [ma2013guidance, xiong2018hyperbolic, liang2020range, he2021computational, huang2019deep, ratnoo2008impact]. Based on fundamental proportional navigation (PN), Jeon et. al developed a cooperative proportional navigation (CPN) where the onboard timetogo of the missile is used as the navigation gain [jeon2010homing]. It is a simple but effective approach for achieving time consensus. Ma developed a composite guidance law, which can be decomposed into the direction along the line of sight (LOS) and the direction perpendicular to LOS [ma2013guidance], corresponding to time and space cooperative respectively. Furthermore, time cooperative control is achieved under the guidance of a virtual leader in [chen2016impact], where undirected topology is adopted to establish communication relationships. Based on the optimal control approach, a variant of the hyperbolic tangent function is proposed in [xiong2018hyperbolic] to force early control of velocity and impact angle.
However, with the increasing demand for developing highprecision weapon systems, intelligent cooperative guidance method is increasingly regarded as a necessary auxiliary option. In recent years, the reinforcement learning theory has attracted much attention because of its ability to learn online based on environmental feedback [gaudet2020reinforcement, liang2019learning, hu2021application, kong2020maneuver]. According to the training structures, existing reinforcement learning algorithms for multiagent system can be roughly divided into four types, which are Fully decentralized training , decentralized execution; Fully centralized training, decentralized execution; Centralized training, centralized execution, and value decomposition methods. Some of these algorithms have achieved satisfactory results in coping with problems with low complexity and accuracy requirements. In [liang2020range], [he2021computational] and [Liangchen2021Metalearning], the stateoftheart reinforcement learning frameworks have demonstrated their effectiveness in the guidance task. Zhang et.al proposed a gradientdescentbased reinforcement learning method in the actorcritic framework and achieved consensus control for multiagent systems by following a tracking leader [zhang2016data]. But the two challenges of Nonstaionarity and Partial Observability [nguyen2020deep] will lead to saturated output or coordination loss of multiagent systems, which greatly reduces the accuracy of the value function. In addition, the use of value function in reinforcement learning is not suitable for continuous control tasks with large search space. Thus, these limitations of RL impedes the development of reinforcement learning in cooperative guidance.
It is an excellent way to solve the above problems by removing the value function of reinforcement learning and optimize in solution space with evolutionary strategy (ES), which is more robust and invariant to realtime rewards because it optimizes towards the objective function directly [brockhoff2010mirrored]. Moreover, as described in [salimans2017evolution], ES is tolerant of long horizontal and implicit solutions, which is exactly consistent with the need for cooperative guidance. The natural evolutionary strategy (NES) is the latest branch of ES, and shows good performance in solving highdimensional continuous multimodal optimization problems, by using the natural gradient information estimated according to the fitness expectation of the population [brockhoff2010mirrored, wierstra2014natural, salimans2017evolution]
. Similar algorithms named coevolutionary algorithm have been discussed in
[xu2017environment] and [qu2013improved], which focuses on solving multiobjective optimization problems by dividing the overall objective into subobjectives, such to optimize and evaluate together. Another idea is to evolve multiple populations for the same goal, and manually regulate constraints of each populations for faster convergence or fuller exploration [qu2013improved, yamanLimitedEvaluationCooperative2018]. As represented in [qu2013improved], the concept of coevolution refers to multithreads of training processes. Note that these methods do not use the natural gradient information as in NES, and the nonstationary issue discussed above is not considered.When optimizing in continuous parameter(solution) space, it is very important to apply adaptive technology. Nomura presented a learning rate adaption method based on the quality of gradients which was often not easy to estimate [nomura2022towards]. Instead, Fukushima leveraged the shifting distance of parameters to adapt the learning rate [fukushima2011proposal]. As shown in [wang2022instance], the size of population was adjusted depending on the novelty metric and quantity metric, which reflected the complexity of the dynamic environment. The estimation of distribution algorithm (EDA) was applied to continuous control by searching the optimal parameter distribution [larranaga2001estimation, karshenas2013regularized]. A variety of evolutionary methods were investigated with random walk strategies to solve the optimal missile guidance handover problem [wang2019gaussian]. Maheswaranathan proposed a surrogate gradient to reduce the evaluation costs [maheswaranathan2019guided]. These works reveal the enormous potential of searching in parameter space, rather than directly searching in parameter space.
Therefore, a NESbased coevolutionary algorithm naming as the natural coevolutionary strategy (NCES) is developed in this paper to distress the dilemma faced by RL in the cooperative guidance task. Considering of the advantages of searching in parameter space, the coevolutionary algorithm is improved in this work by rescaling the gradient information to reduce the estimation bias introduced by neighboring populations. As described in [del2019bio], most of today’s bioinspired algorithm innovations are based on experimental observation rather than meticulous theoretical support. Whereas in this work, we try to dig into the depths of complex optimization and provide a proof as sensible as possible through the presentation of graphs and deduction. Via integrating the NCES algorithm, a hybrid coevolutionary cooperative guidance law (HCCGL) is further developed to solve the challenging missile guidance problem. Extensive empirical results on various engagement scenarios verified the effectiveness of the proposed guidance law. The main contributions of this work are summarized as follows:

[]

To address the issues of nonstationarity and continuous control faced by cooperative guidance, a NCES algorithm is formulated and incorporated into a novel guidance law as an alternative to RL in the cooperative guidance task.

The rigorous constraints of time and space consensus in cooperative guidance are integrated and designed as the fitness function for each missile. A MLP based policy network is constructed and learned to optimize the fitness function.

The proposed HCCGL has advantages in achieving high precision for cooperative guidance tasks, even with dynamic target and random initial conditions.
The rest of the paper are organized as follows. The problem formulation is elaborated in sec:problemFormulation, and the proposed cooperative guidance law is discussed in sec:naturalCoevolutionaryStrategy. In sec:simulations, experiments under various configurations are implemented. Finally, conclusions are made in sec:conclusions.
Ii Problem Formulation
Iia Engagement geometry
The twodimensional engagement geometry between multiple missiles and one target is shown in fig:engagement1, where the inertial coordinate frame represents the horizontal plane. There are missiles in total. The index denotes the missile, and represents the target. , and represent the velocity, line of sight (LOS) angle, flightpath angle, and heading angle of the missile, respectively. and represent the lateral acceleration and the thrust acceleration to be designed for the missile, which are perpendicular to and align with the direction of , respectively. , and are the velocity, LOS angle, flightpath angle, and heading angle of the target, respectively. The lateral acceleration of target is denoted by .
The dynamic equations of the missile and the target are as follows:
(1) 
where, represents the relative range between the missile and the target. The timetogo of the missile refers to the time left from current time until the interception:
(2) 
IiB Communication Topology
The communication relationship of the multiple missiles is depicted by a topology, where a set of nodes representing the missiles. The communications are represented by a set of edges with an adjacency matrix , where if missile is able to communicate directly with missile , otherwise . is the set of neighboring missiles of the missile. In practical engineering, the communication topology is determined through comprehensive considerations of the communication cost and actual demand. In this work, the undirected topology shown in fig:topologies is adopted, where neighboring missiles can share information with each other.
IiC Observation
For the multimissile system, the complete observation information of the entire system is not available to each agent. Thus, the cooperative guidance problem is a partial observable markov decision process (POMDP) described by
(3) 
where, and represent the observation and action of the missile. is the observation of the missile at next time step.
The full state information of each missile consists of three components: personal features, target features, and error features, shown in Table I. and represent the positions of the missile and the target in twodimensional coordinates. The target features are estimated or detected through onboard equipment, and the estimation error is assumed to be negligible compared with the required guidance precision. The acquisition of accurate location information requires the support of powerful global positioning systems, here we only need relative error information. is the consensus error of time of the missile :
(4) 
The consensus error of LOS angle of the missile is defined as:
(5) 
where, is the LOS angle error of the missile , and is the desired impact angle of missile :
(6) 
where, is the desired relative impact angle between two missiles, and is the nominal desired impact angle of the first missile which is determined online. To increase the flexibility and autonomy of the intelligent missile system, the desired can be adjusted adaptably instead of a fixed value.
Features  Symbols 
Personal Features  
Target Features  
Error Features  
IiD Fitness evaluation
The reward of each missile at one evaluation step consists of terminal reward and flight reward. The objective of the cooperative guidance task is to minimize the error , , and . Then, the terminal reward is defined as:
(7) 
where, , , , are constant coefficients. is the step function defined as
(8) 
Thus, the terminal reward only reflects the results at the terminal step, and if and only if and . The flight reward is defined as:
(9) 
where, , , and are positive constant coefficients. It can be inferred that is always true. if and only if and . Then, the fitness function of missile for the cooperative guidance task can be defined as
(10) 
Thus, the objective of the cooperative guidance task can be achieved by maximizing the fitness function for each missile.
IiE Design of the cooperative guidance law
Based on the requirements of cooperative guidance task, the guidance law proposed in this paper includes two parts: tracking control part and consensus control part. The tracking control part is obtained by proportional navigation guidance(PNG) :
(11) 
where, is the navigation constant. Note that the tracking control part only designs the lateral acceleration.
The consensus control part is modeled by a neural network expressed as
(12)  
where , , and denote the weight matrices of the output layer. and are the outputs of the first and second hidden layers. and
are the number of neurons in each layer.
is the bounded activation function
with , and is the common activation function . The input states is chosen as:(13) 
Thus, the guidance law of the missile is presented as:
(14) 
where, is the guidance gain trading off the tracking control part and the consensus control part.
Iii Natural Coevolutionary Strategy
Iiia Natural evolutionary Strategy in multiagent POMDP
In the evolutionary strategy, individual agent (or its policy) is expressed as a population, the group of populations and the environment constitute the ecosystem. The objective is to develop the optimal strategy for the group of populations to maximize the fitnesses of the ecosystem. For cooperative tasks, the optimal strategy of the ecosystem will be exactly the optimal policy for each population.
(15) 
where, which is defined in eqn:composedcommand represents the policy of the th population and is the joint matrix of individual optimal policy. is its corresponding fitness function and is the joint policy fitness function, more details can be viewed in [sonQTRANLearningFactorize2019]. However, the inverse is not true:
(16) 
This is because the optimal fitness obtained by one population may be based on the suboptimal fitness obtained by other populations. When the other populations evolve, the previous optima is easy to be broken. To overcome this nonstationary issue, it is best for all populations to evolve simultaneously, that is the coevolution. Each generation updates its parameter at the same time, instead of updating sequentially, mapping in slight variance in fitness values.
IiiB Optimization in coevolutionary parameter space
The gradient information is obtained by measuring the contribution of each sample. The parameters of the population are defined as , and represents that of the next generation. is the distribution function of under , where is the intrinsic parameter. Then the expectation fitness of the next generation is expressed as:
(17) 
The derivative of Eq. (17) with respect to is
(18) 
If we represent as , then we have the similar equation
(19) 
In an ecosystem with multiple populations, populations will interact and affect the evolutionary process. Thus, the fitness function of the th population is represented by , where represents the parameter set of the th population and its neighboring populations. The expected joint fitness of the next generation is expressed as:
(20) 
where,
is the joint probability distribution of the next generation over
. Assume that and are sampled independently, we have .The gradient of the joint fitness with respect to is expressed as
(21)  
Note that it has the same format as the version of the single population, it seems to be fine if we just keep the original equation. The influence of
is counteracted through the calculation of its expectation. However, it is known that the expectation of the joint distribution is approximated through sampling with a limited size. Although individuals are sampled without bias(unbiased estimation), there exists intrinsic bias for inadequate sampling, and the bias will grow linearly with increment of distribution dimensionality. So it can be a serious issue when taking the expectation of all neighboring parameters, and the sample size stays relative small.
In fact, it is not necessary to take account of all parameters, since only the expectation of is actually needed. To alleviate the incremental bias, we propose to approximate only the expectation of the parameter of the current population and ignore its neighbor parameters, which is
(22) 
Though is available for independent distribution, it is infeasible to obtain , since all agents are sampled and evaluated together. However the expectation of individual fitness can be approximated by the multiplication between the original fitness and its confidence. The rectified expectation is expressed as
(23) 
where, is the confidence, and represent the samples that appear along with . In this way, the bias of estimating the expectation of the neighboring distributions is addressed. The gradient after modification is
(24) 
The core idea is that although the individual fitness does not exist, the expectation of the individual fitness does, and is invariant to the parameter distributions of its neighboring agents, so the expectation of the individual agent’s fitness should be calculated instead of including the expectation of neighboring agents. Let’s denote the expectation of the objective function over , which is , by and the expectation of the objective function over by , such that
(25) 
To visualize the sampling estimation process, we use a variant of eggholder as the objective function for demonstration, which is defined in eqn:eggholder, since the real objective is too expensive to obtain.
(26)  
Assume there exists one neighboring population for
with the size of 400, the sampled individuals are shown in fig:estgrad3d, following a bivariate normal distribution, and the parameter spaces are confined to
. Since and are sampled independently, the individuals can be considered to be sampled from only, which is represented by the sample points in fig:estgrad2d. In this objective graph with singledimensional parameter space, the real objective curve expressed in solid line is obtained by eqn:estoversingleparameter. In order to standardize the scope, all the sampled data including the real objective values are uniformly scaled to the range [0,1], and such standardization does not affect the directionality of the estimated gradient.The original objective value for each sample varies as the corresponding
changes, which introduces additional estimation bias. As shown by the blue dots in fig:estgrad2d, the distribution of the objective values before rescale is significantly different from the distribution of the true objective values. From fig:estgrad3d, it can be seen that as sample points deviate from the distribution center, their probability of being sampled also decreases, which means that the accuracy or confidence of the fitness of each sample
decreases with the decrease of . If the original objective is rescaled by its confidence , which is the probability of the appearance of the given the existence of , the reconstructed objective values represented by the green square dot in fig:estgrad2d is closer to the real , which obviously reduce the estimation bias.The above proof indicates that in the case of a limited population size and large number of neighboring populations, applying the rescaled gradient will keep the approximation bias to the level of single population, resulting in more accurate estimation of gradient information, empirical results also supported this conclusion. However, when the population size is large enough (e.g., thousands), this approach may not result in additional accuracy improvements.
The modified expression is also desirable for parallel computing, as only the perturbation of the neighboring populations are needed, which can be easily obtained through communication among processes, and the probabilities can be calculated in a distributed approach.
IiiC Elitist adaptation Techniques
The performance of NES is sensitive to hyperparameters, and learning rate is usually the most critical hyperparameter of NES. Thus, an elitist adaptation method for the learning rate is applied in this paper. First, a list of learning rates are linearly selected in the neighborhood of the original learning rate as:
(27) 
where, . The and are the minimum and maximum value of . is the size of perturbations which is clipped by . To evaluate the quality of the candidate learning rates, the evaluation function is defined:
(28) 
where is the th sampled learning rate of the candidate list. The gradient is kept after evaluation. Therefore, by comparing the candidate learning rates with the original one, the next update can be better than the previous one. Considering of peer pressure, each missile is assigned with the same learning rate. The learning rate of the next generation is obtained by
(29) 
A similar approach is employed to obtain the optimal during the training process.
(30) 
where, is uniformly sampled from the region . is the fitness function of sampled LOS angle that is defined as
(31) 
where is the joint initial individual parameters. In this way, the desired impact angles are established automatically.
A rankbased fitness shaping method that is in the same spirit as the one proposed in [wierstra2014natural] is employed in shaping the raw fitness. Conventionally, we still let denote the fitness function after shaping. Another technique called mirrored sampling [brockhoff2010mirrored] is also applied for sampling parameter perturbations.
Iv Hybrid coevolutionary cooperative guidance algorithm
To achieve coordinated attack, the natural coevolutionary strategy is applied to optimize the parameter matrices of the neural network controller.
The univariate Gaussian distribution with zero means and standard deviation
is used to sample perturbations. According to eqn:gradient5, it can be obtained that: g_θ_i = E_ϵ_i∼N(0, σ^2){▽_θ_i logp(θ_i’)F_i(ς_i’)∏_c∈N_ip(θ_c’)}= 1mσ2∑_i=1^mF_i(ς_i’)ϵ_i ∏_c∈N_ip(ϵ_c).
The complete implementation algorithm of the proposed guidance law is shown in Algorithm 1. The conceptual diagram in fig:HCCGL figuratively revealed the parallel simulation process. A masterslave (or fullydistributed) model [gong2015distributed][mendiburu2005parallel] is used for the large scale parallel computation. In this case, each population is evaluated in a separate process and the results of the ecosystem are aggregated to calculate the rescaled gradient eqn:thisgradient and sent to produce guided generations.
V Simulations and analysis
To verify the validity of the proposed method, a variety of simulations based on the cooperative guidance framework are designed. Both cases with stationary target and maneuvering target are simulated. Further, comparison experiments are performed to fully demonstrate the superiority of the proposed guidance method.
Va Paremeter setup
The acceleration constraint and velocity constraint of the missiles are listed in tab:ExpSetup. The hyperparameters of the algorithm are listed in tab:HyperParameters.
Now that frameskip has been extensively employed in continuous control problems[salimans2017evolution]. In this work this parameter of frameskip is set to 12 for case 1 and case 2, and 40 for case 3. Appropriate adjustment of this parameter will facilitate the training process without affecting the final results.
Parameter  Value 
maximum lateral overload (g) ,  50 
maximum trust overload (g),  5 
Upper bound of velocity (m/s),  900 
Lower bound of velocity (m/s),  350 
Parameter  Value 
simulation step (ms),  5 
guidance gain,  0.3 
Initial learning rate,  0.015 
standard deviation for sampling population,  0.2 
size of learning rate adaptation,  20 
size of population, m  140 
adaptation cycle,  50 
navigation constant,  4 
1  
0.2  
10  
1  
4000  
2000  
10  
2 
VB Case 1: Comparison Experiments
In this section, the proposed guidance law is compared with the time and space cooperative guidance law (TASCGL) proposed in [lyu2019multiple], which considers the space and time cooperative guidance under the distributed communication topology. However, different from the method proposed in this work, the compared method is susceptible and brittle to the initial conditions. The initial conditions of the compared work are adopted in this work as shown in tab:InitialConditionofCase1. Four missiles are engaged in the cooperative scenario with different desired relative impact angles as , and , for each respectively. The target is located at (9500, 9000)m.
Missile  Position (m)  Flightpath  Velocity 
Angle ()  (m/s)  
(1900, 17000)  25  700  
(1500, 13000)  0  650  
(1400, 4000)  5  700  
(3000, 1300)  10  680 
fig:trajectoryCase1 shows the trajectories of the two guidance laws. As is depicted in the figure, the trajectory of TASCGL is twisted at the initial stage, as the missiles try to consensus their LOS angles and velocities. In comparison, the trajectory of the proposed HCCGL shows better damping performance, without sharp turn and oscillation.
It can be seen from tab:ResultForCase1 that the ZeroEffort Miss (ZEM) and the consensus angle error for both guidance laws have achieved competitive final accuracy. However the consensus time error of TASCGL reached up to 5 seconds at maximum compared with less than 0.1 seconds acquired by the proposed method. Further analysis of the velocity curve shows that in the case of TASCGL, the velocities are prohibited from reaching their ideal values due to the velocity boundary, which is not considered in its design, thus leading to desynchronization in impact time. The profiles of the two methods are shown in fig:consensusAngleError and fig:timeToGo, it can be observed that the flight time of all missiles under HCCGL trends to be identical. For HCCGL, the decomposition of acceleration commands is shown in fig:decoCase1. The left figure shows the decomposition of lateral accelerations, in which the solid line represents the command from tracking controller while the dashed line represent the command from the consensus controller before weighting. Since the tracking part is derived from proportional navigation, the vertical acceleration shown in the right one is completely derived from the consensus controller. The two part of accelerations have similar trends but do not coincide, demonstrating the effectiveness of the consensus controller, which is trained with the improved coevolutionary strategy.
The result reveals that the proposed guidance law outperforms the compared method with higher precision in consensus performance and smoother trajectories. Also, as the traditional guidance law is usually constrained to boundary conditions and missile’s superb maneuverability, the proposed guidance law is more resilient to limited conditions and more intelligent to aware of the timevarying states of missiles of collaboration.
Algorithm  Index  
TASCGL  (s)  5.54  3.23  4.02  4.75 
()  9.83E3  1.58E3  1.80E1  1.91E1  
ZEM(m)  2.24E7  5.06E7  7.00E2  3.00E4  
HCCGL  (s)  1.00E2  1.00E2  5.00E2  5.00E2 
()  1.79E2  9.67E2  9.20E2  4.69E3  
ZEM(m)  4.19E5  7.66E6  3.11E4  4.09E5 
VC Case 2: Nonstationary target
In this part, an engagement scenario with a nonstationary target is designed and simulated to verify the effectiveness of the proposed method against unknown dynamic target. The target is maneuvering with lateral acceleration with its velocity fixed at , and its initial flightpath angle . Other initial conditions are the same with case 1. Simulation trajectory and the result can be seen in fig:trajCase2 and tab:ResultForCase2.
From tab:ResultForCase2 we can see that the consensus angle error is within one degree, which is sufficient for accuracy requirement, and salvo attack is achieved with negligible consensus time error. The result demonstrates the effectiveness of the proposed guidance method intercepting dynamic target. As far as the author knows, it is the first time achieving cooperative guidance against nonstationary target with intelligent control, which shows its extraordinary robustness against disturbance from nonstationary objectives.
Index  
()  3.43E1  6.53E2  1.23E1  1.49E1 
(s)  6.00E2  1.85E2  2.10E1  4.55E1 
ZEM(m)  1.83E3  6.38E3  2.49E2  9.28E1 
VD Case 3: MonteCarlo simulation
MonteCarlo simulation has been extensively employed to examine the robustness of an algorithm under varying initial conditions, thus it is applied in this section. In the existing literature, target is usually regarded as stationary as interception of a stationary target is more exclusive of unpredictable disturbance. In this case, five missiles are engaged, and each missile’s position are randomly sampled from a uniform distribution, which is denoted by
. Specific, for the missile, the xcoordinate of its position is and the ycoordinate is , which makes the missiles arranged in an orderly manner. The initial flightpath angles of all missiles are set to , with identical velocities of 600m/s and the same desired relative impact angles of . Additionally, the target’s position is (10000m, 9000m).Simulations with randomly sampled conditions are conducted 200 episodes. The diverse trajectories are depicted in fig:montecarloCase3, and the statistical result after taking the absolute value is shown in tab:ResultForCase3. From the result, we can see that the mean errors of impact angles are within , and the consensus error of impact time holds within most of the time. Result shows that for any initial state with limited error, the proposed scheme can always find the relative optimal solution.
Index  

Mean  4.50E1  8.20E1  7.10E1  2.20E1  9.30E1  
Max  1.85E0  3.37E0  1.96E0  6.10E1  2.37E0  
Min  4.56E3  6.40E3  7.01E4  4.58E4  6.46E3  

Mean  6.10E1  5.50E1  5.30E1  4.50E1  5.50E1  
Max  1.78E0  1.57E0  1.63E0  1.54E0  1.44E0  
Min  1.50E2  1.00E2  1.78E15  5.00E3  1.78E15  

Mean  5.85E3  5.89E3  3.74E4  9.05E4  9.93E4  
Max  1.03E2  1.07E2  7.77E4  2.53E3  3.39E3  
Min  2.18E3  2.04E3  6.42E5  1.68E5  2.72E6 
VE Optimization process analysis
fig:meanfitsAll shows the learning curves in the three cases. The mean fitness in case 1 keeps moving upper and merge together at final phase. From the curve of case 2, we can see that two of the missiles get ahead about 1000 scores, but finally back to meet with the other missiles. Similar phenomenon also appears in case 3. It can be inferred that the policies automatically evolved to equilibrium state, and one reason is that the rescaled gradient prohibited the everincreasing gap between individual groups, which is crucial for mutual improvement. If one group get ahead too much, then the other groups may never chase up due to the interrelationship, which is to say that the improvement of the poorer performed group is prohibited when more significant drops on the better performed ones will occur. fig:lrAll presents the adaptation profiles of learning rates applying the aforementioned technique. For case 1 and case 2, the learning rates start from high values and gradually converge to the minimal value, which corresponds with the quality of estimated gradients. However, due to the random initial conditions in case 3, the learning rates will not settle easily. Extensive empirical result shows that without the learning rate adaptation, the fitness profiles will jitter in the end instead of converging to satisfactory ranges (regardless of the types of optimizer). Note that it is pretty common when training neural networks and may presumably have been caused by overfitting, according to related research in the field. Employing the simple adaptation technique contribute to distress this deficiency.
Vi Conclusions
In this paper, an improved coevolutionary strategy NCES has been developed to solve the nonstationarity issue in multiagent dynamic environments. The hybrid coevolutionary cooperative guidance law(HCCGL) has been proposed integrating with the improved strategy, and the neural network has been used to construct the consensus controller. To fully demonstrate its effectiveness in synchronizing impact time and angles, three experiments under different conditions have been carried out. Experiment on maneuvering target has been proved effective with satisfactory precision. The proposed method is shown to be robust and can be well scaled to solve cooperative guidance problem for the multiagent system, which is the first time an intelligent cooperative guidance law is applied to intercept a nonstationary target with time and angle constraints in the existing studies.
The proposed algorithm combines traditional control theories with intelligent algorithms, revealing the enormous potential in this field. It is always meaningful to explore the limits of modern control tasks. Despite the satisfactory results that have been acquired, this work still left space to be improved. Future works may include exploring the effectiveness of incremental guidance gain, or control strategies that tackle actuation failure and system uncertainty.