Reinforcement learning-based adaptive spiral-diving Manoeuver guidance method for reentry vehicles subject to unknown disturbances

T. Wu; Z. Wang

doi:10.1017/aer.2024.17

Reinforcement learning-based adaptive spiral-diving Manoeuver guidance method for reentry vehicles subject to unknown disturbances

Published online by Cambridge University Press: 19 March 2024

T. Wu

and

Z. Wang

Show author details

T. Wu: Affiliation:
Research Center for Unmanned System Strategy Development, Northwestern Polytechnical University, Xi’an 710072, Shaanxi, China
Z. Wang*: Affiliation:
Research Center for Unmanned System Strategy Development, Northwestern Polytechnical University, Xi’an 710072, Shaanxi, China Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an 710072, Shaanxi, China National Key Laboratory of Aerospace Flight Dynamics, Northwestern Polytechnical University, Xi’an 710072, Shaanxi, China
*: Corresponding author: Z. Wang; Email: [email protected]

Article contents

Abstract
Nomenclature
Introduction
Problem formulation and preminimaries
Main results
Simulations
Conclusion
Footnotes
References

Rights & Permissions

Abstract

This paper proposed a reinforcement learning-based adaptive guidance method for a class of spiral-diving manoeuver guidance problems of reentry vehicles subject to unknown disturbances. First, the desired proportional navigation guidance law is designed for the vehicle based on the initial conditions, terminal constraints and the curve involute principle. Then, the first-order multivariable nonlinear guidance command tracking model considering unknown disturbances is established. And the controller design problem caused by the coupling of control variables is overcome by introducing the coordinate transformation technique. Moreover, the actor-critic networks and corresponding adaptive weight update laws are designed to cope with unknown disturbances. With the help of Lyapunov direct method, the stability of the system is proved. Subsequently, the range values of the guidance parameters are analysed. Finally, the validity as well as superiority of the proposed method are verified by numerical simulations.

Keywords

Actor-critic design spiral-diving manoeuver adaptive guidance reentry vehicles anti-disturbance

Type: Research Article
Information: The Aeronautical Journal , Volume 128 , Issue 1328 , October 2024 , pp. 2218 - 2234

DOI: https://doi.org/10.1017/aer.2024.17 [Opens in a new window]
Copyright: © The Author(s), 2024. Published by Cambridge University Press on behalf of Royal Aeronautical Society

Nomenclature

RBF: radial basis function
RL: reinforcement learning
VST: virtual sliding target

1.0 Introduction

Reentry vehicles are a class of unpowered aircraft capable of flying in near space, with special large lift-to-drag ratio aerodynamic profile, high manoeuverability and strong penetration ability. Different from traditional inertial vehicles, reentry vehicles rely on aerodynamic control and can achieve large lateral manoeuvering flight, which can significantly improve survivability [Reference Bairstow1, Reference Zhang Mingang2]. However, where there is a spear, there is a shield. Reconnaissance surveillance and interception technologies for reentry vehicles have also made great progress throughout the world [Reference Rahmani-Nejad3, Reference Liu, Yan, Liu, Dai, Yan and Xin4]. The interference and interception measures faced by the vehicles in the dive flight phase are more diverse than those in the midcourse and reentry phases, and the probability of being successfully intercepted will be greatly increased [Reference Yu5]. Therefore, it is a very interesting research topic to provide a manoeuver for reentry vehicles in the dive phase with high penetration ability and strong anti-disturbance ability [Reference Dwivedi, Bhale, Bhattacharyya and Padhi6, Reference He, Yan and Tang7]. When the reentry vehicle does not have any prior information about the interceptor weapon, it is essential to choose a manoeuver mode that is difficult to predict and has high penetration probability. The typical ones are snake manoeuver, roller manoeuver and spiral manoeuver [Reference Li, Zhang and Tang8]. Among them, the spiral manoeuver has become a promising research direction in recent years because of the advantages of large manoeuver range, time-varying manoeuver frequency and unpredictable trajectory [Reference Rusnak and Peled-Eitan9].

The main guidance techniques used for the dive phase are optimal guidance law, predictor-corrector guidance law, sliding mode variable structure guidance law and proportional navigation guidance law [Reference Yibo, Xiaokui, Guangshan and Jiashun10, Reference Zhang and Wang11]. An optimal guidance law design method based on block pulse function is proposed by Dou et al. [Reference Dou and Dou12]. It is able to optimise the landing angle, miss distance and control energy consumption simultaneously. Wang et al. [Reference Qing13] proposes an energy-based predictor-corrector guidance algorithm to design longitudinal and lateral guidance laws, respectively. Li and Qian [Reference Li and Qian14] design a three-dimensional guidance law considering target manoeuver, impact angle constraint and input saturation by using integral sliding mode control and adaptive control. Dhananjay and Ghose [Reference Dhananjay and Ghose15] develop a proportional navigation guidance law incorporating a time-to-go estimation algorithm to strike stationary targets. Among many guidance methods, the proportional navigation guidance has been widely studied in the field of guidance because of its advantages of simplicity, high efficiency and small miss distance. Nevertheless, the proportional navigation guidance is not competent for flight missions that need to perform specific manoeuvering modes. For this reason, some studies have introduced the concept of virtual sliding target (VST) to extend the application scope of the proportional navigation guidance. Intuitively, it is the mapping of the real flight trajectory of the vehicle to the trajectory of the virtual sliding target. And the end point of the virtual sliding target overlaps with the real target point. As long as the vehicle flies in the direction pointing to the virtual sliding target, it will eventually hit the real target. In the Mozaffari et al. [Reference Mozaffari, Safarinejadian and Binazadeh16] and Raju and Ghose [Reference Raju and Ghose17], specific manoeuvers of the vehicle are achieved by controlling the speed and direction of the virtual sliding target. In this case, the parameters of the virtual sliding target are selected empirically. By introducing a virtual sliding target, Hu et al. [Reference Hu, Han and Xin18] designs a two-stage guidance method combining non-singular terminal sliding mode guidance law and proportional navigation guidance law, which can satisfy both impact angle and time constraints. For the stationary targets, an adaptive proportional guidance navigation law based on the virtual sliding target is proposed by He and Yan [Reference He and Yan19] to guide the vehicle to complete the spiral dive manoeuver. And the scope of application of this method is extended to low-speed moving targets in He et al. [Reference He, Yan and Tang20]. Despite the very large contributions made by the mentioned works, none of them consider the effect of unknown disturbances on the guidance effectiveness.

Fortunately, many nonlinear control methods are used to deal with the adverse effects of unknown disturbances on the controlled system. Some of these are inherently robust to unknown disturbances, while others combine various types of disturbance observers [Reference Ning, Liu, Wang and Luo21, Reference Yin, Wang, Xiong, Xiang, Liu, Fan and Xue22]. Sliding mode control is frequently used to design controllers for vehicles because of its insensitivity to matched disturbances [Reference Mao, Dou, Yang, Tian and Zong23]. Shen et al. [Reference Shen, Xia, Zhang and Cui24] proposes a continuous adaptive super-twisting sliding mode tracking control method, which combines a conventional super-twisting sliding mode controller with an adaptive gain technique to overcome bounded disturbances. By exploiting the constraint handling capability and enhanced anti-disturbance capability of model predictive control, Chai et al. [Reference Chai, Tsourdos, Gao, Chai and Xia25] presents a robust model predictive attitude control algorithm. The nonlinear feedback law is designed, and the system constraints are tightened to ensure that robust constraints are satisfied for all allowed uncertainties. A resilient attitude control method for spacecraft is proposed by Cao and Xiao [Reference Cao and Xiao26], which utilises a nonlinear disturbance observer to compensate for unknown disturbances. Xiang et al. [Reference Xiang, Yanli, Peng and Haibing27] proposes an adaptive backstepping attitude control method for hypersonic vehicles with which a nonlinear disturbance observer is also used to estimate unknown disturbances. In Refs (Reference Cheng, Wang and Gong28–Reference Zhang, Chen, Fu and Huang30), a single group or two groups of neural networks are utilised to fit the unmodeled dynamics and external disturbances of hypersonic vehicles. In addition, reinforcement learning (RL) is also introduced into the design of the controller. Ouyang et al. [Reference Ouyang, Dong, Wei and Sun31] introduces actor-critic design into the tracking control problem of elastic joint robots to fit the system uncertainties. Shi et al. [Reference Shi, Wang and Cheng32] proposes a robust adaptive safety control framework for hypersonic vehicles based on reinforcement learning, in which the actor-critic networks are used to approximate the optimal controller. Wang et al. [Reference Wang and Liu33] develops a reinforcement learning-based adaptive tracking control method for a class of semi-Markovian non-Lipschitz uncertain systems, in which actor-critic networks are used to handle unmatched disturbances. The actor-critic networks in reinforcement learning not only inherit the good nonlinear processing ability of neural networks, but also introduce the error-related cost function. Therefore, they have better performance than neural networks in theory.

Combined with the previous discussion, this paper further studies the problem of spiral-diving manoeuver guidance for reentry vehicles considering unknown disturbances based on the results achieved in He and Yan [Reference He and Yan19] and He et al. [Reference He, Yan and Tang20]. The main contributions of this paper are as follows:

Compared with He and Yan [Reference He and Yan19] and He et al. [Reference He, Yan and Tang20], this paper considers the adverse effects of unknown disturbances on the spiral-diving manoeuver of reentry vehicles. Specifically, the guidance command tracking control problem model considering unknown disturbances is established. This model is abstracted as a first-order coupled multivariable nonlinear system, which is more relevant to engineering practice.
The coordinate transformation technique is employed to overcome the controller design challenge caused by the coupling of control variables. Combined with the recursive design technique, the first-order time derivative of the control variables is finally obtained, and the control variables can be obtained by integrating it.
By designing the actor-critic networks and the corresponding adaptive weight update law, the unknown disturbances are compensated with high accuracy. Furthermore, by using the Lyapunov method, it is proved that the tracking errors are uniformly ultimately bounded. As a result, the assumption that the actual guidance commands are equivalent to the desired guidance commands made in the convergence analysis of the guidance parameters in He and Yan [Reference He and Yan19] and He et al. [Reference He, Yan and Tang20] is verified in this paper.

2.0 Problem formulation and preminimaries

2.1 Problem statement

The centroid dynamic model of unpowered reentry vehicle subjected to unknown disturbances is shown as follows:

(1)

\begin{align}\left\{ {\begin{array}{l}{\dot x = V{\rm{cos}}\theta {\rm{cos}}{\psi _v}}\\[3pt]{\dot y = V{\rm{sin}}\theta }\\[3pt]{\dot z = - V{\rm{cos}}\theta {\rm{sin}}{\psi _v}}\\[3pt]{\dot V = - \dfrac{X}{m} - g{\rm{sin}}\theta }\\[11pt]{\dot \theta = \dfrac{{Y{\rm{cos}}{\gamma _v}}}{{mV}} - \dfrac{g}{V}{\rm{cos}}\theta + {d_1}\left( t \right)}\\[11pt]{{{\dot \psi }_v} = - \dfrac{{Y{\rm{sin}}{\gamma _v}}}{{mV{\rm{cos}}\theta }} + {d_2}\left( t \right)}\end{array}} \right.\end{align}

where $x,y,z$ are the positions of the vehicle in the inertial frame. $V$ is the velocity. $\theta $ represents the angle between the velocity vector and the horizontal plane, i.e., the path angle. When the velocity vector is above the horizontal plane, the $\theta $ is positive. ${\psi _v}$ represents the angle between the projection of the velocity vector in the horizontal plane and the $x$ axis, i.e., the deflection angle, measured counterclockwise in the horizontal plane. $\alpha $ and ${\gamma _v}$ are the angle-of-attack and back angle. $m$ and $g$ are mass and gravitational acceleration. ${d_1}\left( t \right)$ and ${d_2}\left( t \right)$ are the unknown disturbances to the vehicle. $X$ and $Y$ are the drag and lift forces, and the expression of $Y$ is

(2)

\begin{align}Y = \left( {C_y^0 + C_y^\alpha \alpha } \right)q{S_{ref}}\end{align}

where $C_y^0$ and $C_y^\alpha $ are the lift coefficients. ${S_{ref}}$ is the reference area. $q$ is the dynamic pressure, and its expression is

(3)

\begin{align}q = \frac{1}{2}\rho {V^2}\end{align}

where $\rho $ is the atmospheric density.

Define $\xi = {\left[ {\theta, {\psi _v}} \right]^T}$ as the state variables. Then, when the reentry vehicle performs a spiral-diving manoeuver, the desired guidance commands can be expressed as ${\xi _d} = {\left[ {{\theta _d},{\psi _{vd}}} \right]^T}$ . Up to now, the problem of spiral-diving guidance for reentry vehicle subject to unknown disturbances can be essentially translated into the problem of tracking the desired guidance command. And this problem can be organized as follows:

(4)

\begin{align}\left\{ {\begin{array}{c}{\dot \xi = {f_1}\left( \xi \right) + {g_1}\left( {\xi, u} \right) + d}\\[5pt]{{\chi _o} = \xi }\end{array}} \right.\end{align}

where $u = {\left[ {\alpha, {\gamma _v}} \right]^T}$ are the control variables. ${\chi _o}$ is the output vector. $d = {\left[ {{d_1},{d_2}} \right]^T}$ . ${f_1}$ and ${g_1}$ are smooth nonlinear functions that can be expressed as

(5)

\begin{align}{f_1}\left( \xi \right) = \left[ {\begin{array}{c}{ - \dfrac{g}{V}{\rm{cos}}\theta }\\[10pt]0\end{array}} \right]\end{align}

(6)

\begin{align}{g_1}\left( {\xi, u} \right) = \left[ {\begin{array}{c}{\dfrac{{\left( {C_y^0 + C_y^\alpha \alpha } \right)q{S_{ref}}{\rm{cos}}{\gamma _v}}}{{mV}}}\\[10pt]{ - \dfrac{{\left( {C_y^0 + C_y^\alpha \alpha } \right)q{S_{ref}}{\rm{sin}}{\gamma _v}}}{{mV{\rm{cos}}\theta }}}\end{array}} \right]\end{align}

The tracking error can be expressed as

(7)

\begin{align}{e_1} = \xi - {\xi _d}\end{align}

Assumption 1. The system represented by Equation 4 is controllable, which satisfies

(8)

\begin{align}\left| {\frac{{\partial {g_1}\left( {\xi, u} \right)}}{{\partial u}}} \right| \ne 0\end{align}

Lemma 1. [Reference He and Dong34] For a Lyapunov function $L\left( t \right)$ , if its initial value $L\left( 0 \right)$ is bounded and its time derivative satisfies

(9)

\begin{align}\dot L\left( t \right) \le - \kappa L\left( t \right) + \delta \end{align}

where $\kappa \gt 0$ and $\delta \gt 0$ are constants, then $L\left( t \right)$ is bounded.

Lemma 2. [Reference Xing-Kai35] For vectors $A \in {\mathbb{R}^n}$ and $B \in {\mathbb{R}^n}$ , there always holds

(10)

\begin{align}2{A^T}B \le \| A \|^2 + \| B \|^2\end{align}

(11)

\begin{align}\|AB\| \le \| A \| \cdot \| B \|\end{align}

Remark 1. As shown in Equation 1, the impact of unknown disturbances are considered in this paper when studying the spiral-diving guidance problem. This work is missing in He and Yan [Reference He and Yan19] and He et al. [Reference He, Yan and Tang20]. For this reason, this paper is complemented by the design of the desired guidance command tracking system as shown in Equation 4. Further, the objective of this paper is to design a reinforcement learning based adaptive controller for system 4 such that $\|{e_1}\| \le {\tilde e_1}$ as $t \to \infty $ , where ${\tilde e_1}$ is a sufficiently small positive constant.

2.2 Spiral trajectory parameters solving

The logarithmic spiral trajectory can be defined as

(12)

\begin{align}{r_s} = r\left( \vartheta \right) = {r_0}{e^{\vartheta {\rm{cot\Lambda }}}}\end{align}

where ${r_0}$ is the initial polar diameter. $\vartheta $ is the polar angle. ${\rm{\Lambda }}$ is the angle between the component of the vehicle velocity vector in yaw plane and the polar diameter. It can be found from Equation 12 that once the values of ${r_0}$ and ${\rm{\Lambda }}$ are acquired, the shape of the spiral trajectory can be determined uniquely.

Figure 3 in He et al. [Reference He, Yan and Tang20] shows the geometric representation of the spiral trajectory in the yaw plane. Where ${M_0}$ and ${M_s}$ are the initial position and the current desired position of the vehicle. $T$ is the target position. $p{x_p}{z_p}$ denotes the polar frame, the pole $p$ coincides with the rotation centre of the spiral trajectory, the polar axis $p{z_p}$ points to the ${M_0}$ , and the polar axis $p{x_p}$ is perpendicular to $p{z_p}$ . ${r_{s0}}$ , ${r_s}$ and ${r_{s1}}$ are the polar diameters at ${M_0}$ , ${M_s}$ and $T$ , and the corresponding polar angles and deflection angles are ${\vartheta _0}$ , $\vartheta $ , ${\vartheta _1}$ and ${\psi _{vs0}}$ , ${\psi _{vs}}$ , ${\psi _{vs1}}$ , respectively. $\eta $ is the rotation angle of the polar frame with respect to the inertial frame. If let $\left( {{x_0},{z_0}} \right)$ and $\left( {{x_1},{z_1}} \right)$ represent the coordinates at ${M_0}$ and $T$ , respectively, then the initial condition set ${H_0}$ and terminal constraint set ${H_1}$ of the vehicle can be defined as

\begin{align*}{H_0} = \left\{ {\left( {{x_0},{z_0}} \right),{\vartheta _0},{\psi _{vs0}}} \right\},{\rm{\;\;\;\;}}{H_1} = \left\{ {\left( {{x_1},{z_1}} \right),{\psi _{vs1}}} \right\}\end{align*}

It is worth noting that the definition of deflection angle direction in this paper is contrary to that in He et al. [Reference He, Yan and Tang20]. Therefore, in order to facilitate understanding, it is necessary to deduce new expressions related to the determination of spiral trajectory parameters. Referring to Fig. 3 in He et al. [Reference He, Yan and Tang20], the main geometrical relation can be expressed as follows:

(13)

\begin{align}\frac{\pi }{2} + {\psi _{vs}} = \vartheta + \eta + {\rm{\Lambda }}\end{align}

Substituting the ${\psi _{vs0}}$ and ${\psi _{vs1}}$ into Equation 13, we get

(14)

\begin{align}\eta = \frac{\pi }{2} + {\psi _{vs0}} - {\rm{\Lambda }}\end{align}

(15)

\begin{align}{\vartheta _1} = {\psi _{vs1}} - {\psi _{vs0}}\end{align}

The coordinates of the $p$ can be solved by

(16)

\begin{align}\left[ {\begin{array}{c}{{x_p}}\\[3pt]{{z_p}}\end{array}} \right] = \frac{1}{{{K_1} - {K_0}}}\left[ {\begin{array}{c}{{K_1}{x_0} - {K_0}{x_1} + {K_0}{K_1}\left( {{z_1} - {z_0}} \right)}\\[3pt]{{K_1}{z_1} - {K_0}{z_0} - \left( {{x_1} - {x_0}} \right)}\end{array}} \right]\end{align}

where ${K_0}$ and ${K_1}$ are slopes of the rays $p{M_0}$ and $pT$ , respectively, and their expressions are:

(17)

\begin{align}{K_i} = {\rm{tan}}\left( {\frac{\pi }{2} + {\psi _{vsi}} - {\rm{\Lambda }}} \right),{\rm{\;\;\;\;}}i = 0,1\end{align}

Refer to He et al. [Reference He, Yan and Tang20] for the calculation of ${x_p}$ and ${z_p}$ when ${K_0}$ and ${K_1}$ do not exist. Next, the lengths of the polar diameters ${r_{s0}}$ and ${r_{s1}}$ can be calculated:

(18)

\begin{align}\|{r_{s0}}\| = \left| {\frac{{{z_0} - {z_p}}}{{{\rm{sin}}\left( {{\psi _{vs0}} - {\rm{\Lambda }}} \right)}}} \right|\end{align}

(19)

\begin{align}\|{r_{s1}}\| = \left| {\frac{{{z_1} - {z_p}}}{{{\rm{sin}}\left( {{\psi _{vs1}} - {\rm{\Lambda }}} \right)}}} \right|\end{align}

Dividing Equation 19 by Equation 18 and combining Equations 12 and 17 yields

(20)

\begin{align}In\left| {\frac{{{\rm{cos}}\left( {{\psi _{vs0}} - {\rm{\Lambda }} - \mu } \right)}}{{{\rm{cos}}\left( {{\psi _{vs1}} - {\rm{\Lambda }} - \mu } \right)}}} \right| = {\vartheta _1}{\rm{cot\Lambda }}\end{align}

where $\mu = {\rm{arctan}}\left[ {\left( {{x_0} - {x_1}} \right)/\left( {{z_0} - {z_1}} \right)} \right]$ .

Once the values in the sets ${H_0}$ and ${H_1}$ are given, ${\rm{\Lambda }}$ can be acquired by solving Equation 20. By substituting ${\rm{\Lambda }}$ into Equations 16, 18, 14 and 15, the polar coordinates $\left( {{x_p},{z_p}} \right)$ , initial polar diameter ${r_0}$ , the rotation angle $\eta $ of the polar frame with respect to the inertial frame and the terminal polar angle ${\vartheta _1}$ can be calculated, respectively.

Remark 2. This subsection presents the procedure for calculating the spiral trajectory parameters in the yaw plane. Without loss of generality, the spiral trajectory in three-dimensional space can be obtained by stretching the yaw plane spiral trajectory along the vertical direction.

2.3 Neural networks in reinforcement learning

Neural networks are an important part of reinforcement learning and are powerful in coping with nonlinearities. With the help of neural networks, a nonlinear function $f$ can be expressed as

(21)

\begin{align}f = {W^T}{\rm{\Phi }}\left( {\bar Z} \right) + \varepsilon \left( {\bar Z} \right)\end{align}

where $W \in {\mathbb{R}^l}$ is the weight vector, $l$ is the number of nodes in the hidden layer. $\bar Z = {\left[ {{{\bar z}_1},{{\bar z}_2}, \ldots, {{\bar z}_m}} \right]^T} \in {\mathbb{R}^m}$ are the inputs of neural networks with dimension $m$ . ${\rm{\Phi }}\left( {\bar Z} \right) = {\left[ {{\varphi _{b1}},{\varphi _{b2}}, \ldots {\varphi _{bl}}} \right]^T} \in {\mathbb{R}^l}$ are the basis functions. $\varepsilon \left( {\bar Z} \right)$ is the function reconstruction error. The optimal approximation can be obtained by properly selecting the number of network nodes.

The radial basis functions (RBF) neural networks are selected as the basic network frame in this paper, and its basis functions can be described as

(22)

\begin{align}{\varphi _j}\left( Z \right) = {\rm{exp}}\left( { - \frac{{\|Z - {\zeta _j}\|^2}}{{\varsigma _j^2}}} \right)\end{align}

where ${\zeta _j} = {\left[ {{\zeta _{j1}},{\zeta _{j2}}, \ldots, {\zeta _{jm}}} \right]^T}$ is the centre vector of the $j$ -th node in the hidden layer. ${\varsigma _j}$ is the width value.

Lemma 3. [Reference Ouyang, Dong, Wei and Sun31] The basis function ${\rm{\Phi }}\left( {\bar Z} \right)$ of neural networks is bounded, which satisfies $\left\|{\rm{\Phi }}\left( {\bar Z} \right)\right\| \le {{\rm{\Phi }}_M}$ and $\left\|\dot{\Phi}\left( {\bar Z} \right)\right\| \le {{\rm{\Phi }}_{dM}}$ , where ${{\rm{\Phi }}_M}$ and ${{\rm{\Phi }}_{dM}}$ are positive constants.

Lemma 4. [Reference Ouyang, Dong, Wei and Sun31] If the ideal weight ${W^{\rm{*}}}$ is obtained, then there exists $\left| {\varepsilon \left( {\bar Z} \right)} \right| \le {\varepsilon _m}$ and $\left| {\dot \varepsilon \left( {\bar Z} \right)} \right| \le {\varepsilon _{dm}}$ , where ${\varepsilon _m}$ and ${\varepsilon _{dm}}$ are positive constants.

3.0 Main results

This section fully presents the reinforcement learning spiral-diving manoeuver guidance method proposed in this paper. The concept of virtual sliding target is employed to design the desired proportional navigation guidance law. With the help of coordinate transformation technique, the desired guidance command tracking controller design challenge arising from the coupling of control variables is overcome. And the actor-critic networks and the corresponding adaptive weight update law are designed to approximate the unknown disturbances. After proving that the tracking errors are uniformly ultimately bounded by using the Lyapunov method, the range of guidance parameters is derived. The diagram of proposed reinforcement learning spiral-diving manoeuver guidance framework for reentry vehicle is shown in Fig. 1.

3.1 Desired proportional navigation guidance law

Figure 4 in He et al. [Reference He, Yan and Tang20] displays the motion of the vehicle and the virtual sliding target in the yaw plane. Where $M$ is the projection of the current position of the vehicle in the yaw plane, and ${M_s}$ is the closest point of $M$ to the spiral trajectory.

Assumption 2. The polar angle at $M$ is the same as the polar angle at ${M_s}$ . Moreover, in order to keep the shape of the spiral trajectory invariant, the pole $p$ is assumed to have the same dynamic properties as the target.

Under Assumption 2, the polar angle $\vartheta $ corresponding to the $M$ can be obtained by solving the following equation:

(23)

\begin{align}\left( {x - {x_p}} \right){\rm{sin}}\left( {\vartheta + \eta + {\rm{\Lambda }}} \right) + \left( {z - {z_p}} \right){\rm{cos}}\left( {\vartheta + \eta + {\rm{\Lambda }}} \right) = a{e^{\vartheta {\rm{cot\Lambda }}}}{\rm{cos\Lambda }}\end{align}

The time derivative of Equation 23 can be arranged to obtain the time derivative of $\vartheta $ :

(24)

\begin{align}\dot \vartheta = \frac{{V{\rm{cos}}\theta {\rm{sin}}\left( {\vartheta + \eta + {\rm{\Lambda }} - {\psi _v}} \right) - {V_t}{\rm{sin}}\left( {\vartheta + \eta + {\rm{\Lambda }} - {\psi _{vt}}} \right)}}{{\left[ {\|{r_s}\|{\rm{co}}{{\rm{s}}^2}{\rm{\Lambda }}/{\rm{sin\Lambda }} - \left( {x - {x_p}} \right){\rm{cos}}\left( {\vartheta + \eta + {\rm{\Lambda }}} \right) + \left( {z - {z_p}} \right){\rm{sin}}\left( {\vartheta + \eta + {\rm{\Lambda }}} \right)} \right]}}\end{align}

where ${V_t}$ and ${\psi _{vt}}$ are the size and direction angle of the target velocity, respectively. Note that ${\psi _{vt}}$ is meaningless when the target is stationary, i.e., ${V_t} = 0$ .

Figure 1. Reinforcement learning adaptive spiral-diving manoeuver guidance framework.

The trajectory of the virtual sliding target $T{\rm{'}}$ is designed based on the curve involute principle and is denoted as

(25)

\begin{align}{r_{vt}} = \left[ {\begin{array}{c}{{x_p}}\\[3pt]{{z_p}}\end{array}} \right] + {r_s}\left[ {\begin{array}{c}{{\rm{sin}}\left( {\vartheta + \eta } \right)}\\[3pt]{{\rm{cos}}\left( {\vartheta + \eta } \right)}\end{array}} \right] + {l_{go}}\left[ {\begin{array}{c}{{\rm{sin}}\left( {\vartheta + \eta + {\rm{\Lambda }}} \right)}\\[3pt]{{\rm{cos}}\left( {\vartheta + \eta + {\rm{\Lambda }}} \right)}\end{array}} \right]\end{align}

where ${r_{vt}} = {\left[ {{x_{vt}},{z_{vt}}} \right]^T}$ represents the coordinate vector of the $T{\rm{'}}$ . ${l_{go}}$ is the remaining length of the spiral trajectory and its value can be obtained by integrating Equation 12. The time derivative of Equation 25 yields

(26)

\begin{align}{V_{vt}} = \left[ {\begin{array}{c}{{{\dot x}_{vt}}}\\[3pt]{{{\dot z}_{vt}}}\end{array}} \right] = {V_t}\left[ {\begin{array}{c}{{\rm{cos}}{\psi _{vt}}}\\[3pt]{ - {\rm{sin}}{\psi _{vt}}}\end{array}} \right] + {l_{go}}\dot \vartheta \left[ {\begin{array}{c}{{\rm{cos}}\left( {\vartheta + \eta + {\rm{\Lambda }}} \right)}\\[3pt]{ - {\rm{sin}}\left( {\vartheta + \eta + {\rm{\Lambda }}} \right)}\end{array}} \right]\end{align}

The virtual line-of-sight deflection angle from the current position of $T{\rm{'}}$ pointing to $M$ is defined as

(27)

\begin{align}\varphi = {\rm{arctan}}\frac{{x - {x_{vt}}}}{{z - {z_{vt}}}}\end{align}

Taking the time derivative of Equation 27, and combining with Equation 1 and Equation 26, the following equation can be obtained:

(28)

\begin{align}\dot \varphi = \frac{{V{\rm{cos}}\theta }}{s}{\rm{sin}}\left( {\delta {\psi _v}} \right) + \dot \vartheta {\rm{co}}{{\rm{s}}^2}\left( {{\rm{\Delta }}\varphi } \right) - \frac{{{V_t}}}{s}{\rm{cos}}\left( {{\psi _{vt}} - \varphi } \right)\end{align}

where

(29)

\begin{align}\delta {\psi _v} = \frac{\pi }{2} + \varphi - {\psi _v}\end{align}

(30)

\begin{align}{\rm{\Delta }}\varphi = - \frac{\pi }{2} + {\psi _{vs}} - \varphi \end{align}

$s = \sqrt {{{\left( {x - {x_{vt}}} \right)}^2} + {{\left( {z - {z_{vt}}} \right)}^2}} $ is the remaining flight distance.

Similarly, the virtual line-of-sight path angle from the current position of $T{\rm{'}}$ pointing to $M$ is defined as

(31)

\begin{align}\phi = {\rm{arctan}}\left( {\frac{y}{s}} \right)\end{align}

Combining Equations 1 and 26, the time derivative of Equation 31 can be derived:

(32)

\begin{align}\dot \phi = \frac{V}{r}\left( {{\rm{cos}}\phi {\rm{sin}}\theta + {\rm{sin}}\phi {\rm{cos}}\theta {\rm{cos}}\left( {\delta {\psi _v}} \right)} \right) + \frac{{{l_{go}}}}{r}\dot \vartheta {\rm{sin}}\phi {\rm{sin}}\left( {{\rm{\Delta }}\varphi } \right) + \frac{{{V_t}}}{r}{\rm{sin}}\phi {\rm{sin}}\left( {\varphi - {\psi _{vt}}} \right)\end{align}

where $r = \sqrt {{s^2} + {y^2}} $ is the distance of the vehicle from the virtual target.

Furthermore, taking the time deriving of Equation 13 yields

(33)

\begin{align}{\dot \psi _{vs}} = \dot \vartheta \end{align}

Based on Equations 28, 32 and 33, the desired proportional navigation guiding law of the vehicle with respect to the virtual sliding target as shown in Equations 34 and 35 can be designed:

(34)

\begin{align}{\dot \psi _{vd}} = - {\lambda _1}\dot \varphi + \left( {1 + {\lambda _1}} \right){\dot \psi _{vs}}\end{align}

(35)

\begin{align}{\dot \theta _d} = - {\lambda _2}\dot \phi \end{align}

where ${\lambda _1}$ and ${\lambda _2}$ are user-defined guidance parameters, and their value ranges will be determined later.

3.2 Reinforcement learning adaptive controller

For the first-order multivariate tightly coupled system 4 considering the effects of unknown disturbances, treat ${g_1}\left( {\xi, u} \right)$ as the virtual control variable and define

(36)

\begin{align}{e_2} = {g_1}\left( {\xi, u} \right) - \upsilon \left( {\xi, u} \right)\end{align}

where $\upsilon $ is the virtual control law.

Taking the time derivative of Equation 7 and combining it with Equation 36 yields

(37)

\begin{align}{\dot e_1} = {f_1} + {e_2} + \upsilon + d - {\dot \xi _d}\end{align}

so the virtual controller can be designed as

(38)

\begin{align}\upsilon = - {k_1}{e_1} - {f_1} - \hat d + {\dot \xi _d}\end{align}

where ${k_1} \gt 0$ is a user-defined control gain. $\hat d$ is the estimation of $d$ .

Taking the time derivative of Equation 36 leads to

(39)

\begin{align}{\dot e_2} = \frac{{\partial {g_1}}}{{\partial \xi }}\dot \xi + \frac{{\partial {g_1}}}{{\partial u}}\dot u - \dot \upsilon \end{align}

so the time derivative of the controller $u$ can be designed as

(40)

\begin{align}\dot u = {\left( {\frac{{\partial {g_1}}}{{\partial u}}} \right)^{ - 1}}\left( { - {k_2}{e_2} - {e_1} - \frac{{\partial {g_1}}}{{\partial \xi }}\left( {{f_1} + {g_1} + \hat d} \right) + \dot \upsilon } \right)\end{align}

where ${k_2} \gt 0$ is a user-defined control gain. By integrating Equation 40, $u$ can be obtained.

From Equations 38 and 40, it is clear that how to obtain the estimate of $d$ is the premise of designing the controller $u$ . Ingeniously, the actor-critic networks in reinforcement learning provide a superior alternative to deal with the problem.

In the framework of the actor network, $d$ can theoretically be expressed by

(41)

\begin{align}d = W_a^{{\rm{*}}T}{{\rm{\Phi }}_a}\left( \xi \right) + {\varepsilon _a}\end{align}

where ${{\rm{\Phi }}_a} \in {\mathbb{R}^{{l_a}}}$ is the basis function of dimension ${l_a}$ , which satisfies $\|{{\rm{\Phi }}_a}\| \le {{\rm{\Phi }}_{aM}}$ . ${\varepsilon _a} \in {\mathbb{R}^2}$ is the actor reconstruction error and satisfies $\|{\varepsilon _a}\| \le {\varepsilon _{am}}$ . $W_a^{\rm{*}} \in {\mathbb{R}^{{l_a} \times 2}}$ is the real actor network weight. The reality is that only the estimation of $d$ can be obtained:

(42)

\begin{align}\hat d = \hat W_a^T{{\rm{\Phi }}_a}\left( \xi \right)\end{align}

where ${\hat W_a} \in {\mathbb{R}^{{l_a} \times 2}}$ is the estimated weight of the actor network.

In the framework of the critic network, the integral penalty function can be designed as

(43)

\begin{align}J\left( t \right) = \mathop \smallint \nolimits_\tau ^\infty \,{\mathcal{L}}\left( t \right)dt\end{align}

where ${\mathcal{L}}\left( t \right) = e_1^TQ{e_1}$ , $Q \in {\mathbb{R}^{2 \times 2}}$ is a positive definite matrix. $J$ can theoretically be expressed by

(44)

\begin{align}J = W_c^{{\rm{*}}T}{{\rm{\Phi }}_c}\left( {{e_1}} \right) + {\varepsilon _c}\end{align}

where ${{\rm{\Phi }}_c} \in {\mathbb{R}^{{l_c}}}$ is the basis function of dimension ${l_c}$ , which satisfies $\|{\dot{\Phi}_c}\| \le {{\rm{\Phi }}_{cdM}}$ . ${\varepsilon _c}$ is the critic reconstruction error and satisfies $\left| {{{\dot \varepsilon }_c}} \right| \le {\varepsilon _{cdm}}$ . $W_c^{\rm{*}} \in {\mathbb{R}^{{l_c}}}$ is the real critic network weight. The reality is that only the estimation of $J$ can be obtained:

(45)

\begin{align}\hat J = \hat W_c^T{{\rm{\Phi }}_c}\left( {{e_1}} \right)\end{align}

where ${\hat W_c} \in {\mathbb{R}^{{l_c}}}$ is the estimated weight of the critic network.

Define the weight error of critic network as ${\hat W_c} = {\hat W_c} - W_c^{\rm{*}}$ . In addition, define the critic error as

(46)

\begin{align}{e_c} = {\mathcal{L}} + \dot{\hat{J}} = {\mathcal{L}} + \hat W_c^T{\dot{\Phi}_c}\end{align}

and the critic error function can be designed as

(47)

\begin{align}{E_c} = \frac{1}{2}e_c^2\end{align}

According to the gradient descent criterion, the adaptive update law of ${\hat W_c}$ can be deduced as follows:

(48)

\begin{align}{\dot{\hat{W_c}}} = - {\lambda _c}\left( {{\mathcal{L}} + \hat W_c^T{{\dot{\Phi}}_c}} \right){\dot{\Phi}_c} - {\lambda _c}{\hbar _c}{\hat W_c}\end{align}

where ${\lambda _c} \gt 0$ and ${\hbar _c} \gt 0$ are the user-defined learning rates of the critic network.

Define the weight error of actor network as ${\hat W_a} = {\hat W_a} - W_a^{\rm{*}}$ . And define the approximation error ${H_a}$ as

(49)

\begin{align}{H_a} = \hat W_a^T{{\rm{\Phi }}_a} - W_a^{{\rm{*}}T}{{\rm{\Phi }}_a} = \hat W_a^T{{\rm{\Phi }}_a}\end{align}

Then, the actor error can be defined as

(50)

\begin{align}{e_a} = {H_a} + {{\rm{\Omega }}_a}\hat J\end{align}

where ${{\rm{\Omega }}_a} \in {\mathbb{R}^{2 \times 1}}$ is the user-defined gain matrix satisfying $\|{{\rm{\Omega }}_a}\| \le {{\rm{\Omega }}_{aM}}$ , and ${{\rm{\Omega }}_{aM}}$ is a positive constant.

Furthermore, the actor error function can be designed as

(51)

\begin{align}{E_a} = \frac{1}{2}e_a^T{e_a}\end{align}

According to the gradient descent criterion, the adaptive update law of ${\hat W_a}$ can be deduced as follows:

(52)

\begin{align} {\dot{\hat{W_a}}} & = - {\lambda _a}{{\rm{\Phi }}_a}e_a^T - {\lambda _a}{\hbar _a}{\hat W_a}\nonumber\\& = - {\lambda _a}{{\rm{\Phi }}_a}\left( {{\rm{\Phi }}_a^T{{\hat W}_a} + \hat J{\rm{\Omega }}_a^T} \right) - {\lambda _a}{\hbar _a}{\hat W_a}\end{align}

where $a \gt 0$ and ${\hbar _a} \gt 0$ are the user-defined learning rates of the actor network.

3.3 Stability and convergence analysis

Theorem 1. Consider the Assumptions 1–2 and Lemmas 1–4, if the control law 40 is designed for system 4, the actor-critic networks 42 and 45 and the corresponding adaptive weight update laws 48 and 52 are designed to cope with $d$ , and the Lyapunov candidate function as shown in Equation 53 is constructed, then the tracking error ${e_1}$ is uniformly ultimately bounded stable. As well, the weight errors ${\hat W_a}$ and ${\hat W_c}$ of the actor-critic networks are uniformly ultimately bounded.

Proof. Construct the Lyapunov candidate function as follows:

(53)

\begin{align}L = {L_1} + {L_2} + {L_3}\end{align}

where

(54)

\begin{align}{L_1} = \frac{1}{2}e_1^T{e_1} + \frac{1}{2}e_2^T{e_2}\end{align}

(55)

\begin{align}{L_2} = \frac{1}{2}Tr\left( {\hat W_a^T\lambda _a^{ - 1}{{\hat W}_a}} \right)\end{align}

(56)

\begin{align}{L_3} = \frac{1}{2}Tr\left( {\hat W_c^T\lambda _c^{ - 1}{{\hat W}_c}} \right)\end{align}

By taking time derivative of Equation 54 and combining Equations 37–42, it can be deduced that

(57)

\begin{align} {\dot L_1} & = e_1^T{\dot e_1} + e_2^T{\dot e_2}\nonumber\\&= - {k_1}e_1^T{e_1} + e_1^T{e_2} - e_1^T\hat W_a^T{{\rm{\Phi }}_a} + e_1^T{\varepsilon _a} - {k_2}e_2^T{e_2} - e_2^T{e_1} - e_2^T\frac{{\partial g}}{{\partial \xi }}\hat W_a^T{{\rm{\Phi }}_a} + e_2^T\frac{{\partial g}}{{\partial \xi }}{\varepsilon _a}\nonumber\\& \le - {k_1}e_1^Te + \frac{1}{2}e_1^T{e_1} + \frac{1}{2}{\rm{\Phi }}_a^T{\hat W_a}\hat W_a^T{{\rm{\Phi }}_a} + \frac{1}{2}e_1^T{e_1} + \frac{1}{2}\varepsilon _a^T{\varepsilon _a} \nonumber\\& \quad -{k_2}e_2^T{e_2} + \frac{1}{2}e_2^T\frac{{\partial g}}{{\partial \xi }}{\left( {\frac{{\partial g}}{{\partial \xi }}} \right)^T}{e_2} + \frac{1}{2}{\rm{\Phi }}_a^T{\hat W_a}\hat W_a^T{{\rm{\Phi }}_a} + \frac{1}{2}e_2^T\frac{{\partial g}}{{\partial \xi }}{\left( {\frac{{\partial g}}{{\partial \xi }}} \right)^T}{e_2} + \frac{1}{2}\varepsilon _a^T{\varepsilon _a}\nonumber\\& \le - \left( {{k_1} - 1} \right)e_1^T{e_1} - \left( {{k_2} - Tr\left( {\frac{{\partial g}}{{\partial \xi }}{{\left( {\frac{{\partial g}}{{\partial \xi }}} \right)}^T}} \right)} \right)e_2^T{e_2} + {\rm{\Phi }}_{aM}^2Tr\left( {\hat W_a^T{{\hat W}_a}} \right) + \varepsilon _{am}^2\end{align}

By taking time derivative of Equation 55 and combining Equation 52, it can be deduced that

(58)

\begin{align}{\dot L_2} & = Tr\left( {\hat W_a^T\lambda _a^{ - 1}{{\dot{\hat{W}}}_a}} \right)\nonumber\\& = Tr\left( {\tilde W_a^T\lambda _a^{ - 1}\left[ { - {\lambda _a}{{\rm{\Phi }}_a}\left( {{\rm{\Phi }}_a^T{{\hat W}_a} + \hat J{\rm{\Omega }}_a^T} \right) - {\lambda _a}{\hbar _a}{{\hat W}_a}} \right]} \right)\nonumber\\& \le - Tr\left( {\tilde W_a^T{{\rm{\Phi }}_a}{\rm{\Phi }}_a^T{{\tilde W}_a}} \right) + \frac{1}{2}Tr\left( {\tilde W_a^T{{\rm{\Phi }}_a}{\rm{\Phi }}_a^T{{\tilde W}_a}} \right) + \frac{1}{2}Tr\left( {W_a^T{{\rm{\Phi }}_a}{\rm{\Phi }}_a^T{W_a}} \right)\nonumber\\& \quad + \frac{1}{2}Tr\left( {\tilde W_a^T{{\rm{\Phi }}_a}{\rm{\Phi }}_a^T{{\tilde W}_a}} \right) + \frac{1}{2}Tr\left( {\tilde W_c^T{{\rm{\Phi }}_c}{\rm{\Omega }}_a^T{{\rm{\Omega }}_a}{\rm{\Phi }}_c^T{{\tilde W}_c}} \right)\nonumber\\& \quad + \frac{1}{2}Tr\left( {\tilde W_a^T{{\rm{\Phi }}_a}{\rm{\Phi }}_a^T{{\tilde W}_a}} \right) + \frac{1}{2}Tr\left( {W_c^T{{\rm{\Phi }}_c}{\rm{\Omega }}_a^T{{\rm{\Omega }}_a}{\rm{\Phi }}_c^T{W_c}} \right)\nonumber\\& \quad - {\hbar _a}Tr\left( {\tilde W_a^T{{\tilde W}_a}} \right) + \frac{1}{2}{\hbar _a}Tr\left( {\tilde W_a^T{{\tilde W}_a}} \right) + \frac{1}{2}{\hbar _a}Tr\left( {W_a^T{W_a}} \right)\nonumber\\& \le - \frac{1}{2}\left( {{\hbar _a} - {\rm{\Phi }}_{aM}^2} \right)Tr\left( {\tilde W_a^T{{\tilde W}_a}} \right) + \frac{1}{2}\left( {{\hbar _a} + {\rm{\Phi }}_{aM}^2} \right)Tr\left( {W_a^T{W_a}} \right)\nonumber\\& \quad + \frac{1}{2}{\rm{\Phi }}_{cM}^2{\rm{\Omega }}_{aM}^2Tr\left( {\tilde W_c^T{{\tilde W}_c}} \right) + \frac{1}{2}{\rm{\Phi }}_{cM}^2{\rm{\Omega }}_{aM}^2Tr\left( {W_c^T{W_c}} \right)\end{align}

Similarly, by taking time derivative of Equation 3.3 and combining Equation 48, it can be deduced that

(59)

\begin{align} {\dot L_3} & = Tr\left( {\tilde W_c^T\lambda _c^{ - 1}{{\tilde W}_c}} \right)\nonumber\\[3pt]& = Tr\left( {\tilde W_c^T\lambda _c^{ - 1}\left[ { - {\lambda _c}\left( {{\mathcal{L}} + \hat W_c^T{{\dot{\Phi}}_c}} \right){{\dot{\Phi}}_c} - {\lambda _c}{\hbar _c}{{\hat W}_c}} \right]} \right)\nonumber\\[3pt]& \le - Tr\left( {\tilde W_c^T{{\dot{\Phi}}_c}\dot{\Phi}_c^T{{\tilde W}_c}} \right) + Tr\left( {\tilde W_c^T{{\dot{\Phi}}_c}\dot{\Phi}_c^T{{\tilde W}_c}} \right) + Tr\left( {W_c^T{{\dot{\Phi}}_c}\dot{\Phi}_c^T{W_c}} \right) + \frac{1}{2}Tr\left( {\tilde W_c^T{{\dot{\Phi}}_c}\dot{\Phi}_c^T{{\tilde W}_c}} \right)\nonumber\\[3pt]& \quad + \frac{1}{2}\varepsilon _{cdm}^2 - {\hbar _c}Tr\left( {\tilde W_c^T{{\tilde W}_c}} \right) + \frac{1}{2}{\hbar _c}Tr\left( {\tilde W_c^T{{\tilde W}_c}} \right) + \frac{1}{2}{\hbar _c}Tr\left( {W_c^T{W_c}} \right)\nonumber\\[3pt]& \le - \frac{1}{2}\left( {{\hbar _c} - {\rm{\Phi }}_{cdM}^2} \right)Tr\left( {\tilde W_c^T{{\tilde W}_c}} \right) + \frac{1}{2}\left( {{\hbar _c} + 2{\rm{\Phi }}_{cdM}^2} \right)Tr\left( {W_c^T{W_c}} \right) + \frac{1}{2}\varepsilon _{cdm}^2\end{align}

At last, by taking time derivative of Equation 53 and substituting Equations 57–59, we get

(60)

\begin{align} \dot L & = {\dot L_1} + {\dot L_2} + {\dot L_3}\nonumber\\[3pt]& \le - \left( {{k_1} - 1} \right)e_1^T{e_1} - \left( {{k_2} - Tr\left( {\frac{{\partial g}}{{\partial \xi }}{{\left( {\frac{{\partial g}}{{\partial \xi }}} \right)}^T}} \right)} \right)e_2^T{e_2} - \frac{1}{2}\left( {{\hbar _a} - 3{\rm{\Phi }}_{aM}^2} \right)Tr\left( {\tilde W_a^T{{\tilde W}_a}} \right)\nonumber\\[3pt]& \quad - \frac{1}{2}\left( {{\hbar _c} - {\rm{\Phi }}_{cdM}^2 - {\rm{\Phi }}_{cM}^2{\rm{\Omega }}_{aM}^2} \right)Tr\left( {\tilde W_c^T{{\tilde W}_c}} \right) + \varepsilon _{am}^2 + \frac{1}{2}\varepsilon _{cdm}^2\nonumber\\[3pt]& \quad + \frac{1}{2}\left( {{\hbar _a} + {\rm{\Phi }}_{aM}^2} \right)Tr\left( {W_a^T{W_a}} \right) + \frac{1}{2}\left( {{\hbar _c} + 2{\rm{\Phi }}_{cdM}^2 + {\rm{\Phi }}_{cM}^2{\rm{\Omega }}_{aM}^2} \right)Tr\left( {W_c^T{W_c}} \right)\end{align}

Equation 60 satisfies $\dot L \le - \kappa L\left( t \right) + \delta $ under the condition that $\left( {{k_1} - 1} \right) \gt 0$ , $\left( {{k_2} - Tr\left( {\frac{{\partial g}}{{\partial \xi }}{{\left( {\frac{{\partial g}}{{\partial \xi }}} \right)}^T}} \right)} \right) \gt 0$ , $\left( {{\hbar _a} - 3{\rm{\Phi }}_{aM}^2} \right) \gt 0$ and $\left( {{\hbar _c} - {\rm{\Phi }}_{cdM}^2 - {\rm{\Phi }}_{cM}^2{\rm{\Omega }}_{aM}^2} \right) \gt 0$ , where

\begin{align*}\kappa & = {\rm{min}}\left\{ {\begin{array}{c}{2\left( {{k_1} - 1} \right),2\left( {{k_2} - Tr\left( {\frac{{\partial g}}{{\partial \xi }}{{\left( {\frac{{\partial g}}{{\partial \xi }}} \right)}^T}} \right)} \right),}\\[3pt]{\left( {{\hbar _a} - 3{\rm{\Phi }}_{aM}^2} \right),\left( {{\hbar _c} - {\rm{\Phi }}_{cdM}^2 - {\rm{\Phi }}_{cM}^2{\rm{\Omega }}_{aM}^2} \right)}\end{array}} \right\}\\[5pt]\delta & = \frac{1}{2}\left( {{\eta _a} + {\rm{\Phi }}_{aM}^2} \right)Tr\left( {W_a^T{W_a}} \right) + \frac{1}{2}{\rm{\Phi }}_{cM}^2{\rm{\Omega }}_a^2Tr\left( {W_c^T{W_c}} \right) + \varepsilon _{am}^2 + \frac{1}{2}\varepsilon _{cdm}^2\end{align*}

Therefore, ${e_1}$ , ${e_2}$ , ${\tilde W_a}$ and ${\tilde W_c}$ are uniformly ultimately bounded.

Remark 3. ${e_1}$ is bounded indicating that $\theta \to {\theta _d}$ , ${\psi _v} \to {\psi _{vd}}$ as $t \to \infty $ , that is, the objective of this paper highlighted in Remark 1 is satisfied. ${\tilde W_a},{\tilde W_c}$ are bounded indicating that ${\hat W_a} \to W_a^{\rm{*}}$ , ${\hat W_c} \to W_c^{\rm{*}}$ as $t \to \infty $ , that is, $\hat d \to d$ as $t \to \infty $ . In conclusion, the designed actor-critic networks and the corresponding adaptive weight update laws can cope with unknown disturbances well.

Theorem 2. For the spiral trajectory in the yaw plane, consider Theorem 1 and the geometric relationship shown in Fig. 4 of He et al. [Reference He, Yan and Tang20]. In addition, let the initial angle between the velocity vector of the vehicle and the virtual line-of-sight be such that $\left| {\delta {\psi _v}\left( 0 \right)} \right| \lt \frac{\pi }{2}$ . If $\left| \theta \right| \lt \frac{\pi }{2}$ , the guidance parameter ${\lambda _1} \lt - 1$ renders $s \to 0$ as $t \to \infty $ , regardless of the value of $V$ . The guidance parameter ${\lambda _1} \lt - 2$ not only renders the flight trajectory converges to the spiral trajectory, but also renders the velocity vector of the vehicle converges to the virtual line-of-sight, meaning that $\varphi - {\psi _v} \to - \frac{\pi }{2}$ and $\varphi - {\psi _{vs}} \to - \frac{\pi }{2}$ .

Proof. It follows from Theorem 1 that

(61)

\begin{align}{\psi _v} = {\psi _{vd}} + {e_{12}}\end{align}

where $\left| {{e_{12}}} \right|$ is an arbitrarily small constant. And the time derivative of Equation 61 gives

(62)

\begin{align}{\dot \psi _v} = {\dot \psi _{vd}}\end{align}

Taking the time derivative of Equation 29 and substituting Equations 28, 34 and 62, the following equation can be derived:

(63)

\begin{align} \delta {\dot \psi _v} & = \dot \varphi - {\dot \psi _v} = \left( {1 + {\lambda _1}} \right)\left( {\dot \varphi - {{\dot \psi }_{vs}}} \right)\nonumber\\[5pt]& = \left( {1 + {\lambda _1}} \right)\left( {\frac{{V{\rm{cos}}\theta }}{s}{\rm{sin}}\left( {\delta {\psi _v}} \right) - \frac{{{V_t}}}{s}{\rm{cos}}\left( {{\psi _{vt}} - \varphi } \right) - \dot \vartheta {\rm{si}}{{\rm{n}}^2}\left( {{\rm{\Delta }}\varphi } \right)} \right)\end{align}

Neglecting the second-order small quantity $\dot \vartheta {\rm{si}}{{\rm{n}}^2}\left( {{\rm{\Delta }}\varphi } \right)$ and the action term $\frac{{{V_t}}}{s}{\rm{cos}}\left( {{\psi _{vt}} - \varphi } \right)$ of the low-speed moving target in Equation 63, it can be rewritten as

(64)

\begin{align}\delta {\dot \psi _v} = \left( {1 + {\lambda _1}} \right)\frac{{V{\rm{cos}}\theta }}{s}{\rm{sin}}\left( {\delta {\psi _v}} \right)\end{align}

A similar treatment to time derivative of $s$ produces

(65)

\begin{align}\dot s = - V{\rm{cos}}\theta {\rm{cos}}\left( {\delta {\psi _v}} \right)\end{align}

Dividing Equation 65 by Equation 64, the Equation 66 can be obtained:

(66)

\begin{align}\frac{{ds}}{{d\left( {\delta {\psi _v}} \right)}} = - \frac{s}{{1 + {\lambda _1}}}\frac{{{\rm{cos}}\left( {\delta {\psi _v}} \right)}}{{{\rm{sin}}\left( {\delta {\psi _v}} \right)}}\end{align}

And the Equation 67 can be obtained by integrating the Equation 66:

(67)

\begin{align}s = \ell {\left| {{\rm{sin}}\left( {\delta {\psi _v}} \right)} \right|^{ - \frac{1}{{1 + {\lambda _1}}}}}\end{align}

where $\ell \gt 0$ is the bounded integration constant.

If $\delta {\psi _v}\left( t \right)$ satisfies $0 \lt \delta {\psi _v}\left( 0 \right) \lt \frac{\pi }{2}$ at $t = 0$ , then by substituting Equation 67 into Equation 64, we can get

(68)

\begin{align}\delta {\dot \psi _v} = \left( {1 + {\lambda _1}} \right)\frac{{V{\rm{cos}}\theta }}{\ell }{\left( {{\rm{sin}}\left( {\delta {\psi _v}} \right)} \right)^{\frac{{{\lambda _1} + 2}}{{{\lambda _1} + 1}}}}\end{align}

Note that $V{\rm{cos}}\theta \gt 0$ always holds no matter in which flight state. In the case $0 \lt \delta {\psi _v}\left( 0 \right) \lt \frac{\pi }{2}$ , when the guidance parameter ${\lambda _1} \lt - 1$ , there is $\delta {\dot \psi _v} \lt 0$ , which indicates that $\delta {\psi _v} \to 0$ as $t \to \infty $ . From Equation 67, it can be found that the remaining flight distance between the vehicle and the virtual sliding target $s \to 0$ as $\delta {\psi _v} \to 0$ . Furthermore, from Equation 29, it can be found that $\varphi - {\psi _v} \to - \frac{\pi }{2}$ as $\delta {\psi _v} \to 0$ . From Equation 68, when ${\lambda _1} \lt - 2$ , there is $\delta {\dot \psi _v} \to 0$ as $\delta {\psi _v} \to 0$ , so $\dot \varphi - {\dot \psi _v} \to 0$ , and combining Equations 34 and 62, it can be observed that ${\dot \psi _{vs}} \to {\dot \psi _v}$ . Therefore, the flight trajectory converges to the spiral trajectory. This means that ${\rm{\Delta }}\varphi \to 0$ , i.e. $\varphi - {\psi _{vs}} \to - \frac{\pi }{2}$ . $\delta {\psi _v} \to 0$ and ${\rm{\Delta }}\varphi \to 0$ indicate that velocity vector of the vehicle converges to the virtual line-of-sight. The same conclusion can be obtained when $ - \frac{\pi }{2} \lt \delta {\psi _v}\left( 0 \right) \lt 0$ .

Theorem 3. Considering Theorem 1 and Theorem 2, the guidance parameter ${\lambda _2} \gt 1$ renders the distance from the vehicle to the virtual sliding target $r \to 0$ as $t \to \infty $ . The guidance parameter ${\lambda _2} \gt 2$ also renders $\phi + \theta \to 0$ .

Proof. It follows from Theorem 1 that

(69)

\begin{align}\theta = {\theta _d} + {e_{11}}\end{align}

where $\left| {{e_{11}}} \right|$ is an arbitrarily small constant. And the time derivative of Equation 69 gives

(70)

\begin{align}\dot \theta = {\dot \theta _d}\end{align}

Define $\sigma = \phi + \theta $ , by taking its time derivative and combining Equations 35 and 70, we get

(71)

\begin{align}\dot \sigma = \dot \phi + \dot \theta = \left( {1 - {\lambda _2}} \right)\dot \phi \end{align}

Taking the time derivative of $r$ , and then neglecting the low-speed moving target action term and noting that $\delta {\psi _v} \to 0$ , ${\rm{\Delta }}\varphi \to 0$ as $t \to \infty $ , yields

(72)

\begin{align}\dot r = - V{\rm{cos}}\sigma \end{align}

Similarly, there is

(73)

\begin{align}\dot \phi = \frac{V}{r}{\rm{sin}}\sigma \end{align}

Dividing Equation 72 by Equation 71 yields

(74)

\begin{align}\frac{{dr}}{{d\sigma }} = \frac{r}{{{\lambda _2} - 1}}\frac{{{\rm{cos}}\sigma }}{{{\rm{sin}}\sigma }}\end{align}

And the Equation 75 can be obtained by integrating the Equation 74:

(75)

\begin{align}r = \ell {\rm{'}}{\left| {{\rm{sin}}\sigma } \right|^{\frac{1}{{{\lambda _2} - 1}}}}\end{align}

where $\ell {\rm{'}} \gt 0$ is the bounded integration constant. Combining Equations 71, 73 and 75, the Equation 76 can be organized:

(76)

\begin{align}\dot \sigma = \left( {1 - {\lambda _2}} \right)\frac{V}{{\ell {\rm{'}}}}{\left( {{\rm{sin}}\sigma } \right)^{\frac{{{\lambda _2} - 2}}{{{\lambda _2} - 1}}}}\end{align}

when $0 \lt \sigma \left( 0 \right) \lt \pi $ , if the guidance parameter ${\lambda _2} \gt 1$ , then $\dot \sigma \lt 0$ . Therefore $\sigma \to 0$ as $t \to \infty $ . From Equation 75, it can be found that spatial distance from the vehicle to the virtual sliding target $r \to 0$ as $\sigma \to 0$ . If ${\lambda _2} \gt 2$ , there is $\dot \sigma \to 0$ . Because $\dot \sigma = \left( {1 - {\lambda _2}} \right)\dot \phi $ , so $\dot \phi \to 0$ . Thus, $\phi $ approaches a constant and $\theta $ approaches the negative of the same constant. That is, in the pitch plane, $\phi + \theta \to 0$ . The same conclusion can be obtained when $ - \pi \lt \sigma \left( 0 \right) \lt 0$ .

Remark 4. As can be seen from Equations 61, 62 and Equations 69, 70, the assumptions that ${\dot \psi _v} = {\dot \psi _{vd}}$ and $\dot \theta = {\dot \theta _d}$ made in the He and Yan [Reference He and Yan19] and He et al. [Reference He, Yan and Tang20] are verified.

4.0 Simulations

In this section, some simulations are presented to demonstrate the validity and superiority of the proposed reinforcement learning based adaptive spiral-diving guidance method. Specifically, the validity of the proposed method is verified by striking a stationary target and a low-speed moving target. For convenience, the former is denoted as Case 1, and the latter is denoted as Case 2. Otherwise, the superiority of the proposed method is demonstrated by comparing it with methods that without RL and RBF neural networks for unknown disturbances.

The parameters of the vehicle are: vehicle mass $m = 200$ kg, the reference area ${S_{ref}} = 1.8$ m. The initial position ${\left( {{x_0},{y_0},{z_0}} \right)^T} = {\left( {0,32,60} \right)^T}$ km, and the initial velocity ${V_0} = 1200$ m/s. The initial path angle ${\theta _0} = - {2^ \circ }$ , and the initial deflection angle ${\psi _{v0}} = {28^ \circ }$ . The terminal deflection angle ${\psi _{vf}} = {363^ \circ }$ . Moreover, the gravitational acceleration $g = 9.81$ m/s ${{\rm{\;}}^2}$ .The position of the stationary target is ${\left( {0,0,0} \right)^T}$ km, which is also the starting point of the low-speed moving target. Note that the low-speed target moves only in the horizontal plane with velocity ${V_t} = 9$ m/s and directional angle ${\psi _{vt}} = - {90^ \circ }$ . Other parameters of the two cases are shown in Table 1.

Table 1. The parameters of two cases

Figure 2. Simulation profiles for Case 1. (a) 3-D trajectory of the vehicle. (b) Trajectories of the vehicle, the target and the virtual sliding target in yaw plane. (c) Vehicle velocity. (d) Path angle tracking curve. (e) Deflection angle tracking curve. (f) Tracking errors. (g) Control inputs. (h) Adaptive weights. (i) Real and estimated values of disturbances.

The simulation results for Case 1 and Case 2 are shown in Figs. 2 and 3, respectively. The 3-D spiral trajectories of the vehicle in two cases are shown in Figs. 2(a) and 3(a). And corresponding trajectories of the vehicle, the target and the virtual sliding target in yaw plane are shown in Figs. 2(b) and 3(b). They indicate that the vehicle is able to hit the target in both cases, and the respective miss distances are 0.449 m and 0.6092 m. Figures 2(c) and 3(c) show the vehicle velocity response profiles in two cases. Figures 2(d) and 3(d) show the real path angle versus desired path angle for two cases. And Figs. 2(e) and 3(e) show the real deflection angle versus desired deflection angle for two cases. At the moment of hitting the target, the desired path angle and the deflection angle have a small jump. The reason for this is that the vehicle needs to slow down the descent rate in the y-direction to adjust the motion in the x, z-directions to reduce the miss distance. The tracking errors of the path angle and the deflection angle in two cases are shown in Figs. 2(f) and 3(f). So the uniform ultimate boundedness of the tracking error is proved. The control input profiles under reinforcement learning based adaptive law for two cases are depicted in Figs. 2(g) and 3(g). Figures 2(h) and 3(h) show the adaptive adjustment profiles of the weights in the two cases, and their effects are verified in Figs. 2(i) and 3(i). In other words, the unknown disturbances are well compensated. In brief, the above simulation results fully demonstrate the validity of the proposed method in this paper.

Without loss of generality, a comparative simulation of the proposed method with Without RL method and RBF method is included based on Case 2. The striking effects of the three methods are shown in Fig. 4. And as shown in Table 2, the miss distances under the three methods are 0.6092 m, 0.8718 m and 0.7602 m, respectively. The strike accuracy of the proposed method has improved by 30.12 ${\rm{\% }}$ and 19.86 ${\rm{\% }}$ compared to without RL method and RBF method.

Table 2. The miss distances under the three methods

Figure 3. Simulation profiles for Case 2. (a) 3-D trajectory of the vehicle. (b) Trajectories of the vehicle, the target and the virtual sliding target in yaw plane. (c) Vehicle velocity. (d) Path angle tracking curve. (e) Deflection angle tracking curve. (f) Tracking errors. (g) Control inputs. (h) Adaptive weights. (i) Real and estimated values of disturbances.

Figure 4. The striking effects under three methods.

5.0 Conclusion

In this paper, the reinforcement learning based adaptive method has been implemented for a class of spiral-diving manoeuver guidance problems of reentry vehicles subject to unknown disturbances. By designing the actor-critic networks and the corresponding adaptive weight update laws, the unknown disturbances are well compensated. In addition, by introducing the coordinate transformation technique, the controller design problem caused by the coupling of control variables is overcome. As a result, a novel reinforcement learning based adaptive guidance framework has been constructed such that desired guidance commands can be tracked stably. Some numerical simulations have been provided to demonstrate the validity and superiority of the proposed method. Based on the work done in this paper, we will study the cooperative spiral-diving guidance of reentry vehicle formation.

Acknowledgements

This work was supported by the Foundation of National Key Laboratory of Science and Technology on Test Physics and Numerical Mathematics and The Foundation of Shanghai Astronautics Science and Technology Innovation, China.

Footnotes

†

Author’s notes

References

Bairstow, S.H. Reentry guidance with extended range capability for low l/d spacecraft, PhD thesis, Massachusetts Institute of Technology, 2006.CrossRef Google Scholar

Zhang Mingang, Y.D. A sinking trajectory planning and design method for reentry vehicle under multiple constraints. Missiles Space Veh., 2022, 4, (25–28). doi: 10.7654/j.issn.1004-7182.20220406 Google Scholar

Rahmani-Nejad, A. An anti-hypersonic missiles hybrid optical and enhanced Railgun system. In Counterterrorism, Crime Fighting, Forensics, and Surveillance Technologies V, Vol. 11869, SPIE, Washington State, USA, 2021, pp. 107–116.Google Scholar

Liu, S., Yan, B., Liu, R., Dai, P., Yan, J. and Xin, G. Cooperative guidance law for intercepting a hypersonic target with impact angle constraint. Aeronaut. J., 2022, 126, (1300), pp 1026–1044. doi: 10.1017/aer.2021.117 CrossRef Google Scholar

Yu, L.L.X. Weaving maneuver trajectory design for hypersonic glide vehicles, Acta Aeronaut. Astronaut. Sin., 2011, 32, (2174–2181). doi: 10.11-1129/V.20110921.0830.001 Google Scholar

Dwivedi, P., Bhale, P., Bhattacharyya, A. and Padhi, R. Generalized state estimation and model predictive guidance for spiraling and ballistic targets. J. Guid. Control Dyn., 2014, 37, (1), pp 243–264. doi: 10.2514/1.60075 CrossRef Google Scholar

He, L., Yan, X. and Tang, S. Spiral-diving trajectory optimization for hypersonic vehicles by second-order cone programming. Aerosp. Sci. Technol., 2019, 95, p. 105427. doi: 10.1016/j.ast.2019.105427 CrossRef Google Scholar

Li, G., Zhang, H. and Tang, G. Maneuver characteristics analysis for hypersonic glide vehicles. Aerosp. Sci. Technol., 2015, 43, pp 321–328. doi: 10.1016/j.ast.2015.03.016 CrossRef Google Scholar

Rusnak, I and Peled-Eitan, L. Guidance law against spiraling target. J. Guid. Control Dyn., 2016, 39, (7), 1694–1696. doi: 10.2514/1.G001646 CrossRef Google Scholar

Yibo, D., Xiaokui, Y., Guangshan, C. and Jiashun, S. Review of control and guidance technology on hypersonic vehicle. Chin. J. Aeronaut., 2022, 35, (7), pp 1–18. doi: 10.1016/j.cja.2021.10.037 Google Scholar

Zhang, W. and Wang, B. A new guidance law for impact angle constraints with time-varying navigation gain. Aeronaut. J., 2022, 126, (1304), pp 1752–1770. doi: 10.1017/aer.2022.16 CrossRef Google Scholar

Dou, L. and Dou, J. The design of optimal guidance law with multi-constraints using block pulse functions. Aerosp. Sci. Technol., 2012, 23, (1), pp 201–205. doi: 10.1016/j.ast.2011.02.009 CrossRef Google Scholar

Qing, R.M.W. Reentry guidance for hypersonic vehicle based on predictor-corrector method. J. Beijing Univ. Aeronaut. Astronaut., 2013, 39 (1563–1567). doi: 10.13700/j.bh.1001-5965.2013.12.013 Google Scholar

Li, T. and Qian, H. Design of three-dimensional guidance law with impact angle constraints and input saturation. IEEE Access, 2020, 8, pp 211474–211481. doi: 10.1109/ACCESS.2020.3038830 CrossRef Google Scholar

Dhananjay, N. and Ghose, D. Accurate time-to-go estimation for proportional navigation guidance. J. Guid. Control Dyn., 2014, 37, (4), pp 1378–1383. doi: 10.2514/1.G000082 CrossRef Google Scholar

Mozaffari, M., Safarinejadian, B. and Binazadeh, T. Optimal guidance law based on virtual sliding target. J. Aerosp. Eng., 2017, 30, (3), p 04016097. doi: 10.1061/(ASCE)AS.1943-5525.0000692 CrossRef Google Scholar

Raju, P. and Ghose, D. Empirical virtual sliding target guidance law design: an aerodynamic approach. IEEE Trans. Aerosp. Electron. Syst., 2003, 39, (4), pp 1179–1190. doi: 10.1109/TAES.2003.1261120 CrossRef Google Scholar

Hu, Q., Han, T. and Xin, M. New impact time and angle guidance strategy via virtual target approach. J. Guid. Control Dyn., 2018, 41, (8), pp 1755–1765. doi: 10.2514/1.G003436 CrossRef Google Scholar

He, L. and Yan, X. Adaptive terminal guidance law for spiral-diving maneuver based on virtual sliding targets. J. Guid. Control Dyn., 2018, 41, (7), pp 1591–1601. doi: 10.2514/1.G003424 CrossRef Google Scholar

He, L., Yan, X. and Tang, S. Guidance law design for spiral-diving maneuver penetration. Acta Aeronaut. Astronaut. Sin., 2019, 40, (193–207). doi: 10.7527/S1000-6893.2019.22457 Google Scholar

Ning, X., Liu, J., Wang, Z. and Luo, C. Output constrained neural adaptive control for a class of kkvs with non-affine inputs and unmodeled dynamics. Aeronaut. J., 2024, 128, (1319), pp 134–151. doi: 10.1017/aer.2023.44 CrossRef Google Scholar

Yin, Z., Wang, B., Xiong, R., Xiang, Z., Liu, L., Fan, H. and Xue, C. Attitude tracking control of hypersonic vehicle based on an improved prescribed performance dynamic surface control. Aeronaut. J. 2023, pp 1–21. doi: 10.1017/aer.2023.79 Google Scholar

Mao, Q., Dou, L., Yang, Z., Tian, B. and Zong, Q. Fuzzy disturbance observer-based adaptive sliding mode control for reusable launch vehicles with aeroservoelastic characteristic. IEEE Trans. Ind. Inf., 2019, 16, (2), pp 1214–1223. doi: 10.1109/TII.2019.2924731 CrossRef Google Scholar

Shen, G., Xia, Y., Zhang, J. and Cui, B. Adaptive super-twisting sliding mode altitude trajectory tracking control for reentry vehicle. ISA Trans., 2023, 132, pp 329–337. doi: 10.1016/j.isatra.2022.06.023 CrossRef Google Scholar PubMed

Chai, R., Tsourdos, A., Gao, H., Chai, S. and Xia, Y. Attitude tracking control for reentry vehicles using centralised robust model predictive control. Automatica, 2022, 145, p 110561. doi: 10.1016/j.automatica.2022.110561 CrossRef Google Scholar

Cao, L. and Xiao, B. Exponential and resilient control for attitude tracking maneuvering of spacecraft with actuator uncertainties. IEEE/ASME Trans. Mechatron., 2019, 24, (6), pp 2531–2540. doi: 10.1109/TMECH.2019.2928703 CrossRef Google Scholar

Xiang, K., Yanli, D., Peng, Z. and Haibing, L. Adaptive backstepping control for hypersonic vehicles with actuator amplitude and rate saturation. Trans. Nanjing Univ. Aeronaut. Astronaut., 2019, 36, (2), pp 242–252. doi: 10.16356/j.1005-1120.2019.02.007 Google Scholar

Cheng, L., Wang, Z. and Gong, S. Adaptive control of hypersonic vehicles with unknown dynamics based on dual network architecture. Acta Astronaut., 2022, 193, pp 197–208. doi: 10.1016/j.actaastro.2021.12.043 CrossRef Google Scholar

Cheng, L., Wang, Z., Jiang, F. and Li, J. Fast generation of optimal asteroid landing trajectories using deep neural networks. IEEE Trans. Aerosp. Electron. Syst., 2019, 56, (4), pp 2642–2655. doi: 10.1109/TAES.2019.2952700 CrossRef Google Scholar

Zhang, X., Chen, K., Fu, W. and Huang, H. Neural network-based stochastic adaptive attitude control for generic hypersonic vehicles with full state constraints. Neurocomputing, 2019, 351, pp 228–239. doi: 10.1016/j.neucom.2019.04.014 CrossRef Google Scholar

Ouyang, Y., Dong, L., Wei, Y. and Sun, C. Neural network based tracking control for an elastic joint robot with input constraint via actor-critic design. Neurocomputing, 2020, 409, pp 286–295. doi: 10.1016/j.neucom.2020.05.067 CrossRef Google Scholar

Shi, L., Wang, X. and Cheng, Y. Safe reinforcement learning-based robust approximate optimal control for hypersonic flight vehicles. IEEE Trans. Veh. Technol., 2023. doi: 10.1109/TVT.2023.3264243 CrossRef Google Scholar

Wang, Z. and Liu, J. Reinforcement learning based-adaptive tracking control for a class of semi-markov non-lipschitz uncertain system with unmatched disturbances. Inf. Sci. 2023, 626, pp 407–427. doi: 10.1016/j.ins.2023.01.043 CrossRef Google Scholar

He, W. and Dong, Y. Adaptive fuzzy neural network control for a constrained robot using impedance learning. IEEE Trans. Neural Netwk. Learn. Syst., 2017, 29, (4), pp 1174–1186. doi: 10.1109/TNNLS.2017.2665581 CrossRef Google Scholar PubMed

Xing-Kai, H. Young type inequalities for matrices. J. East China Normal Univ. (Nat. Sci.), 2012, 2012, (4), p 12. doi: 10.3969/j.issn.1000-5641.2012.04.002 Google Scholar

Figure 1. Reinforcement learning adaptive spiral-diving manoeuver guidance framework.

Table 1. The parameters of two cases

Table 2. The miss distances under the three methods

Figure 4. The striking effects under three methods.

Article contents

Reinforcement learning-based adaptive spiral-diving Manoeuver guidance method for reentry vehicles subject to unknown disturbances

Abstract

Keywords

Nomenclature

1.0 Introduction

2.0 Problem formulation and preminimaries

2.1 Problem statement

2.2 Spiral trajectory parameters solving

2.3 Neural networks in reinforcement learning

3.0 Main results

3.1 Desired proportional navigation guidance law

3.2 Reinforcement learning adaptive controller

3.3 Stability and convergence analysis

4.0 Simulations

5.0 Conclusion

Acknowledgements

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests