Hostname: page-component-745bb68f8f-grxwn Total loading time: 0 Render date: 2025-01-11T15:22:25.313Z Has data issue: false hasContentIssue false

Online robust self-learning terminal sliding mode control for balancing control of reaction wheel bicycle robots

Published online by Cambridge University Press:  19 December 2024

Xianjin Zhu
Affiliation:
School of Mechatronics Engineering, Harbin Institute of Technology, Harbin, China
Wenfu Xu
Affiliation:
School of Mechatronics Engineering and Automation, Harbin Institute of Technology, Shenzhen, China
Zhang Chen
Affiliation:
Department of Automation, Tsinghua University, Beijing, China
Yang Deng
Affiliation:
Department of Automation, Tsinghua University, Beijing, China
Qingyuan Zheng
Affiliation:
Department of Automation, Tsinghua University, Beijing, China
Bin Liang
Affiliation:
Department of Automation, Tsinghua University, Beijing, China
Yu Liu*
Affiliation:
School of Mechatronics Engineering, Harbin Institute of Technology, Harbin, China
*
Corresponding author: Yu Liu; Email: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

This paper proposes an online robust self-learning terminal sliding mode control (RS-TSMC) with stability guarantee for balancing control of reaction wheel bicycle robots (RWBR) under model uncertainties and disturbances, which improves the balancing control performance of RWBR by optimising the constrained output of TSMC. The TSMC is designed for a second-order mathematical model of RWBR. Then robust adaptive dynamic programming based on an actor-critic algorithm is used to optimise the TSMC only by data sampled online. The system closed-loop stability and convergence of the neural network weights are guaranteed based on the Lyapunov analysis. The effectiveness of the proposed algorithm is demonstrated through simulations and experiments.

Type
Research Article
Copyright
© The Author(s), 2024. Published by Cambridge University Press

1. Introduction

In recent years, there has been growing interest in the research of agile and high-speed mobile robots designed for rugged or narrow terrain [Reference Rubio, Valero and Llopis-Albert1Reference Huang, Zhang, Ri, Xiong, Li and Kang4]. Among these, bicycle robots have emerged as a promising platform due to their ability to achieve high-speed locomotion and agile manoeuvres on varied terrains. Reaction wheel bicycle robots (RWBR) are a type of bicycle robot that relies on reaction wheels as auxiliary balancing mechanisms. Compared to other auxiliary balancing mechanisms, such as control moment gyroscopes [Reference Beznos, Formal’sky, Gurfinkel, Jicharev, Lensky, Savitsky and Tchesalin5, Reference Chen, Chu and Zhang6] and mass pendulums [Reference Keo and Yamakita7, Reference He, Deng, Wang, Sun, Sun and Chen8], reaction wheels offer advantages such as simple mechanism design and rapid response [Reference Kanjanawanishkul9, Reference Wang, Cui, Lai, Yang, Chen, Zheng, Zhang and Jiang10].

Previous studies have investigated the effects of strategies on the RWBR balancing control. The proportional-integral-derivative (PID) control was designed to stabilise the roll angle [Reference Kim, An, Yoo and Lee11]. Linear quadratic regulator (LQR) controller was used to achieve balancing control by approximating the linearisation around the equilibrium point [Reference Xiong, Huang, Gu, Pan, Liu, Li and Wang12]. The control of RWBR presents significant challenges, particularly in dealing with inherent uncertainties and disturbances. Traditional control methods often struggle to address these complexities effectively, leading to suboptimal performance and limited adaptability. To address these problems, various robust control strategies were proposed to balance the RWBR, such as robust LQR [Reference Owczarkowski, Horla and Zietkiewicz13] and disturbance observers [Reference Jeong and Chwa14]. Sliding mode control (SMC) has an excellent ability to deal with uncertainties [Reference Tuan and Ha15Reference Behera, Bandyopadhyay, Cucuzzella, Ferrara and Yu17], which has been developed for balancing control of RWBR [Reference Guo, Liao and Wei18Reference Chen, Yan, Wang, Shao, Kurniawan and Wang20]. However, the robustness of the sliding mode controller to uncertainties typically comes at the cost of conservative control performance. This trade-off between robustness and control performance remains an open problem.

Many researchers have been striving to combine SMC with other methods to tackle this challenge, such as fuzzy control [Reference Guo, Liao and Wei18], adaptive control [Reference Chen, Liu, Wang, Hu, Zheng, Ye and Zhang21] and reinforcement learning [Reference Zhu, Deng, Zheng, Zheng, Liang and Liu22Reference Huo, Yu, Liu and Sha24]. A fuzzy sliding mode controller was designed to deal with impulse disturbance and system uncertainty in [Reference Guo, Liao and Wei18], but the determination of fuzzy rules was rather complicated. In [Reference Chen, Liu, Wang, Hu, Zheng, Ye and Zhang21], an adaptive sliding mode controller was proposed, which dynamically adjusts the parameters of the sliding mode controller to optimise the performance of the control. This work only make monotonic adjustments in certain scenarios, which may lead to excessively high system gain and more severe chattering. Our previous work has confirmed that reinforcement learning can improve the control performance of the SMC online [Reference Zhu, Deng, Zheng, Zheng, Liang and Liu22, Reference Zhu, Deng, Zheng, Zheng, Chen, Liang and Liu23], while this combination cannot provide sufficient theoretical stability guarantee.

Adaptive dynamic programming (ADP) algorithm, a kind of reinforcement learning technique, has been used to address various optimal control problems [Reference Guo, Lin, Jiang, Song and Gan25Reference Liu, Xue, Zhao, Luo and Wei29]. It not only improves control performance while maintaining robustness but also provides theoretical stability guarantee. The linear controller with the offline ADP algorithm was proposed to balance a bicycle robot in [Reference Guo, Lin, Jiang, Song and Gan25]. The online ADP algorithm was studied to deal with the optimal control problem with known dynamics in [Reference Vamvoudakis and Lewis28]. Ref. [Reference Ma, Zhang, Xu, Yang and Wu26] proposed a method to adjust the sliding mode controller of ADP online to optimise the trajectory tracking of mobile robots. However, its online optimisation was based on the prediction of the states of the nominal model, which greatly limits the applicability under uncertainty. In order to directly utilise online data for ADP solutions, researchers have conducted a significant amount of work, which has led to the developments of two main methods. One involves using the model obtained from online data fitting for online prediction [Reference Bhasin, Kamalapurkar, Johnson, Vamvoudakis, Lewis and Dixon30]. The other directly uses online data to optimise the controller, including integral reinforcement learning [Reference Vamvoudakis, Vrabie and Lewis31] and robust adaptive dynamic programming (RADP) [Reference Zhu and Zhao27].

To address above problems, we introduce RADP to optimise the TSMC online for balancing control of RWBR. First, the nonlinear dynamics of RWBR with uncertainties and disturbances are established and the terminal sliding mode controller is set. Then, the problem of optimising the TSMC with stability constraints is formulated. An online actor-critic-based RADP algorithm is proposed to solve optimal control problems. The stability and convergence of the proposed control strategy are proven. The algorithm comparison in simulation demonstrates the advantages of the proposed control strategy. Prototype experiments also validate the control strategy. The main contributions of this paper are summarised as follows.

  • An online robust actor-critic-based RADP algorithm with robust self-learning terminal sliding mode control

    (RS-TSMC) is proposed to optimise the control performance while maintain the robustness of balancing controller for RWBR. The optimisation process is directly based on data collected online without the need for system dynamics.

  • The controller optimisation problem is transformed into solving the Hamilton–Jacobi–Bellman (HJB) equation, and the system output generated by ADP is constraint according to the range of TSMC parameters. Compared to [Reference Ma, Zhang, Xu, Yang and Wu26], this mechanism improves the conditions for solving the constrained HJB equation, providing a more flexible and adaptable strategy for designing control strategies.

  • Experimental studies conducted in simulation platform and on a prototype RWBR compared with several recently proposed control strategies show the effectiveness of the algorithm proposed in this paper.

The rest of the paper is organised as follows. The dynamics of the RWBR and the problem formulation are given in Section 1. The online self-learning sliding mode control strategy is proposed with the stability and convergence proof in Section 3. In Section 4, various simulation experiments are performed, and the experimental results for a RWBR prototype are presented. The conclusion is addressed in Section 5. The video of the simulation and the experiments of RWBR prototype are available at the following website: https://github.com/ZhuXianjinGitHub/RSTSMC. (accessed on 30 August 2024).

Throughout the paper, $\left \| \cdot \right \|$ denotes the Euclidean norm, $ \mathrm{diag}\left \{ \cdot \right \}$ represents a diagonal matrix, and $ \otimes$ denotes the Kronecker product.

2. Problem formulation

In this section, the dynamic model of RWBR with uncertainty and disturbance is derived. We also introduce the feedback transformation. In addition, a TSMC is designed. Furthermore, the online optimisation problem for this controller is presented.

2.1. Dynamics model of RWBR

Figure 1 presents the prototype of RWBR, while Figure 2 shows notations. It can be seen that the RWBR consists of five parts, including a rear wheel, body frame, reaction wheel, handlebar and a front wheel (simplified as $R$ , $B$ , $W$ , $H$ and $F$ , respectively) in Figure 2. The details of the notation are shown in Table I.

Figure 1. Side view of the RWBR prototype.

Figure 2. Notations of the RWBR.

Table I. Diagram of bicycle structure.

Following [Reference Zhu, Deng, Zheng, Zheng, Chen, Liang and Liu23], the roll dynamics of the RWBR is presented as follows:

(1) \begin{equation} \begin{aligned} J\ddot{\varphi }+I_2\ddot{\theta }-Mg\sin\! \left ( \varphi \right ) = d_1\\[3pt] I_2\ddot{\varphi }+I_2\ddot{\theta }=\tau +d_2 \end{aligned} \end{equation}

where $J=m_1l_{1}^{2}+m_2l_{2}^{2}+I_1+I_2$ , $M=m_1I_1+m_2I_2$ , $d_1$ and $d_2$ represent unmodelled dynamics and uncertainty.

To make full use of the known dynamics of the system, the dynamics parameters are divided into a nominal part and an uncertainty part.

(2) \begin{equation} \begin{aligned} \varDelta J=\left | J-J_N \right |\lt \overline{\varDelta J}\\[3pt] \varDelta I_2=\left | I_2-{I_2}_N \right |\lt \overline{\varDelta I_2}\\[3pt] \varDelta M=\left | M-M_N \right |\lt \overline{\varDelta M} \end{aligned} \end{equation}

where $J_N$ , ${I_2}_N$ and $M_N$ are the nominal parameter values, $\overline{\varDelta J}$ , $\overline{\varDelta I_2}$ and $\overline{\varDelta M}$ are the upper bounds of the uncertainties $\varDelta J$ , $\varDelta I_2$ and $\varDelta M$ .

Further, equation (1) can be re-written as

(3) \begin{align} \left ( J_N-I_{2N} \right ) \ddot{\varphi }-M_Ng\sin\!( \varphi) =-\tau +d_{1N}-d_{2N} \nonumber\\[-8pt] \end{align}

where $d_{1N}=d_1+\varDelta Mg\sin\!(\varphi) -\varDelta J\ddot{\varphi }-\varDelta I_2\ddot{\theta }$ and $d_{2N}=d_2-\varDelta I_2\ddot{\varphi }-\varDelta I_2\ddot{\theta }$ .

2.2. Design of TSMC controller

For the controller design, we first define $\varphi _d$ as the reference roll angle. The $\varphi _d$ , $\dot{\varphi }_d$ and $\ddot{\varphi }_d$ can be obtained as shown in our previous work [Reference Zhu, Deng, Zheng, Zheng, Chen, Liang and Liu23]. Based on the Olfati–Saber transformation mentioned in [Reference Spong, Corke and Lozano33], the following state variables and the feedback transformation are classified.

(4) \begin{equation} \begin{aligned} \dot{x}_1=x_2\\[3pt] \dot{x}_2=u+d^* \end{aligned} \end{equation}

where $x_1=\varphi -\varphi _d$ , $x_2=\dot{\varphi }-\dot{\varphi }_d$ , $u=\frac{I_{2N}}{\left ( J_N-I_{2N} \right )}M_Ng\sin \left ( x_1 \right ) -\ddot{\varphi }_d-\frac{I_{2N}}{\left ( J_N-I_{2N} \right )}\tau$ and $d^*=\frac{I_{2N}}{\left ( J_N-I_{2N} \right )}\left ( d_{1N}-d_{2N} \right )$ .

Assumption 1. Assuming $d_1$ and $d_2$ are bounded, it is can be get that $d_{1N}$ and $d_{2N}$ are bounded. Then, it is can be easily proved that $d^*$ is bounded. Consider that $\left | d^* \right |\lt L$ , and note that L is an unknown constant.

The sliding mode surface $s$ , the equivalent control $u_{eq}$ and the reaching control $u_r$ of TSMC are designed according to [Reference Yu, Yu and Zhihong32]. The fractional-order terminal attractor replaces the sign item in the classical sliding mode controller, which is beneficial to attenuate chattering.

(5) \begin{equation} \begin{aligned} s=x_2+\alpha _0x_1+\beta _0x_{1}^{q_0/p_0}\\[3pt] u_{eq}=-\left ( \alpha _0\dot{x}_1+\beta _0\frac{d}{dt}x_{1}^{q_0/p_0} \right ) \\[3pt] u_r=-\left ( \alpha _1s+\beta _1s^{q_1/p_1} \right ) \\[3pt] u_{tsmc}=u_{eq}+u_r \end{aligned} \end{equation}

where $\alpha _i\gt 0$ , $\beta _i\gt 0$ , $q_i$ and $p_i$ $\left ( q_i\lt p_i \right )$ $\left ( i=0,1 \right )$ are positive odd integers.

By selecting appropriate gains, the system will converge to the sufficiently small neighbourhood of the system equilibrium in finite time. According to [Reference Yu, Yu and Zhihong32], $\beta _1=\frac{L}{\left | s^{q_1/p_1} \right |}+\gamma$ and $\gamma \gt 0$ , the sliding mode variable will reach the neighbourhood $\left | s \right |\lt \left ( \frac{L}{\beta _1} \right ) ^{p_1/q_1}$ of the equilibrium in finite time $t_s$ .

(6) \begin{equation} t_s=\frac{p_1}{\alpha _1\left ( p_1-q_1 \right )}ln\frac{\alpha _1s\left ( 0 \right ) ^{\left ( p_1-q_1 \right ) /p_1}+\gamma }{\gamma } \end{equation}

Then, define $\xi _s=\left | \left ( \frac{L}{\beta _1} \right ) ^{p_1/q_1} \right |\lt L\prime$ ,

(7) \begin{equation} \dot{x}_1=-\alpha _0x_1-\beta _0x_{1}^{q_0/p_0}+L\prime \end{equation}

the system state $x_1$ will converge to the sufficiently small neighbourhood $\left | x_1 \right |\lt \left ( \frac{L\prime }{\beta _0} \right ) ^{p_0/q_0}$ of the system equilibrium in finite time $t_{x_1}$ the system equilibrium in finite time with $\beta _0=\frac{L\prime }{\left | x_{1}^{q_0/p_0} \right |}+\gamma \prime$ , $\gamma \prime \gt 0$ .

(8) \begin{equation} t_{x_1}=\frac{p_0}{\alpha _0\left ( p_0-q_0 \right )}ln\frac{\alpha _0x_1\left ( 0 \right ) ^{\left ( p_0-q_0 \right ) /p_0}+\gamma \prime }{\gamma \prime } \end{equation}

Remark 1. The parameters $\alpha _1$ and $\beta _1$ influence the reaching process of sliding mode variables. The larger parameters can reduce the time required for convergence and improve the robustness of the controller to uncertainties, while the burden of the actuator is increased and the performance of the controller is more conservative. In this paper, the RADP is introduced to online tune parameters $\alpha _1$ and $\beta _1$ of the TSMC controller (5) with constraints $\kappa =\left [ \varDelta \alpha _1,\varDelta \beta _1 \right ] ^T$ . The main motivation is to improve the control performance while maintain stability and robustness.

Assumption 2. Assuming $\kappa \in \mathcal{K} =\left \{ \kappa _{i\min }\leqslant \kappa _{i}\leqslant \kappa _{i\max } \right \}$ , $\left ( i=1,2 \right )$ . $\mathcal{K}$ is set to guarantee the finite-time convergence. $\mathcal{K}$ and $L$ generally can be obtained through experiments. And the the stability proof is given in [26].

3. Online robust self-learning TSMC

In this section, an online robust self-learning TSMC for RWBR is proposed to improve the control performance and retain the robustness. First, the optimal control problems with stability constraints are formulated. Then, an online actor-critic-based RADP algorithm is designed to approximate the HJB solutions.

Define $u_{adp}$ as the self-learning part of the control, the output of the controller as follows:

(9) \begin{equation} \begin{aligned} u=u_{tsmc}+u_{adp}\\[3pt] \left [ \kappa _{1\min },\kappa _{2\min } \right ] \zeta \lt \left | u_{adp} \right |\lt \left [ \kappa _{1\max },\kappa _{2\max } \right ] \zeta \\[3pt] \end{aligned} \end{equation}

where $\zeta =\left [ s,s^{q_1/p_1} \right ] ^T$ .

Taking (9) into (4), the system can be written as

(10) \begin{align} \dot{X}=AX+Bu+D \nonumber\\[-12pt] \end{align}

where $X=\left [ \begin{array}{c}x_1\\[3pt] x_2\\[3pt] \end{array} \right ]$ , $A=\left [ \begin{matrix}0& 1\\[3pt] 0& 0\\[3pt] \end{matrix} \right ]$ , $B=\left [ \begin{array}{c}0\\[3pt] 1\\[3pt] \end{array} \right ]$ and $D=\left [ \begin{array}{c}0\\[3pt] d^*\\[3pt] \end{array} \right ]$ .

The optimal problem is considered to be solved by minimising the value function $V_c$ to obtain the optimal policy function $u$ . $V_c$ is defined as

(11) \begin{equation} V_c=\int _0^{\infty }{\left ( X^TQX+r\left ( u_{tsmc}+u_{adp} \right ) ^2 \right ) dt},X\left ( 0 \right ) =X_0 \end{equation}

where $Q$ is symmetric positive definite matrices and $r$ is a positive constant. Taking the derivative of (11) along the trajectory of (10), the following Hamiltonian function can be obtained

(12) \begin{equation} H=V_{cX}^{T}\dot{X}+X^TQX+r\left ( u_{tsmc}+u_{adp} \right ) ^2 \end{equation}

where $V_{cX}=\frac{\partial V_c}{\partial X}$ . Define $V_{c}^{*}=\underset{U\prime }{\min }\left ( V_c \right )$ to denote the optimal value function, which satisfies

(13) \begin{equation} 0=H^*=\underset{u_{adp}}{\min }\left \{ H \right \} =V_{cX}^{*T}\dot{X}+X^TQX+r\left ( u_{tsmc}+u_{adp} \right ) ^2 \end{equation}

where $V_{cX}^{*}=\frac{\partial V_{c}^{*}}{\partial X}$ . Assuming the minimum of (13) exists and is unique, then we can obtain the optimal control policy $u_{adp}^{*}=\underset{u_{adp}}{arg\min }\left \{ H \right \}$ by $\frac{\partial H}{\partial u_{adp}}=0$ , which is described as

(14) \begin{equation} u_{adp}^{*}=-\frac{1}{2r}V_{cX}^{*T}B-u_{tsmc} \end{equation}

Taking (14) into (13),

(15) \begin{equation} 0=V_{cX}^{*T}\dot{X}+X^TQX+r\left ( -\frac{1}{2r}V_{cX}^{*T}B-u_{tsmc} \right ) ^2 \end{equation}

Traditionally, (15) is difficult to get the solution directly. The policy iteration algorithm [Reference Sutton RS34] is adopted to iteratively solve in traditional ADP by the following two steps:

a) given $u^{\left ( i \right )}$ , solve for the $V_{c}^{\left ( i \right )}$ using

(16) \begin{equation} \begin{aligned} 0=V_{cX}^{\left ( i \right ) T}\dot{X}+X^TQX+r\left ( u_{tsmc}+{u_{adp}}^{\left ( i \right )} \right ) ^2\\[3pt] V_{c}^{\left ( i \right )}\left ( 0 \right ) =0 \end{aligned} \end{equation}

b) update the control policy using

(17) \begin{equation} u_{adp}^{\left ( i+1 \right )}=-\frac{1}{2r}V_{cX}^{\left ( i \right ) T}B-u_{tsmc} \end{equation}

where $i=1,2,\cdots$ denotes the iterations. When $i\rightarrow \infty$ , then $V_c\rightarrow V_{c}^{*}$ , $u_{adp}\rightarrow u_{adp}^{*}$ .

It can be seen that the system dynamic is needed in (16) to get $\dot{X}$ . When there is a certain deviation between the nominal model of the system and the actual scene, the optimisation effect based on the nominal model of the system may be affected. In this paper, RADP [Reference Zhu and Zhao27] is used to solve the optimal control problem only by data sampled online.

Consider an arbitrary control input $u=u_{tsmc}+u_s$ and differentiate the value function $V_{c}^{\left ( i \right )}$ .

(18) \begin{equation} \begin{aligned} \dot{V}_{c}^{\left ( i \right )}=V_{cX}^{\left ( i \right ) T}\left ( AX+B\left ( u_{tsmc}+{u_{adp}}^{\left ( i \right )} \right ) +B\left ( u_s-{u_{adp}}^{\left ( i \right )} \right ) \right ) \\[3pt] =-2r\left ( u_{tsmc}+u_{adp}^{\left ( i+1 \right )} \right ) \left ( u_s-{u_{adp}}^{\left ( i \right )} \right ) -X^TQX-r\left ( u_{tsmc}+{u_{adp}}^{\left ( i \right )} \right ) ^2 \end{aligned} \end{equation}

Integral (18) over an arbitrary interval as follows,

(19) \begin{equation} \begin{aligned} V_{c}^{\left ( i \right )}\left ( X_t \right ) -V_{c}^{\left ( i \right )}\left ( X_{t-T} \right ) = \\[3pt] -\int _{t-T}^t{\left ( 2r\left ( u_{tsmc}+u_{adp}^{\left ( i+1 \right )} \right ) \left ( u_s-{u_{adp}}^{\left ( i \right )} \right ) +X^TQX+r\left ( u_{tsmc}+{u_{adp}}^{\left ( i \right )} \right ) ^2 \right ) d\tau } \end{aligned} \end{equation}

The closed-loop stability of the system is ensured by (9). $V_{c}^{\left ( i \right )}$ and the improved policy $u_{adp}^{\left ( i+1 \right )}$ can be obtained in one calculation, and it does not need knowledge of the system dynamics.

The value function and the policy function are defined as neural network (NN),

(20) \begin{align} \hat{V}_{c}^{*}\left ( X \right ) =\hat{W}_{c}^{T}\phi \left ( X \right ) \end{align}
(21) \begin{equation} \hat{u}_{adp}^{*}\left ( X \right ) =\hat{W}_{a}^{T}\varphi \left ( X \right ) \end{equation}

After inserting into (19),

(22) \begin{equation} \begin{aligned} \epsilon \left ( t \right ) =\hat{W}_{c}^{T}\left ( \phi \left ( x_t \right ) -\phi \left ( x_{t-T} \right ) \right ) + \\[3pt] \int _{t-T}^t{\left ( 2r\left ( u_{tsmc}+\hat{W}_{a}^{T}\varphi \left ( X \right ) \right ) \left ( u_s-\hat{W}_{a}^{T}\varphi \left ( X \right ) \right ) +X^TQX+r\left ( u_{tsmc}+\hat{W}_{a}^{T}\varphi \left ( X \right ) \right ) ^2 \right ) d\tau } \end{aligned} \end{equation}

Under the gradient descent method, the updating laws for the weights of the critic NN and the actor NN as follows,

(23) \begin{equation} \dot{\hat{W}}_c=-\lambda _1\frac{\phi \left ( x_t \right ) -\phi \left ( x_{t-T} \right )}{m_{s}^{2}\left ( t \right )}\epsilon \left ( t \right ) \end{equation}
(24) \begin{equation} \dot{\hat{W}}_a=-\lambda _2\frac{\eta \left ( t \right )}{m_{s}^{2}\left ( t \right )}\epsilon \left ( t \right ) \end{equation}

where $\eta \left ( t \right ) =2\int _{t-T}^t{\left ( \left ( ru_s \right ) \otimes \varphi \left ( x \right ) \right ) d\tau }-\int _{t-T}^t{\left ( \varphi \left ( x \right ) \otimes r \right ) \otimes \varphi \left ( x \right ) d\tau }\mathbf{vec}\left ( \hat{W}_{a}^{T} \right )$ , $m_s = \left(\phi(x_t) -\phi(x_{t-T})\right)^T \left ( \phi \left ( x_t \right ) -\phi \left ( x_{t-T} \right ) \right ) +\eta ^T\eta +1$ and $m_s$ is used for normalization.

Remark 2. The differences between RS-TSMC proposed in this paper and self-TSMC (S-TSMC) in [26] and R-TSMC (rosbust-TSMC) in [27] are listed as follows. First, the optimisation process in S-TSMC is based on the state prediction of the nominal model, which is not conducive to the online application of the algorithm. To address this problem, this paper employs an iterative form of RADP to optimise TSMC using online data. Second, the optimisation in S-TSMC is performed directly for the state variable $s$ , which is not exactly equivalent to the optimisation for the state $X$ . The optimisation objective in R-TSMC considers only the part of the $u_{adp}$ and not the overall output of the controller. However, the optimisation is directly based on the system state and controller output in RS-TSMC. Third, the optimisation solution in S-TSMC is performed provided that the constraints of the HJB equations have a solution, whereas R-TSMC does not consider the constraints, but RS-TSMC first solves the unconstrained problem of the HJB and subsequently constrain the controller outputs.

The proposed control strategy schemes are illustrated in Algorithm 1 and Figure 3. The stability and the convergence of the proposed control strategy are given in the Appendix.

Algorithm 1. Online robust self-learning TSMC for RWBR

Figure 3. Schematic of control system.

4. Simulations and experiments

4.1. Simulations

In order to demonstrate the effectiveness of the RS-TSMC controllers proposed in this paper, two cases built in a simulation platform shown in Figure 4 as one of our previous works [Reference Zhu, Deng, Zheng, Zheng, Chen, Liang and Liu23]. And two recently developed methods: S-TSMC [Reference Ma, Zhang, Xu, Yang and Wu26] and R-TSMC [Reference Zhu and Zhao27] are used for comparison. The other simulation factors are the same except for the distinction mentioned in Remark 2. The RWBR is placed on a curved pavement with white noise. The nominal parameters of RWBR are $J_N=0.0368$ , ${I_2}_N=0.0035$ and $M_N=0.2544$ . The true parameters of RWBR used for simulation are $J=0.033$ , $I_2=0.0040$ and $M=0.2742$ . The control period of the controllers is 0.01s. The other parameters are given as follows:

Figure 4. Simulation environment of RWBR in Matlab Simscape.

(25) \begin{equation} \begin{aligned} Q=\mathrm{diag}\left \{ 1,1 \right \}, r=1\\[3pt] \alpha _0=3, \alpha _1=2, \beta _0=1, \beta _1=1, q_0=q_1=17\ and\ p_0=p_1=19\\[3pt] \kappa _{1\max }=\kappa _{2\max }=0.8, \kappa _{1\min }=\kappa _{2\min }=-0.8\\[3pt] \lambda _1=0.2, \lambda _2=0.1 \end{aligned} \end{equation}

The activation functions of the critic NN and the actor NN are considered as

(26) \begin{equation} \begin{aligned} \phi \left ( X \right ) =\left [ \begin{array}{l}x_{1}^{2}, x_{2}^{2}, x_1x_2, x_{1}^{4}, x_{2}^{4}, x_{1}^{2}x_{2}^{2}\\[3pt] \end{array} \right ] ^T\\[3pt] \varphi \left ( X \right ) =\left [ \begin{array}{l}x_1, x_2, x_{1}^{2}, x_{2}^{2}, x_1x_2, x_{1}^{4}, x_{2}^{4}, x_{1}^{2}x_{2}^{2}\\[3pt] \end{array} \right ] ^T \end{aligned} \end{equation}

An overturning moment $d_2$ is added to the system of RWBR. In case 1,

(27) \begin{equation} d_2=0.02\sum _j{\sin \left ( jt \right )} \end{equation}

where $j=\left [ \begin{array}{l}1, 3, 7, 11, 13, 15\\[3pt] \end{array} \right ]$ . In case 2,

(28) \begin{equation} d_2=0.02\sum _j{\sin \left ( jt \right )}+\begin{cases} 0.2,t\in \left [ 10,12 \right ) \cup t\in \left [ 30,32 \right )\\[3pt] -0.2,t\in \left [ 20,22 \right ) \cup t\in \left [ 40,42 \right )\\[3pt] 0,else\\[3pt] \end{cases} \end{equation}

To clearly demonstrate the superiority of the proposed method, $V_c$ defined in (11) are used to quantitatively estimate the performance, which are shown in Table II. As seen in this table, RS-TSMC reduced the criteria by $39.79\%$ in case 1 and by $15.91\%$ in case 2 to TSMC. It is less than the other two recently developed methods (R-TSMC, S-TSMC), which implies that the proposed method can achieve better control performance with less control effort. Then, details of the simulations of the two cases are discussed.

Table II. Assessment of control performance under different cases.

The simulation results of Case 1 are demonstrated in Figure 5 and Figure 6. Figure 5 gives the norms $\left \| \hat{W}_c \right \|$ and $\left \| \hat{W}_a \right \|$ with respect to the time under RS-TSMC. As shown in Figure 5, $\left \| \hat{W}_c \right \|$ converges after 12 s, and $\left \| \hat{W}_a \right \|$ converges after 20 s. Figure 6 gives the states, the control output, and $V_c$ of four methods. As can be seen, the proposed method has the smallest value of $V_c$ among the four controllers. In sum, it can be concluded that the control performance of the proposed method (RS-TSMC) outperforms the other three methods, which illustrates the superiority of the proposed method.

Figure 5. $\| \hat{W}_c\|$ and $\| \hat{W}_a \|$ with respect to the time under RS-TSMC in case 1.

Figure 6. The states, output and $V_c$ with respect to the time under four different algorithms in case 1.

Figure 7. $\| \hat{W}_c \|$ and $\| \hat{W}_a \|$ with respect to the time under RS-TSMC in case 2.

Figure 8. The states, output and $V_c$ with respect to the time under four different algorithms in case 2.

Figure 7 gives the norm $\left \| \hat{W}_c \right \|$ and $\left \| \hat{W}_a \right \|$ with respect to the time under RS-TSMC. The pulse perturbation has a significant effect on $\left \| \hat{W}_c \right \|$ at 10 s. The $\left \| \hat{W}_a \right \|$ shows regular changes with the pulse disturbance, indicating the regulation effect of the online learning algorithm on the controller output. Figure 8 illustrates the simulation results in Case 2. Similarly, we can conclude that the better control performance is reached and the less control effort is needed with the proposed method in this case.

4.2. Experiments

The RWBR prototype is used to verify the effectiveness of the proposed controller in this subsection. We presented the experiment results of the proposed RS-TSMC controller. We also performed TSMC, R-TSMC and S-TSMC for performance comparisons, which can be found in Figure 9. In the experimental studies, the TSMC algorithm works on ESP32 control board at 50 Hz and the optimising algorithm works on a PC at 25 Hz. Wireless data transmission between ESP32 and PC is achieved via UDP communication protocol. We consider the swing of the handlebars to generate disturbances for the control of the roll angel. The other settings are the same as in the simulations.

Figure 9 demonstrates the experimental results. Within the first 10 s, it can be seen that the $V_c$ of the three optimisation algorithms is slightly higher than that in TSMC, which can also be seen from the curves of $x_1$ , $x_2$ and $\tau$ . The reasons may be as follows: 1) The experimental factors such as initial roll Angle and initial roll angular velocity of RWBR are not completely consistent in different experiments. 2) The processing power of RWBR and PC is limited. With the iterative optimisation of the controller, it is only after 15 s that the three optimisation algorithms gradually outperform TSMC. The main reason lies in the fact that the control period in the RWBR prototype is much lower than that in the simulation experiment. In addition, it is not difficult to find that RS-TSMC almost outperforms the other two optimisation algorithms throughout the experiment. The proposed controller (RS-TSMC) reduced the criteria by 21.79 $\%$ , while R-TSMC and S-TSMC reduced by about 10 $\%$ to TSMC. The experimental results also validate the effectiveness and feasibility of the proposed control strategy.

Figure 9. The states, output and $V_c$ with respect to the time under four different algorithms of RWBR prototype.

5. Conclusions

This paper proposes an online RS-TSMC with stability guarantee for balancing control of RWBR under uncertainties, which improves the balancing control performance of RWBR by optimising the constrained output of TSMC. The robust adaptive dynamic programming (RADP) is used to optimise the TSMC only based on data sampled online without system dynamic. The constraint on the parameters of the sliding mode controller is utilised to derive the constraint on the control output at each time step to maintain the stability of the closed-loop system. Experimental studies conduct a simulate platform and on a prototype RWBR compared with several recently proposed control strategies show the effectiveness of the algorithm proposed in this paper.

Author contributions

Conceptualization and methodology, X.Z. (Xianjin Zhu); software, X.Z. (Xianjin Zhu); validation, W.X., Q.Z., Y.D.; writing – original draft preparation, X.Z. (Xianjin Zhu); writing – review and editing, W.X. and Z.C.; visualisation, Q.Z.; supervision, Y.L.; project administration, Y.L. and B.L.; funding acquisition, Z.C. and Y.D. All authors have read and agreed to the published version of the manuscript.

Financial support

This research was funded by the National Natural Science Foundation of China (62203252, 52205008).

Competing interests

The authors declare no conflicts of interest exist.

Ethical approval

Not applicable.

Appendix

Define the errors $\tilde{W}_c=W_c-\hat{W}_c$ and $\tilde{W}_a=W_a-\hat{W}_a$ , $\tilde{W}_c$ , where $W_c$ and $W_a$ represent the ideal coefficients of $V_{c}^{*}$ and $u_{adp}^{*}$ , $\varepsilon _c$ and $\varepsilon _a$ are the approximation errors.

(29) \begin{equation} \begin{aligned} V_{c}^{*} ( X) =W_{c}^{T}\phi(X) +\varepsilon _c\\[3pt] u_{adp}^{*} ( X) =W_{a}^{T}\varphi ( X) +\varepsilon _a \end{aligned} \end{equation}

According to (19),

(30) \begin{equation} V_{c}^{*}( X_t) -V_{c}^{*}\!\left ( X_{t-T} \right ) = \\[3pt] -\int _{t-T}^t{\left ( 2r\left ( u_{tsmc}+u_{adp}^{*} \right ) \left ( u_s-{u_{adp}}^* \right ) +X^TQX+r\left ( u_{tsmc}+{u_{adp}}^* \right ) ^2 \right ) d\tau } \end{equation}

Inserting (29) to (30):

(31) \begin{equation} \begin{aligned} \left ( W_{c}^{T}\phi \left ( X_t \right ) +\varepsilon _c\left ( t \right ) \right ) -\left ( W_{c}^{T}\phi \left ( X_{t-T} \right ) +\varepsilon _c\left ( t-T \right ) \right ) = \\[3pt] -\int _{t-T}^t{\left ( 2r\left ( u_{tsmc}+W_{a}^{T}\varphi \left ( X \right ) +\varepsilon _a \right ) \left ( u_s-W_{a}^{T}\varphi \left ( X \right ) -\varepsilon _a \right ) +X^TQX+r\left ( u_{tsmc}+W_{a}^{T}\varphi \left ( X \right ) +\varepsilon _a \right ) ^2 \right ) d\tau } \end{aligned} \end{equation}

Then substitude $\tilde{W}_c=W_c-\hat{W}_c$ and $\tilde{W}_a=W_a-\hat{W}_a$ to (22),

(32) \begin{equation} \begin{aligned} \epsilon \left ( t \right ) =\left ( W_c-\tilde{W}_c \right ) \left ( \phi \left ( x_t \right ) -\phi \left ( x_{t-T} \right ) \right ) + \\[3pt] \int _{t-T}^t \!{\left ( 2r\left ( u_{tsmc}+\left ( W_a-\tilde{W}_a \right ) \varphi\! \left ( X \right ) \right ) \left ( u_s-\!\left(W_a-\tilde{W}_a \right ) \varphi\! \left ( X \right ) \right ) +X^TQX+r\left ( u_{tsmc}+\left ( W_a-\tilde{W}_a \right ) \varphi\! \left ( X \right ) \right ) ^2 \right ) d\tau } \end{aligned} \end{equation}

Substract (32) from (31),

(33) \begin{equation} \epsilon \left ( t \right ) =-\left ( \tilde{W}_c\left ( \phi \left ( x_t \right ) -\phi \left ( x_{t-T} \right ) \right ) +\tilde{W}_a\eta \left ( t \right ) -\int _{t-T}^t{rW_a\varphi \left ( X \right ) \tilde{W}_a\varphi \left ( X \right ) d\tau }-\varepsilon _{HJB} \right ) \end{equation}

where $\varepsilon _{HJB}=-\left [ \varepsilon _c\left ( t \right ) -\varepsilon _c\left ( t-T \right ) \right ] -\int _{t-T}^t{\left ( 2r\varepsilon _a\left ( u_s-W_{a}^{T}\varphi \left ( X \right ) \right ) -r\varepsilon _{a}^{2} \right ) d\tau }$ .

Define the Lyapunov candidata $L_y=\frac{1}{2\lambda _1}\tilde{W}_{c}^{T}\tilde{W}_c+\frac{1}{2\lambda _2}\tilde{W}_{a}^{T}\tilde{W}_a$ , its time derivative has,

(34) \begin{equation} \begin{aligned} \dot{L_y}=\frac{1}{\lambda _1}\tilde{W}_{c}^{T}\dot{\tilde{W}}_c+\frac{1}{\lambda _2}\tilde{W}_{a}^{T}\dot{\tilde{W}}_a \\[3pt] =\frac{\epsilon \left ( t \right )}{m_{s}^{2}\left ( t \right )}\left ( \tilde{W}_c\left ( \phi \left ( x_t \right ) -\phi \left ( x_{t-T} \right ) \right ) +\tilde{W}_a\eta \left ( t \right ) \right ) \\[3pt] \leqslant -\left \| \frac{\rho \left ( t \right )}{m_s\left ( t \right )}\tilde{W} \right \| \left [ \left \| \frac{\rho \left ( t \right )}{m_s\left ( t \right )}\tilde{W} \right \| -\left \| \frac{\varepsilon _H}{m_s\left ( t \right )} \right \| \right ] \end{aligned} \end{equation}

where $\rho \left ( t \right ) =\left [ \phi ^T\left ( x_t \right ) -\phi ^T\left ( x_{t-T} \right ), \eta ^T\left ( t \right ) \right ] ^T$ and $\tilde{W}=\left [ \tilde{W}_{c}^{T},\tilde{W}_{a}^{T} \right ] ^T$ .

Therefore $\dot{L_y}\leqslant 0$ , if $\left \| \frac{\rho \left ( t \right )}{m_s\left ( t \right )}\tilde{W} \right \| \gt \left \| \frac{\varepsilon _H}{m_s\left ( t \right )} \right \|$ , since $\left \| m_s\left ( t \right ) \right \| \gt 1$ . This provides an effective practical bound for $\left \| \rho \left ( t \right ) \tilde{W} \right \|$ , since $L$ decreases. According to the lemma 2 in [Reference Vamvoudakis and Lewis28], $\tilde{W}_c$ and $\tilde{W}_a$ are ultimately uniformly bounded.

References

Rubio, F., Valero, F. and Llopis-Albert, C., “A review of mobile robots: Concepts, methods, theoretical framework, and applications,” Int J Adv Robot Syst. 16(2), 122 (2019).CrossRefGoogle Scholar
Fadini, G., Kumar, S., Kumar, R., Flayols, T., Del Prete, A., Carpentier, J. and Souères, P., “Co-designing versatile quadruped robots for dynamic and energy-efficient motions,” Robotica 42(6), 20042025 (2024).CrossRefGoogle Scholar
Huang, Y., Liao, Q., Guo, L. and Wei, S., “Simple realization of balanced motions under different speeds for a mechanical regulator-free bicycle robot,” Robotica 33(9), 19581972 (2015).CrossRefGoogle Scholar
Huang, J., Zhang, M., Ri, S., Xiong, C., Li, Z. and Kang, Y., “High-order disturbance-observer-based sliding mode control for mobile wheeled inverted pendulum systems,” IEEE T Ind Electron. 67(3), 20302041 (2020).CrossRefGoogle Scholar
Beznos, A., Formal’sky, A., Gurfinkel, E., Jicharev, D., Lensky, A., Savitsky, K. and Tchesalin, L., “Control of autonomous motion of two-wheel bicycle with gyroscopic stabilisation,” In: IEEE International Conference on Robotics and Automation, Leuven, Belgium, (1998) pp. 26702675.Google Scholar
Chen, C.-K., Chu, T.-D. and Zhang, X.-D., “Modeling and control of an active stabilizing assistant system for a bicycle,” Sensors 19(2), 248 (2019).CrossRefGoogle ScholarPubMed
Keo, L. and Yamakita, M., “Controller design of an autonomous bicycle with both steering and balancer controls,” In: IEEE International Conference on Control Applications/International Symposium on Intelligent Control, St Petersburg, Russia (2009) pp. 12941299.Google Scholar
He, K., Deng, Y., Wang, G., Sun, X., Sun, Y. and Chen, Z., “Learning-based trajectory tracking and balance control for bicycle robots with a pendulum: A gaussian process approach,” IEEE-ASME T Mech. 27(2), 634644 (2022).CrossRefGoogle Scholar
Kanjanawanishkul, K., “LQR and MPC controller design and comparison for a stationary self-balancing bicycle robot with a reaction wheel,” Kybernetika 51(1), 173191 (2015).Google Scholar
Wang, S., Cui, L., Lai, J., Yang, S., Chen, X., Zheng, Y., Zhang, Z. and Jiang, Z.-P., “Gain scheduled controller design for balancing an autonomous bicycle,” In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Electr Network (2020-2021) pp. 75957600.CrossRefGoogle Scholar
Kim, H.-W., An, J.-W., Yoo, H.D and Lee, J.-M., “Balancing control of bicycle robot using pid control,” In: 13th International Conference on Control, Automation and Systems (ICCAS), Gwangju, South Korea (2013) pp. 145147.Google Scholar
Xiong, C., Huang, Z., Gu, W., Pan, Q., Liu, Y., Li, X. and Wang, E. X., “Static balancing of robotic bicycle through nonlinear modeling and control,” In: 3rd International Conference on Robotics and Automation Engineering (ICRAE), Guangzhou, China (2018) pp. 2428.Google Scholar
Owczarkowski, A., Horla, D. and Zietkiewicz, J., “Introduction of feedback linearization to robust lqr and lqi control - analysis of results from an unmanned bicycle robot with reaction wheel,” Asian J Control 21(2), 10281040 (2019).CrossRefGoogle Scholar
Jeong, S. and Chwa, D., “Sliding-mode-disturbance-observer-based robust tracking control for omnidirectional mobile robots with kinematic and dynamic uncertainties,” IEEE-ASME T Mech 26(2), 741752 (2021).CrossRefGoogle Scholar
Tuan, L. A. and Ha, Q. P., “Adaptive fractional-order integral fast terminal sliding mode and fault-tolerant control of dual-arm robots,” Robotica 42(5), 14761499 (2024).CrossRefGoogle Scholar
Song, J., Ho, D. W. C. and Niu, Y., “Model-based event-triggered sliding-mode control for multi-input systems: Performance analysis and optimisation,” IEEE T Cybernetics 52(5), 39023913 (2022).CrossRefGoogle Scholar
Behera, A., Bandyopadhyay, B., Cucuzzella, M., Ferrara, A. and Yu, X., “A survey on event-triggered sliding mode control,” IEEE Journal of Emerging and Selected Topics in Industrial Electronics 2(3), 206217 (2021).CrossRefGoogle Scholar
Guo, L., Liao, Q. and Wei, S., “Design of fuzzy sliding-mode controller for bicycle robot nonlinear system,” In: IEEE International Conference on Robotics and Biomimetics (ROBIO 2006), Kunming, China (2006) pp. 176180.Google Scholar
Alizadeh, M., Ramezani, A. and Saadatinezhad, H., “Fault tolerant control in an unmanned bicycle robot via sliding mode theory,” IET Cyber-syst Robot. 4(2), 139152 (2022).CrossRefGoogle Scholar
Chen, L., Yan, B., Wang, H., Shao, K., Kurniawan, E. and Wang, G., “Extreme-learning-machine-based robust integral terminal sliding mode control of bicycle robot,” Control Eng Pract. 121, 105064 (2022).CrossRefGoogle Scholar
Chen, L., Liu, J., Wang, H., Hu, Y., Zheng, X., Ye, M. and Zhang, J., “Robust control of reaction wheel bicycle robot via adaptive integral terminal sliding mode,” Nonlinear Dynam. 104(3), 22912302 (2021).CrossRefGoogle Scholar
Zhu, X., Deng, Y., Zheng, X., Zheng, Q., Liang, B. and Liu, Y., “Online reinforcement-learning-based adaptive terminal sliding mode control for disturbed bicycle robots on a curved pavement,” Electronics 11(21), 3495 (2022).CrossRefGoogle Scholar
Zhu, X., Deng, Y., Zheng, X., Zheng, Q., Chen, Z., Liang, B. and Liu, Y., “Online series-parallel reinforcement-learning- based balancing control for reaction wheel bicycle robots on a curved pavement,” IEEE Access 11, 6675666766 (2023).CrossRefGoogle Scholar
Huo, B., Yu, L., Liu, Y. and Sha, S., “Reinforcement learning based path tracking control method for unmanned bicycle on complex terrain,” In: IECON. 2023- 49th Annual Conference of the IEEE Industrial Electronics Society, Singapore, Singapore (2023) pp. 16.Google Scholar
Guo, L., Lin, H., Jiang, J., Song, Y. and Gan, D., “Combined control algorithm based on synchronous reinforcement learning for a self-balancing bicycle robot,” ISA T. 145, 479492 (2024).CrossRefGoogle ScholarPubMed
Ma, Q., Zhang, X., Xu, X., Yang, Y. and Wu, E. Q., “Self-learning sliding mode control based on adaptive dynamic programming for nonholonomic mobile robots,” ISA T. 142, 136147 (2023).CrossRefGoogle ScholarPubMed
Zhu, Y. and Zhao, D., “Comprehensive comparison of online adp algorithms for continuous-time optimal control,” Artif Intell Rev. 49(4), 531547 (2018).CrossRefGoogle Scholar
Vamvoudakis, K. G. and Lewis, F. L., “Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem,” Automatica 46(5), 878888 (2010).CrossRefGoogle Scholar
Liu, D., Xue, S., Zhao, B., Luo, B. and Wei, Q., “Adaptive dynamic programming for control: A survey and recent advances,” IEEE T Syst Man Cy-S. 51(1), 142160 (2021).CrossRefGoogle Scholar
Bhasin, S., Kamalapurkar, R., Johnson, M., Vamvoudakis, K. G., Lewis, F. L. and Dixon, W. E., “A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems,” Automatica 49(1), 8292 (2013).CrossRefGoogle Scholar
Vamvoudakis, K. G., Vrabie, D. and Lewis, F. L., “Online adaptive algorithm for optimal control with integral reinforcement learning,” Int J Robust Nonlin. 24(17), 26862710 (2014).CrossRefGoogle Scholar
Yu, S., Yu, X. and Zhihong, M., “Robust global terminal sliding mode control of SISO nonlinear uncertain systems,” In: Proceedings of the 39th IEEE Conference on Decision and Control (Cat 00CH37187), vol. 3 (2000) pp. 21982203.Google Scholar
Spong, M., Corke, P. and Lozano, R., “Nonlinear control of the reaction wheel pendulum,” Automatica 37(11), 18451851 (2001).CrossRefGoogle Scholar
Sutton RS, B. A.. Reinforcement Learning: An Introduction (MIT Press, United States, 2018).Google Scholar
Figure 0

Figure 1. Side view of the RWBR prototype.

Figure 1

Figure 2. Notations of the RWBR.

Figure 2

Table I. Diagram of bicycle structure.

Figure 3

Algorithm 1. Online robust self-learning TSMC for RWBR

Figure 4

Figure 3. Schematic of control system.

Figure 5

Figure 4. Simulation environment of RWBR in Matlab Simscape.

Figure 6

Table II. Assessment of control performance under different cases.

Figure 7

Figure 5. $\| \hat{W}_c\|$ and $\| \hat{W}_a \|$ with respect to the time under RS-TSMC in case 1.

Figure 8

Figure 6. The states, output and $V_c$ with respect to the time under four different algorithms in case 1.

Figure 9

Figure 7. $\| \hat{W}_c \|$ and $\| \hat{W}_a \|$ with respect to the time under RS-TSMC in case 2.

Figure 10

Figure 8. The states, output and $V_c$ with respect to the time under four different algorithms in case 2.

Figure 11

Figure 9. The states, output and $V_c$ with respect to the time under four different algorithms of RWBR prototype.