1. Introduction
The demand for the capacity of electronic product manufacturing extends with the increase of consuming electronic products, such as mobile phones and laptops. The connector assembly is one of the most important and challenging stages in electronic manufacturing. Flexible flat cable (FFC) assembly is fairly commonly found in modern electronics for connecting two electronic components [Reference Chapman, Gorjup, Dwivedi, Matsunaga, Mariyama, MacDonald and Liarokapis1]. The difficulties of assembling FFC are the tiny assembly tolerance and uncertain disturbance resulting from the characteristic of being prone to deformation under external force. Currently, this task is still carried out manually. Robot assembly plays a significant role in automated production. Nevertheless, the adaptability of conventional robot assembly is poor in an uncertain environment since it is based on position control. The robot conducts the assembly task by tracking the desired trajectory. The tiny position deviation may result in unsuccessful assembly. Relatively few works have been conducted on FFC assembly. However, lots of research has been discussed to tackle similar assembly tasks.
The frequently used method is to model the contact state and analyze the geometric errors. Yao and Cheng [Reference Yao and Cheng2] derived the geometrical compatibility condition to address the issue of force overshoot in the no-cylindrical part assembly tasks. Park et al. [Reference Park, Park, Lee, Park, Baeg and Bae3] analyzed the contact state between the peg and the hole. Then, unit motions based on the analysis are presented to perform the peg-in-hole assembly task. A novel modeing method based on a Non-Uniform Rational B-Splines surface was proposed by Zhang et al. [Reference Zhang, Zhang, Jin and Zhang4] for precise assembly tasks. Tang et al. [Reference Tang, Lin, Zhao, Chen and Tomizuka5] built up a three-point contact model and estimated the pose misalignment between the peg and hole through force and geometric analysis.
However, it is difficult to analyze the contact state in a complex environment. A direct assembly strategy based on demonstration is commonly used. Duque et al. [Reference Duque, Prieto and Hoyos6] generated assembly plans based on demonstration data captured by the Kinect motion sensor for assembling construction toys. To shorten the setup times of automated assembly tasks, Kramberger et al. [Reference Kramberger, Piltaver, Nemec, Gams and Ude7] proposed a method for learning the constraints of desired tasks via demonstration and autonomous exploration. Su et al. [Reference Su, Meng, Wang and Yang8] introduced a strategy to teach robots assembly skills by combining adaptive impedance control (AIC) with dynamic motion primitives. To solve the problem of peg-in-hole tasks, Abu-Dakka et al. [Reference Abu-Dakka, Nemec, Kramberger, Buch, Krüger and Ude9] proposed an algorithm based on programming by demonstration. Roveda et al. [Reference Roveda, Magni, Cantoni, Piga and Bucca10] proposed an assembly method based on a Bayesian optimization algorithm, which can efficiently enable robots to perform assembly tasks through a few human’s demonstrations while compensating for the task uncertainties.
As compared to rigid components, the assembly of FFC (Figure 1) entails more uncertain interference and presents greater challenges. FFC is a type of flexible electronics, so it is highly prone to deformation under external force. This causes the FFC to move out of sync with the robot, which is equivalent to the environment constantly changing. The assembly strategy according to the contact model is stable and efficient in the structural environment. Nevertheless, FFC assembly is a complex contact-rich manipulation task, and it is difficult to establish the contact model. The assembly method combining human’s demonstrations data and iterative optimization algorithms demonstrates high learning efficiency and data utilization rates. However, the high-precision FFC assembly is fraught with uncertainties, primarily stemming from deformation and friction between the FFC and connector. It is difficult to get rich and enough demonstration data and take all uncertainties into account [Reference Hou, Fei, Deng and Xu11, Reference Ma, Xie, Zhu and Liu12]. Additionally, FFC is very fragile and easily damaged. Therefore, an advanced algorithm with remarkable adaptability is desirable to complete this task.
With the ultrafast development of machine learning technology, reinforcement learning (RL) is a direct optimization approach that enables robots to learn skills through trial-and-error without requiring prior knowledge. It shows impressive skill learning performance in the field of robotics. Roveda et al. [Reference Roveda, Maskani and Franceschi13, Reference Roveda, Testa, Shahid, Braghin and Piga14] combined RL and variable impedance control methods to recognize human intentions and ensure the robot moves safely in the intended motion direction, which significantly improves the performance of physical human–robot interaction. RL has also brought about new opportunities for robot assembly skill acquisition through trial-and-error. Ma et al. [Reference Ma, Xie, Zhu and Liu12] presented an efficient robot assembly skill learning framework for precise assembly tasks by combining offline pretraining based on a few demonstrations from experts and online self-learning using RL. Inoue et al. [Reference Inoue, De Magistris, Munawar, Yokoya and Tachibana15] proposed a strategy to perform the high-precision peg-in-hole fitting task by training a recurrent neural network with RL. Xu et al. [Reference Xu, Hou, Wang, Xu, Zhang and Chen16] introduced a model-driven deep RL algorithm to complete multiple peg-in-hole tasks. Luo et al. [Reference Luo, Solowjow, Wen, Ojea, Agogino, Tamar and Abbeel17] combined RL and force controller to solve the problem of a gear assembly task. In previous works, there are few studies involving electronic product assembly tasks [Reference Shi, Chen, Liu, Riedel, Gao, Feng, Deng and Zhang18–Reference Schoettler, Nair, Luo, Bahl, Ojea, Solowjow and Levine20], especially the FFC. Data inefficiency restricts the application of RL in real complex systems [Reference Liu, Tian, Ai, Li, Cao and Wang21]. Large-scale trial-and-error brings the risk of damage to the FFC. Improving learning efficiency and reducing trial-and-error are the problems to be solved. In addition, the assembly of FFC is a complex contact-rich manipulation task, and small position deviation can result in a large contact force. During the assembly process, a force controller with excellent robustness and adaptability should be taken into account to deal with the uncertain disturbance. Impedance control (IC) is one of the popular compliant control methods proposed by Hogan [Reference Hogan22], and many efforts have been made to further improve the force-tracking performance within this control frame. Roveda et al. [Reference Roveda, Riva, Bucca and Piga23, Reference Roveda, Pallucca, Pedrocchi, Braghin and Tosatti24] combined fuzzy learning and iterative learning algorithms to enhance the force-tracking capability of robots in unknown environments, demonstrating strong robustness. The adaptive control algorithm is also applied to minimize the force-tracking error under uncertain disturbances [Reference Wang, Zhang and Lu25–Reference Xu, Wang, Yue, Wang, Peng, Liu, Chen and Shi29]. The purpose of this paper is to design a force controller that is easy to implement and highly robust in order to solve the force control issues in the FFC assembly process. To address the two main issues mentioned above, the main contributions are summarized as follows:
-
1. An efficient parallel assembly skill learning algorithm is proposed based on RL. The simulation system and real robot system share the learning experiences in parallel. The environment information obtained from the real robot system is utilized for the training optimal policy, providing efficient guidance to perform FFC assembly tasks in a real robot system. The optimal policy trained by the simulation system is also refined by updating real physical information simultaneously.
-
2. An AIC algorithm is presented to track the desired force in the unstructured environment during the assembly process, and the stability is also analyzed based on the Lyapunov function.
-
3. The proposed parallel learning algorithm and AIC are combined to enable the robot to acquire the skill of FFC assembly efficiently.
The rest of the paper is organized as follows. In Section 2, an efficient parallel assembly skill learning algorithm is proposed. Section 3 presents the AIC algorithm. In Section 4, FFC assembly experiments are conducted to demonstrate the performance of the proposed assembly skill acquisition strategy. Finally, a conclusion is drawn in Section 5.
2. Efficient parallel assembly skill learning algorithm
The thickness of FFC is only 0.3 mm, which makes it highly susceptible to deformation under external force. Additionally, FFC is very fragile. These characteristics require assembly skill learning algorithms to have efficient learning capabilities to cope with the uncertainty caused by deformation and reduce the risk of damaging FFC. Sim-to-real is a fast approach for training robots to acquire skills [Reference Narang, Sundaralingam, Macklin, Mousavian and Fox30, Reference Shahid, Narang, Petrone, Ferrentino, Handa, Fox, Pavone and Roveda31]. Inspired by this, we present an efficient parallel assembly skill learning algorithm to accelerate the training process for high-precision FFC assembly tasks based on RL. The main idea of the algorithm is to share information, which includes learning experience, goal state, and reward information, between the simulation system and real robot system in parallel. The simulation system efficiently trains optimal policy based on the successful and failed assembly pose data collected from the real robot system. The agent receives rewards or penalties when the robot reaches the corresponding poses. The optimal policy trained in the simulation system is used to guide the real robot system in completing the assembly task. Meanwhile, the real robot system rectifies the optimal policy trained by the simulation system. The simulation system in this paper is built using the V-rep. However, it does not take the interaction model into consideration. Due to the complex contact involved in high-precision FFC assembly, the impact of friction on the FFC assembly is significant. In addition, FFC is made of flexible material which easily deforms under external force. Therefore, it is extremely challenging to establish a simulation environment consistent with the real robot system. The FFC assembly skill learning algorithm is shown in Figure 2 and Algorithm 1. The parameter server is used to store the evaluation model trained by the Softmax classifier using physical environment information from the real robot system. The parameter server is also applied for the simulation system to train optimal policy, and it will be rectified consecutively by the real robot system. The experience pool consisting of two shared Q-tables is applied to save the learning experience of all agents so that the agent learns the optimal experiences from other agents to accelerate the training process. All agents have individual Q-function and consecutively update the shared Q-table. The experience means the records from state space to action space.
The weighted double Q-learning algorithm (WDQL) is utilized for each agent to acquire assembly skills, which dilutes the influence of the overestimation in Q-learning (QL) [Reference Zhang, Pan and Kochenderfer32]. The assembly process can be described as the Markov decision process, which consists of agents, environment, action space $\mathcal{A}$ , state space $\mathcal{S}$ , and reward strategies. At each step $t$ , the agent chooses action $a\in \mathcal{A}$ , interacting with the environment, and then gets the reward $r_{t}$ . The state space $S$ is defined as:
where $f_{x}$ , $f_{y}$ , $f_{z}$ , and $\tau _{z}$ denote the contact force and moment from the force sensor along x-, y-, and z-axes. $x$ , $y$ , $z$ , and $\theta _{z}$ are the position and angle of the end-effector of the manipulator along x-, y-, and z-axes.
According to the actual assembly process of the FFC, the action space consisting of six components (Figure 3) can be defined as:
where ${\Delta} x$ , ${\Delta} y$ , and ${\Delta} \theta$ denote the step lengths for translation and rotation adjustment along x-, y-, and z-axes.
The assembly skill acquisition process starts with a random exploration. The performance of exploration is improved by maximizing the cumulative reward:
where $\gamma$ is the discount rate, $r$ is the current reword, and $k$ stands for the step number.
In the skill learning process, the purpose is to complete the task as quickly as possible. The reward strategy is applied to evaluate the performance of the action $a$ at state $s\in S$ in the real robot and simulation system. The parameter server is utilized to evaluate whether the assembly task is completed successfully. According to the strategy [Reference Xu, Hou, Wang, Xu, Zhang and Chen16] and production specifications [38], the reward strategy is designed as follows:
where $k_{max}$ denotes the maximum step in an episode. If the robot fails to assemble after attempting $k$ ( $\geq k_{max}$ steps, it receives a reward $r=-0.5$ . The danger occurs means that the absolute value of the contact force exceeds the safe boundary $f_{b}=[f_{x}^{b},f_{y}^{b},f_{z}^{b},\tau _{z}^{b}]$ , it receives a reward $r=-1$ . $f_{x}^{b}$ , $f_{y}^{b}$ , $f_{z}^{b}$ , and $\tau _{z}^{b}$ denote the safe boundary of contact force and moment along the x-, y-, and z-axes, respectively. When one of the three situations in Eq. (4) occurs, the robot will return to the initial position and start the next episode.
In order to obtain the optimal assembly strategy, the WDQL is applied to select the best possible action. The action selection policy is defined as follows:
where $\xi$ is generated between 0 and 1 randomly and $\varepsilon \in (0,1)$ is the greedy parameter. $Q_{1}(s,a)$ and $Q_{2}(s,a)$ are the Q-functions which recursively update by the Bellman function randomly
where $\psi _{1}$ and $\psi _{2}$ are the learning rate and $\vartheta$ is generated between 0 and 1 randomly. In the WDQL, the Q-value is recorded in the double Q-tables. $\partial _{1}$ and $\partial _{2}$ are the weighted parameters given by:
where $c$ is a constant.
To preliminarily validate the feasibility of the proposed method, a simulation of rectangular peg-in-hole assembly is implemented using Python. The successful assembly position and orientation are set to [4 4 5], and a search is performed within the given range to simulate the process of peg-in-hole assembly. The search range is set to $x\epsilon [{-}8,8]$ , $y\epsilon [{-}8,8]$ and $\theta \epsilon [{-}6,6]$ . The comparative analysis of the performance of commonly used QL, deep Q-learning (DQL), actor-critic (AC), policy gradient (PG), proximal policy optimization (PPO), and the proposed method is implemented. The parameters of the skill learning algorithm are set to $\psi _{1},\psi _{2}=0.01$ , $\varepsilon =0.85$ , ${\Delta} x=1mm$ , ${\Delta} y=1mm$ , ${\Delta} \theta =1deg$ , and $\gamma =0.85$ . The learning rate of the neural network in DQL, AC, PG, and PPO is set to 0.01. The simulation results are shown in Figure 4. With the utilization of QL, DQL, AC, PG, and PPO, the learning step has no notable drop trend. When using the proposed method, the learning efficiency improves as the number of agents $n$ increases. However, in real physical experiments, it is impractical to increase the number of agents of the simulation system indefinitely, as this would affect the real-time performance of the real robot system and the force-tracking performance of the force controller. Therefore, a trade-off must be made between force-tracking performance and the number of agents in a real assembly process.
3. Adaptive impedance control
The assembly of the FFC is a complex contact-rich manipulation task. AIC is used to ensure stable contact force between the FFC and connector during the learning assembly process. Meanwhile, the contact force information is also utilized as a crucial element for assembly learning. The combination of the parallel assembly skill learning algorithm and AIC in this paper refers to using the action sequence output by the learning strategy as the desired trajectory input to the AIC. The AIC is responsible for maintaining the desired contact force along the input trajectory. Throughout the process, the parallel assembly skill learning algorithm does not adjust the parameters of AIC. In this section, the proposed AIC is derived, and the stability is analyzed based on the Lyapunov function. The general form of the Cartesian IC model composed of both the translational and rotational parts can be expressed as follows:
where $\boldsymbol{M}\epsilon R^{6\times 6}$ is the mass matrix, $\boldsymbol{B}\epsilon R^{6\times 6}$ is the damping matrix, and $\boldsymbol{K}\epsilon R^{6\times 6}$ is the stiffness matrix. $\boldsymbol{F}_{\boldsymbol{d}}\epsilon R^{6}$ denotes the prescribed force, and $\boldsymbol{F}_{\boldsymbol{m}}\epsilon R^{6}$ is the measured force between the end-effector and the environment. $\boldsymbol{X}_{\boldsymbol{c}}\epsilon R^{6}$ is the command trajectory of the end-effector sent to the robot, and $\boldsymbol{X}_{\boldsymbol{d}}\epsilon R^{6}$ is the desired trajectory of the end-effector. $\boldsymbol{M}$ , $\boldsymbol{B}$ , and $\boldsymbol{K}$ are all diagonal matrices [Reference Ferraguti, Landi, Sabattini, Bonfè, Fantuzzi and Secchi33]. Each direction can be independently controlled [Reference Duan, Gan, Chen and Dai34–Reference Seraji and Colbaugh36].
Without loss of generality, a one-dimensional case is studied. Eq. (8) can be rewritten as:
where ${\Delta} x=x_{c}-x_{d}$ . By applying the Laplace transform to Eq. (9), it can be expressed as:
Assuming that the environment can be represented by a linear spring model with stiffness $k_{e}$ , it has $f_{m}=k_{e}(x_{e}-x_{c})$ , where $x_{e}$ is the environment location. Then, we can get
According to Eq. (11), the steady-state force-tracking error can be written as:
If ${\Delta} f_{ss}=0$ , either $k=0$ or $k_{e}$ and $x_{e}$ are known exactly. Typically, the environment is complicated. It is difficult to get information on $x_{e}$ and $k_{e}$ ; hence, the target IC can be modified as:
Generally, the force-tracking performance of traditional IC is poor in complex environments. To overcome this issue, an AIC algorithm based on the Lyapunov function is presented to further improve the force-tracking capability against uncertain disturbance. It can be expressed as:
where $\lambda _{p}(t)$ and $\lambda _{d}(t)$ are the adaptive gains, and $\varphi (t)$ is the auxiliary compensate function.
Substituting Eq. (15) into Eq. (14) and with the modification, then Eq. (14) becomes
Let us define force-tracking error $\boldsymbol{E}=[\begin{array}{c@{\quad}c} {\Delta} f & {\Delta} \dot{f} \end{array}]^{T}$ , then Eq. (16) can be represented by:
The prescribed performance of ${\Delta} f$ can be described by the reference model:
where $R=\left[\begin{array}{c@{\quad}c} 0 & 1\\ -\omega ^{2} & -2\eta \omega \end{array}\right]$ , $\boldsymbol{E}_{r}=[\begin{array}{cc} e_{r} & \dot{e}_{r} \end{array}]^{T}$ , $\omega$ is the natural frequency, and $\eta$ denotes damping ratio. the reference model is stable, that is, $\boldsymbol{E}_{r}\equiv 0$ .
Defining $\boldsymbol{E}_{m}=\boldsymbol{E}_{r}-\boldsymbol{E}$ , and substituting Eq. (17) and Eq. (18) into it, the error differential equation can be obtained:
A Lyapunov function is defined as:
where $\beta _{i}(i=0,1,2)$ is the positive parameter and $\varphi$ , $\lambda _{d}$ , and $\lambda _{p}$ are the function of time. $\boldsymbol{P}=\left[\begin{array}{c@{\quad}c} p_{1} & p_{2}\\ p_{2} & p_{3} \end{array}\right]$ is the symmetric positive-definite constant matrix.
Differentiating Eq. (20), we can get
where $\delta =p_{2}{\Delta} f+p_{3}{\Delta} \dot{f}$ and $\boldsymbol{Q}=-\boldsymbol{PR}-\boldsymbol{D}^{T}\boldsymbol{P}$ [Reference Seraji37].
For the purpose that $\boldsymbol{E}_{m}$ tends to zero asymptotically, $\dot{V}$ should be negative-definite. To achieve this, we set
Substituting Eq. (22) to Eq. (21) and simplifying, Eq. (21) can be rewritten as:
As $\dot{V}$ is negative-definite, it implies $\boldsymbol{E}_{m}\rightarrow 0$ and Eq. (17) is asymptotically stable. From Eq. (22), the adaptive control law can be obtained as:
In order to make the control laws independent of $m$ , $b$ , and $k_{e}$ , we set
Substituting Eq. (25) into Eq. (24) and then integrating Eq. (24), the adaptive control law can be expressed as:
To verify the performance of the proposed method, the force-tracking experiments on the curve surface are conducted (Figure 5) using IC, the commonly used AIC [Reference Wang, Zhang and Lu25–Reference Xu, Wang, Yue, Wang, Peng, Liu, Chen and Shi29], and the proposed AIC. The parameters of the proposed method are set to $f_{d}=10N$ , $m=1$ , $b=200$ , $\varphi (0)=0.09$ , $\lambda _{p}(0)=0.1$ , $\lambda _{d}(0)=0.4$ , $\mu _{0}=1.5$ , $\varrho _{0}=0.3$ , $\mu _{1}=0.4$ , $\varrho _{1}=0.1$ , $\mu _{2}=0.1$ , and $\varrho _{2}=0.1$ . The initial value of the $x_{c}$ is set to 0.193 m. The robot moves from free space to the contact space. The experiments are executed five times using each aforementioned approach. Fig 6(a) and (b) illustrate the position and force-tracking performance. The solid line and the shaded area represent the mean and standard deviation after five groups of experiments. It demonstrates the proposed AIC has better force-tracking capability without a large force overshoot compared with IC and commonly used AIC.
4. Experiments
To evaluate the performance of the proposed method, the FFC assembly experiments are conducted in this section. First, the experiment platform is detailed presented. Second, FFC assembly experiments, which consist of three phases: picking up and stable contacting, model training, and further verification, are conducted. In the model training experiments, the robot picks up an FFC and trains for 48 episodes. In further verification experiments, the robot picks up another FFC and implements the assembly task two times according to the model trained in the model train experiments to further verify the robustness of the proposed assembly skill learning method. Third, the comparative analysis of the performance of the QL, DQL, AC, PG, PPO, and the proposed method in FFC assembly tasks is implemented.
4.1. Experiment setup
An overview of the assembly experiment platform is shown in Figure 7. This platform consists of a 6-DOF robot (Universal robot, maximum load 5 kg, communication frequency 125 Hz, repeated positioning accuracy: 0.03 mm), a six-axis force/torque sensor (NRS-6050-D80, maximum force/torque ± 500 N/ ±10 Nm, sampling rate 1000Hz, force/torque resolution 0.015 N/0.312 × 10-3 Nm) mounted on the end-effector of the robot, air pump (maximum pressure 8 kpa), a pneumatic gripper, and a computer (NUC Intel Core i7, Ubuntu 18.04, Python 3.10) communicating with robot via TCP protocol.
The FFC and connector produced by I-PEX Co., Ltd. are used to verify the effectiveness of the proposed assembly skill acquisition strategy. The length and width of the FFC are 18.5 mm and 0.3 mm, respectively. However, the length and width of the slot of the connector are 18.56 mm and 0.37 mm, respectively. The clearances between the FFC and connector are approximately 0.06 mm and 0.07 mm. Before the assembly task is conducted, the connector is fixed on the fixture, and two FFCs are also placed on the fixture. The position for gripping the FFC is determined by demonstration.
4.2. Experiment implementation
The FFC assembly experiments consist of three phases. The procedure of the FFC assembly experiment is shown in Figure 8, and the details of the experiments can be found in the supplemental video.
-
1. Picking up and stable contacting: The robot picks up the FFC1 and approaches the connector. Then, the proposed AIC is applied to bring FFC1 and the connector into stable contact. The initial position is arbitrarily set manually with a large positional error. The parameters of AIC are set the same as the parameters of the force-tracking experiment in Section 3. According to the production specifications [38] provided by I-PEX Co., Ltd., the maximum mating force of FFC (Product model: EVAFLEX 5-SE-VT 30p) is 18N. So we referred to this parameter and chose a smaller desired contact force $f_{d}=5N$ along the z-axis.
-
2. Model training: The parallel assembly skill learning algorithm and proposed AIC are combined to enable the robot to have the capability of acquiring FFC assembly skills through training. The training episode $E$ is 48. One episode is terminated when the FFC reaches the goal depth successfully or the contact force-moment exceeds the maximum safe boundary $f_{b}$ or the learning steps are above the maximum learning step $k_{max}=500$ . Then, the robot moves to the initial position and executes the next learning episode. The parameters of the skill learning algorithm are set to $\psi _{1},\psi _{2}=0.01$ , $\varepsilon =0.85$ , ${\Delta} x=0.3mm$ , ${\Delta} y=0.3mm$ , ${\Delta} \theta =1deg$ , and $\gamma =0.9$ . The number of agents in the simulation system is set to 6. $f_{b}=[1.5,1.5,10,1]$ is set according to the accumulated experience gained through a large number of experiments.
-
3. Further verification: After 48 episodes, the model is stored, and the robot places the FFC1 back in its initial placement position and then picks up FFC2 as a new task for assembly. In order to ensure the successful gripping of the FFC, there is a clearance of approximately 0.8 mm between the fixture and the FFC. However, it results in variations in the position of the FFC within the fixture when gripping the FFC each time. The purpose of this section is to further validate the robustness of the proposed algorithm with the aforementioned uncertainties.
4.3. Experiment results
To comprehensively evaluate the performance of the proposed method, we apply the proposed method, QL, DQL, AC, PG, and PPO, respectively, to conduct FFC assembly experiments and analyze their performance from multiple perspectives.
Figure 9 presents the comparisons of the FFC assembly performance of QL, DQL, AC, PG, and PPO and the proposed method. From Figure 9(a), it is clear that the proposed method has a higher success rate of 92% exceeding 58%, 74%, 38%, 42%, and 52% of DQL, QL, AC, PG, and PPO, respectively. The rates of failed assembly caused by contact force exceeding safe boundary and the learning steps exceeding the maximum step are 6% and 2%, which are lower than 20%, 6% of QL, 32%, 10% of DQL, 18%, 44% of AC, 22%, 36% of PG, and 28%, and 20% of PPO. Moreover, Figure 9(b) shows that the cases of failed assembly only occur in the early stages of training when using the proposed method. With the utilization of the QL, DQL, AC, PG, and PPO, these cases occur throughout the entire learning process.
It can also be observed in Figure 10(a) that the assembly steps rapidly converge to about 20 steps after training 15 episodes using the proposed method, while the assembly steps of the QL and DQL show a slight drop trend and fluctuate greatly. With the utilization of AC, PG, and PPO, there is no noticeable decrease. The main reason is that the thickness of FFC is only 0.3 mm, which makes it extremely susceptible to deformation under external force. This can result in a lack of synchronization between FFC and the robot end-effector, which is equivalent to the environment constantly changing. The assembly experience that the agent has learned may become outdated when the environment changes, so the agent needs to adapt its model to get with the new environment quickly. However, the data inefficiency of the QL, DQL, AC, PG, and PPO impedes the learning efficiency of FFC assembly. The last two episodes of the proposed method require about 30 steps to perform the FFC2 assembly task, indicating that the proposed method is robust against the uncertain disturbance. The main reason why the number of steps for assembling FFC2 is slightly higher than FFC1 is due to the position differences when gripping FFC2 and the repeated positioning errors of the robot.
Figure 10(b) shows the execution time distribution of different methods in FFC assembly experiments. The execution time of other methods spreads more widely and is moved further right than the execution time of the proposed method, which indicates the high efficiency of the proposed method. Figure 11 shows the contact force of one episode between the FFC and connector during the assembly process. The abrupt damping of external force along the z-axis indicates the critical phase in which the position of the connector is found successfully. Then, the robot moves downwards until it reaches the goal depth or the contact force exceeds the safe boundary. Figure 12 is obtained by smoothly fitting the maximum Q-values under different states. In order to depict in a three-dimensional space, we reduced one dimension concerning the rotational state along the z-axis. It demonstrates the changing trend of the assembly policy. The agent moves towards the area with bright colors.
5. Conclusion and future work
For challenging FFC assembly tasks, an efficient assembly skill acquisition strategy is presented by combining a parallel assembly skill learning algorithm with AIC. The force-tracking experiments on the curve surface show that the proposed AIC has a better performance compared with the IC and the commonly used AIC. Thus, it solves the complex contact issues during the assembly process. The experiments of FFC assembly illustrate that the proposed skill acquisition strategy enables the robots to perform the FFC assembly task through fewer steps after training. It is robust against the disturbance of uncertain factors and has a more efficient assembly skill learning efficiency compared with other commonly used methods.
The current work still has some limitations. First, position for gripping the FFC is determined by demonstration. This will increase the workload of workers in practical applications. Second, the proposed parallel assembly skill learning algorithm requires tuning a significant number of parameters. Our future work will focus on combining the proposed method with the vision algorithm to achieve autonomous object recognition, grasping, and skill learning of FFC assembly tasks. Additionally, inspired by reference [Reference Roveda, Magni, Cantoni, Piga and Bucca10], we will combine the Bayesian optimization algorithm with the assembly skill learning algorithm proposed in this paper to achieve autonomous parameter tuning and further enhance learning efficiency.
Author contributions
Xiaogang Song: Writhing – original draft, Methodology, and Conclusion. Peng Xu: Supervision, Writing – review and editing, and Project administration. Wenfu Xu: Supervision and Writing – review and editing. Bing Li: Supervision, Writing – review and editing, and Project administration. Lei Qin: Supervision and Writing – review and editing.
Financial support
This work was supported in part by the National Natural Science Foundation of China under Grant U22A20176 and Grant 52305016, in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2022B1515120078, in part by the Science and Technology Innovation Committee of Shenzhen under Grant JSGGZD20220822095401004, and in part by the Shenzhen Peacock Innovation Team Project under Grant KQTD20210811090146075.
Competing interests
The authors declare no conflict of interest.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0263574724001164.