1. Introduction
1.1. Pose correction for aerospace robotic drilling
Achieving high-precision drilling tasks using industrial robots requires careful consideration of both geometric and nongeometric sources of error in robots. One approach to tackle this challenge involves the offline calibration of robots by re-evaluating their kinematic model parameters and adjusting the nominal parameters within the robot controller [Reference Arthur and Khoshdarregi1]. However, accurate kinematic calibration is time-consuming and still may not provide the required accuracy at all robot poses [Reference Maghami, Imbert, Côté, Monsarrat, Birglen and Khoshdarregi2]. Additionally, the calibration process often needs to be repeated depending on the payload of the robot, which further disrupts production operations. While specific tolerances may vary depending on the specific application and component being manufactured, aerospace machining operations generally aim for tight tolerances in order to ensure the integrity and reliability of assembled structures. In many cases, these operations strive to achieve tolerances within the range of 0.025–0.120 mm [Reference Saha3]. To meet these tolerances, robotic pose correction solutions have emerged to enhance the accuracy of drilling operations in aerospace manufacturing. This research presents the development and implementation of a vision-based pose correction technique, aiming to significantly improve drilling accuracy and overall efficiency in the aerospace industry.
In the drilling process, the perpendicularity of drill bit relative to the surface plays a crucial role. Any deviation in the alignment of the tool axis and surface normal can lead to issues such as inaccurate hole diameter and excessive burr. Consequently, this can negatively impact the quality of joints between aircraft components. To address this issue, various techniques have been developed to enhance the accuracy of normal adjustment in drilling processes. Laser displacement sensors are extensively used to measure surface normal and ensure perpendicularity of drilling operations. These sensors are noncontact measurement devices known for their numerous advantages, including high precision, low power consumption, and high reliability. Usually, three or four laser displacement sensors are installed on the robot’s end-effector to perform surface normal measurements [Reference Gao, Wu, Nan and Chen4, Reference Gong, Yuan, Wang, Yu, Xing and Huang5]. Several studies [Reference Yuan, Wang, Shi, Wang, Wang, Chen and Shen6–Reference Wang, Qin, Bai, Tan and Li8] have proposed drilling end-effectors equipped with four laser displacement sensors to measure the surface normal during the drilling process. However, the potential assembly errors of these sensors were not taken into account in their approach. Fabrication and assembly errors in the end-effector can result in small discrepancies between the nominal and actual zero points. Therefore, researchers [Reference Chen, Yuan, Wang, Cai and Tang9, Reference Han, Yu and Zhu10] have proposed a calibration method to compute the errors of the zero point and laser beam direction in laser displacement sensors. They reported a reduction in average angular deviation down to 0.1°, which satisfied the established tolerances. While these methods effectively correct the robot’s orientation with respect to the surface, they do not directly address the challenge of accurate robot positioning.
Laser trackers are an alternative solution that can be used as an external measurement system for enhancing the accuracy of drilling operations. Laser trackers, in essence, perform as a laser displacement sensor that follows a target attached to the robot and can offer a 3D positional accuracy of 0.025 mm [11]. They also offer additional 6D probes, e.g., Leica T-Mac, with orientational accuracy of 0.03° [Reference Wang and Keogh12]. Liu et al. [Reference Liu, Liu, Liu, Xie, Xu and Chen13] used a laser tracker with several 3D targets on the end-effector to correct the robot’s errors in a peg-in-hole assembly robot. In a similar approach, Posada et al. [Reference Posada, Schneider, Pidan, Geravand, Stelzer and Verl14] proposed an error correction solution for robotic drilling using a laser tracker and three separate 3D targets attached to the drilling spindle. Although they reported positional errors smaller than 0.1 mm and rotational deviations of 0.2°, their solution cannot be used for real-time measurements as the targets must be measured once at a time using the laser tracker. Droll [Reference Droll15] developed a real-time path correction with direct end-effector feedback from a Leica laser tracker paired with a 6D probe. He reported an RMS error of 0.11 mm while driving the robot with 100 mm/s. Moeller et al. [Reference Moeller, Schmidt, Koch, Boehlmann, Kothe, Wollnack and Hintze16] used two laser trackers with 3D position targets and a 6D probe attached to the robot end-effector. They used real-time measurements from the laser tracker in a secondary controller to correct the robot’s trajectory and achieved errors less than 0.25 mm in a milling operation. Chen et al. [Reference Chen, Yuan, Wang, Cai and Xue17] combined a laser tracker positioning target with four laser displacement sensors to improve the positioning accuracy in robotic drilling based on co-kriging. Their method utilizes error similarity based on the kinematics of the drilling robot and laser tracker measurements to estimate positional errors at various points in the workspace. They reported a maximum/average error of 0.26/0.12 mm in the positional error. Fernandez et al. [Reference Fernandez, Olabi and Gibaru18] proposed the implementation of a laser tracker with 6D probe along with a laser line scanner at the robot end-effector. With their suggested system, they achieved a positioning accuracy of 0.15 mm in the robot workspace. Wang et al. [Reference Wang, Zhang and Keogh19] proposed a solution for real-time path error compensation in robots using a single 3D laser tracker target. They tested their methodology for robotic drilling tasks and showed a reduction in hole positional errors from 0.83 mm to 0.17 mm. Cvitanic et al. [Reference Cvitanic, Melkote and Balakirsky20] investigated the effect of fusing the data from a laser tracker 6D probe with an inertial measurement unit (IMU) to improve the robot end-effector state estimation. They reported up to 95% improvement in velocity estimation accuracy and up to 45% improvement in angular acceleration estimation from the laser tracker measured data. As previous studies suggest, laser trackers are undeniably suitable for precise position and orientation measurements in industrial robots. However, their high cost and challenges in implementation, particularly due to the bulkiness of 6D probes and limitations of 3D targets, position them as better suited for offline error compensation and robot calibration rather than online error correction.
Optical coordinate measurement machines (CMMs) are another alternative solution that can be used as an external measurement system for robotic applications. Schneider et al. [Reference Schneider, Posada, Drust and Verl21] proposed an approach for position control of industrial robots using the pass-through between an industrial computer numerical control (CNC) and servomotors. They used a CNC-controlled robot with an external optical CMM system paired with LED targets to create a closed-loop feedback control on the robot end-effector and improve the robot accuracy to better than 0.4 mm. Chen et al. [Reference Chen, Yang, Hofschulte, Jiang and Zhang22] proposed an optical/inertial data fusion system for motion tracking of the robot manipulator. They used Kalman filter to fuse the data from an IMU with measurements from optical trackers to achieve submillimeter accuracy. Gharaaty et al. [Reference Gharaaty, Shu, Joubair, Xie and Bonev23] used an optical CMM system for pose correction of the robot end-effector. They used a root mean squared filtering approach coupled with a proportional–integral-derivative (PID) controller to correct robot’s pose error in an online manner. Overall, compared to laser trackers, optical CMMs are easier to implement for robot error correction. However, they offer lower accuracy due to their large measurement volume designed for general measurement applications.
Vision-based systems offer a practical and accurate measurement solution owing to their tailored approach and reasonable cost [Reference Pérez, Rodríguez, Rodríguez, Usamentiaga and García24]. Several vision-based solutions have been proposed to enhance drilling positional accuracy. Zhu et al. [Reference Zhu, Mei, Yan and Ke25] presented a two-dimensional (2D) vision system designed to enhance the positional accuracy of drilling processes. They achieved this by localizing multiple reference holes in the part, which ultimately led to a positional accuracy of 0.1 mm. Frommknecht et al. [Reference Frommknecht, Kuehnle, Effenberger and Pidan26] proposed a combined system of 2D vision detection along with laser distance sensors to detect and localize reference holes and achieved an accuracy of 0.3 mm. In a similar approach, Mei and Zhu [Reference Mei and Zhu27], combined laser displacement sensors with a 2D vision camera to position the tool using a set of reference holes. They reported a positioning accuracy of 0.05 mm. Similar approaches [Reference Liu, Zhenyun Shi, Chen, Lin and Li28–Reference Filonov, Zmeu, Notkin and Baranchugov30] have been used for positioning the drilling tool using different detection and localization methods for reference holes. Several studies [Reference Mei, Zhu, Yan and Ke31–Reference Lou, Lu, Cui, Jiang, Tian and Huang33] concentrated solely on achieving robust detection of circular holes through contour refinement and model fitting, as it directly affects the precision of the drilling process.
Vision-based solutions based on part localization and feature detection have also been utilized in other manufacturing tasks [Reference Yu, Bi, Ji, Fan, Huang and Wang34]. Yang et al. [Reference Yang, Jiang and Wang35] proposed a robotic multi-view pose estimation method for the pick-and-place operation of large parts with an accuracy better than 1 mm. Jiang et al. [Reference Jiang, Cui, Cheng and Tian36] developed a vision-based guidance solution for robotic peg-in-hole operations. They used two sets of binocular cameras, one with eye-to-hand configuration to localize the robot end-effector, and the other one installed on the robot (eye-in-hand configuration) for aligning the tool with reference holes. Ayyad et al. [Reference Ayyad, Halwani, Swart, Muthusamy, Almaskari and Zweiri37] developed a neuromorphic vision-based solution for positioning of robotic drilling systems. In their solution, they first performed a multi-view reconstruction to estimate the workpiece’s pose, and then, they refined the pose for a local region of workpiece by detecting reference circular holes. A positional accuracy of 0.2 mm is reported for their solution. Although above-mentioned vision-based solutions for robotic drilling can offer high accuracy in measurements, they mainly rely on pre-existing reference holes for correcting the robot pose. This constraint restricts their implementation in scenarios where drilling operations are required on large aerospace panels, where reference holes either are not visible or do not exist.
This study focuses on the development of a high-precision vision-based measurement system that can deliver accurate measurements in the robot workspace. By simultaneously measuring the robot pose and the part pose, we combine part localization with robot pose error correction and propose an online relative error correction solution between the robot end-effector and the part. We compare different implementation methods for the proposed solution and test them experimentally on the robot-workpiece setup.
1.2. Target localization for robot pose measurement
Figure 1(a) shows a stereo optical tracker and circular targets used for measuring robot’s spindle pose. To achieve high accuracy in measurements, positioning targets are used to mark the points of interest. Accurate detection of these positioning targets in the images is one of the main steps in the measurement process. Flat circular targets are commonly used in these systems due to their detection accuracy and robustness [Reference Shen, Zhang, Cheng and Zhu38]. These targets can be active or passive targets. Active targets are normally made of infrared (IR) or near-IR light-emitting diodes (LEDs) [Reference Wang, Zhang, Yu, Chen, Si, He, Wang, Zhang, Eds. and dv39]. Due to the radiometric characteristic of IR light and with the help of near-IR-bandpass imaging filters, the effect of environmental light can be reduced which results in high contrast images of the targets. Although active targets have the advantage of high visibility in measurements, they are difficult to implement due to wiring requirements and their large dimensions [Reference Schütze, Raab, Boochs, Wirth and Meier40]. Passive targets, on the other hand, are made of retro-reflective materials deposited on a thin plastic film and can be manufactured as circular stickers [Reference Huang, Huang, Zhou, Dou, Wang and Hu41, Reference Nubiola, Slamani, Joubair and Bonev42]. These targets reflect the light in almost the same direction of incident. On-axis illumination of these targets by installing IR LEDs close to the camera results in high contrast images of the targets. The main advantage of using passive targets is their light weight, which makes it possible to use many of them on a robot without limiting the robot’s motion.
In general, vision-based measurement systems use triangulation, based on the images of targets from different angles, to find the 3D coordinates of the targets. The projective transformation of a planar circular target to the image plane can be approximated as an ellipse. Ellipse center coordinates with sub-pixel accuracy are used for triangulating the center coordinates of the target in 3D space. The commonly used algorithms for center calculation can be divided into two categories: centroiding methods and ellipse fitting methods [Reference Wang, Zhang, Yu, Chen, Si, He, Wang, Zhang, Eds. and dv39, Reference Roig, Espinosa, Perez, Ferrer and Mas43]. Centroiding methods are relatively simple and efficient algorithms. Binary centroid and gray-weighted centroid are two of the commonly used methods in photogrammetry applications. These methods are usually applied on images obtained from a selected threshold. The gray-weighted centroid method uses the grayscale value of each pixel as the weight and can result in higher accuracy in center location as opposed to the binary centroid which gives equal weight to all selected pixels.
In ellipse fitting methods, the full ellipse geometry that fits best to the ellipse points is found rather than finding only the center of the ellipse. The ellipse points in images are usually pixels extracted using edge detection algorithms such as Canny. Ellipse fitting algorithms are more computationally complex than centroiding methods. In measurement systems for robotic applications, we need to consider not only accuracy but also time efficiency. Based on these requirements, here, we narrow down our options in different classes of ellipse fitting. We mainly focus on fully deterministic methods, where the data model and the operators are both deterministic, in contrast with statistical models and voting operators such as RANSAC [Reference Satriya, Wibirama and Ardiyanto44, Reference Mai, Hung, Zhong and Sze45]. Deterministic methods can be classified into two main categories: clustering (Hough transform), and least-squares methods.
Hough-based methods [Reference Rao, Zhou and Nie46, Reference Lu and Tan47] can provide robustness against outliers. However, their high computational and memory load makes them unsuitable for online measurement applications. Least-square methods are mainly based on the definition of a distance (cost) function and finding the minimum value of this function, i.e., best fit. The distance function represents the error between the fitted ellipse and input data points. Based on the definition of the distance parameter used in these methods, they can be classified into geometric and algebraic approaches. In the geometric approach, the distance is a geometrical parameter, such as Euclidean distance [Reference Ahn, Rauh and Warnecke48, Reference Sturm and Gargallo49]. In this case, the cost function with respect to the ellipse parameters becomes nonlinear and requires iterative methods for minimization, and hence, it is computationally expensive. In algebraic fitting of ellipses, on the other hand, an algebraic distance is used [Reference Fitzgibbon, Pilu and Fisher50]. This approach usually results in a linear cost function with respect to ellipse parameters. These formulations can result in an efficient closed form solution having selected a quadratic constraint function for the minimization process [Reference Fitzgibbon, Pilu and Fisher50]. One of the main disadvantages of these methods is the lack of invariance in the cost function, which can result in other forms of fitted curve other than the desired ellipse, such as hyperbola or parabola [Reference Ahn, Rauh and Warnecke48]. To solve this problem, Fitzgibbon et. al. [Reference Fitzgibbon, Pilu and Fisher50] incorporated the ellipticity constraint $4ac-b^{2}\gt 0$ into the minimization constraint in a quadratic form and found an ellipse-specific closed form solution. Since IR-bandpass images of targets are bimodal images with minimal background noise and complexity, and due to the computational time limitations in robotic applications, in this study we focus on non-iterative algebraic least-squares methods and investigate different variations of them in the context of industrial optical tracking systems.
Many research studies have been conducted in the last decades on different formulations of least-squares approach for ellipse fitting. Here, we try to investigate the performance of different variations and combinations of these methods in IR-bandpass images from a motion tracking system. From the viewpoint of input data, we divide the algebraic least-squares approach into two types of point and line ellipse fitting. In fitting an ellipse using points, i.e., a point ellipse, the edge pixels are used as sampled input data and are usually found by exploiting the local maxima and using edge detectors such as Canny. The accuracy of ellipse fitting in these methods depends on the accuracy of sampled pixels at the edge. The accuracy of edge pixels directly depends on the resolution of ellipse image, which is not high in the case of imaging a small circular target in a relatively large field of view, see Fig. 1(b). Therefore, preprocessing the input data and finding the edge points with sub-pixel accuracy can improve the accuracy of ellipse estimations. Heikkila [Reference Heikkila51] proposed a sub-pixel ellipse boundary detection algorithm that convolves each pixel on the edge and corrects its location using the local gradient information in the neighboring pixels [Reference Heikkila51], see Fig. 1(c).
The addition of edge correction steps adds to the complexity of the solution and the computation time. However, directly exploiting the gradient information in an operator can increase the accuracy of estimations and at the same time reduce the solution time for each ellipse. Forstner et al. [Reference Förstner and Gülch52] proposed one of the first operators of this kind for the image of a circle and its surrounding edge pixels (Fig. 1(d). Ouellet et al. [Reference Ouellet and Hébert53] proposed an operator that uses perpendicular lines to gradient vectors for fitting a line (dual) ellipse, see Fig. 1(e). Ouellet’s implementation of gradient field with perpendicular lines reduced the angular error caused by noise compared to Forstner’s method which directly uses gradient vectors. Later, Patraucean et al. [Reference Pătrăucean, Gurdjos and von Gioi54] augmented Ouellet’s solution by the simultaneous usage of gradient and point data, here called point-line method. They showed that when the data is sampled along incomplete (occluded) ellipses, especially when the ellipse is small, simultaneous solution of positional and gradient data improves the accuracy of fitting results.
Accurate target center measurement is a key element of optical tracking in robotic applications. The subsequent steps in three-dimensional (3D) coordinate triangulation and six-dimensional (6D) pose estimation rely on the results of target center location from images. While many studies in the past have investigated the image processing aspects of target localization, the performance of the above-mentioned methods have not yet been compared in the context of robot tracking. This study explores aspects like augmenting and sub-pixel processing of data points to enhance the precision of target center location. We test the impact of different methods on the accuracy and precision of measuring robot’s motion in 3D distance and 6D pose estimation.
The rest of the article is organized as follows: Section 2 presents the target localization methods investigated in this study. Section 3 describes the experimental setup and presents a comprehensive comparison of these methods through experimental evaluation. Section 4 develops the methodology for relative error correction of the drilling spindle with respect to the part. The results of error correction in positional and orientational robot motions are presented, and finally, section 5 concludes the results and findings of this article.
2. Circular target localization methods
A conic section $\mathcal{C}$ is the set of all points satisfying the bivariate quadratic polynomial
where $A, B, C, D, E, F$ are the conic section parameters, which from here on are referred to as vector $\boldsymbol{\theta}_{\mathcal C} =[A, B, C, D, E, F]^{T}$ , and $x,y$ are the Cartesian point coordinates. The following subsections briefly review the theories of the target localization methods investigated in this work.
2.1. Point ellipse
In the homogenous coordinates, where the point $\mathbf{p}=(x,y,1)^{\mathrm{T}}$ , equation (1) can be written in the matrix form as
where $\mathbf{C}_{\mathcal{C}}$ is the matrix of coefficients for the conic section $\mathcal{C}$ and can be found as
Let $\mathcal{C}$ be an ellipse represented by equation (2), with $\mathbf{p}$ as the homogenous coordinates of a data point with $(x,y)$ coordinates and $\mathbf{C}_{\mathcal{C}}$ as the matrix of coefficients. Now, having $n$ points to fit with ellipse $\mathcal{C}$ , the problem is to minimize the distance
where $F(\mathbf{p}_{i},\boldsymbol{\theta}_{\mathcal{C}})$ , the cost function, is the algebraic distance of the point $\mathbf{p}_{i}$ to the fitted ellipse and
is the so-called design matrix with $n\times 6$ dimensions [Reference Fitzgibbon, Pilu and Fisher50]. To avoid trivial solutions and recognize that any multiply of $\boldsymbol{\theta}_{\mathcal{C}}$ denotes the same solution, the minimization problem is subject to the constraint function $h(\boldsymbol{\theta}_{\mathcal{C}})=0$ which links the conic section parameters together for a desired output geometry, in this case ellipses.
The direct ellipse-specific constraint can ensure ellipticity by embedding $4ac-b^{2}\gt 0$ and can be used in the quadratic form of
where $\mathbf{N}$ is the $6\times 6$ constraint matrix defined as
A closed-form solution exists for this problem and can be solved using the method of Lagrange multipliers [Reference Fitzgibbon, Pilu and Fisher50]. The same constraint can also be used for fitting a line (dual) ellipse [Reference Ouellet and Hébert53].
In point ellipse methods, where the input data are pixel points such as Fitzgibbon’s [Reference Fitzgibbon, Pilu and Fisher50], we can improve the fitting results by correcting the edge points using local curvature information. Although this process increases the computation and complexity of the estimator, we investigate the effect of this approach on the input data for point ellipse method by Fitzgibbon [Reference Fitzgibbon, Pilu and Fisher50] and refer to it as corrected point ellipse. The details about the edge correction algorithm can be found in ref. [Reference Heikkila51].
2.2. Line ellipse
The dual of a conic section can also be represented using the line $\mathbf{l}=(a,b,c)^{\mathrm{T}}$ that is tangent to the dual conic satisfying the equation [Reference Ouellet and Hébert53]
where $\mathbf{C}_{\mathcal{C}}^{\mathrm{*}}$ is the inverse of $\mathbf{C}_{\mathcal{C}}$ , called dual matrix of coefficients. Here, the resulting ellipse represented with lines is called line ellipse in contrast with the ellipse represented by points, i.e., point ellipse. An example of a line ellipse can be seen in Fig. 1(e).
Given a point $\mathbf{p}_{i}=(x_{i},y_{i})^{\mathrm{T}}$ and image gradient vector $\mathbf{g}_{i}=\left(\frac{\partial I}{\partial x}\left(x_{i},y_{i}\right),\frac{\partial I}{\partial y}\left(x_{i},y_{i}\right)\right)^{\mathrm{T}}$ on the ellipse $\mathcal{C}$ , based on pole-polar duality, when $\mathbf{p}_{i}$ is located on the ellipse $\mathcal{C}$ , its polar line ( $\mathbf{l}_{i}$ ) is tangent to $\mathcal{C}$ at the point $\mathbf{p}_{i}$ . The line $\mathbf{l}_{i}$ is then orthogonal to the gradient vector $\mathbf{g}_{i}$ , and a vector orthogonal to $\mathbf{g}_{i}$ that lies on $\mathbf{l}_{i}$ can be found in homogenous coordinates as $\mathbf{g}_{i}^{\bot }=\left(g_{ix}^{\bot },g_{iy}^{\bot },0\right)^{\mathrm{T}}=\left(-\frac{\partial I}{\partial y}\left(x_{i},y_{i}\right),\frac{\partial I}{\partial x}\left(x_{i},y_{i}\right),0\right)^{\mathrm{T}}$ [Reference Ouellet and Hébert53]. In the case of line ellipse, having $n$ lines to fit with ellipse $\mathcal{C}$ , the problem is to minimize the distance
where $G(\mathbf{p}_{i},{\mathbf{g}_{i}},\boldsymbol{\theta}_{\mathcal{C}})$ , is the cost function and
is the design matrix with $n\times 6$ dimensions.
2.3. Point-line ellipse
The positional equations in point ellipse can also be augmented with the gradient information in line ellipse by stacking equations (2) and (8), as proposed in ref. [Reference Pătrăucean, Gurdjos and von Gioi54]. In this case, the minimization problem is defined by adding a second term of algebraic distance for the gradient contribution as
subject to $h(\boldsymbol{\theta}_{\mathcal{C}})=0$ , where
is the new design matrix with $2n\times 6$ dimensions. The constraint proposed by Fitzgibbon et al. [Reference Fitzgibbon, Pilu and Fisher50], $h(\boldsymbol{\theta}_{\mathcal{C}})=\boldsymbol{\theta}_{\mathcal{C}}^{\mathrm{T}}\mathbf{N}\boldsymbol{\theta}_{\mathcal{C}}-1$ , can now be applied to find a closed-form solution using the method of Lagrange multipliers.
2.4. Centroiding
The gray-weighted centroid method is widely used in photogrammetry applications and commonly serves as the benchmark for other algorithms. In this method, the pixel intensity, $I(x_{i}, y_{j})$ , is assigned to each pixel as the weight, and the target center coordinates, $x_{c}$ and $y_{c}$ , are found as:
Now that we have reviewed the theoretical concepts and methodologies for circular target localization, in the next sections we compare the performance of these methods through simulations and experiments.
3. Experimental evaluation of target localization methods
3.1. Experimental setup
An IR stereo vision setup has been developed as an optical tracking system to conduct experiments and compare different target localization methods in robot measurement scenarios. This system includes two 3.2 Megapixel monochrome cameras paired with 12 mm lenses and 715 nm (near IR) long-pass filters. The cameras are synchronized using IEEE-1588 Precision Time Protocol (PTP) and capture images in under 10 micro-seconds from each other. Each camera is also equipped with 850 nm IR ring lights. The targets are adhesive retroreflective stickers with 12 mm of diameter and a black contour. A Leica AT960 short-range laser tracker has been used as the reference measurement system for calibration of the stereo vision setup and evaluation of measurements (see Fig. 2(a)). The laser tracker has a positional uncertainty of ±30 μm in the operation range of ∼ 2.5 m used in this study. In all measurements, a UR5e collaborative industrial robot from Universal Robots has been used along with necessary attachments designed for holding the measurement targets (see Fig. 2(b).
3.2. Camera calibration
To triangulate the targets with high accuracy, it is required to perform camera calibration and find the camera and lens parameters with high accuracy. A three-dimensional measurement volume of 400 mm × 400 mm × 400 mm in robot workspace has been selected for calibration of cameras and performing the measurement tests (see Fig. 3). A calibration end-effector with nine passive targets is designed in a way that holds a laser tracker target (SMR) at the barycenter of the targets (see Fig. 3). The end-effector is then moved to a grid of 5 × 5 × 5 equally distanced points within the measurement volume using the robot. The position of the laser tracker target is measured at each point. Each camera captures an image from each end-effector pose and the corresponding image point is found as the barycenter of the nine targets. Since the calibration end-effector is made by manually positioning the passive targets, the barycenter of targets cannot accurately align with the SMR center. However, this error only results in a translated camera coordinate system and does not affect the accuracy of calibration if the end-effector is moved between the calibration points with purely translational motion and with negligible rotations. The calibration is then performed using Bouguet’s calibration method implemented in the OpenCV library. To increase the accuracy of target center localization in the presence of lens distortion, the iterative calibration process introduced in ref. [Reference Datta, Kim and Kanade55] has been implemented. In this process, the camera parameters are recomputed and refined iteratively through successive undistortion and unprojection of images from the calibration artifact, which results in higher accuracy in center localization of each individual target. As for the target detection method, the point-line ellipse method is used for finding the center of each target, and the barycenter of all nine targets is then calculated and used as the image point in the calibration process. For the camera model, a pinhole model is considered with 3 and 2 parameters for radial and tangential lens distortions, respectively.
3.3. Analysis of 3D accuracy
The calibrated stereo vision system is used to estimate positions of the robot end-effector in the measurement volume. The laser tracker is also used to measure the end-effector positions at the same time and evaluate accuracy. This evaluation is performed using the distances between two successive positions. Using the distance as the accuracy metric eliminates the need for transformation between the laser tracker and vision system which itself can introduce additional errors to the estimations. For this test, the robot is moved to 100 pair of points that are 100 mm apart from each other and positioned randomly inside the measurement volume. The robot end-effector includes an SMR and up to 4 passive targets around it. Except for the case that includes only one circular target, the targets are mounted in a way that their barycenter coincides with the SMR center, see Fig. 4(c). At each point, the vision system was used to measure the 3D positions of the targets, and their barycenter was calculated when more than one target was used. Then, the Euclidean distance between the subsequent positions were calculated and compared with the distance measured by the laser tracker. This experiment has been performed for different number of targets from 1 to 4 and using different target localization methods. The result of experiments is shown in Fig. 4(a) and (b) in terms of mean and standard deviation of errors, respectively. It can be seen that the 3D distance errors in line ellipse and point-line ellipse methods show higher accuracy and precision compared to other methods. Their mean error for a single target was found to be approximately 0.033 mm. As the number of targets increases, the error reduces gradually, reaching 0.025 mm for four targets. Other methods show a trend in accuracy which is somewhat close to each other, starting from around 0.075 mm for a single target to around 0.050 mm for four targets. This indicates that methods that use gradient information can achieve significantly higher distance accuracy with even a single target, compared to other methods such as centroiding.
3.4. Analysis of 6D accuracy
The 6D pose of the robot is measured using the stereo vision setup and the laser tracker’s 6D probe, Leica T-Mac. The T-Mac probe is equipped with an SMR and 10 additional IR LEDs, which enable precise and absolute 6D pose measurements. The measurements are compared to that of the laser tracker in terms of three positional and three orientational components. The T-Mac probe is rigidly attached to the robot flange, along with circular passive targets, as seen in Fig. 5(a). The passive targets are affixed to the flat panels located on the top and bottom of the T-Mac probe in a circular pattern with an average diameter of 160 mm. Different numbers of targets are used for pose estimation (4, 8, 18, and 30 targets) as shown in Fig. 5(b). The robot has been moved to 100 randomly selected locations in the measurement volume, each separated by a distance of 100 mm in 3D space. To establish the three orientations of the robot at each point, the end-effector roll, pitch, and yaw rotations are randomly selected from the range of [–15, 15] degrees. This allows for the experiment to be performed in a range of different orientations and positions, ensuring that the accuracy of the pose measurement is thoroughly investigated inside the measurement volume.
In this experiment, the measurements obtained from the laser tracker are the absolute pose of the T-Mac probe with respect to the laser tracker coordinate system. On the other hand, the measurements from the stereo vision are relative poses found by calculating rigid-body transformations using singular-value decomposition (SVD) between two subsequent positioning of targets with respect to the stereo vision coordinate system. To generate absolute measurements with the stereo vision system, an initial pose of robot at the center of measurement volume is set as the initial reference, where a frame parallel to the stereo vision coordinates is assigned to. This initial frame is shown as absolute transformation $\mathbf{A}_{\mathbf{0}}$ in Fig. 6. In Fig. 6, the same pose is measured from the T-Mac probe using the laser tracker with absolute transformation $\mathbf{B}_{\mathbf{0}}$ . Since the stereo vision targets and T-Mac probe are rigidly attached to the robot flange, the rigid-body transformation $\mathbf{X}$ between them can be found by solving the “ $\mathbf{AX}=\mathbf{XB}$ ” calibration problem. Here, the simultaneous solution proposed by Li et al. [Reference Li, Wang and Wu56] based on Kronecker product is used to find $\mathbf{X}$ , which is the transformation between the frame attached to the T-Mac probe and the frame assigned to the passive targets. In the calibration process, 40 pairs of measured poses $(\mathbf{A}_{\mathbf{0}\mathbf{i}}, \mathbf{B}_{\mathbf{0}\mathbf{i}})$ are used. To perform the comparison between the stereo vision and the laser tracker, the measured 6D poses from the stereo vision system, $\mathbf{A}_{\mathbf{0}\mathbf{i}}$ , are compared with $\mathbf{X}{\mathbf{B}_{\mathbf{0}\mathbf{i}}}\mathbf{X}^{-\mathbf{1}}$ , which is the measured pose from the laser tracker expressed in stereo vision coordinate system. For each measured pose, the deviations of position and orientation are calculated from laser tracker as errors. The root-mean-square (RMS) values obtained from different methods of target localization in different number of targets are presented in Fig. 7.
From Fig. 7, it is evident that the line ellipse and point-line ellipse methods exhibit the lowest level of errors in both positional and orientational elements. However, all the methods demonstrated a higher baseline error compared to the distance errors obtained in the previous section. This can be attributed to the inclusion of a calibration step between the laser tracker and vision system in the present analysis. Nevertheless, calibration errors are unavoidable in practical robotic applications, where multiple levels of calibration are necessary, such as calibration between the vision system and the robot base frame, and between the robot flange and tool center point (TCP). While the analysis of distance errors demonstrated a significant advantage of gradient-based methods over the centroiding method, from Fig. 7, it can be inferred that the advantage is reduced in a more practical setup that includes calibration steps between different coordinate frames. Therefore, in a practical scenario, the disadvantage of the centroiding method in terms of accuracy can be neglected, as it offers a higher computational efficiency.
Table I presents the numerical values depicted in Fig. 7, obtained with 30 targets, along with the robot errors measured using the laser tracker. To enable a comparison in terms of computation time, the solution time for each method is also included. The solution times were obtained on an AMD 5900X CPU, with the methods programed in Python 3.7 using NumPy library version 1.21.6.
The results presented in Table I demonstrate that all of the tested methods can achieve a measurement accuracy that exceeds that of the robot itself. This improved accuracy has potential applications in fields such as robot calibration and error correction, robot guidance, and assembling sensory data. Furthermore, it is shown that the centroiding method, which is computationally more efficient than other methods (around 10 times), exhibits only a slightly higher error compared to the methods with the best performance, i.e., line ellipse and point-line ellipse methods. This difference in error may be acceptable in many practical applications when higher measurement frequency is required, or lower computational resources are available. Overall, the results presented in Table I provide insights into the trade-offs between accuracy and computation time for different target localization methods and can help guide the selection of an appropriate method for a given robotic application.
4. Drilling spindle pose correction
The pose measurement of the drilling spindle is an essential step for correcting robot errors and improving the accuracy of the drilling process. In the previous sections, we introduced different methods for localizing the targets attached to the drilling spindle. In this section, we present the procedure for measuring the drilling spindle pose using the developed system and applying the correction to the robot motion.
After localizing the targets, the 3D coordinate of each target is calculated using triangulation. Each triangulated target is measured as a 3D point coordinate, $(x, y, z)$ , in the optical tracker’s coordinate system, here called world frame, $\mathcal{F}_{w}$ . Having the position of at least three noncolinear targets, a coordinate frame, $\mathcal{F}_{s}$ , can be assigned to the drilling spindle. The pose of $\mathcal{F}_{s}$ with respect to the world frame $\mathcal{F}_{w}$ is measured as ${}^{w}{\mathbf{T}_{s}}{}$ . Since the initial arrangement of the targets on drilling spindle is arbitrary, $\mathcal{F}_{s}$ is an arbitrary frame and the measurements are to be performed with respect to an initial pose of the drilling spindle. Here, the initial frame of the spindle is assigned in a way that the center of spindle frame, $\mathcal{F}_{s}$ , is positioned at the barycenter of the targets, in parallel with the coordinate frame of the optical tracker $\mathcal{F}_{w}$ , see Fig. 8. As long as the optical tracker observes the targets, it can measure the rigid body motion of the targets due to the motion of the drilling spindle.
Here, the relative pose of two subsequent configurations of the targets in 3D space is found using the Least-Squares fitting method based on singular value decomposition (SVD) proposed in ref. [Reference Arun, Huang and Blostein57]. It should be mentioned that since the initial spindle frame is an arbitrary frame, it can be assigned to any point in space and in any orientation.
4.1. Robot-camera calibration
Figure 9 shows the kinematic chain between the vision system and the robot. In this chain, the transformation between the robot base frame, $\mathcal{F}_{b}$ , and the optical tracker is shown as ${}^{w}{\mathbf{T}_{b}}{}$ . The transformation between the robot flange, $\mathcal{F}_{f}$ , and the assigned frame to the spindle targets, $\mathcal{F}_{s}$ , is presented as ${}^{s}{\mathbf{T}_{f}}{}$ . Both ${}^{w}{\mathbf{T}_{b}}{}$ and ${}^{s}{\mathbf{T}_{f}}{}$ are unknown and required to be found so that the measurements can be used for correcting the robot’s pose. This can be done by solving a calibration and pose estimation problem using the robot pose at different locations in the robot workspace and within the vision system’s field of view.
Here, there exist two sets of measurements from two separate systems: the vision system that measures the pose of frame attached to the spindle, and the robot controller that provides the flange pose. The spindle targets are rigidly attached to the moving spindle which is connected to the robot flange with a fixed unknown transformation, ${}^{s}{\mathbf{T}_{f}}{}$ . The targets are tracked at poses j = 1, 2, …, n by the vision system with the data being represented as ${}^{w}{\mathbf{T}_{s}^{j}}{}$ , while simultaneously the robot controller is reporting the flange pose at j = 1, 2, …, n with the data being represented as ${}^{b}{\mathbf{T}_{f}^{j}}{}$ . Both ${}^{w}{\mathbf{T}_{s}^{j}}{}$ and ${}^{b}{\mathbf{T}_{f}^{j}}{}$ are measured with respect to their own reference coordinate frame, i.e., vision system and robot base, respectively. Therefore, if the unknown fixed transformations ${}^{w}{\mathbf{T}_{b}}{}$ and ${}^{s}{\mathbf{T}_{f}}{}$ are calculated, one can transform the measured pose from one coordinate frame, i.e., the vision system, to the coordinate frame of the other system, i.e., the robot. Thus, the data from the vision system can be directly compared with the robot’s motion.
The unknowns ${}^{w}{\mathbf{T}_{b}}{}$ and ${}^{s}{\mathbf{T}_{f}}{}$ can be found by solving the calibration problem
at poses j = 1, 2, …, n. This problem is commonly referred to as $\mathbf{AX}=\mathbf{YB}$ in the context of calibration between two systems of measurement. If there is no sensor noise, theoretically, only three unique poses are required to find a solution. In practice, however, there is always measurement noise, making it necessary to use a higher number of measurements for more generalized solutions. The closed form solution based on Kronecker product and singular value decomposition proposed in ref. [Reference Shah58] can be used to estimate both unknown transformations.
4.2. Relative pose correction
Here, the goal is to correct the robot pose errors using the developed measurement system. To do so, one can define a point (or path) in the measurement coordinate frame and guide the robot to the defined point. This requires finding the transformation between the robot and the measurement system, ${}^{w}{\mathbf{T}_{b}}{}$ , and the transformation between the spindle targets and robot flanges, ${}^{s}{\mathbf{T}_{f}}{}$ , to generate the robot commands that guides the robot spindle to the desired point. This solution, however, accumulates the errors from the calibration process of both robot and spindle and does not fully exploit the benefit of the developed measurement system.
To increase the accuracy of robot guidance, here, we rely on the ability of the vision system in measuring rigid body frames relative to each other in the same reference frame, i.e., the world frame. To do so, high-precision fixture targets are assumed to guide the robot to the location of drilling in the robot workspace. These fixtures localize the part and act as a reference frame in which the drilling operation is performed. Figure 10 shows the schematics of the relative pose error correction between the drilling spindle moved by the robot with respect to the fixture frame, $\mathcal{F}_{t}$ . In this case, ${}^{t}{\mathbf{T}_{s}^{des.}}{}$ is the desired relative pose between the spindle and the fixture, and the goal is to make the measured relative pose between the spindle and the fixture, ${}^{t}{\mathbf{T}_{s}^{mea.}}{}$ , as close as possible to the desired one. Therefore, one can compute the error between the desired and measured poses as ${}^{des.}_{w}{\mathbf{T}_{mea.}}{}$ with respect to the measurement frame. This relative error can be corrected by sending a relative motion command to the robot flange which is represented as ${}^{des.}_{f}{\mathbf{T}_{mea.}}{}$ in Fig. 10. We can compute the robot relative motion command, ${}^{des.}_{f}{\mathbf{T}_{mea.}}{}$ , as
Computation of ${}_{f}^{des.}{\mathbf{T}_{mea.}}{}$ requires the unknown transformation between the flange and spindle, ${}^{s}{\mathbf{T}_{f}}{}$ , that can be found by solving the relative calibration problem of form $\mathbf{AX}=\mathbf{XB}$ .
4.3. AX = XB calibration
Figure 11 shows the parameters involved in the computation of the transformation between the robot flange and the spindle frame, here shown as $\mathbf{X}$ , by solving the homogeneous transformation equation given by
$\mathbf{A}$ and $\mathbf{B}$ are the homogeneous transformation matrices representing the relative motion of the spindle frame, $\mathcal{F}_{s}$ , with respect to the measurement frame and the relative motion of the robot flange, $\mathcal{F}_{f}$ , with respect to the robot base frame, respectively.
Equation (16) can be represented in the form of rotation and translation parts as
where $\mathbf{R}$ and $\mathbf{t}$ indicate rotation matrices and translation vectors, respectively. The subscripts $A$ , $B$ , and $X$ indicate to which homogenous transformation they belong. The calibration process involves measuring sets of poses from the robot and the measurement system at the same time.
Different methodologies have been proposed in the literature for solving the $\mathbf{AX}=\mathbf{XB}$ calibration problem. These methods can be classified into separable and simultaneous solutions. In separable solutions, the rotation part of equation (17) is first estimated, and then, the translation part is estimated based on the computed rotation. Methods proposed by Chou and Kamel [Reference Chou and Kamel59] and Park and Martin [Reference Park and Martin60] can be categorized as separable solutions, where they used unit quaternions and Lie-algebra, respectively, to represent the rotation parameters. The separable solutions suffer from the loss of the intrinsic relationship between the rotation and translation parameters since the estimated rotation is necessary for computation of the translation vector [Reference Chen61]. To overcome this limitation, simultaneous solutions have been devised. Daniilidis and Bayro-Corrochano [Reference Daniilidis and Bayro-Corrochano62] introduced the use of dual quaternions as a method for simultaneously estimating the rotation and translation components. Another simultaneous solution was developed by Lu and Chou [Reference Ying-Cherng Lu63], who formulated a linear system of equations using quaternions. In a different approach, Li et al. [Reference Li, Wang and Wu56] incorporated the Kronecker product in their solution. Here, we perform an experimental comparison between these solutions for $\mathbf{AX}=\mathbf{XB}$ calibration problem.
4.3.1. Experimental evaluation of calibration methods
The performance of different calibration solutions is evaluated in the context of robot calibration with respect to the vision measurement system. A set of 100 poses inside the robot workspace are randomly selected (see Fig. 12). To ensure the visibility of the targets from the cameras, each pose is constrained to the rotation angles in the range of [–15°, 15°] for pitch, roll, and yaw. The performance of different solution methods has been evaluated using the residual transformation error defined as
This formulation evaluates how accurately the transformation measured from the measurement system can be transformed to the robot flange transformation using the estimated $\mathbf{X}$ . The transformation error matrix at each location is then converted to a 6D pose vector in the form of $e_{i}=[x_{e}, y_{e},z_{e}, {r_{x}}_{e},{r_{y}}_{e},{r_{z}}_{e}]^{T}$ , and the error for each solution is defined as
where $e_{m}$ is the mean of error vector norms, $e_{i}$ , over the $N$ pair of measurements, $\mathbf{A}_{i}$ and $\mathbf{B}_{i}$ , used for each solution. $e_{m}$ is a unitless metrics that indicates how close $\mathbf{A}_{i}$ and $\mathbf{X}\mathbf{B}_{i}\mathbf{X}^{-\mathbf{1}}$ are using the estimated transformation $\mathbf{X}$ .
Fig. 13 shows the mean error values, $e_{m}$ , from different calibration methods obtained from 5 to 100 measurements. As can be seen, the simultaneous solution proposed by Li et al. [Reference Li, Wang and Wu56] based on Kronecker product yields in the lowest error values. This method also shows the fastest convergence rate as the results converge after 12 measurement points used as the input. Therefore, this method has been selected in this study for estimating the transformation between flange frame and the spindle frame, ${}^{s}{\mathbf{T}_{f}}{}$ .
4.4. Iterative online relative pose correction
Now, using the relative pose error formulation introduced in equation (15), we can correct the difference between the desired and measured relative poses to perform online pose correction. Figure 14 shows the flowchart of the algorithm used in this study for correcting the relative error in robot. First, the user sets the relative pose between the drilling spindle and the workpiece. The relative pose between the drilling spindle and the fixture is then measured and the robot motion is calculated using ${}^{s}{\mathbf{T}_{f}}{}$ transformation, i.e., $\mathbf{X}$ found in $\mathbf{AX}=\mathbf{XB}$ calibration. Next, the robot is relocated to correct the relative error, and the process continues until the accuracy requirement is met.
Since ${}^{s}{\mathbf{T}_{f}}{}$ has been estimated using relatively inaccurate robot poses, there will always be uncertainty in its estimation. As shown in Fig. 13, even the best-performing calibration method has residual errors. The inaccuracy in ${}^{s}{\mathbf{T}_{f}}{}$ along with other sources of inaccuracy in robot’s motion result in residual error in the corrective motion that the algorithm applies. Therefore, the spindle error with respect to the fixture cannot be corrected in one corrective motion and it requires multiple iterations until the accuracy requirements are met.
4.4.1. Experimental evaluation
The developed vision system has been used to correct robot errors in an iterative approach until the positional and orientational component errors are less than 0.1 mm and 0.1°. Relative errors of about 1 mm and 0.1° are considered for the initial positional and orientational errors, respectively. The results of error correction are shown in Fig. 15 for each individual motion component measured from the vision system. As can be seen, the robot errors are corrected with two steps that satisfies the accuracy requirements.
4.5. Closed-loop relative pose correction
The error correction can also be implemented in the form of a closed feedback loop for a smoother corrective motion. The error between the desired and measured relative poses can be considered as the feedback signal for the controller to perform online pose correction. As shown in Fig. 16, the feedback signal from the vision system provides the measured pose of the spindle and fixture, which is then used for computing the relative pose signal, ${}^{t}{\mathbf{T}_{s}^{mea.}}{}$ . The desired relative pose, ${}^{t}{\mathbf{T}_{s}^{des.}}{}$ , is also set by user, i.e., reference input. The relative pose error signal, ${}^{des.}_{w}{\mathbf{T}_{mea.}}{}$ , is calculated in the measurement frame. The robot motion command, ${}^{des.}_{f}{\mathbf{T}_{mea.}}{}$ , is then estimated and used as the input signal for the controller. ${}^{des.}_{f}{\mathbf{T}_{mea.}}{}$ is converted to the 6D vector format consisting of the translational and rotational rigid body motion components that need to be applied to the robot flange frame relative to its current pose. The controller output is the motion command sent to the robot controller and later to the robot.
The PID controller for the robot motion can be expressed as
where $u_{i}(t)$ is the PID output to the robot controller, and $e_{i}(t)$ is the relative pose error with $i=x, y, z, r_{x},r_{y}, r_{z}.$ The coefficients $K_{p}, K_{i}, K_{d}$ are the proportional, integral, and derivative gain values, respectively, that require tuning for a desirable performance. The selected controller gains after tuning are listed in Table II.
4.5.1. Controller performance
In order to demonstrate the performance of the developed and tuned controller in online error correction, the robot error in each individual axis has been corrected. For this test, the drilling spindle has been placed in a distance from the desired pose in each axis, and then, the controller has been used to correct the error. Figure 17 shows the recorded axis errors over the time of experiment for positional and orientational errors. As can be seen, the controller is able to converge to errors less than tolerance requirements by gradually and smoothly approaching to the desired reference point in each individual motion axis.
4.5.2. Online 6D pose correction
Finally, we test the developed system and controller in a general 6D error correction scenario. In this case, the robot is relocated with respect to the fixture to create positional and orientational errors of about 1 mm and 1°, respectively, in all axes of motion simultaneously. Figure 18 shows the result of error correction measured by the measurement system. As can be seen, the controller is able to correct the errors by gradual motion of spindle with respect to the workpiece.
5. Conclusion
In this study, we have presented a comprehensive analysis of high-precision vision-based target localization methods for robot pose measurement and an online vision-based error correction approach. We evaluated the accuracy, precision, and computation time of five different target localization methods, including centroiding, ellipse fitting with point data and gradient information, and ellipse fitting methods with augmented and corrected input data. Our results show that gradient-based methods exhibit the lowest level of errors in both position and orientation estimations. However, all the tested methods can achieve a measurement accuracy that exceeds that of the robot itself. Furthermore, we showed that the inclusion of calibration steps between different coordinate frames can reduce the advantage of gradient-based target localization methods over more efficient methods such as centroiding in terms of accuracy. Therefore, when the highest levels of accuracy are required, proper calibration procedures must be performed for different stages of calibration. In the context of robot error correction, we implemented an iterative approach to correct the robot’s relative errors with respect to the workpiece and showed the performance of the solution using the developed vision system. To make the robot’s corrective motion smoother, we implemented and tuned a PID controller that uses the measurement data from the vision system as the feedback signal. The performance of the developed solution in correcting each individual motion axis and also general 6D errors has been demonstrated.
In conclusion, the results presented in this study and the insights gained from our experiments can help guide the selection of an appropriate method for a given robotic application and provide a robust solution for online vision-based error correction.
Author contributions
Ali Maghami contributed to research conceptualization, methodology, experimental testing, and writing the manuscript. Matt Khoshdarregi contributed to research conceptualization, planning, securing funding, equipment acquisition and setup, and editing the manuscript.
Financial support
This work was supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada under grant number RGPIN-2019-05873; and the Mitacs Accelerate program under project number IT20382; and the Research Manitoba Innovation Proof of Concept program under project number 4763.
Competing interests
The authors declare no competing interests.
Data availability statement
Data sharing is not applicable to this article.