I Introduction
Over the last several decades, robots have become very precise in performing repetitive pickandplace operations. However, complications arise when the positions of the parts involved in assembly vary between repetitions of the operation. A classical example of such a task is PiH insertion, which has been studied extensively in assembly for a long time, due to its relevance to manufacturing [22]. This task is a major component of a lot of assembly operations. Even though this task has been studied in robotics and automation research for a long time, this problem remains open in multiple aspects. Presence of pose uncertainty for the parts being assembled leads to complex contact configurations between the parts. Consequently, manipulation for successful assembly requires design of forcefeedback controllers that can interpret contact forces and correct the contact configuration, so that the parts could be assembled. Since these contact configurations depend on the physical, as well as the geometrical features of the objects being assembled, they are notoriously difficult to model precisely. As a result, learningbased approaches have been very popular for designing the required corrective controllers. However, learningbased approaches also present challenges when it comes to designing efficient controllers which can reliably perform assembly in the presence of sustained contact interactions.
The design of learningbased controllers requires an initial exploration phase where the robot has to explore different possible contact configurations so that a generalizable corrective policy could be learnt. We believe that one of the key components of the design of such controllers is the design of compliant controllers which can ensure safe interaction of the robot with the assembly components during the exploration phase. Whereas this is a key requirement for learning adaptive assembly controllers, design of such safe controllers remains mostly unexplored. With this motivation, we present the design of a class of accommodation controllers which guarantee that the contact forces will remain within safe bounds. This accommodation controller is used to collect contact force data to learn a relationship between misalignment and the expected contact forces. To interpret the data efficiently, we present analysis of the contact forces for the purpose of selecting features which can be used to learn a predictive model between contact forces and the amount of expected misalignment. Finally, this predictive model is used to design a corrective policy that allows assembly using the predictive model.
Contributions. This paper has the following key contributions:

We present the design and analysis of two different controllers for safe interaction between the robot and its environment in the presence of sustained contacts.

We present feature analysis for the design of an efficient forcefeedback controller for interpretation of different complex contact configurations.

We present the design and verification of a learningbased controller that makes use of the proposed safe accommodation controller and the proposed feature analysis for insertiontype assembly using a DoF manipulator system.
Ii Related Work
Automatic assembly is one of the most common robot applications, and what differentiates it from many other robotic applications is the need to carefully consider the effect of contact between assembled parts. Pure position control is usually inadequate, because if the robot uses such a controller to follow a reference trajectory exactly, even small misalignments between where the assembled parts are and where they were expected to be would result in very large forces, possibly damaging the robot and/or the parts. A much more suitable type of control for this application is force control that adjusts the motion of the robot in response to experienced contact forces. Such methods for robotic assembly are commonly known asadaptive assembly strategies (AAS) ([19, 8, 1, 15, 5, 14, 17]). A lot of the research in this area has focused on the PiH insertion problem, as a prototypical operation for various assembly tasks.
The main challenge in an AAS is how to interpret the measured force/torque (F/T) signals in order to direct the motion of the robot so as to accomplish the insertion. As early as the 1970s, it was successfully demonstrated that highaccuracy PiH insertion was possible by direct interpretation of F/T signals by a robot program [10]. However, the development of such robot programs is very complex, laborious, expensive, and casedependent, so this approach turned out to be impractical for wide industrial use. A more universallyapplicable approach is to follow a suitable position trajectory that would accomplish the task in the absence of collisions, and adjust the robot’s motion in response to contact forces, thus forming a force feedback controller [19]
. Many such controllers use a mapping from the F/T readings measured at the wrist of the robot (or the platform that the hole is mounted on), onto a correction to the trajectory. In some rare instances, this mapping can be computed analytically – for example, when the peg and hole are circular, have no angular misalignment, overlap at least to some extent, and the point around which the moments of the F/T sensor are computed lies on the axis of the peg (which is also the direction of insertion,
[7]). However, this kind of solution requires careful placement and alignment of the F/T sensor, and is not general enough for regular use.As this approach to designing force controllers reduces to finding a suitable mapping between F/T measurements and corrections to a nominal trajectory, a much more general method for obtaining this mapping, and thus a working controller, is to use machine learning methods for estimating the mapping from data. One early such method for
programmed compliance proposed in [20] used linear least squares to estimate a linear mapping between F/T measurements and corrections to either position or velocity, effectively learning the admittance and accommodation matrices used in linear compliance controllers. The training examples needed for learning were constructed based on general considerations about what the corrections should be for prototypical situations, and what contact forces might be measured in them. However, it was later demonstrated that for the contact configurations usually experienced in PiH insertion tasks, the mapping between forces and corrections was not linear [2, 3], and it was suggested to use neural networks to represent a nonlinear mapping between the two. This advance significantly expanded the type of mappings that could be learned, but still left open the question of how a suitable data set of training examples could be compiled, as doing this manually is excessively difficult for all but the simplest geometries. A much more appealing solution is to measure contact forces directly on a real robot by putting the peg and hole in various contact situations. Gullapalli et al. (
[8]proposed a reinforcement learning (RL) solution based on trial and error, which learned to associate the contact forces with a correction that was advantageous in bringing the peg closer to its desired end position, while also minimizing contact forces. The desired outcome was encoded in the reward function of the RL problem formulation. Although this approach achieved remarkable results, learning to insert a peg in a hole with clearances significantly lower than the accuracy (repeatability) of the robot used, this method still needed accurate knowledge of where the goal position was, in order to use it in the reward function. This precludes its direct use in the version of the problem we consider, where the uncertainty is precisely in the position of the hole, and thus in the correct end position that the peg should reach.
Following this seminal application of RL to PiH insertion, a number of later works explored the use of machine learning models for the design of adaptive force controllers. The application of deep RL for learning endtoend visuomotor policies was demonstrated in [15]. In addition to F/T sensors, tactile sensors have been employed, too ([6, 5]). As the instantaneous F/T readings might not be sufficient to disambiguate the contact configuration, the use of recurrent neural nets has been proposed in [11, 14]
. However, as is well known, RL often suffers from unfavorable sample complexity, making it less suitable for use on real mechanical systems. In contrast, we explore below a supervised learning approach to learning mappings between forces and corrections, thus significantly reducing the number of training samples needed.
The work proposed in this paper is closest to our previous work in [13]. However, compared to the work in [13], we present design of an additional nonlinear accommodation controller, we present a proof for convergence of the controller, as well as feature analysis for threshold detection. Furthermore, we show an improvement in the final insertion system which uses a DLbased hole detection method along with a faster controller for insertion.
Iii Problem Statement
In this section, we present the problem that we are trying to solve in this paper. Loosely speaking, the objective is to control the contact state and the problem state (i.e., the pose of the peg for our case) during an insertion attempt. The schematic in Figure 2 shows the twofold objective of using force feedback in controller design for assembly. The force feedback is used to design a lowerlevel controller to limit interaction forces in the event of contact formation during an insertion attempt. As could be seen in the figure, any insertion attempt leads to a contact formation and the goal is to use the corresponding force signature to correct for the underlying misalignment. However, we would like that the interaction forces obtained during any arbitrary contact formation be bounded, irrespective of the reference trajectory provided to the robot. Furthermore, we would like to learn models using quasisteady behavior of the system for ease of learning and prediction.
In all all of these cases, we assume that the misalignment is only in a plane, and that there is no angular misalignment between the peg and the hole. This corresponds to the often encountered case in practice when the hole base slides across a working surface in a factory, for example a workbench. In order to train a model, we try to solve the following problems in this paper, which are then used together to design a forcefeedback controller for performing peginhole assembly in the presence of significant positional inaccuracy.

Suppose that we have a reference trajectory for insertion, which is denoted as . Suppose that due to contact formation, the robot experiences a sequence of contact forces denoted as . The force control task is to design a force feedback controller that modifies using a force feedback law so that such that , , where is arbitrarily small.

The second task is to then analyze and use the force signature data obtained from the force controller to design a force feedback controller to correct the misalignment between the peg and hole position.
In summary, the goal is to use force feedback to design both the lower level accommodation controller, as well as the corrective policy that can allow the robot to correct any contact formation for successful assembly (see also Figure 2).
Iv Controller Design
In this section, we present the design and analysis of the compliance controllers that we use to ensure safe interaction during insertion. We believe that this is a critical step to ensure safety of the learning process. Even though there has recently been a lot of work in robot learning approaches for performing manipulation, ensuring the safety of the contactrich interactions during these tasks has largely been overlooked. However, this is a very critical requirement for adoption of these learningbased approaches for use in assembly operations, and many other related operations. Based on this motivation, we present the design and analysis of two different kinds of controllers using force feedback with different convergence behaviors.
In both of these controllers, we use the force measured by a forcetorque sensor mounted at the wrist of the robot (see Figure 1) to adapt a reference trajectory to regulate interaction forces experienced by the robot with their environment. The idea is to use force feedback to modify the reference trajectory so as to limit the contact forces to allowable bounds. For clarity of presentation, we present block diagrams for both controllers in Figure 3.
Iva Linear Accommodation Controller Design
The operation of the proposed linear accommodation controller is presented in Figure 3. As could be seen in the block diagram in Figure 3, the accommodation controller modifies the reference trajectory using force feedback. In particular, the accommodation controller uses the following feedback law to modify the reference trajectory of the robot. Let us denote the discretetime reference trajectory by , the trajectory commanded to the lowlevel position controller by , the experienced forces by , the measured position as , at any instant . Note that the here denotes the control time index and not the actual time in seconds. In this design, we employ a lowlevel compliant position controller that makes the robot behave like a springdamper system with desired stiffness and damping coefficients. Most robot vendors provide such a stock controller with the robot, or if not, one can be implemented relatively easily ([18]). Let us denote the stiffness constant of the compliant position controller by and the accommodation matrix for the force feedback by . For simplicity, we consider a diagonal matrix . With this assumption, we present the forcefeedback law for updating the commanded position along each individual axis next. The commanded trajectory sent to the robot is computed using the following update rule (also see Figure 3):
(1) 
where is a discounting parameter for computing the integral error, and are desired position increments computed from the reference trajectory. An actual force trajectory obtained for a reference trajectory that advances with constant velocity along the axis of the robot under the operation of the linear accommodation controller is shown in Figure 4a. Note that even though the reference trajectory keeps advancing, the experienced force stabilizes; this behavior is in contrast to that of the stock compliant controller, where contact forces grow proportionally to the advance of the reference position, and can easily become dangerously large for the robot or manipulated parts. (It is generally not feasible to limit these forces by making the stiffness of the stock compliant controller very low, because the robot does not know exactly where an obstacle will be encountered. In contrast, the proposed accommodation controller guarantees bounded forces, even if the reference trajectory advances to infinity, as long as this happens at a constant velocity. The latter condition can be guaranteed easily by sampling any desired geometric reference trajectory accordingly.)
IvB Nonlinear Accommodation Controller Design
Next, we present a nonlinear feedback law to design an accommodation controller. The corresponding block diagram for this controller is shown in Figure 3. Using the nomenclature from Section IVA, the nonlinear force feedback law is given by the following equation (also see Figure 3):
(2) 
where , where is specified by the user and approximately defines the force around which the controller would converge. Note that the control law in 2 does not use an integration block. Rather, the idea here is to use the the force feedback to cancel any increment of the commanded trajectory. The proposed feedback law ensures that the feedback does not interfere with the movement of the robot in free space (as the force feedback is close to zero in free space). However, since quickly converges to as forces go beyond , this would lead to convergence of the commanded trajectory and hence of the contact forces. The convergence behavior could also be seen in the plot for the nonlinear accommodation controller in Figure 4b.
IvC Convergence Analysis
Next, we state a Theorem which proves that under the assumption of constant velocity of trajectories, the interaction forces will converge for the controller presented in Section IVA and IVB. To prove convergence of forces, we will need an assumption that relates the robot position and the commanded position with contact forces.
Assumption 1
The robot is equipped with a stiffness controller with stiffness constant such that the forces observed during an interaction is given by , where , and are the robot actual state, robot commanded state and observation noise respectively.
We make another assumption regarding the velocity of the reference trajectory of the robot.
Assumption 2
The reference trajectory of the robot has a constant velocity, i.e., .
With these two assumptions, we can now state the following theorem.
Theorem IV.1
Suppose that a reference trajectory with constant velocity is modified with the force feedback specified in Equations (1) and (2). Suppose that the robot makes contact with a rigid environment at time instant . Then, we have that the following is true: a such that , , where can be made arbitrarily small.
Since the robot moves with constant velocity in free space, there is no force experienced by the force sensor (except for the measurement noise). Thus, we ignore the part of the trajectory before contact formation.
Upon contact formation with the external environment, using Assumption 1 (we ignore the noise term) and Equation (1), we get the following:
(3) 
For simplicity of notation, let us denote the summation term by . Thus, the above equation can be simplified as follows:
(4) 
Using the above equation, we can write that . Note that is a discounted infinite sum of the sequence of observed forces times a gain term. To show convergence, we make an assumption that we can find at least one and accommodation term , such that , , where is arbitrarily small. Using this assumption, then we have that .
Convergence of the nonlinear controller given by Equation (2) is straightforward. It can be shown by the convergence properties of . We show this in the following text. Equation (2) can be rearranged as following:
(5) 
The convergence rate of the sigmoid function in Equation (
2) can be controlled by the accommodation term . Using the asymptotic convergence of , we have that a such , . Then we can use this to rewrite Equation (5) as the following:(6) 
The assumption regarding the existence of and is not very strict. In practice, we found that we were able to find an interval for for which our infinite sum converged. The plots shown in Figure 4a were obtained with . The above provides us a solution to the first problem that was presented in Section III. In the next section, we analyze the data collected using the proposed force controller and present the design of a learningbased controller for peg insertion.
V Learning Predictive Model for Misalignment
In this section, we analyze contact wrench data to understand the dependence of the contact wrench on the misalignment. To provide a complete understanding of this relationship, we analyze the force signature data that is collected during an initial training phase. The purpose of this model is to predict misalignment based on the force signature which is characteristic of a certain contact configuration.
To learn the predictive model, we collect training data consisting of the force signature for the contact configuration by a known amount of misalignment. We use the accommodation controller that we presented earlier during data collection to ensure safe interaction during this exploration phase. Furthermore, we also ensure that we measure the force signature for a given misalignment at a quasisteady state, when it has converged to an asymptotic value, which simplifies the learning problem.
Va Data Collection
To learn a predictive model for correcting misalignment, we collect data by introducing misalignment in the position of the peg with respect to the hole. The work in this paper only considers planar misalignment between the peg and the hole. Consequently, we introduce misalignment in the and
axes from the known hole location. The misalignment is sampled from a uniform distribution from the interval
mm. This interval was chosen, because the deep learningbased hole detection method we used is able to achieve similar accuracy in the estimated position of the hole. With the added misalignment in the position of the peg, any insertion attempt leads to a contact formation between the peg and the hole environment. The contact formation leads to a force signature that is observed through the F/T sensor mounted at the wrist of the robot (see Figure
1). For every episode of data collection, the robot follows the insertion trajectory, and records the force measurements measured through the F/T sensor for the resulting contact formation. Thus, we collect a data set where we store the misalignment as well as the measured force signature corresponding to the misalignment. We use a Mitsubishi Electric Factory Automation (MELFA) RVASD Assista DoF arm (see Figure 1) for the experiments. The robot has pose repeatability of mm. The robot is equipped with Mitsubishi Electric F/T sensor FFSW (see Figure 1). In the initial set of experiments, we also verify that Assumption 1 is valid for our robotic setup.VB Numerical Analysis for Convergence
We analyse the convergence properties of the proposed controllers. In Figures 5 and 6, the statistics of the force signature measured by the F/T sensor along the vertical direction have been reported for regular time intervals on all experiments described in Section VA for the linear controller. Similarly, we report these quantities for the nonlinear controller in Figures 7 and 8.
In particular, we have computed the mean, , and twice the standard deviation, , for each time interval of along the trajectory and for all the 1200 experiments. The mean and the confidence interval of these two statistics are reported in Figures 5 and 6, respectively for the linear and in Figures 7 and 8, for the nonlinear controller, respectively.
The practical purpose of this analysis is to be able to decide online when the controller has converged to a stable value of the vertical forces as soon as possible. The criterion we selected to decide the convergence of the system is based on the changes we can observe in the 4 statistics we have described above: the mean, , the standard deviation, , the mean of the standard deviation, , and the standard deviation of the standard deviation . We then took the difference of each of these statistics with respect to time intervals, with where is one of the four statistics and . We declare that the system converged if
(7) 
for 2 consecutive time intervals , where is in the end operator. Basically we are requesting for the time interval where the changes for all the statics are less than a predetermined threshold .
Criterion (7) applied to the linear controller, see signals in Figures 5, 6, output that the controller converged after . Analogously, for the nonlinear controller, see signals in Figures 7, 8, the controller converged after . Note that the confidence intervals never go to zero because of the measurement noise. This empirical analysis confirms the theoretical convergence results shown in Section IV
. Therefore, the classifiers can be computed based on values at convergence without having to wait for the end of the experiment.
VC Model Learning Performance
We train classification and regression models using the collected contact force data to learn predictive models for direction and magnitude of misalignment. We use the results from the previous section to decide on the convergence and use the convergence criterion to decide when to stop collecting data during the contact formation. We use found in the last section for collecting the force signature. We then train a classification and a regression model to learn the direction and the magnitude of error respectively, to understand the efficacy of the models. The results of classification and regression are shown in Tables I and II (see the results with full features). Note that we are able to achieve better result with the linear accommodation controller; however, the nonlinear controller can predict these faster than the linear controller. Another point to notice is that we are able to achieve good RMSE scores for both controllers – the linear controller is better than the nonlinear. However, this problem requires that we should be able to predict the directions accurately. The regression model with the linear accommodation controller is able to predict the direction with accuracy and in the and axis respectively. With the nonlinear controller, we achieve an accuracy of and . Notice that overall the linear controller is able to achieve higher accuracy. This might be due to higher interaction forces which leads to less noise in the force signatures. This might be one of the reasons for the slightly better performance of the linear controller.
Axis  Classification Accuracy (higher is better)  

Linear Controller  NonLinear Controller  
Full Feat  Reduced Feat  Full Feat  Reduced Feat  
X  0.9964  0.9916  0.9928  0.9916 
Y  0.939  0.949  0.92  0.920 
Axis  RMSE [mm] (lower is better)  

Linear Controller  NonLinear Controller  
Full Feat  Reduced Feat  Full Feat  Reduced Feat  
X  0.59  0.61  0.59  0.55 
Y  0.83  0.67  0.92  0.72 
VD Feature Importance
We use feature importance analysis to describe which features are relevant for learning the predictive model for misalignment from force observations. Feature analysis can help with a better understanding of the problem. In particular, we use a forest of trees to evaluate the importance of the force features on the classification task [16]. We consider the Cartesian force signals and the corresponding moment signals from the F/T sensor to obtain the wrench signal
, which we use as features for identifying the hole misalignment. The fitted attribute provides feature importance, and they are computed as the mean and standard error of accumulation of the impurity decrease within each tree. We observe that
and Cartesian force signals and moments are found important for the classification task, where Cartesian force signals and moments are unimportant. In figure 9, the bars are the feature importance of the forest, along with their intertrees variability represented by the error bars. This agrees with the physical intuition about the insertion– since the forces in the direction are constant for all trials, it should not be helpful in providing any discriminating information for class separation. Similarly, the contact formation during insertion attempts should not lead to any moment in , and thus this information is also not useful for making misalignment decisions. We repeat the classification and regression modeling with the reduced feature sets and the results are listed in Tables I and II(see the reduced feature results). It can be observed that we can achieve better performance than using the full force signature for learning a predictive model. This shows the effectiveness of feature selection and that we are able to do better than using the entire
dimensional wrench vector.
Vi Results for Insertion
In this section, we present design of an insertion policy using the predictive model using the force signature data that was described in the previous section. For completeness of presentation, we present brief details of the deep learningbased hole detection framework which is used for testing the performance of the force controller proposed in the paper. This vision module is used in the proposed work to perform holedetectionbased insertion. The approach is based on our previous work presented in [12]. In this section, we first present details of the vision module that we use for hole detection, and then present results for insertion using the learned predictive models.
Via Vision System for Hole Detection
We choose a supervised learning approach to detect the hole location from visual sensory data obtained from an RGBD sensor (Intel Realsense, D435) camera. Using traditional computer vision approaches to detect hole location with unknown object pose might lead to false positives (e.g., template matching
[4] or the Hough circle transform [23]). We implement the Mask RCNN [9]deep learning architecture for instancelevel segmentation to detect hole locations. Our classification setup has two classes, one for the background and one for the hole location. The network prediction identifies the resulting segmentation masks for hole locations. We performed transfer learning from the MS COCO dataset pretrained weights in a supervised manner. For the learning dataset, we captured 300 images of size 640×480 at different distances. We annotated the data to indicate hole pixels with the labelme
[21] annotation tool. At inference time, we utilize the detected segmentation mask of the hole location to compute the corresponding registered point cloud data points. The output from the approach is the estimate of the 3D hole location from the visual sensory data. Figure 10 shows some qualitative samples of hole detection approach on point cloud of the test object.ViB Insertion Using Force Feedback Models
In this section, we present results from performing insertion with the trained force feedback models in the presence of error in the detection of hole location. We use the vision module to detect the approximate location of hole in the environment of the robot. Compared to our previous work proposed in [12, 13], we experiment with parts with tighter tolerances to test the performance of the force controller. In particular, we use parts with tolerance of approximately mm. To test the performance of the integrated force with the visionbased hole detection, we move the object in the field of view of the RGBD sensor (see Figure 1). The robot is now asked to perform insertion based on the estimate of the DL method for hole location. We find that the DLbased method is fairly accurate for reaching in the vicinity of the hole location. Then we use the learned force controller for performing insertion, overcoming any misalignment. We use the classification prediction by the trained classifiers to move by a unit step of mm in the predicted direction while maintaining contact with the object surface. This is repeated till either the robot succeeds in insertion or diverges more than mm. The robot is given a maximum of correction attempts. We measure the number of corrections made by the linear and nonlinear controllers. We move the object to random location in the view of the camera and attempt insertion. We observe that the ML model with linear controller is able to achieve success rate with average number of corrections to be while the ML model with nonlinear controller achieves success rate (1 failure case out of 20 attempts) with an average correction rate of per successful attempt. A more thorough analysis of the controller performance is left to an extended version of the paper.
Vii Conclusions and Future Work
In this paper, we presented the design and analysis of accommodation controllers for contact interaction during assembly operations and their use in adaptive assembly strategies based on machine learning. Most assembly operations with tight tolerances result in complex contact formations which might damage the parts being assembled. Ensuring safe operation of robots requires the design of force feedback controllers that can ensure limited contact forces in the presence of sustained contacts. In this paper, we presented two designs of generalized accommodation controllers that use force feedback during contact interaction to ensure limited contact forces. We presented analysis of these controllers to show convergence of contact forces under the assumption of constant velocity of the underlying reference trajectory. We presented results from different machine learning models which were trained using different signal statistics, and compared them to find an optimal signal feature. Finally, we used the trained model to perform insertion using a DLbased vision algorithm for hole detection. We show that we are able to achieve success rate for insertion using the proposed controllers and using the DLbased vision system for detecting hole location with tolerances tighter than mm.
In the future, we will perform more rigorous comparison between the linear and nonlinear controllers at different operating velocities of the robot, and find the best operating conditions which leads to fastest insertion times and highest success rate.
References
 [1] (2015) Adaptation of manipulation skills in physical contact with the environment to reference force profiles. Autonomous Robots 39 (2), pp. 199–217. External Links: ISSN 15737527 Cited by: §II.
 [2] (1990) Teaching and learning of compliance using neural nets: Representation and generation of nonlinear compliance. In International Conference on Robotics and Automation, pp. 1237–1244. External Links: ISBN 0818620617, Document Cited by: §II.
 [3] (1993) Representation and Learning of Nonlinear Compliance Using Neural Nets. IEEE Transactions on Robotics and Automation 9 (6), pp. 863–867. External Links: Document, ISSN 1042296X Cited by: §II.

[4]
(2001)
Template matching using fast normalized cross correlation.
In
Optical Pattern Recognition XII
, Vol. 4387, pp. 95–102. Cited by: §VIA.  [5] (2021) TactileRL for insertion: generalization to objects of unknown geometry. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. . Cited by: §II, §II.
 [6] (201911) TactileBased Insertion for Dense BoxPacking. In IEEE International Conference on Intelligent Robots and Systems, pp. 7953–7960. External Links: ISBN 9781728140049, Document, ISSN 21530866 Cited by: §II.
 [7] (1989) A dynamic approach to highprecision parts mating. IEEE Transactions on Systems, Man, and Cybernetics 19, pp. 797–810. Cited by: §II.
 [8] (1994) Acquiring Robot Skills via Reinforcement Learning. IEEE Control Systems 14 (1), pp. 13–24. External Links: Document, ISSN 1066033X Cited by: §II, §II.
 [9] (2017) Mask rcnn. In Proceedings of the IEEE international conference on computer vision, Cited by: §VIA.
 [10] (197408) Force Feedback in Precise Assembly Tasks. Massachusetts Institute of Technology. Cited by: §II.
 [11] (201712) Deep reinforcement learning for high precision assembly tasks. In IEEE International Conference on Intelligent Robots and Systems, pp. 819–825. External Links: ISBN 9781538626825, Document Cited by: §II.
 [12] (2022) Automated visual hole detection for robotic peginhole assembly. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. under review. Cited by: §VIB, §VI.
 [13] (2021) Imitation and supervised learning of compliance for robotic assembly. arXiv preprint arXiv:2111.10488. External Links: 2111.10488 Cited by: §II, §VIB.
 [14] (202109) Learning Assembly Tasks in a Few Minutes by Combining Impedance Control and Residual Recurrent Reinforcement Learning. Advanced Intelligent Systems, pp. 2100095. External Links: Document, ISSN 26404567 Cited by: §II, §II.
 [15] (2016) Endtoend training of deep visuomotor policies. Journal of Machine Learning Research 17, pp. 1–40. External Links: Document Cited by: §II, §II.
 [16] (2002) Classification and regression by randomforest. R news 2 (3), pp. 18–22. Cited by: §VD.
 [17] (2020) Understanding multimodal perception using behavioral cloning for peginahole insertion tasks. CoRR abs/2007.11646. External Links: 2007.11646 Cited by: §II.
 [18] (2017) Modern Robotics. Cambridge University Press. Cited by: §IVA.
 [19] (1977) Research on Advanced Assembly Automation. IEEE Computer 10 (12), pp. 24–38. External Links: Document Cited by: §II, §II.
 [20] (1990) Programmed Compliance for Error Corrective Assembly. IEEE Transactions on Robotics and Automation 6 (4), pp. 473–482. External Links: Document, ISSN 1042296X Cited by: §II.
 [21] (2008) LabelMe: a database and webbased tool for image annotation. International journal of computer vision 77 (1), pp. 157–173. Cited by: §VIA.
 [22] (2019) Compare contact modelbased control and contact modelfree learning: a survey of robotic peginhole assembly strategies. arXiv preprint arXiv:1904.05240. Cited by: §I.
 [23] (1990) Comparative study of hough transform methods for circle finding. Image and vision computing 8 (1), pp. 71–77. Cited by: §VIA.