I Introduction
Robotic assembly as the essential components of industrial applications has been studied for a long time. In this work, we look at the most common problem of robotic assembly: peginhole assembly, which is the basis of a wide range of component assemblies[kuangenjamming][su2017sensorless]. Robotic peginhole assembly has been extensively researched and applied in various fields from largescale object assembly, such as aviation components[wan2017optimal][qiao2016largescale], engines[su2012new] and windshields assembly to smallscale components, such as mold casting manufacturing[visionformultiplepeghole], electronic components[su2012sensor] and even microproduct[chang2011visual] assembly.
Ia The development of robotic peginhole assembly
Many academic and industrial researchers have focused on promoting the robotic peginhole assembly on the basis of classical conditioning learning with conventional compliant control strategies[lefebvre2005active], observational learning with learning from demonstrations[zhu2018robot], and operant conditioning learning with learning from environments[sutton2018reinforcement]. In this work, we decompose the existing peginhole assembly strategies into contact modelbased and contact modelfree two categories, as illustrated in Fig. 1. Furthermore, the contact modelfree strategies can be further subdivided into learning from demonstrations and learning from environments.
IA1 Conventional contact modelbased strategies
The concept of conventional contact modelbased strategies relies on the contact model analysis and decomposes the peginhole assembly into two steps: contactstate recognition and compliant control. The compliant control strategies are preprogrammed by humans according to contact state recognition by analyzing the underlying friction and contact model[lefebvre2005active]. Researchers have made efforts to use contact model based strategies to solve a wide range of peginhole assembly problems with special requirements and high complexity in autonomous industrial manufacturing. For instance, the best methods for performing highprecision assembly[su2017sensorless][rlhighpercision][2018modelfreelearning] and largescale component assembly[zhi2014largescale][qiao2016largescale][wan2017optimal] with the limited sensors have been widely investigated. Additionally, methods for deriving an assembly control strategy that can cope with complicated multiple peginhole assembly flexibly has attracted considerable attention from researchers[kuangenjamming][zhiminfeedback][hou2018learning].
To date, most of the published research worked on the peginhole assembly has focused only on optimizing the separate stages. On the one hand, contactstate recognition has been explored to improve the success rate of recognition through theoretical analysis[whitney1982jamming] and statistical techniques[jasim2017contact] without caring about assembly implementation. On the other hand, to enhance the efficiency and stability with little contact forces during assembly, some optimization approaches have been applied to improve the performance of compliant control strategies directly[nn2002analysis][tang2016autonomous][hou2018learning] without considering the assistance of contact state recognition results. However, less research has focused on analyzing the relations between the contact state recognition and control strategies and on integrating these two stages [lefebvre2005active].
IA2 Learning from demonstrations (LFD)
As autonomous robotic peginhole assembly techniques progress, the compliant assembly control strategies are expected to perform more complicated assembly with higher degrees of compliance in a unstructured and nonstationary environments. It is possible for the preprogramming contact model based compliant control strategies to take into account all the possible assembly situations in advance. From the perspective that human beings are capable of handing various complicated assembly flexibly and unpredictably, LFD methods[argall2009surveyforlearingdemonstration] are based on the idea that the assembly behaviors can be learned by interpreting humans demonstrations without preprogramming and have been developed to solve peginhole assembly in the recent years[wan2017optimal][tang2015learning][tang2016teach]. To imitate the compliant behaviors of human beings for peginhole assembly, in addition to the assembly motion path, the force control strategies are also taken into account[tang2015learning][tang2016teach]. A comprehensive survey of LFD methods was presented in [argall2009surveyforlearingdemonstration], which contributed a structure with demonstration gathering and policy deriving two phases. Zhu and Hu[zhu2018robot] surveyed the LFD techniques applied in general robotic assembly and introduced the whole demonstrations assembly system. Kyrarini et al.[kyrarini2018robot] examined and compared several modeling methods of human demonstrations for industrial assembly tasks.
The LFD method is an effective learning algorithm used to solve robotic assembly problems. However, few studies have analyzed the challenges of LFD methods for peginhole assembly scenarios. Furthermore, less research has focused on improving the ability of adaptation to environmental changes, uncertainties and generalization in new assembly situations.
IA3 Learning from environments (LFE)
In contrast to improving the generalization of LFD methods, current robots are expected to recognize the surrounding environments actively and to learn the assembly skills incrementally, similar as human beings. Reinforcement learning (RL) based methods hold great promise for achieving such performance, and these methods enable agents to learn behaviors through integration with the surrounding environments and ideally by generalizing to unseen scenarios or tasks[sutton2018reinforcement]. To solve the inherent difficulties in behavior modeling and the generalization, an adaptable and robust control system was developed not only through learning from expert demonstrations but also incremental learning. With the development of the artificial intelligence techniques, especially deep learning, typical RL based learning approaches, especially modelfree learning algorithms, have been extensively applied to perform complex manipulations[jan2013reinforcement][levine2015learning], including robotic peginhole assembly[rlhighpercision][rl2018peginhole].
It is widely accepted that it is possible to apply modelfree RL algorithms in realworld robotic assembly tasks at the expense of data efficiency. To enhance the practicalities, many studies focusing on incorporating typical model free RL approaches with the prior knowledge or expert demonstrations have been published[zhiminfeedback][2018learningfromCAD][2018modelfreelearning]. Recently, there has been increasing interest in the development of modelbased RL in the robotics community. For example, transition dynamics models have been utilized to derive the feedback rewards or optimal actions and have been investigated in [levine2015learning][polydoros2017survey][2018modelfreelearning]. However, for robotic peginhole assembly, it is not clear how to fuse the existing knowledge into a modelfree learning process naturally. Although, some papers have worked on the comparison and combination of modelbased and modelfree RL learning algorithms for robotic applications[polydoros2017survey], no survey has yet explored the relation between modelbased RL learning algorithms with the conventional theoretical contact model and the implicit model learning from demonstrations.
IB The motivation and purpose of this paper
Although numerous studies on robotic assembly have been published, there is still no paper surveying the existing research, including both the contact modelbased and two kinds of contact modelfree assembly strategies for peginhole assembly. To the best of our knowledge, few studies have compared conventional contact modelbased strategies and contact modelfree algorithms. Therefore, one motivation of this paper is to survey the existing assembly strategies and group them as shown in Fig. 1 for the first time. Another motivation is to exploit the underlying relations between different assembly strategies and to explore the promising solutions by combining the strengths of contact modelbased and contact modelfree control strategies. Consequently, to make existing peginhole research tractable, we attempt to give a fairly complete overview with the following goals

This paper surveys the stateoftheart research and ongoing developments of robotic peginhole assembly and identifies promising approaches.

This paper provides a novel and clear grouping method to analyze the existing research completely with comprehensive insights.

This paper explores the underlying relationship between traditional contact modelbased control strategies and contact modelfree learning algorithms and proposes the promising solutions.

This paper highlights the remaining challenges of the existing approaches and identifies open questions for future research.
The remainder of the paper is organized as follows: Section II introduces the whole robotic peginhole assembly system. Section III analyzes the contact modelbased control strategies in detail. Section IV surveys and compare two contact modelfree learning algorithms. Section V concludes with a discussion of the open questions and potential future research directions.
Ii Robotic peginhole assembly system
In this section, we briefly introduced the construction of a robotic peginhole control system briefly, as shown in Fig. 2, which consists of three components: mating parts, the sensing system and manipulators. Generally, the holes are fixed and the manipulators grab the pegs to complete the parts mating according to the feedback from the sensing system.
Iia Mating parts
The mating parts, as shown in Fig. 2, are the assembly components and include the pegs and holes. According to the geometrical features shown in Fig. 2, a cylindrical peginhole system is the basic assembly problem and has been extensively studied. Complexshape peg assembly is also used in some special cases, including square pegs[park2013intuitive][kim2014holedectect], pegs with key slots and pegs with complex shapes[2014forceguide]. In addition, according to the number of peghole mating pairs, the research work can be decomposed into single peginhole[wan2017optimal] and multiple peginhole assembly[fei2003assembly][kuangenjamming] scenarios, as shown in Fig. 2. The complexity of assembly increases as the contact states of multiple peginhole scenarios become more complicated.
Category  Types  Characteristics  Methods 

Vision  Camera(2D, stereo), laser tracker  Contactless, lowresolution  Boundary detection, visual servo strategy 
Force  F/T, torque  Monitors contact force  Blind search strategy, impedance control, forcebased position control 
Sensorless  Joint current, joint encoder  Low cost, no installation  ARIEbased inserting, humanlike exploration searching 
The scale of assembly components corresponds to the application and ranges from macroassembly for large aviation parts to microassembly for electronic components in circuit board. The clearance between peg and hole also differs with the requirements of the assembly scenarios. In some highprecision scenarios[rlhighpercision][tang2015learning][kuangenjamming], the clearance may be below the solution and accuracy of the robot, which is typically in the range of 0.02 up to 0.2 mm. In addition to the rigid pegs with high stiffness values[zhang2017force], some flexible peginhole components composed of plastics[kuangenjamming] and wood are also used. The clearance and hardness of the mating surfaces change the nature of the part mating tasks[2012contactsvm], including various degrees of complexity.
IiB Sensing system
In the case of robotic peginhole assembly, the sensing system is used to acquire feedback from environment, similar to human sight and tactile sensing. Sensing systems based on two types of sensors or other feedback are surveyed, and the corresponding characteristics are as follows.
IiB1 Vision sensors
2D cameras are widely used for coarse localization by extracting the boundaries of holes from images[miura1998vision]. Maker points captured by 2D cameras were used to calculate the pose (position and orientation) of pegs in[wan2017optimal]. Imagebased visual servo systems were designed to track the accurate hole position based on the extracted features[pauli2001vision] [wang2008microassembly]. For highspeed microscale peginhole assembly, Chang et al.[chang2011visual] and Huang et al.[huang2013visualservoing]
proposed positionbased visual servo systems with fast convergence guarantees based on the image calibration method and limited calibration. In contrast to images, stereo cameras (Kinect) have been applied to capture 3D point data to estimate the accurate 3D poses of mating parts
[abu2014solving][park2017compliance]. Additionally, laser trackers, such as the highprecision and contactless tools have been employed to enhance the position accuracy of largescale peginhole assembly systems[zhi2014largescale][qiao2016largescale].IiB2 Force sensors
Position controllers based on vision sensors might produce large contact forces due to the position errors. Therefore, force feedback can be utilized not only to monitor the assembly process, but also to accommodate the position uncertainty. The force feedback referred to as wrench signals (forces and moments) can be acquired from an external forcetorque (F/T) sensor equipped on the endeffector of the robot
[zhiminfeedback] and from torque sensors integrated into the robot joints[lee2014active][ren2018learning], as shown in Fig. 2.In general, force feedback is used for compliant control strategies in passive and active ways. As a passive example, auxiliary mechanical devices such as remotecentercompliance (RCC)[whitney1982jamming][sturges1996RCC] and a variant [xu2015robust] (composed of springs and dampers) attached to the endeffector were applied to accommodate the contact forces. Active compliant control strategies aim to control the assembly motions of robots actively based on force feedback [su2017sensorless] and have been widely surveyed for more than 30 years. Wrench signals from sensors have been applied to recognize the contactstate and generate the lowlevel commands for position control of robots[lefebvre2005active] through impedance control[hogan1984impedance] and forcebased position control[raibert1981hybrid].
IiB3 Sensorless systems
To eliminate the limitations of sensor frequency, sensor installation and measurement error, De Luca and Mattone[de2005sensorless] and Lee and Park[lee2014active] worked on an efficient sensorless active compliant control systems without external F/T sensors, in which the wrench signals were approximated according to the current of joint motors. Additionally, the poses of pegs attached on the endeffector were interpreted by the encoder in the robots. Based on the pose information, robotic peginhole assembly strategies driven by the environmental constraints, such as attractive regions in the environment (ARIE) without forcesensor feedback have been investigated recently[qiao2015concept][qiao2017iros].
Consequently, both vision sensors and force sensors have the strengths and shortcomings, as summarized in Table. I. Most vision sensors are appropriate for peginhole assemblies with larger clearances and weak contact forces[huang2013visualservoing]. Nevertheless, forcebased assembly control strategies have been explored in highprecision assembly systems with small clearances[rlhighpercision][2018modelfreelearning], largescale components assembly systems with large contact forces[hou2018learning][kuangenjamming] and complexshaped parts assembly systems with complex contact forces[song2016guidance][dietrich2010contact]. To combine the strengths of vision and force sensors, the hybrid sensing systems have also been developed in [xie2009hybridsensor][2004datafusion].
IiC Manipulators
The manipulators in peginhole assembly can be industrial robots (such as those produced by the companies ABB and KUKA) with 6 degreesoffreedoms (DOFs)
[kuangenjamming][zhang2017force], which are used to perform assembly requiring large forces and moments. Furthermore, in recent years, highcompliance robot manipulators (Baxter, PR2) with 7 or 8 DOFs have been developed and can perform manipulation problems more flexibly and safely[park2017compliance]. Generally, most robotic peginhole assembly environments involve fixed holes, as shown in Fig. 2, and manipulators are used to grab the pegs to control the motions (translation and rotation) in Cartesian space or joint angles in joint space.The robotic peginhole assembly for inserting the pegs to the desired depth of holes generally consists of two main phases: searching and inserting. For the searching phase, the localization of holes should be identified, which is an essential step for the following inserting phase in realworld scenarios. Imagebased boundary extraction techniques[chang2011visual], visual servo tracking approaches[wang2008microassembly] and blind search strategies based on the designed humanlike searching path[chhatpar2001search] and the force feedback[kim2014holedectect] have been applied to locate holes and track the position. For the inserting phase, the assembly actions involve not only motion but also the applied external forces. In contrast to the searching phase, the inserting phase is more complicated; most researchers only focus on active control strategies for the inserting phase and neglect the searching phase[tang2016autonomous][zhang2017force][hou2018learning]. Therefore, in this work, all of the following analysis of robotic peginhole assembly are for the inserting phase.
Iii Contact modelbased control strategies
The assembly process is a constrained motion with geometrical and environmental constraints. The contact constraints between mating parts can be represented as topological contact states[291962][xiao1998contact][10.1007/9783642836251_17]. Thus, the overall assembly process can be described as a sequence of transitions between the contact states. For instance, as shown in Fig. 3, a single peginhole insertion process is formulated as the transitions between no contact, onepoint contact and twopoint contact. The contact modelbased strategies for robotic assembly shown in Fig. 4 generally include two steps: contact sate recognition and compliant control.
The general idea of contact state recognition is to determine the contact constraints according to the observations, such as wrench signals and pose information. In this work, we analyze and decompose the existing research for contact state recognition into two categories: the analytical model[whitney1982jamming] and the statistical model[2012contactsvm]. The analytical model relies on the analysis of the geometrical and environmental constraints. The statistical model has been extensively developed in recent years and estimates the contact state through learning the pattern from the collected samples directly without the need for information about the tasks.
Categories  Success rate(%)  Computational time()  Advantages  Disadvantages 

GMM, DSMGMM  94.4  18.795  Fit distribution of samples  Sensitive to the initial setting 
SVM, SVMFIM  64.2  70.719  Excellent generalization  Sensitive to missing sample 
CFC, GSFCA  27.3/65.9  0.002/237.307  Fuse prior knowledge  Only solve simple case 
SGB  60.7  92.083  Without defining parameters  Sensitive to unseen samples 
HMM      Eliminating timevarying uncertainties  Sensitive to gain value 
NNs      Easyimplementation  Less dataefficient 
Iiia Contact state recognition with analytical model
The contact state recognition with analytical model[xiao1998contact] generally includes two steps: contact state modeling and contact state determining.
IiiA1 Contact state modeling
Contact states are modeled according to the mating features, and the most commonly used features are the force constraints between mating parts. Whitney [whitney1982jamming] proposed a quasistatic model to clarify the relation between force and geometrical constraints for a single peginhole assembly. Additionally, two possible ill contact situations (wedging and jamming) are analyzed. Jamming, which often leads to insertion failure, represents the conditions in which the applied forces/moments of the peg are in the wrong proportions. As shown in Fig. 3, a jamming diagram is drawn to analyze the jamming conditions of the overall assembly process[whitney1982jamming]. Based on this idea, Sathirakul and Sturges[sathirakul1998jamming] and Fei and Zhao[fei2003assembly] enumerated contact states and presented a threedimensional analysis of jamming conditions for multiple peginhole assembly.
In contrast to the quasistatic model and rigid body assumption, some researchers have focused on dynamical or flexible models, which are closer to real assembly situations. HsinTe et al.[liao1998analysis] derived a general form of impact equations for an industrial manipulator performing peginhole assembly using Lagrange’s impact model. Xia et al.[xia2006dynamicanalysis] established a threedimensional jamming analysis based on a compliant elastic contact dynamic model and designed several nojamming and nowedging assembly strategies by analyzing the free contact conditions.
IiiA2 Contact state determination
The contact states are determined by calculating the similarity between the observed actual and modeled contact states. As the contact state recognition moves forward, the contact states are modeled with uncertain parameters to accommodate the error in the contact model and to enhance the robustness to environmental uncertainties. Kalman filters (KF)[lefebvre2005online] and particle filters[gadeyne2005bayesian] have been utilized to estimate the geometrical parameters for better recognition of the contact state and state transitions. In addition, instead of determining the contact state by calculating the similarity, classification algorithms, such as support vector machine (SVM), have been applied to determine the contact state [2012contactsvm].
In conclusion, the analytical model is sensitive to uncertainties (such as that in the position of parts, the rigidity or elasticity of the assembly system and the friction model), and no perfect model can be adapted automatically to different environments. The aforementioned methods with analytical models relying on force constraint analysis will become more complicated in assembly systems with uncertain mating features and changing jamming conditions. Consequently, only a partial model can be achieved, and the generalization to new assembly scenarios is limited. Another drawback is that the variables of an analytical model can only be determined based on the observed contact states and past transitions.
IiiB Contact state recognition with statistical model
In contrast to recognition based on an analytical model, contact state recognition with a statistical model without considering the possible uncertainties is formulated as a classification problem given the possible contact states. The contact state can be classified through the advanced statistical techniques such as
fuzzy classifiers (FC), neural networks (NNs), SVM, Gaussian mixtures model (GMM) and hidden Markov models (HMMs).Conventional fuzzy classifiers (CFC) have been applied in contact state recognition by accommodating the uncertainties based on prior knowledge without the geometrical information on the pegs[xiao1998contact][1998conceptfuzzy][2000fuzzyanalysis]. In these scenarios, the output contact state is decided through the following fuzzy ifthen rules
(1) 
where denotes the th component of the th input observation signal, is the antecedent membership function of the th input component for the th contact state . To enhance the robustness of the fuzzy system, the gravitational search (GS) algorithm is employed to tune the fuzzy rules of each model[2013lmsfuzzy]. CFC is able to solve the simple classification problem through the designed fuzzy logic controller with little computing time.
NNs have been developed for a long time and were used to map the nonlinear relationship between the input force information and output contact states[nn2002analysis]. Compared to FCbased methods, the implementation of NNs is feasible without handcrafted extraction features and fuzzy rules. NNs have shown competitive classification performance in recent years with sufficient computing resources and samples. The main issue is that the trained classification model cannot be generalized to the scenarios with different dimensions of inputs due to the fixed network architecture. The performance compared to CFC was analyzed in [nn2002analysis], and both of them have advantages and disadvantages. Additionally, numerous studies focusing on integrating the flexibility of fuzzy set theory and the approximation ability of NNs have been performed[son2002optimal][son2001neuralfuzzy]. For training the classifiers with NNs, the input observed information, including measured wrench signals and pose data, usually requires preprocessing, such as normalization or uniform discretization.
SVM techniques through reducing the actual risk and confidence interval for correct classification, have been demonstrated to be suitable and applicable for realworld recognition with generalization to unknown environments
[2012contactsvm]. Previous work has proposed a practical contact state recognition framework, in which the input observations are processed through discrete wavelet transform (DWT) and the contact states are acquired through existing analytical models. A fuzzy inference mechanism (FIM) with an adaptive classifier boundary generated by SVM was used to classify the contact states of the peginhole assembly sequence[2014fuzzysvm].GMMs have been employed to model the input observations, and Bayesian classification has been incorporated to estimate a binary classification of the given GMMs[jasim2014contact][jasim2017contact]. The expectation maximization (EM) algorithm has been demonstrated to be efficient in optimizing the parameters of the given GMMs. Jasim et al. utilized the distribution similarity measure (DSM) to determine the optimal number of GMM components based on the previous work[jasim2017contact], and this process significantly enhances the modeling performance and computational cost for contact state modeling of the flexible objects.
HMMs show advantages in recognizing both the contact states and state transitions over the previous contact state classification approaches[hovland1998hmm][lau2003hmm][hannaford1991hmm]. In this way, the contact state classification problems solved by HMMs are capable of taking the temporal information into account. A multiple contact model method incorporated into an HMM model to estimate the contact sequence was proposed in [debus2004hmm]
, and this model only requires the partial observations, such as kinematic data without other object information. Basically, the aforementioned contact state recognition approaches are supervised learning problems, which require considerable labeled samples and extensive training first.
In contrast to parametric learning techniques, the random forest[cabras2016random] technique, without determining the parameters in advance, was explored for multiple classifications. In addition, the binary
stochastic gradient boosting
(SGB)[cabras2010contact] classifier, based on its strength of classifier diversity, can perform the contact state recognition. Jasim et al.[jasim2017contact] has compared the success rate and computational time of several most frequently used classification techniques through an assembly experiment with the flexible rubber manipulated objects. Based on the given results, we provide a comprehensive summary shown in Table. II, covering the comparison of the success rate, computational time and pros/cons of the introduced statistical techniques for contact state recognition.In conclusion, for realworld robotic peginhole assembly with a limited number of samples, NNs and FCbased methods cannot handle complicated contact model recognition well. Nevertheless, GMMs and SVM have shown the high efficiency and better generalization for contact state classification. Furthermore, HMMs can cooperate with other classification algorithms to take the effect of state transition into account.
IiiC Compliant control
In contrast to the general assembly[lefebvre2005active][2012contactsvm], the contact modelbased control strategies for peginhole assembly depicted in Fig. 4 can be simplified into two steps: a highlevel planning module and a lowlevel controller. The highlevel planning module is used to derive the highlevel commands for lowlevel controllers based on based on the geometrical and environmental constraints. The lowlevel controller with the set of highlevel commands is used to execute the assembly actions according to the observed wrench signals and pose information and the current contact state.
IiiC1 Lowlevel controller
At present, the robots are able to handle the pointtopoint accuracy requirements easily, and the position controller has become quite mature[lopes2008force]. The lowlevel controllers of industrial robots generally consist of two categories: forcebased impedance controllers and positionbased force controllers. The forcebased impedance controllers aim to execute the commands for joint torque[ren2018learning], and the measured wrench signals are used to generate the desired torque value for the inner force loop. The positionbased force controllers typically generate the desired position and orientation commands according to the outer force loop; then, the commands are executed by the inner position controller. Both of these controllers have strengths and weaknesses; nevertheless, direct access to actuator torques data is not available for most industrial robots. The positionbased force controllers are widely utilized for the industrial assembly control through the designed outer force controller[kuangenjamming][hou2018learning][tang2016autonomous].
To accommodate the environmental uncertainties of the assembly process, some researchers have focused on optimizing the parameters of the positionbased force controller. A networkbased adaptive fuzzy model guided by the contact state estimator has been proposed to acquire adaptive parameters for a force impedance controller[1998conceptfuzzy]. To optimize the outer proportionalintegralderivative (PID) force controller through few trials for realworld peginhole assembly, Hou et al. [hou2018learning] proposed evolutionary algorithms (EA) in conjunction with support vector regression (SVR) to obtain the optimal PID parameters.
IiiC2 Highlevel planning module
For peginhole assembly, the highlevel planning module generally generates the desired force and moment value for the lowlevel force controller according to geometrical constraints of mating parts[zhang2017force][kuangenjamming][hou2018learning]. In addition to the geometrical constraints, Qiao et al.[qiao2015concept] took environmental constraints into account based on the concept of ARIE, as shown in Fig. 4, and the position uncertainties were eliminated by coarse wrench signals. In [qiao2017iros] and [qiao2015arie], the constraint region in configuration space and physical space has been discussed, and a twostep insertion strategy and a humaninspired compliant strategy based on the ARIE concept have been verified in a broader range of peginhole tasks with sensorless systems. The environmental constraints based on ARIE can not only compensate for the limitations of the force sensors for highprecision assembly, but also provide the guarantees of safety and reliability in realworld assembly. The generality and robustness of the compliant control system have been improved with the assistance of environmental constraints.
Instead of optimizing the lowlevel controller directly, some researchers have made significant efforts to optimize the highlevel planning module according to the contact model recognition results. Son [son2001neuralfuzzy] utilized fuzzy set theory to manage and address the uncertainties according to the prior assembly knowledge. Additionally, a neural network was constructed to approximate the nonlinear relationship between the jamming analysis and the insertion control strategy. Xia et al.[xia2006dynamicanalysis] proposed a nojamming and nowedging assembly strategy by choosing the appropriate set of applied forces and moments based on the corresponding control law for different contact states. In [shirinzadeh2011hybrid], a hybrid methodology was proposed by choosing the corresponding lowlevel controller according to distinguishing the different contact states. Tang et al.[tang2016autonomous] proposed an autonomous alignment method to correct the initial pose before the inserting phase based on the estimated contact state. As assembly strategies have become more advanced, Zhang et al.[kuangenjamming] established jamming diagrams based on contact state analysis for a complicated flexible dual peginhole assembly. Then, a jamming theory was applied to establish the parameters of the lowlevel PD force controller.
However, the highlevel planning module of peginhole assembly is sometimes neglected, and most studies commonly focus on contact model recognition and lowlevel control strategies[zhang2017force]. Furthermore, it remains unclear how best to optimize the highlevel planning module and the lowlevel controller according to the contact model recognition. A flexible and adaptable assembly strategy should match the realtime uncertainties via a smooth integration of all the separate modules.
Iv Contact modelfree learning strategies
In contrast to the contact modelbased control strategies dealing with the contact state recognition and compliant control separately, the contact modelfree strategies combine these two steps together. As shown in Fig. 1, contact modelfree strategies consist of two categories: LFD and LFE.
Iva Lfd
Compared to the industrial robots, humans can perform peginhole assembly with any degree of pose uncertainty due to the flexibility of the wrists, the sensing system and intelligent decisionmaking ability. Instead of analyzing how do humans accomplish the assembly tasks, many researchers focus on simulating the human assembly demonstrations directly and then transforming the skills into robots programming, which is referred as LFD (also termed imitation learning and apprentice learning). For robotic peginhole assembly, LFD methods consist of three principal phases: sensing, encoding and reproducing, which are depicted in Fig. 5[zhu2018robot][kyrarini2018robot].
LFD Approaches  Advantages  Disadvantages  References 

DMP  Robust to spatial perturbation  Delay and pause of motion  [2010DMP][park2008movement][paxton2015incremental] 
Solve multivariate data separately  
Learn from single demo  
GMM 
Model joint probability density function 
[kyrarini2018robot][tang2015learning]  
Handle different source and missing data  
Offline learning and online fast regression  
HMM  Handle partial demonstrations  Stability sensitive to gains  [1996hmm][calinon2010learninghmm][calinon2010overview] 
Handle temporal variability  
Handle periodic and reaching movements together  
Encode multivariate motion simultaneously 
IvA1 Sensing phase
This stage aims to interpret the human motion trajectories, including the observed states and executed actions. At present, the human demonstration data can be collected through external sensors, such as kinesthetic demonstration, motion capture systems, and teleoperated demonstration, which are shown in Fig. 5. In this paper, the states of the assembly process, including pose information and wrench signals, can be recorded through kinesthetic demonstration and motion capture systems, as shown in 5. The pose of pegs can be calculated by robot joints encoder or determined by visionbased pose estimation approaches, such as the extracted 2D boundary features, maker point matching[wan2017optimal] or 3D point cloud data processing[jasim2017contact]. The wrench signals are usually detected through the external F/T sensors mounted on the endeffector or joint torque sensors. The corresponding executed actions include transnationalrotational offsets or velocities and applied forces acquired through torque sensors or external F/T sensors.
Additionally, to enhance the performance of interpretation, some data preprocessing techniques are applied for coping with the raw data, such as principal component analysis (PCA) to reduce dimensionality and dynamic time warping (DTW)[song2016guidance] methods to temporally align all sample points from different demonstrations. To integrate different types of sensing information, a data fusion architecture[2004datafusion] based on artificial neural networks (ANNs) is used to combine the pose information and wrench signals, and kalman filters (KF) is utilized to minimize the effects of noise.
IvA2 Encoding phase
The encoding phase involves mapping the relations among the observed states and the executed actions. The mapping approaches have been developed in three main methodologies: dynamic movement primitives (DMPs), Gaussian mixture regression (GMR) and HMMs.
DMPs, as a nonlinear dynamic system, are utilized to model the discrete movements of the assembly trajectories with sequence of specific goal positions. A secondorder differential equation is employed to encode the desired movement primitives of assembly trajectories (positions, velocities and accelerations)[2010DMP]. The one component motion specified in joint or task space of the observed state is formulated as follows
(2) 
where denotes the dimensions of the observed state; denotes the number of demonstration trajectories; denotes the length of one single demonstration; denotes the length of the trajectory; , and are the realtime positions, velocity and acceleration, respectively, of the trajectory at time step . Generally, the discrete movements and periodic movements can be represented as a firstorder equation and a secondorder differential equation, respectively, and can be rewritten in one manner as follows:
(3) 
(4) 
For discrete movements, the can be derived through
(5) 
where
is a radialbasis function;
is a phase variable to guarantee tends to 0 as time increases. For the periodic movements, the can be derived through(6) 
where ; the phase variable moves with the constant speed ; is the amplitude of the oscillator; denotes the period of periodic movements or duration of the training movement; denotes the target position. Furthermore, denotes a nonlinear function representing the convergence property of the position towards the target value with the following two formulations, respectively. and are constant and set to ensure the convergence of the dynamic system represented by (3).
DMPs are defined by the parameters , and . can be set directly according to the samples; the duration of training movements can be chosen as ; and the parameter could be calculated from the solution of (4) in the recursive least square manner. For better regression performance, the multiple variables of DMPs are estimated in a separate process synchronized by the phase variable. For instance, locally weighted regression (LWR) with lower computational complexity is applied to synthesize the parameter and nonparametric Gaussian process regression (GPR) with high accuracy is applied to estimate and . DMPsbased approaches have been applied to reach a target or follow a periodic path by a set of massspringdamper mechanisms. In [abu2014solving] and [kramberger2017generalization], a complete methodology is proposed to learn from the human assembly demonstrations by combining DMPs to capture the trajectories of pegs with the forcetorque profiles. Furthermore, the differential equation of DMPs has been improved to adapt to the uncertainty in the desired position and obstacle avoidance[park2008movement].
GMR is introduced to estimate the relation between the observed states and the control commands. GMR is a realtime regression solution that it can reproduce the trajectories modeled by a GMM or modified GMM, and the reproduced trajectories can be adapted to control robot assembly tasks. In [tang2015learning] and [tang2016teach], GMR is employed to predict the velocities in a manner similar to the human in response to wrench signals; then, the output velocities are executed through a lowlevel controller (impedance controller) to realize the peginhole insertion phase. To construct a heavyweight component assembly process, Wan et al.[wan2017optimal]
proposed a complete methodology through learning assembly skills from human demonstrations and compensating for the large deformation with GPR. The joint probability distribution
is calculated with a mixture of Gaussian components weighted by as follows(7) 
where is the observed state as discussed above; denotes the assembly actions; and denotes the dimensions of the assembly actions. Each component
features a Gaussian distribution with a mean of
and covariance(8) 
The conditional probability is derived by a weighted summation of each as follows:
(9) 
where is the marginal probability of input variable . Therefore, the parameters of can be estimated iteratively based on the collected demonstration training data by calculating maximum likelihood estimation through the EM algorithm. Then, with the learned Gaussian parameters, the optimal predicted output could be calculated by maximizing as follows
(10) 
where the weights could be calculated by
(11) 
Therefore, the GMR can be learned offline and the learned regression function calculate the expected actions rapidly online, which makes it appropriate to perform the assembly in realtime.
HMMs have been extensively used to encode and generalize the observed assembly trajectories of humans due to the strengths of the spatial and temporal variability[1996hmm][calinon2010learninghmm][calinon2010overview]
. HMMs considered as a type of dynamic Bayesian network and are employed to model the real state transition in assembly processes (vision feedback and forcemoments). An HMM model generally includes five components, hidden state
, observable state , initial state probability matrix , hidden state transition probability and observable state transition probability matrix . represents the probability of the state transition from to , and represents the probability of acquiring the observation at the state (always for realworld peginhole assembly). The joint probability is encoded by HMM model with a continuous state and each state is encoded by GMR with mean and convariance . Therefore, the HMM model can be defined by parameters , which can be learned by the EM algorithm[1996hmm]. Compared to the original GMR approach in (10), the weight representing the importance of different Gaussian is constant, which is extended to in [calinon2010learninghmm] by recursively calculating a maximum likelihood represented as the HMM model. The weight can be derived as(12) 
which takes the temporal influence of the dynamic assembly movements into account.
In conclusion, the comparison of different LFD encoding methods for robotic assembly has been investigated in [zhu2018robot], [kyrarini2018robot], [calinon2010overview] and [calinon2010learninghmm], and our conclusions regarding the pros and cons of these three LFD encoding methods for peginhole assembly are as shown in Table. III. A significant strength of DWPs is their adaptability to the perturbations through a secondorder system. GMMs can model the mapping function well with clustering and probability density estimation with high robustness to environmental noise. HMMs, encapsulating the precedence information with a state transition metric, can perform imitation learning with partial demonstrations[calinon2010learninghmm]. Additionally, in contrast to the DMPs, which require two different equations for periodic and discrete problems, HMMs exploit a unified formulation. To enhance the adaptation of learned assembly strategies, many researchers have investigated the variants of the above modeling methods or have combined them. Modified GMMs combined with optimal control algorithms were proposed in[kyrarini2018robot]. GMMs combined with HMMs have been explored and have shown competitive performance for robotic assembly[calinon2010learninghmm].
IvA3 Reproducing phase
After demonstrations are encoded and regression functions are optimized, the desired assembly actions are reproduced in the reproducing phase. The generalization of the learned assembly skills depends on the regression performance. Instead of generalizing the motions with statistical regression methods, such as LWR and locally weighted projection regression (LWPR), directly, GMR derives the regression function with the joint probability density of collected demonstration data. However, the existing LFD methods are at the trajectory level, which is difficult to apply to reproduce more complicated assembly tasks with larger uncertainties. Additionally, the generalization of new circumstances and the robustness against perturbation in addition to reproducing actions require further improvement.
IvB Lfe
The development of highly intelligent control systems with the ability to learn skills autonomously has advanced considerably. A promising direction based on RL has been extensively used to solve challenges related to complicated contactrich assembly tasks[rlhighpercision][zhiminfeedback] [rl2018peginhole]. The core idea of RLbased strategies is that the robot learns and explores the assembly policy actively given a highlevel specification of what to do through the reward interpret mechanism instead of guiding the specific actions explicitly. Furthermore, the robots can achieve incremental learning by interacting with the environment through the smooth combination of the contact model recognition and compliant control process. Recent advances in RL have achieved great success in solving robotic manipulations issues, especially in conjunction with deep neural networks～(DNNs) for parameterizing policies and value functions. As shown in Fig. 6, RL approaches are generally distinguished into typical modelfree and modelbased two main classes according to whether there is a learned model of the dynamic transitions between the robot and environment. Additionally, the integration of modelbased and modelfree techniques has also drawn considerable attention in recent years.
Category  Advantages  Disadvantages  Methods 

Modelfree  No need for prior knowledge of environment  Performance depends on transition model  DDPG 
Easy implementation  Divergence due to the bias of model  DQN (Qlearning)  
Modelbased  Fewer interactions with environments  Less data efficiency  GPS 
Fast convergences to optimal policy  Unstable and dangerous  PILCO 
IvB1 Modelfree RL
Modelfree RL methods aim to learn the optimal policy by simultaneously exploring the stateaction space and estimating a dynamic model from the transitions simultaneously[jan2013reinforcement]. The solution for solving an RL problem can be decomposed into two alternative method families: valuebased methods and policybased methods. A valuebased method was proposed for the first time to learn the value function through the nonlinear function approximation (via NNs), and the discrete assembly action was chosen in a greedy manner[gullapalli1992learning]. A valuebased learning algorithm with a long shortterm memory (LSTM), a variant of a recurrent neural network (RNN) to estimate the value function in order to achieve peginhole assembly with a precision exceeding the resolution of the robots[rlhighpercision]. Additionally, a learning framework for solving the realworld robotic assembly problem was proposed: the output of the RL control system was used as the settings of the lowlevel positionbased force controller instead of controlling the robots directly.
The limitation of the valuebased methods is that the output actions of the RL system can only be discrete and lowdimensional. Policybased methods have been extensively explored in the case of highcompliance robotic applications. The RL method implemented with actor and critic two components was proposed to derive the assembly policy for the actor and critic and was used to evaluate the actions[nuttin1997learning]. As policybased methods have advanced, deterministic policy gradient (DPG) theory was derived in [silver2014deterministic] to achieve differential policy learning with a high stability. Subsequently, deep deterministic policy gradient (DDPG) approaches[duan2016benchmarking] have been developed through combination with DNNs, and these approaches have been widely applied for highcompliance continuous action control applications[zhiminfeedback][ren2018learning].
Policybased methods are appropriate to solve the realworld problems with the continuous and highdimensional actions. The learning of the parameterized policy always converges slowly with a high degree of variance and instability. To date, some studies on improving the stability and efficiency of the DDPG framework for realworld robotic assembly have been published, which allows learning from different samples distribution in an offpolicy scheme. A modeldriven DDPG algorithm was proposed to learn the general assembly policy for multiple peginhole problems
[zhiminfeedback]. As shown in Fig. 6, one contribution of the modeldriven DDPG algorithm is that the learning of the actor network is driven by the basic actions from the simple but practical controller. Additionally, many research studies have focused on incorporating prior knowledge to enhance the efficiency. A DDPG from demonstration (DDPGfD) method was proposed in [vecerik2017leveraging] by inputting human demonstrations into the expert memory buffer, which are reused by a prioritized replay mechanism to enhance policy learning. In contrast to providing a baseline policy with for robots, prior knowledge about the geometric information of assembly parts was used to plan the motion trajectory in [2018learningfromCAD] to guide the policy learning. Basically, those authors focused on the assembly motion planning with geometrical information from a computer aided design (CAD) and utilized the RL algorithm to handle the dynamics of the environment.IvB2 Modelbased RL
In contrast to the typical modelfree RL methods, modelbased methods aim to learn a dynamic model with the stored transitions, and the policies are optimized by deriving the rewards and next state from the learned model[polydoros2017survey]. For complicated manipulations tasks, policy search methods for deriving the optimal policies through interacting with the learned dynamic model directly have shown faster convergence. Guide policy search (GPS) has been developed to learn a couple of manipulation behaviors, as shown in Fig. 6; this method combines a trajectory optimization component and a neural network policy learning component[levine2015learning]. Luo et al. proposed mirror descent GPS (MDGPS) to tackle a complicated assembly task with rigid pegs and deformable holes for use with noncompliant robots and external F/T sensors[rl2018peginhole]. The probabilistic Inference for Learning Control (PILCO)[deisenroth2011pilco] framework employs a Gaussian Process to model the transition dynamics and a linear function to represent the policy, and this framework is a stateoftheart modelbased RL algorithm in terms of the sample efficiency and time efficiency.
Consequently, modelbased RL methods only need to explore a narrower space than the modelfree methods, resulting in faster convergence with fewer interactions with the environment. However, the performance of modelbased RL methods heavily depends on the accuracy of the learned transition dynamic model. Polydoros and Nalpantidis gave an uptodate overview of modelbased RL algorithms and the related robotic applications in [polydoros2017survey]. We summarize the pros and cons of the modelfree and modelbased RL algorithms for robotic peginhole assembly as shown in Table. IV.
IvB3 Integration of modelbased and modelfree methods
Both modelfree and modelbased RL methods have advantages and disadvantages, as summarized in [polydoros2017survey]. Modelfree RL methods can perform the complicated assembly problems prominently with a general and easy implementation way but are less efficient. DDPGbased modelfree algorithms can provide more stable policies and attain the asymptotic performance in some assembly tasks that exceeds the performance of nonsmooth dynamics models. Modelbased RL methods are able to enhance policy learning by utilizing rich transition information. Additionally, modelbased optimal controllers constrain the exploration space to a safe region but often cannot consistently achieve good convergence performance due to a large model bias. In [2018modelfreelearning], Fan et al. analyzed a modelbased RL method (GPS) and a modelfree RL method (DDPG), and then proposed a more efficient framework by combining the modelbased optimal control strategies with a modelfree actorcritic based learning algorithms, as shown in Fig. 6.
Recently, the integration of the strengths of modelbased and modelfree RL methods has been a wellstudied topic for decades. Most of the efforts have focused on smoothing the transition from model learning to policy learning and obtaining more useful information from sample transitions. In [pong2018temporal], the authors introduced a novel strategy called as the temporal difference model (TDM) by training a goalconditional value function with a specific choice of reward and horizon prediction. This model made the robots consider not only reaching a goal state as optimally as possible but also as easily as possible. The TDM conditions was extended in a multistep model study[venkatraman2016multimodel] to not only predict a sequence state in the future but also to reach a possible goal state based on the general value functions idea in[sutton2011horde] by learning rich contextual value functions from one single experience dataset. Additionally, some researchers have focused on exploring how to make full use of the learned dynamic model in addition to the commonly used Dyna architecture[sutton1991dyna] and GPSbased methods[levine2015learning] in order to simulate the entire trajectory every iteration.
V Discussion and conclusion
We have surveyed the remarkable work on robotic peginhole assembly processes and have provided a comparison of different strategies summarized in Table. V. Both contact modelbased and contact modelfree strategies can achieve distinguished performance in some special scenarios. In summary, contact modelbased conventional controllers and LFD methods guarantee safety and efficiency and are suitable for special assembly scenarios after adjustment with preprogramming beforehand. LFE algorithms based on RL are promising for actively and flexibly performing a broad range of complicated assembly process. Similar to human beings decisionmaking systems without tedious programming and rules, RLbased algorithms can remove the specificity engineering of the feedback controller, and they can naturally solve assembly problems with large environmental uncertainties and generalize to new situations.
Category  Contact modelbased  Contact modelfree  

LFD  LFE  
Preprogramming  
Dataefficiency  
Safetyguarantee  
Generalization 
It is clear that it is not possible for robots to perform the peginhole assembly as flexibly as human beings based solely on any single strategy. Although RLbased contact modelfree algorithms have attracted more attention than contact modelbased algorithms and LFD methods, RL is not the main component for deriving an assembly strategy with sufficient robustness and flexibility to perform all the robotic peginhole assembly problems. Furthermore, typical modelfree RLbased methods are still not the suitable way for robotic problems. Consequently, we highlight a couple of open questions in the field of robotic peginhole assembly and propose some potential directions for future research.
Va Open questions in the field of robotic peginhole assembly?
VA1 How can the active compliant control strategies cooperating with passive compliant mechanisms be improved?
With the development of sensing hardware and robotic perception techniques, active compliant control strategies have been extensively explored for robotic peginhole assembly. In addition, highcompliance robots have also been employed develop complicated assembly systems with a simple compliant strategy. Both the improvement of active compliant control strategies and passive compliant mechanisms can promote assembly research. For a peginhole assembly, the large position or force uncertainties can be accommodated by an active compliant control strategy, while smaller uncertainties can be eliminated through improving the compliance of mechanism instead of optimizing the parameters of the active controller. Therefore, the incorporation of active compliant control strategies and passive devices still requires more attention to decide when to optimize the compliant control strategy or modify the passive mechanism.
VA2 How can effective and incremental demonstration learning be realized?
LFD methods provide a solution to perform the robotic peginhole assembly without handcrafted preprogramming according to contact model recognition. Although it is challenging to collect demonstration experiences, it is an essential task for improving not only data efficiency but also the adaptation and generalization of the learned assembly policy. For instance, in an attempt to solve this challenge, DMPs were used as fundamental blocks with RL to learn advanced skills[rldmp2011reinforcement]
. RL is commonly used to obtain the adaptive parameters for robust results. Additionally, to improve the efficiency, better feature extraction methods are required to select better demonstrations and omit undesirable information.
VA3 How can modelbased and modelfree RL algorithms by combined?
It is clear that the integration of modelbased and modelfree RL algorithms is a promising solution to promote the RL based strategies in robotics peginhole assembly, but this issue introduces two key points: how can a perfect dynamic model be learned? and how can robots be made to balance learning from the transition model and learning directly from the environments?
A good transition model representing the dynamics of the environment allows the robots to have a true understanding of the environment, which ensures that the optimal policy can be chosen accurately based on the model. In specific realworld robotic problems, the environment has been explored as a physicsbased model and as a statistical model from experience data, including deterministic models and stochastic models. In the learning process, the statistical model can be considered as a supervised learning problem. Deep learning has achieved a major advances in function approximation, but a low sample efficiency still limits the performance in realworld scenarios. Therefore, one point is how to consider the environmental uncertainties or the existing physics model in transition model learning. Additionally, the transition model can be extended by taking prior domain knowledge, such as expert experience, into account.
As shown in Fig. 6, the robots decide when to interact with the transition model, and the degree of confidence in the transition model greatly affects the quality of the learned assembly skills. Therefore, scalable methods for effectively planning based on the given transition model are still required, in addition to the Dyna architecture[sutton1991dyna] and GPSbased methods [levine2015learning].
VB Potential future work
To combine the strengths of contact modelbased and contact modelfree learning algorithms, we propose the following directions to explore the possible solutions in the field of robotic peginhole assembly.
VB1 Incorporate the knowledge representation method into contact model recognition and transition model learning.
Contact model recognition and transition model learning still require better feature extraction methods for the assembly environment. A promising solution is a knowledge representation method based on general value functions, which was proposed to represent the understanding of environments through learning some simple auxiliary tasks given some prior knowledge.
VB2 Incorporate prior knowledge into learning process
Prior knowledge can be the existing control law as in [zhiminfeedback] or can be interpreted through learning from expert demonstrations. Additionally, predictions about the environments can be learned as GVFs, which can also be considered as prior knowledge for incorporating into the learning process. For instance, prior knowledge can be used to improve the balance of the modelbased and modelfree RL strategies.
VB3 Incorporate physical model into reward shaping for RLbased algorithms
The solution of robotic peginhole assembly problems through RLbased algorithms holds great promise. However, most of realworld problems are sometimes difficult to interpret with reward signals, and unpredictable exploration and dangerous actions need to be reduced. How well the designed reward mechanism shapes the assembly problems affects the the quality and efficiency of learning. For instance, Xu et al.[zhiminfeedback] investigated a fuzzy reward system to take more prior knowledge into account. The Inverse RL method was utilized to derive the rewards from the observed expert behaviors, thereby exploiting the knowledge of human beings[abbeel2004apprenticeship]. Therefore, the physicalmodel including the geometric information on parts and a mature friction model can be considered as the implicit constraint on the design of reward mechanism. Additionally, instead of the constant reward signals, the rewardbased mechanism can be updated by evaluating a highlevel objective function according to the designer’s final goal, which means that the robots receive different evaluation feedback at different stages.
Acknowledgment
The authors would like to thank…
Comments
There are no comments yet.