Endowing robots with the ability to autonomously assemble enormous products from their full sets of parts has the potential to profoundly change the manufacturing industry. Although the automation level has increased in the last decades, autonomous robotic assembly remains a challenging task. A typical robotic assembly pipeline consists of action sequences repeating the following stages: (1) picking up a particular part, (2) finding a feasible collision-free path to move it to an appropriate 6D pose, (3) mating it precisely with the other parts, and (4) placing the assembled parts to the ground and preparing for the next pickup movement. Robots need to know which part to choose for the attachment at each time and in what order to assemble all the parts into a functionally and physically valid shape. To connect two parts, robots not only need to master multiple contact-rich manipulation primitive skills such as grasping, placing, and object re-orientation, but also manage complex multi-arms or multi-robots cooperation.
Many previous works in robotics and computer vision attempted to tackle this challenging problem from different aspects. One line of works ignores the physical interaction and focuses on estimating the desired poses[15, 9]. Other works teach robots to master complex object manipulation skills, such as object grasping [18, 25], object reorientation ,peg-insertion [29, 16], tool manipulation . Some works formulate the assembly as a planning problem and propose various approaches [6, 13, 7]. Suárez-Ruiz et al.  demonstrate that, based on state-of-the-art robotic capabilities, it is possible to physically assemble an IKEA chair with two robots, soliciting all manipulation skills. However, their sequence was hard-coded through considerable engineering efforts. Recently, the IKEA Furniture Assembly Environment , most similar to our work, simulates complex long-term robotic manipulation tasks with 80 furniture models using different robots based on Mujoco . The environment leverages the weld equality constraint provided by the Mojuco  to merge two select parts as long as they are near to each other within a threshold.
Our environment uses multi-robots to assemble furniture. Multi-robots greatly expand the working space compared to a single-arm robot with a fixed base and can perform more complex manipulation operations. Motion-planning modules are integrated into our simulation environments to enable the multi-robots to perform complicated manipulation operations. Our environment also considers strict collision constraints in robotic manipulation operations. We build a dataset with 220 chairs from the PartNet dataset 
, annotated with additional connection points and graspable regions, as our simulation assets, to investigate robotic assembly over realistic and complicated shapes. Moreover, we formulate the assembly process as a concrete and feasible reinforcement learning problem. We combine a structured representation with a model-free RL algorithm that assembles a diverse set of chairs. We leverage the imitation learning process to train a unified policy network that can assemble various shapes. Experiments show that our approach achieves a 74.5% success rate under an object-centric setting and a 50% success rate under the full setting when testing unseen chairs. We adopt a motion planning algorithm called RRT-Connect which only achieves an 18.8% with significantly more planning steps.
In summary, our contributions are: 1) we develop a robotic simulation environment for furniture assembly, 2) we formulate the furniture assembly process as a concrete and feasible reinforcement learning problem, 3) we propose a novel policy learning pipeline to assemble a diverse set of chairs with multi-robots, and 4) we demonstrate that the learned policy also generalizes to unseen novel shapes.
Ii Related Work
Ii-a Simulation Environments for Robotic Manipulation
Most robot manipulation simulations often concentrate on solving specific manipulation problems, such as grasping [19, 17] and in-hand manipulation [1, 24, 34]. The Recent MetaWorld  and RLBench  contain a variety of manipulation tasks for robot learning. But these tasks are all short-term object manipulation tasks. The work most related to ours is the IKEA Furniture Assembly Environment . It simulates complex long-term robotic manipulation tasks with 80 furniture models and different robots. Unlike the IKEA environment , our simulation environment supports the multi-robots assembly setting. Multiple 3DoF mobile platforms mounted with 7-DoF robotic arms shown in Fig.1 are designed. They significantly expand the working space during the assembly process. To enable them to navigate, manipulate, and co-operate to successfully assemble furniture, motion-planning modules are integrated into our simulation environments. Usually, motion planning modules take time to compute; we improve the speed so that we can run popular deep reinforcement learning algorithms within reasonable time periods. When merging two select parts, the IKEA furniture environment leverages the weld equality constraint provided by the Mojuco  to merge two parts as long as they are near to each other within a threshold. Our simulation environment considers more strict part attachment processes. Our environment simulates the precise collision-checking and rigid-body contact processes. It has realistic collision constraints in robotic manipulation operations such as pick-and-place and executing motion trajectories.
Ii-B Assembly Planning
Assembly is a long-horizon manipulation task, and many works formulate it as a planning problem focusing on finding valid sequences to assemble a product from its parts. Natarajan  shows that the complexity of deciding the existence of an assembly sequence is PSPACE-hard in general. Kavraki and Kolountzakis  demonstrate that partitioning a planar assembly into two connected parts remains NP-complete. However, several assembly strategies have been investigated. Halperin et al.  presents a general framework for finding assembly motions based on the concept of motion space where each point represents a mating motion independent of the moving part set. Lee  develops a backward assembly planner to reduce the search space by merging and grouping parts based on interconnection feasibility. More recently, Hartmann et al.  integrate task and motion planning (TAMP)  in robotic assembly planning and propose a rearrangement pipeline for construction assembly. Different from these planning approaches, we formulate the assembly process as a reinforcement learning problem and train a model-free RL policy learning to assemble a diverse set of chairs and generalize to unseen chairs.
Ii-C Learning to Assemble
Thanks to the recently proposed large-scale object-part datasets [20, 32], recent works [15, 8] in computer vision have investigated 6-DoF pose prediction for 3D shape part assembly. Li et al.  learns to assemble 3D shapes from a single image as guidance. Huang et al.  removes the external image guidance and learns a graph-based generative model. However, these works tackle the part assembly problem from the perception side; the robotic assembly task in the real world is much more complex. Robots need to decide how to pick up the parts and find feasible collision-free paths to mate two parts together, which is the main goal of this work.
Iii Simulation Environment and Assets
Using bullet , we build a robotic assembly simulation environment where two robots need to pick up two parts sequentially, plan collision-free motion trajectories, mate two parts precisely, and return the next pickup movement. We adopt 220 chairs from the PartNet dataset , adjust the part relative poses to satisfy collision and contact conditions, and annotate connection points and graspable regions used as our simulation assets to study the robotic assembly task.
Iii-a Simulation Environment
We design two settings which are the object-centric setting and the full setting. In the object-centric setting, no robots are loaded. We allow users to directly control each part to be moved to a pose specified by users. We describe what type of actions are provided to merge two parts in Sec.IV-C. After the merging process, the merged parts are positioned on the ground. We describe the penalty function for collision and incorrect merging action in the Sec.IV-C.
In the full setting, two robots are loaded in the simulation. Each of them is a seven DoF Franka-Arm mounted on a three DoF mobile platform. These two robots significantly improve the working space for the assembly process. The robots would pick up parts, merge them, and place the merged parts sequentially. There are several fixed-base holders in the environment to hold the parts when these parts are not grasped by the robots. We integrate the motion-planning modules in our simulation environment to control the two robots to navigate, cooperate, and conduct complicated manipulation operations. We integrate the OMPL library  in our simulation environment and run the RRT-connect algorithm  to search for a feasible path from the initial configuration to the goal configuration. To speed up the simulation process, we replace the pybullet default collision checking function with FCL  and re-implement the checking modules in the FCL to work for rigid articulation bodies. The simulation process is sped up to the degree that popular deep reinforcement learning algorithms can be applied in our simulation environment. We describe how to merge two parts and the corresponding reward function in Sec.IV-C.
Due to the length limitation, videos illustrating the simulation environments can be found on our project webpage: https://sites.google.com/view/roboticassembly
Iii-B Data Assets
We select 220 chairs from PartNet, randomly split them into the training set and test set. Based on the assembly difficulty, the training set is further divided into two subsets: Easy Train and Hard Train. Examples for each level can be found in Fig.4.
We make adjustments to these meshes to satisfy collision-free constraints in our task. Leveraging the annotation from PartNet , we get a fine-grained segmentation for each part. In PartNet, each part’s local frame is set to be the center of the chair, rather than each part’s own center of mass. We set each part’s local frame on its center of mass. We then adjust the relative pose between each part so that there is no penetration between the two parts.
Additional annotations include the connection points between two parts and graspable regions, as shown in Fig.3
. We first calculate the minimum distance between any two parts/meshes of a chair. Once the minimum distance between two parts is detected less than a threshold of 5mm, the two parts are considered as connected. If two parts are treated as connected, we annotate the two points corresponding to the minimum distance between these two parts as the connection points. Then we define a set of normal vector and tangent vector at the contact surface as shown in Fig.3. One connection point on a part is depicted by a 9-dim vector: connection point position relative to the part center, its normal vector, and its tangent vector. We also annotate the graspable regions for robots manually. A feasible graspable region must allow collision-free grasp, and each part is annotated with two graspable regions.
Iv Reinforcement Learning Formulation
We formulate the robotic assembly task as a reinforcement learning problem. In this section, we describe the detailed definitions of states , actions , rewards , and step function as follows.
We first introduce the notations used in the paper. Assume a chair has parts. Each part in the chair is assigned with a part id where . We adopt the part id assignment order from the PartNet  description. For each part, our simulation provides its geometry description represented as a mesh file.
Each part has a predefined connection id list where . Here is the total number of connection points on the part . As shown in the Fig.3, for two connected parts, we annotate the corresponding connection point information on each part’s mesh file. The connection information on each part contains the contact point 3D position based on object frame and the local xyz-coordinates with the contact point set as the origin denoted as . Note that all the 6D poses or transformations in this work are composed of 3D translation and 3D orientation represented as the Euler angles. Besides the connection information, we also annotate the graspable region on each part. These graspable regions are used for robotic grasping.
At the beginning of each learning episode, we load a chair’s full set of parts into our simulation and distribute them randomly on the ground. At each time step, some parts are moved to new poses. To record the part connection status at each time step, we maintain a part connection status tensorwith a shape of . If the part and part is not connected, the and are set to be a six-dimensional zero vector. If the part and part is connected, the and are set to be the relative transformation between these two parts denoted as and , respectively.
The state in our setting is composed of each part’s status. The part status has two main aspects. One is the geometry description, which includes the shape of the part denoted as , the connection information on each part, and the grasping region on each part. The other is the spatial and connection relationship among these parts. The absolute pose of the part at the current time step denoted as is provided. The relative pose transformation between two parts and can be directly calculated denoted as and . The connection status between two parts could be either “Connected” or “Not Connected.” Here if two parts, and , are recorded to be “Connected,” it indicates that these two parts would be rigidly fixed together and they would share the same rigid 6D transformation. If the connection status between two parts is ”Not-Connected”, these two parts could be moved independently. The part connection status tensor is provided. By looping the items in the tensor , the agent could figure out which parts are connected rigidly into a rigid group. Note that the number of groups varies during the assembly process. In the beginning, each part is considered as a rigid group. When the chair is successfully assembled, there is only one rigid group.
In summary, the states to assemble the chair with parts are
Here we provide two types of assembly settings: 1) the object-centric setting, which does not involve the robots and does not need to consider robotic grasping and robot’s motion collision; 2) the full setting, which contains two robots to grasp and connect parts.
At each time step, we need to select which two parts should be connected through which two connection point. Denote the two selected part ids as and . The part and has its own connection id list and . Denote the two selected connection id as and . The connection point represented by on part is connected to the point represented by on part .
Besides the above four parameters which are two selected part ids and two selected connection ids, we also need to decide how to move these two and to be connected in the simulation.
In the object-centric setting, we would move the from its current pose to the target pose connecting to . As illustrated in Fig.4, due to the existence of ground, not all configurations of the group of allow enough space for to be connected. Thus, we provide an extra action denoted as to indicate how to rotate the part . . So the group of is able to reselect its orientation as action before path-planning. With a correct selection of , there is a collision-free path to move the part from its current pose to the relative pose to be merged toward part . Therefore, the action space in the object-centric setup case has five dimensions , which are two selected parts, two selected connection ids, the pose id of the second selected part during mating. Once these two parts are attached, the relative pose between them is fixed, and they’re free to move as a group under gravity.
In the full setting, after we have selected the two parts and . We need to select the grasp approach direction denoted as . All parts can be grasped by the two robots. After grasping a part, the robot could change the part absolute pose in the world frame. Therefore, the action space in the full setting has six dimensions , which are two selected parts, two selected connection ids, two grasping approach directions. Once these two parts are attached, the relative pose between them is fixed, and they are moved as a group by one robot. The merged group is then placed in a specific region.
Iv-C Step Function and Rewards
Based on the current state and the selected action , the step function describes what is the next state and what type of reward are returned. If two parts are successfully merged, the agent receives a reward of one point. Otherwise, the agent receives zero reward and the process terminates. If the chair is fully assembled, the agent receives an extra bonus reward of four points. Next, we discuss the detailed processes in our simulation environment. One challenge in the step function is how to deal with the symmetry parts such as the four legs shown in Fig.1. Note that for those symmetry parts, their local connection annotations and grasping region annotations are all the same. After selecting two parts and , we would need to gather all parts which are geometrically equivalent to , denoted as . Note that the action at the object-centric setting is where two selected part ids and are two selected connection ids. We verify whether the two parts are able to be connected according to the selected two part ids and connection ids. We search the ground-truth assembled chair to find the part denoted as the which is connected to the part through the selected connection id . Accordingly, we find the part denoted as the which is connected to the part through the selected connection id . If is also in the and the select part is actually , these selected two parts and two connection points are reasonable without considering the physical interaction constraints.
In the object-centric setting, the selected part would be set to the pose decided by the fifth action . We run the motion-planning module to check whether there is a collision-free path so that the part could be attached to the part .
In the full setting, denote our two robots as and . The remaining two actions represent the two grasping directions for the robots and , respectively. We first use the motion-planning module to query whether there are feasible collision-free paths to grasp the two parts according to the grasping approach directions. If there are no such paths, the step function returns false and a reward of zero. For example, a chair leg connected with a seat cannot be grasped from the seat side because the seat itself has blocked any approach direction to reach the leg. If there are feasible paths, in the simulation, we execute the two robots to grasp the two parts and . Thereafter, these two robots would move to two random collision-free positions by moving their mobile bases and maintaining the same configurations of other joints. We use the motion-planning module to find a feasible collision-free path to move the robot to mate the part to the part grasped by the robot . If there is no feasible path, the step function returns false and a reward of zero. As the example illustrated in Fig.4, the vertical bar on the back cannot be moved to its target pose unless colliding with either horizontal bar above or the seat. If there are feasible paths, the robots execute the paths in the simulation. Then the robot releases its gripper. We utilize the motion-planning module to find a path to move the new merged part grasped by the robot and place it to one stand as shown in the video. After placing the merged part, the robot and move to new random positions. The step function is then finished, a reward of one is returned.
V Technical Approach
In the section, we describe how to train a unified multi-task model to generate actions to assemble different chairs. We first introduce the processes of learning a single-task policy to assemble one chair and then describe how we develop the multi-task model.
V-a Single-task Policy
Consider the process to assemble a chair with the total parts. At each time step, the state is . As described in Sec IV-B, the action in the object-centric setup and the full setup are and , respectively. In both setups, the actions are discrete. Thus, we adopt the Double-DQN algorithm (DDQN)  to learn the policy.
We first train a model which extracts the geometry feature from the point clouds. The model is an Auto-Encoder based on the PointNet  and we adopt the Chamfer distance loss  to optimize the model’s weights. In order to train the Auto-Encoder, we create a large set of point clouds sampled from all chairs in PartNet . We then fix the model’s weight and use it as a feature extractor.
We sample a point cloud of each part at its beginning pose denoted as . Then we feed the point cloud into the above model to extract a geometry feature for each part denoted as .
After gathering the vector , we directly concatenate these vectors with the rest items and the part connection matrix . Note that all items are reshaped as 1-dim vector before the above concatenation.
Note that different chairs may have different numbers of parts, and different parts may have different numbers of connection points. We set two assumptions: a chair could only have a maximum number of 20 parts; a part could only have a maximum number of 10 connection points. These two assumptions hold within our dataset. We think they are, in general, reasonable. We add zero padding in the concatenated vector if any chair has less than 20 parts or any part has less than 10 connection points. The concatenated vector is then fed into multiple Multilayer Perceptron (MLP)s to extract a global feature. The feature is decoded to generate the Q values associated with different actions.
V-B Multi-task Policy
We denote these chairs in the training set as . After training single-task policies, we now have expert models for each chair . We then use these expert models to train a multi-task model through imitation learning. Denote Q-function which guides to assemble a chair as . Note that for discrete action, the is a vector with a size equaling to the action space’s dimension.
We aim to learn a Q-function denoted as that generates Q values when various chairs are assembled. For each chair in the training set, we generate large amounts of training pairs with state as the inputs and the corresponding Q values as the ground truth annotation. We also apply data-augmentation methods such as adding noise to the point clouds of the data we collected from successful single-task policies.
We then adopt two types of loss functions to guide the weight updates. The first loss shown as in Eqn.V-B is a mean square loss between the predicted Q values and the ground-truth Q values. To successfully assemble these parts into a chair, our model at each time step select the action associated with the highest Q value. We adopt a second loss to update the weights such that the ground-truth action index has the highest Q value in our predicted models.
In this work, we develop a framework for robots to learn to assemble diverse chairs. Our experiments focus on evaluating how effective is our proposed learning-to-assemble approach compared with other baselines.
Vi-a Baseline: Motion Planning
For the baseline, we adopt a motion planning (MP) algorithm to find the solution path to assemble each chair. The dimension of the state is , where is the total number of the parts from the chair . Note that we have access to the ground-truth 6D pose of each part when the chair is successfully assembled. These 6D poses together become the goal state in our baseline. In the beginning, all parts are positioned at their initial poses as in our object-centric setting. We run the RRT-connect  algorithm to find a collision-free path in the state space, which moves all parts from their initial 6D poses to their goal poses. The maximum number of searching operations in the state space is 100k for each chair. If no solution is found after checking 100k states in the searching space, the assembly process will be considered a failure. Otherwise, we will get the total planning steps for a successful assembly process.
Vi-B Evaluation Metric
We adopt two different evaluation metrics, which are the success rate and planning steps.
Success Rate refers to the proportion of successfully assemble a complete chair among all chairs.
Planning Steps evaluate the total number of attempted states in the searching space during the entire planning process for assembling a chair. It reflects the computation speed of the planning process. The smaller the planning step is, the faster the module is able to find a feasible path.
Vi-C Experiment Results and Analysis
We split the training dataset into the easy training set and the hard training set, according to Sec. III-B. We train one single-task policy for each chair from the training set under the object-centric and the full setting. The maximum training step is set to be 40 thousand for both easy and hard training sets. Then we evaluate the trained model on the corresponding chair it is trained on to report whether it can complete the assembling process. The results are summarized in Table.I
. Under the object-centric setting, our approach achieves a success rate of 66.4% on the easy training set and 79.4% on the hard training set. After training single-task policies, we train a multi-task policy under object-centric setup. In the process, we adopt 640 successful single-task policies after data augmentation and test the multi-task policy on 192 unseen chairs. The agent achieves a 74.5% success rate under the object-centric setting, while baseline can only handle 18.8% on the same set of chairs. We observe that our multi-task policy can generalize to unseen chairs and outperform the baseline. For successfully assembled chairs, our method is able to plan a series of paths within two minutes on average while baseline requires three times longer. We also check the planning steps for the trained multi-task object-centric model and baseline to assemble an unseen chair. We observe the agent under object-centric setting can assemble a chair within 37.8k steps on average, outperforming baseline’s more than 90k steps. This indicates that applying our model to an unseen chair has a higher probability of success and requires much fewer planning steps.
For the full setting, our approach achieves a success rate of 59.3% on the training set and 25.8% on the hard training set. Thereafter, a multi-task policy is trained using 84 successful single-task policies and then tested on 16 unseen chairs. Our approach under full setting achieves a success rate of 50.0%. Our model has the ability to generalize to unseen chairs. Videos of the assembly processes are available on our project webpage: https://sites.google.com/view/roboticassembly.
|Ours (OC)||Easy train||79.4||-|
|Ours (OC)||Hard train||66.4||-|
|Ours (F)||Easy train||59.3||-|
|Ours (F)||Hard train||25.8||-|
|Ours (OC)||Test set||74.5||37.8k|
|Ours (F)||Test set||50.0||-|
We develop a robotic assembly simulation environment that supports multi-robots. Based on our simulation environment, we formulate the assembly problem as a concrete reinforcement learning problem and develop a deep reinforcement learning model to successfully assemble a diverse set of chairs. Experiments indicate that when testing with unseen chairs, our approach achieves a success rate of 74.5% under our object-centric setting and 50% under our full setting. Our approach outperforms the RRT-Connect baseline by a large margin regarding the success rate and the computation speed.
-  (2020) Learning dexterous in-hand manipulation. The International Journal of Robotics Research 39 (1), pp. 3–20. Cited by: §II-A.
-  (2021) Learning to regrasp by learning to place. In 5th Annual Conference on Robot Learning, External Links: Cited by: §I.
PyBullet, a python module for physics simulation for games, robotics and machine learning. Note: http://pybullet.org Cited by: §III.
A point set generation network for 3d object reconstruction from a single image.
Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 605–613. Cited by: §V-A.
-  (2021) Integrated task and motion planning. Annual review of control, robotics, and autonomous systems 4, pp. 265–293. Cited by: §II-B.
-  (1998) A general framework for assembly planning: the motion space approach. In Proceedings of the Fourteenth Annual Symposium on Computational Geometry, SCG ’98, New York, NY, USA, pp. 9–18. External Links: Cited by: §I, §II-B.
-  (2021) Long-horizon multi-robot rearrangement planning for construction assembly. arXiv preprint arXiv:2106.02489. Cited by: §I, §II-B.
-  (2020) Generative 3d part assembly via dynamic graph learning. Neural Information Processing Systems. Cited by: §II-C.
-  (2020) Generative 3d part assembly via dynamic graph learning. In The IEEE Conference on Neural Information Processing Systems (NeurIPS), Cited by: §I.
-  (2020) Rlbench: the robot learning benchmark & learning environment. IEEE Robotics and Automation Letters 5 (2), pp. 3019–3026. Cited by: §II-A.
-  (1995) Partitioning a planar assembly into two connected parts is np-complete. Information Processing Letters 55 (3), pp. 159–165. External Links: Cited by: §II-B.
-  (2000) RRT-connect: an efficient approach to single-query path planning. In Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065), Vol. 2, pp. 995–1001 vol.2. External Links: Cited by: RoboAssembly: Learning Generalizable Furniture Assembly Policy in a Novel Multi-robot Contact-rich Simulation Environment, §I, §III-A, §VI-A.
-  (1992) Backward assembly planning with assembly cost analysis. Proceedings 1992 IEEE International Conference on Robotics and Automation, pp. 2382–2391 vol.3. Cited by: §I, §II-B.
-  (2021) IKEA furniture assembly environment for long-horizon complex manipulation tasks. In IEEE International Conference on Robotics and Automation (ICRA), External Links: Cited by: §I, §II-A.
-  (2020) Learning 3d part assembly from a single image. In European Conference on Computer Vision, pp. 664–682. Cited by: §I, §II-C.
-  (2019) Reinforcement learning on variable impedance controller for high-precision robotic assembly. In 2019 International Conference on Robotics and Automation (ICRA), pp. 3080–3087. Cited by: §I.
-  (2017-13–15 Nov) Learning deep policies for robot bin picking by simulating robust grasping sequences. In Proceedings of the 1st Annual Conference on Robot Learning, S. Levine, V. Vanhoucke, and K. Goldberg (Eds.), Proceedings of Machine Learning Research, Vol. 78, , pp. 515–524. Cited by: §II-A.
Dex-net 2.0: deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. Cited by: §I.
-  (2004-dec.) Graspit! a versatile simulator for robotic grasping. Robotics Automation Magazine, IEEE 11 (4), pp. 110 – 122. External Links: Cited by: §II-A.
-  (2019) Partnet: a large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 909–918. Cited by: §I, §II-C, §III-B, §III, §IV, §V-A.
-  (1988) On planning assemblies. In Proceedings of the Fourth Annual Symposium on Computational Geometry, SCG ’88, New York, NY, USA, pp. 299–308. External Links: Cited by: §II-B.
-  (2012) FCL: a general purpose library for collision and proximity queries. In 2012 IEEE International Conference on Robotics and Automation, Vol. , pp. 3859–3866. External Links: Cited by: §III-A.
-  (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 652–660. Cited by: §V-A.
-  (2017) Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087. Cited by: §II-A.
-  (2020) Unigrasp: learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5 (2), pp. 2286–2293. Cited by: §I.
-  (2020) Learning to scaffold the development of robotic manipulation skills. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 5671–5677. Cited by: §I.
-  (2018) Can robots assemble an ikea chair?. Science Robotics 3 (17), pp. eaat6385. Cited by: §I.
-  (2012-12) The Open Motion Planning Library. IEEE Robotics & Automation Magazine 19 (4), pp. 72–82. Note: https://ompl.kavrakilab.org External Links: Cited by: §III-A.
-  (2018) Learning robotic assembly from cad. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 3524–3531. Cited by: §I.
-  (2012) MuJoCo: a physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vol. , pp. 5026–5033. External Links: Cited by: §I, §II-A.
-  (2016) Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30. Cited by: §V-A.
-  (2016) A scalable active framework for region annotation in 3d shape collections. ACM Transactions on Graphics (ToG) 35 (6), pp. 1–12. Cited by: §II-C.
-  (2020) Meta-world: a benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on Robot Learning, pp. 1094–1100. Cited by: §II-A.
-  (2020) Design and control of roller grasper v2 for in-hand manipulation. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Cited by: §II-A.