Assembly is an essential, but highly challenging area of manufacturing. It includes a diverse range of operations, from peg insertion, electrical connector insertion, and threaded fastener mating (e.g., tightening nuts and bolts), to wire processing, cable routing, and soldering [9, 42]. These operations are ubiquitous across the automotive, aerospace, electronics, and medical industries . However, assembly has been exceptionally difficult to automate due to physical complexity, part variability, and strict reliability requirements .
In industry, robotic assembly methods may achieve high precision, accuracy, and reliability [9, 52, 98]. However, these methods can be highly restrictive. They often use expensive equipment, require custom fixtures, have high setup times (e.g., tooling design, waypoint definition, parameter tuning) and cycle times, and are sensitive to variation (e.g., part appearance, location). Custom tooling and part-specific engineering are also cost-prohibitive for high-mix, low-volume settings . In research, methods for robotic assembly often use less-expensive equipment, require fewer custom fixtures, achieve increased robustness to variation, and may recover from failure [35, 56, 90]. Nevertheless, these methods often have lower reliability, higher setup times (e.g., demo collection, real-world training, parameter tuning), and/or higher cycle times.
Meanwhile, physics simulation has become a powerful tool for robotics development. Simulators have primarily been used to verify and validate robot designs and algorithms . Recent research has demonstrated a host of other applications: creating training environments for virtual robots [20, 50, 91, 105], generating large-scale grasping datasets [21, 33], inferring real-world material parameters [70, 71], simulating tactile sensors [73, 84, 100], and training reinforcement learning (RL) agents for manipulation and locomotion [13, 77]. Compelling works have now shown that RL policies trained in simulation can be transferred to the real world [2, 4, 12, 27, 62, 81, 103].
Nevertheless, the power of physics simulation has not substantially impacted robotic assembly. For assembly, a simulator must accurately and efficiently simulate contact-rich interactions, a longstanding challenge in robotics [14, 31, 53, 109], particularly for geometrically-complex, tight-clearance bodies. For instance, consider the canonical nut-and-bolt assembly task. Real-world nuts and bolts have finite clearances between their threads, thus experiencing -DOF relative motion rather than pure helical motion. To simulate real-world motion phases (e.g., initial mating, rundown) and associated pathologies (e.g., cross-threading, jamming) , collisions between the threads must be simulated. However, high-quality surface meshes for a nut-and-bolt may consist of triangles; a naive collision scheme may easily exceed memory and compute limits. Moreover, for RL training, a numerical solver may need to satisfy non-penetration constraints for environments in real-time (i.e., at the same rate as the underlying physical dynamics). Despite the omnipresence of threaded fasteners in the world, no existing simulator achieves this performance.
In this work, we present Factory, a set of physics simulation methods and robot learning tools for such interactions (Fig. 1). Specifically, we contribute the following:
A physics simulation module for fast, accurate simulations of contact-rich interactions through a novel synthesis of signed distance function (SDF)-based collisions, contact reduction, and a Gauss-Seidel solver. The module is accessible within the PhysX physics engine  and Isaac Gym . We demonstrate simulator performance on a wide range of challenging scenes. As an example, we simulate simultaneous nut-and-bolt assemblies in real-time on a single GPU, whereas the prior state-of-the-art was a single nut-and-bolt assembly at real-time.
A robot learning suite consisting of a Franka robot and all rigid-body assemblies from the NIST Assembly Task Board , the established benchmark for robotic assembly . The suite includes carefully-designed assets, robotic assembly environments, and classical robot controllers. The suite is accessible within Isaac Gym. User-defined assets, environments, and controllers can be added and simulated as desired.
Proof-of-concept RL policies for a simulated Franka robot to solve the most contact-rich task on the NIST board, nut-and-bolt assembly. The policies are trained and tested in Isaac Gym. The contact forces generated during policy execution are compared to literature values from the real world and show strong consistency.
We aim for Factory to greatly accelerate research and development in robotic assembly, as well as serve as a powerful tool for contact-rich simulation of any kind.
Ii Related Work
Ii-a Contact-Rich Simulation
A longstanding challenge in contact-rich simulation is fast, accurate, and robust contact generation, as well as solution of non-penetration constraints. We have found that achieving such performance requires careful consideration of 1) geometric representations, 2) contact reduction schemes, and 3) numerical solvers. Here we review primary options for each, as well as prior results on a challenging benchmark.
Ii-A1 Geometric Representations
There are 5 major geometric representations in physics simulation for graphics: convex hulls, convex decompositions, triangular meshes (trimeshes), tetrahedral meshes (tetmeshes), and SDFs (Fig. S10).
Convex hulls cannot accurately approximate complex object geometries, such as threaded fasteners with concavities. Convex decompositions address this issue by approximating the input shape using multiple convex hulls, generated with algorithms such as V-HACD . While an improvement on single convex hulls, these decompositions can produce spatial artifacts on complex geometries (Fig. S11). Even for perfect decompositions, the number of collision pairs to test during contact generation scales as (where is the number of convex shapes), impacting memory and performance. Since contacts are generated between convex shapes, undesirable contact normals can be generated, and snagging may occur.
Trimeshes can provide a near-exact approximation of complex geometries. However, the number of collision pairs for contact generation scales as (where is the number of triangles), again impacting memory and performance. In addition, since triangles have zero volume, penetrations can be difficult to resolve, motivating techniques such as boundary-layer expanded meshes . Tetmeshes can mitigate such penetration issues, but high quality tetrahedral meshing is challenging. Tetrahedra may have extreme aspect ratios in high-detail areas, leading to inaccurate collision checks.
SDFs, which map points to distance-to-a-surface, can provide accurate implicit representations of complex geometries. They enable fast lookups of distances, as well as efficient computation of gradients to define contact normals. However, using SDFs for collisions requires precomputing SDFs offline from a mesh, which can be time- and memory-intensive. Moreover, collision schemes typically test the vertices of a trimesh against the SDF to generate contacts. For sharp objects, simply sampling vertices can cause penetration to occur, motivating iterative per-triangle contact generation .
We use discrete, voxel-based SDFs as our geometric representation and demonstrate that they provide efficient, robust collision detection for challenging assets in robotic assembly.
Ii-A2 Contact Reduction
Contact reduction is a powerful technique from game physics for reducing the total number of contacts without compromising simulation accuracy. A naive contact generation scheme between a nut and bolt may generate contacts; a careful one may generate
. Excessive contacts can impact both memory and stability, and per-contact memory requirements cannot easily be reduced. Since the rigid-body mechanics principle of equivalent systems dictates that a set of distributed forces can be replaced by a smaller set of forces (constrained to produce the same net force and moment), accurate dynamics can be preserved.
Specifically, contact reduction consists of methods for preprocessing, clustering, and maintaining temporal persistence of contacts. We pay particular attention to contact clustering
, which generates bins for contacts, reduces the bins, and reduces the contacts within each bin using heuristics. The two most common heuristics are normal similarity and penetration depth. Normal similarity assigns contacts with similar surface normals to the same bin, culls similar bins, and culls similar contacts, and is often implemented with -means or cube map algorithms [76, 80, 26]. Penetration depth culls bins and contacts with negligible penetration, as these often have minimal impact on dynamics. Data-driven methods train networks to perform the reduction , but require a separate data collection and learning phase to be effective.
We combine normal similarity, penetration depth, and an area-based metric to reduce contacts and demonstrate the resulting dynamics across numerous evaluation scenes.
There are 4 major options for solvers in physics simulation: direct, conjugate gradient/conjugate residual (CG/CR), Jacobi, and Gauss-Seidel. We briefly review these solution methods below; see  for a detailed treatment.
Direct solvers, which may rely on matrix pivoting, do not scale well to large sets of contact constraints, such as the constraints between a nut and bolt mesh. CG/CR methods can handle larger sets of constraints, but are still unable to achieve real-time performance for complex scenarios. Gauss-Seidel solvers are robust and converge quickly for well-conditioned problems, but perform serial operations that do not scale well to large sets of constraints. Jacobi solvers perform parallel operations that can leverage GPU acceleration, but converge slower than the aforementioned techniques.
A naive implementation of Jacobi can outperform Gauss-Seidel for simulating numerous contact-rich interactions. However, we show that contact reduction can greatly accelerate Gauss-Seidel, achieving better performance than Jacobi.
Ii-A4 Benchmark Problem
We pay special attention to the problem of accurately and efficiently simulating a nut-and-bolt assembly (Fig. 2). In the simulation community, this problem has emerged as a canonical challenge in contact-rich simulation. Moreover, in robotic assembly, tightening nuts onto bolts is critically important, as of all mechanical assembly operations involve screws and bolts .
The 3D finite element method (FEM) is the gold standard for accurate simulation of nut-and-bolt models, capturing deformation phenomena such as pretension (). However, FEM simulation of a single nut-bolt pair may take to execute on CPU. As our aim is real-time (or faster) simulation to enable learning methods such as RL, we focus on rigid-body simulation prevalent in graphics and robotics.
In , rigid-body contact was simulated using semi-implicit integration, symbolic Gaussian elimination, analytical contact gradients, and an SVD solver. A dynamic scene was simulated of screwing a hex bolt into a threaded hole at real-time ( for ). The proposed method was compared against Bullet (2012) , and a speed-up was measured for a stable quasi-static version of the scene. In , rigid-body contact was simulated using a smoothed particle hydrodynamics solver that samples rigid surfaces with particles, models contacts as density deviations, and computes pressure-based contact forces implicitly. A quasi-static nut-and-bolt scene was simulated at real-time ( for ). The proposed method was also compared against Bullet (2018), and a speed-up was measured for a stable scene.
In , rigid-body contact was simulated using an extension of incremental potential contact (IPC)  and continuous collision detection (CCD) for curved trajectories. A dynamic, frictionless nut-and-bolt scene was simulated at real-time ( for ). The proposed method was compared against Bullet (2019), MuJoCo , Chrono , and Houdini RBD . Instabilities were observed with Bullet and MuJoCo, and interpenetrations were observed with Chrono and Houdini. In , rigid-body contact was simulated using a configuration-space contact model. A single large M48 ( diameter) nut-and-bolt was simulated at real-time; however, data and evaluations were limited. Informally, an industry group simulated a dynamic nut-and-bolt scene at real-time with variational integrators [10, 45].
From the previous works, we define the established state-of-the-art for a general-purpose physics simulator that can simulate a nut-and-bolt assembly to be real-time for a single nut-bolt pair. Although fast, such speeds are not optimal for applications such as RL, which benefit from simulation of contact-rich interactions in real-time. In this work, we demonstrate real-time simulation of nuts-and-bolts, as well as exceptional speeds on other challenging scenes.
Ii-B Robotic Assembly Simulation
Here we review previous efforts to simulate robotic assembly, as well as efforts to use these simulators and/or real-world training for RL. For another recent review, see .
Research efforts such as [106, 25, 23] have demonstrated advances in contact-rich simulation, often in comparison to physics engines used in robotics. However, the majority of robotics studies use PyBullet, MuJoCo, or Isaac Gym, as they provide robot importers, simulated sensors, parallel simulation, and learning-friendly APIs. Consequently, existing work in simulation for robotic assembly is largely limited to the performance of these simulators.
Efforts using MuJoCo or the robosuite extension  include [19, 22, 29, 37, 38, 48, 82, 88, 96, 97, 99, 108, 110, 111]. Simulated rigid-body assembly tasks are limited to peg-in-hole insertion of large pegs with round, triangular, square, and prismatic cross-sections, lap-joint mating, and one non-convex insertion . Furthermore, clearances between rigid parts are typically , substantially greater than real-world clearances. Efforts using PyBullet include [5, 57, 83, 107]. Simulated tasks are again limited to large peg-in-hole insertion and lap-joint mating, with clearances of . A handful of research efforts have used other off-the-shelf simulators [7, 32, 47, 104, 111] and custom simulators [16, 58, 87]. Only  simulated a nut-and-bolt, which was exceptionally large ( diameter) with inherently-larger clearances.
From the previous works, we conclude that there have been few successful efforts to simulate assembly tasks with realistic scales, realistic clearances (e.g., diametral clearance for a peg), and complex geometries (e.g., nuts-and-bolts, electrical connectors) within a robotics simulator. In this work, we build a module for PhysX and Isaac Gym that can successfully simulate all rigid components on NIST Task Board  with accurate models and real-world clearances.
Ii-B2 Reinforcement Learning
The studies in the previous subsection, as well as a small number of studies that trained RL for robotic assembly purely in the real world, can be categorized according to their choice of RL algorithm.
Several earlier works used model-based RL algorithms or proposed variants, which explicitly predict environment response during policy learning and/or execution. These efforts leveraged guided policy search [55, 93] and iterative LQG . Such algorithms are sample-efficient, but difficult to apply to contact-rich tasks due to highly discontinuous and nonlinear dynamics and unknown material parameters .
Most recent works have used model-free, off-policy RL algorithms or variants, which do not predict environment response, and update an action-value function independently of the current policy (e.g., using a replay buffer). These studies have applied Q-learning , deep-Q networks , deep deterministic policy gradients (DDPG) [5, 58, 56, 97], soft-actor critic , probabilistic embeddings , and hierarchical RL . These algorithms are typically chosen for sample efficiency, but are often brittle and slow/unable to converge.
Several other studies use or develop off-policy RL algorithms that leverage human demonstrations, as well as motion planners and trajectory optimizers. These efforts have used residual learning from demonstration , guided DDPG , DDPG from demonstration (DDPGfD) [56, 57, 96], and inverse RL . Notably,  used DDPGfD to achieve state-of-the-art performance in the real world for insertion tasks from NIST Task Board . They used human demonstrations, real-world training, and human on-policy corrections, achieving a success rate over trials. Of course, these methods also require demonstration collection or effective planners/optimizers, and are typically limited to the performance of these structured priors.
Finally, a handful of research efforts have successfully used on-policy algorithms or variants. These studies have applied proximal policy optimization (PPO) [29, 87, 99], trust region policy optimization , asynchronous advantage actor-critic , and additional algorithms . These algorithms are typically stable, easy-to-use, and achieve high return, but are highly sample-inefficient and require long wall-clock time.
We take inspiration from the performance of , but aim to achieve such performance in a fundamentally different way. We build a module for contact-rich simulation that can help enable roboticists to perform more complicated tasks (e.g., nut-and-bolt assembly) with tight clearances; leverage performant and stable on-policy RL algorithms with high parallelization; avoid tedious human demonstrations and corrections; and mitigate the need for time-consuming (e.g., hours of data in ), costly, and dangerous real-world RL training.
Iii Contact-Rich Simulation Methods
In this work, we first build a module for PhysX  for efficient and robust contact-rich simulation. Specifically, we uniquely combine SDF collisions , contact reduction , and a Gauss-Seidel solver , allowing us to simulate interactions of highly-detailed models substantially faster than previous efforts. We describe methods and results below.
Iii-a SDF Collisions
An SDF is a function that maps a point in Cartesian space to its Euclidean distance to a surface. The surface is implicitly defined by , and the sign indicates whether is inside or outside the surface. The gradient provides the normal at a point
on the surface. Collectively, the SDF value and gradient define a vector to push a colliding object out of the surface.
We generate the SDF for an object via sampling and compute gradients via finite-differencing. Specifically, given a triangle mesh representing the boundaries of the object, we generate an SDF for the mesh at initialization time and store it as a 3D texture, which enables GPU-accelerated texture fetching with trilinear filtering and extremely fast lookups. Since our shapes contain many small features, we typically use SDF resolutions of or greater.
To generate an initial set of contacts, we use the method of , which generates one contact per triangle-mesh face. The contact position on each face is determined by performing iterative local minimization to find the closest point on the face to the opposing shape, using projected gradient descent with adaptive stepping. As an example, detailed M4 nut-and-bolt meshes generate contacts; with the above method, these contacts can be generated in , orders of magnitude faster than typical approaches for convex or mesh collision.
Iii-B Contact Reduction and Solver Selection
To motivate our contact reduction methods, we begin with a brief discussion and first-order analysis of the tight coupling between contact generation and solver execution.
In a typical contact generation scheme, each contact only requires approximately floats (point, normal vector, and distance) and integers (rigid body indices), for a total of bytes. However, the memory required to store constraints associated with these contacts is substantially greater; in our implementation, storing a contact and its constraints requires approximately bytes. Thus, simulating an M4 nut-bolt pair with contacts requires approx. per timestep.
For a Jacobi solver with , we require substeps and iterations for stable simulation. Thus, memory bandwidth requirements for contacts are approximately per frame and per second. Using a state-of-the-art GPU, we can only simulate nut-and-bolt assemblies in parallel (Table V). Unfortunately, although a Gauss-Seidel solver converges faster (i.e., requires fewer substeps and iterations), it would be unreasonably slow for so many contacts due to its inherently serial nature. As reducing per-contact memory can be challenging, contact reduction becomes a compelling strategy for reducing memory and increasing parallelization.
If we can reduce the number of contacts to (i.e., from to ), the simulation now requires approximately per timestep. The Jacobi solver now only requires per frame and per second. We can now hypothetically simulate a maximum of nut-and-bolt assemblies in real-time. Solving contact constraints is no longer a performance bottleneck, and we can achieve a level of parallelization suitable for training on-policy RL algorithms.
In addition, given the far fewer contacts, it is now feasible to use a Gauss-Seidel solver. With , we require only substep and iterations for stable simulation. Thus, we can now comfortably simulate nut-and-bolt assemblies in real-time. We demonstrate exactly such performance later.
Iii-C Implementation of Contact Reduction
To implement contact reduction, we use the concept of contact patches, which are sets of contacts that are proximal and share a normal. We generate contact patches in phases (Algorithm 1). First, we Generate candidate contacts using SDF collisions for a batch of triangles (size ). Second, we Assign candidates to existing patches in the shape pair based on normal similarity. Third, for unassigned candidates, we FindDeepest (i.e., find the one with deepest penetration), create a new patch, BinReduce (i.e., assign remaining candidates to this patch based on normal similarity), and AddPatch to our list of patches. We repeat until no candidates remain. When performing AddPatch, we also check if a patch with a similar normal exists. We either add both patches or replace the existing patch, using a measure that maximizes patch surface area, prioritizes contacts with high penetration depth, and restricts the number of patches to (where ).
The preceding contact reduction process is performed exclusively in GPU shared memory. Notably, contact reduction does not make contact generation slower; to the contrary, it makes generation faster, as the contact generation kernel does not need to write extensive amounts of data to global memory.
Applying the above procedure to the M4 nut-and-bolt interactions, we reduce the number of contacts from to (Fig. S13), allowing us to simulate assemblies in real-time on an NVIDIA A5000 GPU. Generating and reducing contacts takes , and solving contact constraints takes an additional . A number of additional evaluations follow.
Iii-D Performance Evaluations
We test our collision detection, contact generation, contact reduction, and solution pipeline on contact-rich scenes. These scenes were designed to represent a broad range of challenging real-world scenarios, including complex geometries, robot-object interactions, tight clearances, s of interacting bodies, and multi-part mechanisms. The scenes are as follows:
1024 parallel peg-in-hole assemblies from the NIST board with ISO-standard clearances ().
parallel M16 nut-and-bolt assemblies with ISO-standard clearances (Fig 3). For ease of contact profiling, the coefficient of friction is reduced to , allowing the nuts to rotate on the bolts under gravity.
parallel VGA-style D-subminiature (D-sub) connectors from the NIST board (Fig S14). For ease of contact profiling, a clearance is introduced, allowing the plug-and-socket to mate under gravity.
parallel -stage gear assemblies from the NIST board (Fig S14). An external torque is applied to the intermediate gear to rotate the adjacent gears.
M16 nuts, falling into a pile in one environment (Fig. S15).
bowls (akin to ), falling into a pile in one environment. To enable larger timesteps while maintaining accuracy, a tiny negative clearance is added.
toruses, falling into a pile in one environment.
Table VI describes the geometric representations used in each of the scenes, including SDF resolution and number of triangles. Table I provides statistics on contacts before and after contact reduction, as well as timing for reduction and solution. Finally, Table II provides simulation performance statistics, including comparisons to real-time. We also qualitatively demonstrate an additional test scene: a Franka robot + M16 nuts + flange assembly scene (App. -C2).
|Contact Stats (Before)||Contact Handling||Contact Stats (After)||Contact Solution|
|Scene||Contacts||Per Pair (avg)||Per Pair (max)||Time||Per Pair (avg)||Patches||Time|
|Peg-in-hole||5.89e5||576||576||2 ms||46||11||1 ms|
|Nut-and-bolt||1.73e7||16930||16930||11 ms||195||53||3 ms|
|D-sub connector||1.20e7||11746||11746||12.5 ms||175||36||1.5 ms|
|Gear assembly||7.26e7||14172||31568||39 ms||83||26||3 ms|
|Nuts||1.81e6||2516||8304||10 ms||99||46||2.1 ms|
|Bowls||5.23e5||731||1160||2 ms||66||18||4.3 ms|
|Toruses||4.00e5||185||864||4.6 ms||44||20||2.8 ms|
|Franka + nut-and-bolt||4.64e5||9285||10031||1.7 ms||147||40||2.7 ms|
|Scene||Substeps||Pos Iterations||Vel Iterations||Time||Real-time|
|D-sub connector||4||4||1||14 ms||305x|
|Gear assembly||4||4||1||42 ms||102x|
|Franka + nut-and-bolt||4||16||1||4.4 ms||121x|
Although we defer to the tables for complete performance assessments, key observations include the following:
Contact reduction can reduce contact counts by over orders-of-magnitude compared to naive methods.
Contact handling time (i.e., pair finding, generation, reduction) is typically dominant compared to solution time.
Parallelization achieves a -orders-of-magnitude speed-up over real-time single-threaded computation.
Iv Robot Learning Tools
We have thus far evaluated our physics simulation module over a diverse array of contact-rich scenes. However, the module is an extension of PhysX . For convenient use in robot learning, we have integrated our module into Isaac Gym , which can use PhysX as its physics engine. To use our contact methods for arbitrary assets, the user simply has to include an tag in URDF descriptions (Listing 1).
For applications to robotic assembly, assets and scenes related to NIST Task Board may be particularly useful. Thus, we provide 1) assets from the NIST board, 2) robotic assembly scenes for RL training in Isaac Gym, and 3) classical robot controllers in Isaac Gym to accelerate learning. Here we describe our assets, environments, and controllers.
The NIST Task Board consists of unique parts. However, the CAD models publicly provided for these parts are not suitable for high-accuracy physics simulation. In particular, the models for the nuts, bolts, pegs, and gear assembly do not conform to real-world tolerances and clearances; in assembly, mating parts together with tight clearances is precisely the most significant challenge. Furthermore, the models for the electrical connectors were sourced from public repositories rather than manufacturers (e.g., the D-sub plug and socket), were geometrically incompatible (e.g., the RJ45 plug and socket, which interpenetrate), were incomplete (e.g., the USB socket, which lacks mating features), and/or were designed using hand measurements (e.g., the Waterproof plug and socket). Regardless of simulator accuracy, inaccurate geometries (esp. interpenetration) will lead to unstable or inaccurate dynamics.
high-quality, simulation-ready part models, each with an Onshape CAD model, one or more OBJ meshes, a URDF description, and estimated material properties (Table VII and Table VIII). These models include all the parts on the NIST Task Board , as well as dimensional variations. The assets for the nuts, bolts, pegs, and gearshafts conform to ISO , ISO , and ISO standards and contain loose and tight configurations that correspond to the extremes of the tolerance band. The CAD models for the electrical connectors were sourced from manufacturer-provided models. Each part of each connector contains a visual mesh, directly exported from the CAD models, and a collision mesh, carefully redesigned to simplify external geometry while faithfully preserving mating features (e.g., pins and holes).
robotic assembly scenes for Isaac Gym that can be used for developing planning and control algorithms, collecting simulated sensor data for supervised learning, and training RL agents. Each scene contains a Franka robot and disassembled assemblies from NIST Task Board 1. All scenes have been tested with up tosimultaneous environments on an NVIDIA RTX 3090 GPU. The scenes are as follows:
FrankaNutBoltEnv, which contains a Franka robot and nut-and-bolt assemblies of the user’s choice (M4, M8, M12, M16, and/or M20). The nuts and bolts can be randomized in type and location across all environments. The default goal is to pick up a nut from a work surface and tighten it to the bottom of its corresponding bolt. Our own RL training results on this environment will be discussed in detail in the next section.
FrankaInsertionEnv, which contains a Franka robot and insertion assemblies of the user’s choice (round and/or rectangular pegs-and-holes; BNC, D-sub, and/or USB plugs-and-sockets) (Fig. 6). The assets can be randomized in type and location across all environments. The default goal is to pick up a peg or plug and insert it into its corresponding hole or socket.
FrankaGearsEnv, which contains a Franka robot and a -part gear assembly (Fig. 7). The assets can be randomized in location across all environments. The default goal is to pick up each gear, insert it onto its corresponding gear shaft, and align it with any other gears.
Research efforts in reinforcement learning for robotic manipulation have traditionally used an action space consisting of low-level position, velocity, or torque commands. On the other hand, classical PD- or PID-style robot controllers have been used to solve contact-rich tasks in robotic assembly for several decades [69, 101]. In recent years, there has been substantial interest in using an RL action space consisting of targets to such controllers, with promising results in both sample efficiency and asymptotic performance [67, 78].
Akin to  in MuJoCo, we provide a series of robot controllers based on those that researchers and engineers commonly use in the real world. The actions of the controllers are executed using an explicit integrator to avoid undesired damping. The controllers are as follows:
Joint-space inverse differential kinematics (IK) motion controller, which converts task-space errors into joint-space errors and applies PD gains to generate joint torques. The IK controller can use either the geometric or analytic Jacobian  and generate torques with the Jacobian pseudoinverse, Jacobian transpose, damped least-squares (Levenberg-Marquardt), or adaptive SVD .
Joint-space inverse dynamics (ID) controller, which uses the joint-space inertia matrix and gravity compensation to generate joint torques, achieving desired spring-damper behavior in joint-space .
Task-space impedance controller, which applies PD gains to task-space errors to generate joint torques. This controller is immediately available on the real-world Franka robot via the libfranka library .
Operational-space (OSC) motion controller, which uses the task-space inertia matrix and gravity compensation to generate joint torques, achieving desired spring-damper behavior in task-space (akin to ).
Open-loop force controller, which converts a task-space force target into joint torques.
Closed-loop P force controller, which stacks an open-loop force controller with a closed-loop controller that applies P gains to task-space force errors.
Hybrid force-motion controller, which stacks a task-space impedance or OSC motion controller with an open- or closed-loop force controller. Selection matrices can specify which axes use motion and/or force control.
Mathematical formulations are provided in App. -D4.
V Reinforcement Learning
The robotics community has demonstrated that RL can effectively solve simulated or real-world assembly tasks. However, these efforts are often limited to off-policy algorithms, require extensive training time or human demonstrations/corrections, and/or only address simple tasks. With our contact simulation methods, we use on-policy RL to solve the most contact-rich task on NIST Task Board : assembling a nut onto a bolt. Like many assembly tasks, such a procedure is long-horizon and challenging to learn end-to-end. We divide the task into phases and learn an subpolicy for each:
Pick: The robot grasps the nut with a parallel-jaw gripper from a random location on a work surface.
Place: The robot transports the nut to the top of a bolt fixed to the surface.
Screw: The robot brings the nut into contact with the bolt, engages the mating threads, and tightens the nut until it contacts the base of the bolt head.
RL is neither the only means to solve the phases of this task, nor the most efficient: Pick and Place can be solved with classical grasping and motion controllers, and although challenging, Screw may be solved using a nut-driver and a compliance and/or suction mechanism . We investigate this task as a proof-of-concept that our simulation methods can enable efficient policy learning for tasks of such complexity. Moreover, it is a common experience of simulation developers that model-free RL agents reveal and exploit any inaccuracies or instabilities in the simulator to maximize their reward; we view successfully training RL agents in contact-rich tasks as important qualitative evidence of simulator robustness.
We describe each subpolicy below; detailed evaluations will focus on Screw, the most contact-rich of the phases. We then address sequential execution and examine contact forces.
V-a Shared Framework
The Pick, Place, and Screw subpolicies were all trained in Isaac Gym using our simulation methods and FrankaNutBoltEnv environment. The PPO implementation from 
was used with a shared set of hyperparameters (Table IX). Typically, a batch of policies were trained simultaneously on a single NVIDIA RTX 3090 GPU, with each policy using parallel simulation environments. Each batch required a total of hours for policy updates.
We defined our action space as targets for our implemented controllers. Unless otherwise specified, the targets for the joint-space IK controller, joint-space ID controller, task-space impedance controller, and OSC motion controller were all -DOF transformations relative to the current state, with the rotation expressed as axis-angle. The targets for the open-loop and closed-loop force controller were -dimensional force vectors. The targets for the hybrid force-motion controller were the -dimensional union of the previous action spaces.
We now discuss our randomization, observations, rewards, success criterion, and success rate for each subpolicy.
V-B Subpolicy: Pick
At the start of each Pick episode, the -DOF Franka hand pose and -DOF nut pose (constrained by the work surface) were randomized over a large spatial range (Table III).
|Hand X-axis||[-0.2, 0.2] m||Hand XY-axes||[-0.2, 0.2] m||Hand angle||[-90, 90] deg|
|Hand Y-axis||[-0.4, 0.0] m||Hand Z-axis||[0.5, 0.7] m||Fingertip X-axis||[-3, 3] mm|
|Hand Z-axis||[0.5, 0.7] m||Hand roll, pitch||[-17, 17] deg||Fingertip Y-axis||[-3, 3] mm|
|Hand roll, pitch||[-17, 17] deg||Hand yaw||[-57, 57] deg||Fingertip Z-axis||[0, 3] mm|
|Hand yaw||[-57, 57] deg||Nut-in-gripper XY-axes||[-2, 2] mm||Nut-in-gripper X-axis||[-3.5, 3.5] mm|
|Nut X-axis||[-0.1, 0.1] m||Nut-in-gripper Z-axis||[-5, 5] mm||Nut-in-gripper Z-axis||[-6.5, 1.0] mm|
|Nut Y-axis||[-0.4, 0.2] m||Nut-in-gripper yaw||[-180, 180] mm||Nut-in-gripper yaw||[-15, 15] deg|
|Bolt XY-axes||[-10, 10] mm|
The observation space for Pick was the pose (position and quaternion) of the hand and nut, as well as the linear and angular velocity of the hand. In the real world, the pose and velocity of the hand can be determined to reasonable accuracy (
) through a forward kinematic model and proprioception of joint positions and velocities, whereas the pre-grasp pose of the nut (a known model, as typical in industrial settings) can be accurately estimated through pose estimation frameworks. The action space for Pick consisted of joint-space IK controller targets with damped least-squares.
A dense reward was formulated as the distance between the fingertips and the nut. Initial experiments defined this distance as , where and are the translation and quaternion errors, and is a scalar hyperparameter. However, this approach was sensitive to . Inspired by , we reformulated the distance as , where and
are both tensors ofkeypoints distributed along the nut central axis and end-effector approach axis, respectively. Intuitively, this method computes distance on a single manifold, obviating tuning. The collinearity of each keypoint set also allows equivariance to rotation of the hand (i.e., yaw) about the nut central axis.
After executing the Pick subpolicy for a prescribed (constant) number of timesteps, a manually-specified grasp-and-lift action was executed. Policy success was defined as whether the nut remained in the grasp after lifting. If successful, a success bonus was added to the episodic return.
With the above approach, the Pick policy was able to achieve a success rate within the randomization bounds. Qualitatively, the agent learned to execute a fast straight-line path towards the nut, followed by a slow pose refinement. Due to the collinearity of the keypoints, the final pose distribution of the hand was highly multimodal in yaw.
V-C Subpolicy: Place
At the start of each Place episode, the Franka hand and nut were reset to a known stable grasp pose. The nut-in-gripper position/rotation and the bolt position were randomized. The hand-and-nut were moved to a random pose using the joint-space IK controller (Table III). Training was then initiated.
The observation space for Place was identical to that for Pick, but also included the pose (position and quaternion) of the bolt. When grasped, the nut pose may be challenging to determine in the real world; however, recent research has demonstrated that visuotactile sensing with known object models can enable high-accuracy pose estimates .
The action space was identical to that for Pick. A dense reward was again formulated as a keypoint distance, now between the bolt and nut central axes. The keypoints were defined such that, when perfectly aligned, the base of the nut was located above the top of the bolt. Success was defined as when the average keypoint distance was
With the above approach, the Place policy was able to achieve a 98.4% success rate within the randomization bounds. A common initial failure case during training was collision between the gripper and the bolt, dislodging the nut. The robot learned trajectories that remained above the top plane of the bolt, with a slow pose refinement phase when close.
Although having negligible effect on steady-state error, an effective strategy for smoothing the Place trajectory was applying an action gradient penalty at each timestep. The penalty was equal to , where is the -dimensional action vector and is a hyperparameter ().
V-D Subpolicy: Screw
At the start of each Screw episode, the Franka hand and nut were reset to a stable grasp pose, randomized relative to the top of the bolt (Table III); these stable poses were generated using the FrankaCalibrate script described in App. -D3. The nut-in-gripper position was also randomized as before.
Among the subpolicies, Screw was by far the most contact-rich, and as follows, challenging to train. The robot was required to bring the nut into contact with the bolt, engage the respective threads, generate precise torques along the arm joints to allow the high-inertia robot links to admit the rigid bolt constraint, and maintain appropriate posture of the gripper with respect to the nut during tightening. As a simplifying assumption, the joint limit of the end-effector was removed, allowing the Franka to avoid regrasping (akin to the Kinova Gen3). Nevertheless, training was replete with a diverse range of pathologies, including high-energy collision with the bolt shank, roll-pitch misalignment of the nut when first engaging the bolt threads, jamming of the nut during tightening, and precession of the gripper around the bolt during tightening, which induced slip between the gripper and nut.
To overcome the preceding issues, a systematic exploration of controllers/gains, observation/action spaces, and baseline rewards was executed. First, policies for task-space controllers were evaluated over a wide range of gains, and the controller-gain configuration with the highest success rate was chosen (Table X). Then, observation spaces were evaluated, and the space with the highest success rate was selected (Table IV). The procedure continued with action spaces (Table XI) and baseline rewards (Table XII). Success was defined as when the nut was less than thread away from the base of the bolt.
To encourage stable robot posture, a dense reward was formulated that consisted of the sum of the keypoint distance [between nut and base of bolt] and [between end-effector and nut]. To prioritize both task completion and efficient training, early termination was applied on success and failure, and a maximum of gradient updates was allowed. Future work will investigate asymptotic performance with more updates.
Notably, collisions between the complex geometries of the nut and bolt remained stable during exploration by the RL agent. However, the majority of experimental groups failed due to the pathologies described earlier. The highest performing agents consistently used an OSC motion controller with low proportional gains, an observation space consisting of pose and velocity of the gripper and nut, a -DOF action space (
-translation and yaw), and a linear baseline reward. As expected, the relatively low number of epochs biased towards lower-dimensional observations and actions.
Using the above configuration, a final Screw policy was trained over gradient updates and achieved an success rate over episodes.
|Observations||Success Rate||Env Steps to Success||Reward||Joint Torque (Nm)|
|Pose, velocity, force||0.5026||3849||-0.0784||1.2186|
|Pose, velocity, force, action||0.3307||3791||-0.0521||1.2257|
V-E Sequential Policy Execution
Although not our primary focus, a natural question arose on whether the subpolicies could be chained. Policy chaining can be challenging, as errors in each subpolicy can accumulate into poor overall performance; as a simple example, perfectly-coupled subpolicies with success rates can produce a combined policy with a success rate.
In this work, we used a simple strategy to connect the learned Pick, Place, and Screw subpolicies end-to-end. Specifically, when training a given subpolicy, the initial states were randomized to span the distribution of the final states of the preceding trained subpolicy. For example, we defined the initial states of the Screw subpolicy (Table III) to span the maximum observed error of the Place subpolicy. For a small number of subpolicies, this strategy may be effective; however, the approach does not scale to long sequences, as Policy must be trained and Sequence must be evaluated before training Policy . To facilitate smoothness, an exponential moving average was applied on Place actions.
V-F Contact Forces
Quantitatively and through numerous visual comparisons, our physics simulation module enabled accurate, efficient, and robust simulation of contact-rich interactions of assets with real-world geometries and material properties. Furthermore, our module was built on PhysX, which has been evaluated under challenging sim-to-real conditions [2, 18, 81].
Nevertheless, it is also important to consider the contact forces generated during such interactions. We executed our Screw subpolicy and recorded joint torque norms, as well as contact force norms at the gripper fingers and bolt (Fig. S17). The joint torques are well within the range of lightweight collaborative robots (e.g., UR3). Furthermore, the contact force norms at the fingertips were compared to analogous real-world forces from the Daily Interactive Manipulation dataset , in which human subjects tightened or loosened nuts with a wrench outfitted with a force-torque sensor (Fig. 9).
Although the reward functions for the RL agents never involved contact forces, the robots learned policies that generated forces in the middle of human ranges; the much higher variance of human forces was likely due to more diverse strategies adopted by humans. Combined with our visual comparisons, these results do not guarantee sim-to-real policy transfer, but demonstrate that raw quantities computed by simulation are highly comparable to the real world.
We have presented Factory, a set of physics simulation methods and robot learning tools for contact-rich interactions in robotics. We provide a physics simulation module for PhysX and Isaac Gym that enables s to s of contact-rich interactions to be simulated in real-time on a single GPU, as tested on a diverse array of scenes. As one example, nuts-and-bolts were simulated in real-time, whereas the established benchmark was a single nut-and-bolt at real-time.
We also provide carefully-designed, ISO-standard or manufacturer-based assets from the NIST Assembly Task Board 1, suitable for high-accuracy simulation; robotic assembly scenes in Isaac Gym where a robot can interact with these assets across a diverse range of assembly operations (fastener tightening, insertion, gear meshing); and classical robot controllers that can achieve pose, force, or hybrid targets. We intend for our assets, environments, and controllers to grow over time with contributions from the community.
Finally, we train proof-of-concept RL policies in Isaac Gym for the most contact-rich interaction on the board, nut-and-bolt assembly. We show that we can achieve stable simulator behavior, efficient training ( hours to simultaneously train policies on GPU), high success rates, and realistic forces/torques. Although Factory was developed with robotic assembly as a motivating application, there are no limitations on using our methods for entirely different tasks within robotics, such as grasping of complex non-convex shapes in home environments, locomotion on uneven outdoor terrain, and non-prehensile manipulation of aggregates of objects.
We plan to address several limitations of this work. Within simulation, we plan to make improvements to our SDF collision scheme: 1) the ability to robustly handle collisions of thin-shell meshes (e.g., thin-walled bottles and boxes), 2) improved handling of low-tessellation meshes, as currently, contact is generated per-triangle, allowing penetration on large flat surfaces, 3) using sparse SDF representations to reduce the SDF memory footprint (Table VI). Furthermore, we are adding support for FEM-based simulation of stiff deformable features, such as the flexible tab on an RJ45 connector.
Within our assets, environments, and controllers, we plan to add assets for additional industrial and home subassemblies (e.g., USB-C, power plugs, key-in-lock), scenes for additional assembly tasks (e.g., chain-and-sprocket assembly), and controllers found in industrial settings (e.g., admittance). Within policy training, we plan to extend our policy for FrankaNutBoltEnv to learn regrasp behavior. In addition, we aim to develop a unified proof-of-concept policy for all insertion tasks within FrankaInsertionEnv, as well as a policy for gear meshing within FrankaGearsEnv, further evaluating training efficiency and simulator robustness. However, we encourage the broader RL community to test and develop state-of-the-art RL algorithms around these complex tasks.
Vii-a Future Work
Upon making the aforementioned improvements, our future work will focus primarily on sim-to-real transfer. As described earlier, there has been compelling evidence sim-to-real is possible for industrial insertion tasks; we aim to demonstrate this for more complex bolt-tightening and gear-meshing tasks, as well as full assembly operations in both industrial and home settings. For perception, we may train image-based policies using real-time ray-tracing and/or post-simulation path-tracing [8, 74] combined with domain randomization , However, we find distillation approaches to be particularly compelling. Specifically, we can
Train RL policies with the actor accessing images, but the critic accessing privileged information .
Adding noise on both low-dimensional and high-dimensional observations may be valuable. Furthermore, given that camera observations will be occluded during contact, we anticipate that integrating tactile sensing into our real-world system will be exceptionally critical for object-gripper pose estimation and slip detection.
We aim for Factory to establish the state-of-the-art in contact-rich simulation, as well as serve as an existence proof that highly efficient simulation and learning of contact-rich interactions is possible in robotics. Our experience has shown that high-quality assets and accurate, efficient simulation methods drastically reduce the inductive bias and algorithmic burden required to solve contact-rich tasks. We invite the community to establish benchmarks for solving the provided scenes, as well as extend and use Factory for their own contact-rich applications both within and outside of RL. We also hope that this work inspires researchers to execute contact-rich simulations of tasks beyond what we show in this paper to enable further solutions to complex problems.
YN led the research project.
MM initially developed SDF collisions for FleX.
MM and YN conducted proof-of-concept demonstrations of SDF collisions for robotic assembly in FleX.
KS, ML, PR, and AM developed SDF collisions and contact reduction for PhysX.
PR, KS, AM, and YN developed evaluation and demonstration scenes for PhysX.
LW, YG, and GS integrated the PhysX developments into Isaac Gym.
YN, IA, and PR developed the assets, environments, and controllers for Isaac Gym.
YN, IA, and AH developed the RL policies within Isaac Gym.
DF, AH, MM, ML, GS, and AM advised the project.
YN, MM, AH, and IA wrote the paper.
We thank Joe Falco for assistance with the original NIST assets, John Ratcliff for generating the convex decomposition for the bolt (Fig. S11), Karl Van Wyk and Lucas Manuelli for helpful discussions on controllers, Viktor Makoviychuk for assistance with the rl-games  library, and the RSS reviewers for their insightful comments and questions.
-  (2020) A study on the challenges of using robotics simulators for testing. arXiv:2004.07368 [cs]. Cited by: §I.
-  (2021) Transferring dexterous manipulation from GPU simulation to a remote real-world TriFinger. arXiv:2108.09779 [cs]. Cited by: §I, §V-B, §V-F.
-  (2021) Contact and friction simulation for computer graphics. In ACM SIGGRAPH Courses, Cited by: §II-A3.
-  (2020) Learning dexterous in-hand manipulation. Int. J. Rob. Res.. Cited by: §I, item 2.
-  (2021) Robotic assembly of timber joints using reinforcement learning. Autom. Constr.. Cited by: §II-B1, §II-B2.
-  (2020) Tactile object pose estimation from the first touch with geometric contact rendering. In Conference on Robot Learning (CoRL), Cited by: §V-C.
-  (2020) Variable compliance control for robotic peg-in-hole assembly: A deep-reinforcement-learning approach. Appl. Sci.. Cited by: §II-B1, §II-B2.
-  (2018) Blender: a 3d modelling and rendering package. Note: http://www.blender.org Cited by: §VII-A.
-  (2022) Assembly Magazine. Note: https://www.assemblymag.com/[Online; accessed 1-January-2022] Cited by: §I, §I.
-  (2021) Note: https://twitter.com/kennethbodin/status/1252973291217850368[Online; accessed 1-January-2022] Cited by: §II-A4.
-  (2009) Introduction to inverse kinematics with Jacobian transpose, pseudoinverse and damped least squares methods. Cited by: 1st item, §-D4, 1st item.
-  (2019) Closing the sim-to-real loop: Adapting simulation randomization with real world experience. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §I.
-  (2021) A system for general in-hand object re-orientation. In Conference on Robot Learning (CoRL), Cited by: §I, item 1.
-  (2021) On the use of simulation in robotics: Opportunities, challenges, and suggestions for moving forward. Proc. Natl. Acad. Sci. USA. Cited by: §I.
-  (2018) Learning to dress: Synthesizing human dressing motion via deep reinforcement learning. ACM Transactions on Graphics (TOG). Cited by: §V-E.
-  (2021) Assistive Tele-op: Leveraging transformers to collect robotic task demonstrations. arXiv:2112.05129 [cs]. Cited by: §II-B1, §VII-A.
-  (2022) Bullet real-time physics simulation. Note: https://pybullet.org/wordpress/[Online; accessed 1-January-2022] Cited by: §II-A4.
-  (2020) Learning a contact-adaptive controller for robust, efficient legged locomotion. In Conference on Robot Learning (CoRL), Cited by: §V-F.
-  (2021) Residual learning from demonstration: Adapting DMPs for contact-rich manipulation. arXiv:2008.07682 [cs]. Cited by: §II-B1, §II-B2.
-  (2021) ManipulaTHOR: A framework for visual object manipulation. In , Cited by: §I.
-  (2021) ACRONYM: A large-scale grasp dataset based on simulation. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §I.
-  (2019) A learning framework for high precision industrial assembly. In International Conference on Robotics and Automation (ICRA), Cited by: §II-B1, §II-B2.
-  (2021) Intersection-free rigid body dynamics. ACM Trans. Graph.. Cited by: §II-A4, §II-B1.
-  (2017) libfranka examples. Cited by: 3rd item.
-  (2019) Interlinked SPH pressure solvers for strong fluid-rigid coupling. ACM Trans. Graph.. Cited by: §II-A4, §II-B1.
-  (2012) Efficient collision detection for brittle fracture. In Proc. of the ACM SIGGRAPH / Eurographics Symposium on Computer Animation, Cited by: §II-A2.
-  (2021) FlingBot: The unreasonable effectiveness of dynamic manipulation for cloth unfolding. In Conference on Robot Learning (CoRL), Cited by: §I.
-  (2016) Robust contact generation for robot simulation with unstructured meshes. In Robotics Research, Cited by: §-C1, §II-A1.
-  (2021) Towards real-world force-sensitive robotic assembly through deep reinforcement learning in simulations. In IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM), Cited by: §II-B1, §II-B2.
-  (2004) Impedance and Interaction Control. In Robotics and Automation Handbook, Cited by: §-D4.
-  (2019) On the similarities and differences among contact models in robot simulation. IEEE Robot. Autom. Lett.. Cited by: §I.
-  (2021) Data-efficient hierarchical reinforcement learning for robotic assembly control applications. IEEE Trans. Ind. Electron.. Cited by: §II-B1, §II-B2.
-  (2022) DefGraspSim: Physics-based simulation of grasp outcomes for 3D deformable objects. IEEE Robotics and Automation Letters. Cited by: §I.
-  (2019) A dataset of daily interactive manipulation. The International Journal of Robotics Research. Cited by: Fig. 9, §V-F.
-  (2017) Deep reinforcement learning for high precision assembly tasks. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Cited by: §I, §II-B2.
-  (2019) A survey of automated threaded fastening. IEEE Trans. Automat. Sci. Eng.. Cited by: §I, Fig. 2.
-  (2021) Trajectory optimization for manipulation of deformable objects: Assembly of belt drive units. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §II-B1.
-  (2021) Stability-guaranteed reinforcement learning for contact-rich manipulation. IEEE Robot. Autom. Lett.. Cited by: §II-B1, §II-B2.
-  (1987) A unified approach for motion and force control of robot manipulators: The operational space formulation. IEEE J. Robot. Automat.. Cited by: 4th item.
-  (2007) Finite element analysis and modeling of structure with bolted joints. Applied Mathematical Modelling. Cited by: §II-A4.
-  (2019) Data-driven contact clustering for robot simulation. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §II-A2.
-  (2020) Benchmarking protocols for evaluating small parts robotic assembly systems. IEEE Robot. Autom. Lett.. Cited by: 2nd item, §I, §I, §II-B1.
-  (2016) Fast & reliable micro screw fastening. Note: https://www.youtube.com/watch?v=BOY3oZ8SkZU[Online; accessed 1-January-2022] Cited by: §V.
-  (2020) CosyPose: Consistent multi-view multi-object 6D pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV), Cited by: §V-B.
-  (2007) Ghosts and machines: regularized variational methods for interactive simulations of multibodies with dry frictional contacts. Ph.D. Thesis, Umeå University. Cited by: §II-A4.
-  (2020) Learning quadrupedal locomotion over challenging terrain. Science Robotics. Cited by: item 1.
-  (2020) Making sense of vision and touch: Learning multimodal representations for contact-rich tasks. IEEE Trans. Robot.. Cited by: §II-B1, §II-B2.
-  (2021) IKEA furniture assembly environment for long-horizon complex manipulation tasks. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §II-B1.
-  (2021) Adversarial skill chaining for long-horizon robot manipulation via terminal state regularization. In Conference on Robot Learning (CoRL), Cited by: §V-E.
-  (2021) iGibson 2.0: Object-centric simulation for robot learning of everyday household tasks. arXiv:2108.03272 [cs]. Cited by: §I.
-  (2020) Incremental potential contact: intersection-and inversion-free, large-deformation dynamics. ACM Trans. Graph.. Cited by: §II-A4.
-  (2021) Benchmarking off-the-shelf solutions to robotic assembly tasks. arXiv:2103.05140 [cs]. Cited by: §I.
-  (2021) The role of physics-based simulators in robotics. Annu. Rev. Control Robot. Auton. Syst.. Cited by: §I.
-  (2019) Reinforcement learning on variable impedance controller for high-precision robotic assembly. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §II-B2.
-  (2018) Deep reinforcement learning for robotic assembly of mixed deformable and rigid objects. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Cited by: §II-B2.
-  (2021) Robust multi-modal policies for industrial assembly via reinforcement learning and demonstrations: A large-scale study. arXiv:2103.11512 [cs]. Cited by: §I, §II-B2, §II-B2, §II-B2.
-  (2019) Dynamic experience replay. In Conference on Robot Learning (CoRL), Cited by: §II-B1, §II-B2.
-  (2021) A learning approach to robot-agnostic force-guided high precision assembly. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Cited by: §II-B1, §II-B2.
-  (2017) Modern Robotics: Mechanics, Planning, and Control. Cambridge University Press. Cited by: §-D4.
-  (2020) Local optimization for robust signed distance field collision. Proc. ACM Comput. Graph. Interact. Tech.. Cited by: §II-A1, §III-A, §III.
-  (2019) Small steps in physics simulation. ACM SIGGRAPH/Eurographics Symposium on Computer Animation. Cited by: §III.
-  (2019) Learning ambidextrous robot grasping policies. Sci. Robot.. Cited by: §I.
-  (2021) Rl-games. External Links: Cited by: §V-A, Acknowledgments.
-  (2021) Isaac Gym: High performance GPU-based physics simulation for robot learning. In Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks, Cited by: 1st item, §IV.
-  (2016) Volumetric hierarchical approximate convex decomposition. In Game Engine Gems 3, Cited by: Fig. S10, §II-A1.
-  (2018) RoboTurk: A crowdsourcing platform for robotic skill learning through imitation. In Conference on Robot Learning (CoRL), Cited by: §VII-A.
-  (2019) Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Cited by: §IV-C.
-  (1995) Industrial perspective on research needs and opportunities in manufacturing assembly. J. Manuf. Syst.. Cited by: §II-A4.
-  (2001) Mechanics of Robotic Manipulation. MIT Press. Cited by: §IV-C.
-  (2020) Inferring the material properties of granular media for robotic tasks. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §I.
-  (2020) STReSSD: Sim-to-real from sound for stochastic dynamics. In Conference on Robot Learning (CoRL), Cited by: §I.
-  (2004) Fast contact reduction for dynamics simulation. In Game Programming Gems 4, A. Kirmse (Ed.), pp. 253–263. Cited by: §II-A2, §III.
-  (2021) Sim-to-real for robotic tactile sensing via physics-based simulation and learned latent projections. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §I.
-  (2022) NVIDIA Omniverse Platform. Note: https://developer.nvidia.com/nvidia-omniverse-platform[Online; accessed 1-January-2022] Cited by: §-C1, §VII-A.
-  (2022) NVIDIA PhysX SDK. Note: https://github.com/NVIDIAGameWorks/PhysX[Online; accessed 1-January-2022] Cited by: 1st item, §III, §IV.
-  (2006) A modular haptic rendering algorithm for stable and transparent 6-DOF manipulation. IEEE Trans. Robot.. Cited by: §II-A2.
-  (2017) DeepLoco: Dynamic locomotion skills using hierarchical deep reinforcement learning. ACM Trans. Graph.. Cited by: §I.
-  (2017) Learning locomotion skills using DeepRL: does the choice of action space matter?. In Proceedings of the ACM SIGGRAPH / Eurographics Symposium on Computer Animation, Cited by: §IV-C.
-  (2018) Robot dynamics lecture notes. Cited by: 3rd item, §-D4, 2nd item.
-  (2016) Stable simulation of underactuated compliant hands. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §II-A2.
-  (2021) Learning to walk in minutes using massively parallel deep reinforcement learning. In Conference on Robot Learning (CoRL), Cited by: §I, §V-F.
-  (2020) Meta-reinforcement learning for robotic industrial insertion tasks. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Cited by: §II-B1, §II-B2.
-  (2020) Learning to scaffold the development of robotic manipulation skills. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §II-B1, §II-B2.
-  (2021) Taxim: An example-based simulation model for GelSight tactile sensors. arXiv:2109.04027 [cs]. Cited by: §I.
-  (2009) Robotics. Springer. Cited by: §-D4, 1st item.
-  (2022) Houdini. Note: https://www.sidefx.com/products/houdini/[Online; accessed 1-January-2022] Cited by: §II-A4.
-  (2020) Sim-to-real transfer of bolting tasks with tight tolerance. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Cited by: §II-A4, §II-B1, §II-B2.
-  (2020) Deep reinforcement learning for contact-rich skills using compliant movement primitives. arXiv:2008.13223 [cs]. Cited by: §II-B1, §II-B2.
-  (2020) Reinforcement learning for assembly robots: A review. Proc. Manuf. Syst.. Cited by: §II-B.
-  (2018) Can robots assemble an IKEA chair?. Sci. Robot.. Cited by: §I.
-  (2021) Habitat 2.0: Training home assistants to rearrange their habitat. In Conference on Neural Information Processing Systems (NeurIPS), Cited by: §I.
-  (2016) Chrono: An open source multi-physics dynamics engine. In High Performance Computing in Science and Engineering, Cited by: §II-A4.
-  (2018) Learning robotic assembly from CAD. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §II-B2.
-  (2017) Domain randomization for transferring deep neural networks from simulation to the real world. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Cited by: §VII-A.
-  (2012) MuJoCo: A physics engine for model-based control. In IEEE/RSJ International Conference on Intelligent Robots and Systems, Cited by: §II-A4.
-  (2018) Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv:1707.08817 [cs]. Cited by: §II-B1, §II-B2.
-  (2019) A practical approach to insertion with variable socket position using deep reinforcement learning. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §II-B1, §II-B2.
-  (2019) Robots assembling machines: learning from the World Robot Summit 2018 Assembly Challenge. Adv. Robot.. Cited by: 2nd item, §I.
-  (2021) Learning sequences of manipulation primitives for robotic assembly. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §II-B1, §II-B2.
-  (2020) TACTO: A fast, flexible and open-source simulator for high-resolution vision-based tactile sensors. arXiv:2012.08456 [cs, stat]. Cited by: §I.
-  (2004) Mechanical Assemblies: Their Design, Manufacture, and Role in Product Development. Oxford University Press. Cited by: §IV-C.
-  (2021) OSCAR: Data-driven operational space control for adaptive and robust robot manipulation. arXiv:2110.00704 [cs]. Cited by: 4th item.
-  (2019) MAT: Multi-fingered adaptive tactile grasping via deep reinforcement learning. In Conference on Robot Learning (CoRL), Cited by: §I.
-  (2021) Learning dense rewards for contact-rich manipulation tasks. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §II-B1, §II-B2.
-  (2020) SAPIEN: A simulated part-based interactive environment. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §I.
-  (2014) Implicit multibody penalty-based distributed contact. IEEE Trans. Visual. Comput. Graphics. Cited by: §II-A4, §II-B1, 6th item.
-  (2021) RoboAssembly: Learning generalizable furniture assembly policy in a novel multi-robot contact-rich simulation environment. arXiv:2112.10143 [cs]. Cited by: §II-B1.
-  (2021) Interpreting contact interactions to overcome failure in robot assembly tasks. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §II-B1.
-  (2020) Necessity for more realistic contact simulation. In Robotics: Science and Systems (RSS) Workshop on Visuotactile Sensors for Robust Manipulation, Cited by: §I.
-  (2021) Learning insertion primitives with discrete-continuous hybrid action space for robotic assembly tasks. arXiv:2110.12618 [cs]. Cited by: §II-B1, §II-B2.
-  (2020) Towards robotic assembly by predicting robust, precise and task-oriented grasps. arXiv:2011.02462 [cs]. Cited by: §II-B1.
-  (2020) robosuite: A modular simulation framework and benchmark for robot learning. arXiv:2009.12293 [cs]. Cited by: §II-B1, §IV-C.