I Introduction
Many tasks in robot manipulation require handling of general tools in the wild; in the future, we believe that robots will be able to grab any tool and do meaningful control in order to accomplish various tasks and exchange forces with the environment. To manipulate tools skillfully and robustly, we will need end effectors that allow controllable handtool interaction in hardware, while having sensing capabilities on this interaction to enable closedloop feedback.
Paralleljaw grippers are sufficient for grasping [antipodal], but quickly meet limitations when it comes to forceful tool use. Even when sensing is given via finger attachments [gelslim], the hardware often relies on friction to handle the forces that arise in handtool interaction, and may lack the ability to resist spatial forces in some of the axes. For example, torques applied perpendicular to the finger surface are hard to resist, but arise in many tooluse scenarios [holladay]. Multifingered hands are much more versatile, but existing solutions (e.g. imposing a fullrank grasp matrix on the tool [liandsastry, cutkosky]) also rely on frictional contacts, which can limit the amount of force they can exert.
More fundamentally, the rigidity of most of our hardware requires nonsmooth contact forces to be used to resist external forces in tooluse. Such forces can be notoriously hard to control robustly as their behavior changes instantaneously [suh2021bundled]. In the absence of the challenges brought by rigid contacts, custom tool changers that are rigidly attached to the robot have demonstrated impressive capability to achieve finely controlled force interaction with the environment [grinder]. However, such solutions require modifying tools with specialized handles compatible with the tool changer, which limits the robot’s ability to use unmodified tools.
To alleviate the difficulties coming from rigidity and the nonsmooth behavior that it brings, we ask the following question in this work: can we consider visuotactile hardware as not only a mechanism for sensing, but also as an opportunity to provide compliance for control? Indeed, similar ideas have been proposed in Series Elastic Actuators (SEA)s [sea]; by attaching a soft spring in front of the gearbox whose deformation can be measured, SEAs have been successful in achieving smooth and stable force control by turning the problem into that of position control [sea, seathesis, hoganimpedancecontrol].
How can we generalize the benefits of SEAs to the setting of grasping and using arbitrary tools? We propose an answer that attaches soft, spatially compliant elements at the end effector right where interaction with the tool occurs. Mechanically, such a solution can be attached to a lowcost, positioncontrolled robot, while still achieving the benefits of SEAs in the interaction of the end effector and the tool. Through our solution, we aim to achieve a 6D generalization of SEAs that can be useful for spatial tool use.
Our characterization of spatial serieselastic actuators would not be complete unless we can measure the deformation of the spatial compliance in real time, and use the feedback for force control. In order to achieve this, we leverage recent advances in visuotactile sensing that measure 6D deformation using vision [gelsight, gelslim, bubblegrippers, bigbubble]
. In contrast to many works that utilize deep learning to directly process data from visuotactile sensors in an endtoend manner, we propose to measure the pose of the grasped tool relative to the end effector, abstracting visuotactile sensing as a relative pose estimator.
Our proposed framework of Series Elastic End Effectors in 6D (SEED) consists of three elements: a manipulator capable of accurate position control, a 6D spatially compliant stiffness element, and visuotactile sensing that measures the deformation of the spatial compliance. With these three elements, we show that we can achieve spatial force control of tools with closedloop feedback from visuotactile sensing.
Ii Literature Review
Iia ToolUse in Manipulation
Tooluse has long been one of the hallmarks of intelligence [animaltooluse], as well as a practical problem to solve for robotic applications. As such, many existing works [toussainttooluse, holladay] center around how to give robots the ability to use tools. However, only a few works attempt to perform explicit force control with a tool that has not been rigidly attached to the robot, but rather, must be grabbed before it can be used.
Most existing works in this setting focus on planning, where the grabbed tool must be used to manipulate the pose of another object [toussainttooluse, holladay, pushandpull]. Such plans can be very useful in reaching confined spaces [pushandpull] or beyond the workspace of the manipulator [toussainttooluse]. However, as the focus of these works lie more in planning, tasks that require force exchange among static objects, such as using a squeegee, surface grinding, wiping a table, or using torque drivers, are often not considered.
On the other extreme, classical works in robot force control excel in forceful manipulation with rigidly attached tools. Strategies such as impedance control [hoganimpedancecontrol] and hybrid forcevelocity control [hybridpositionforce] have been extensively tested and applied on problems that require force exchange between the robot and the environment [grinder, albuschaffer2, forcecontrol]. However, customized tool changers are quite limited in terms of versatility in the wild.
Finally, works that attempt to explicitly apply forces with the grasped tool [toussaintforce] often run into hardware limitations, as typical parallel jaw grippers with rigid, flat fingers are unable to resist forces and provide compliance in certain directions due to their relatively small contact patch.
IiB Manipulation using Visuotactile Sensing
Visuotactile sensors [gelsight, gelslim, bubblegrippers] consist of a deformable membrane which interacts with objects, and a camera (color, depth or both) under the membrane to measure its deformation during interactions. As the measurements from visuotactile sensors are images, some works have leveraged deep learning approaches to learn the dynamics [swingbot], or directly learn a map from the input image or optical flow to the policy [visionandtouch, tactilerl]. While such approaches can be effective, we first focus on interpretable abstractions in this work that are more conducive for understanding, and may provide more inductive bias [inductivebias] for designing deep models in the future.
Other works have taken a more modelbased route. In [tactiledexterity], visuotactile sensors are used to track geometric features of the objects such as lines and points. These features are utilized to track the pose of the object and the contact state, which is then used for feedback control. Similarly, [cablemanipulation] tracks the state of a deformable cable by estimating the contact patch ellipse, and fits a linear dynamics model which is stabilized by LQR. Although we use a similar modelbased approach, our work is unique in that we generalize the estimator spatially, then explicitly do force control.
IiC Tactile Force and Pose Estimation
Many of the existing works in tactile pose/force estimation attempt to deal with dense measurements. In [contactpatchposeestimation], ICP is used from dense depth information in order to estimate the poses of the object. Similarly, [tactileposeestimation] uses geometric contact rendering which is then compared with the dense tactile image. While such dense information is useful for classification [gelsight], it is unclear if such dense information is necessary for control.
On the other hand, [tactiledexterity] estimates simpler features such as points and edges, and [cablemanipulation] estimates ellipsoidal contact patches that are sufficient for achieving the task. We use similar representation to [cablemanipulation], but estimate the patch in 3D instead of localizing on the plane. While such approaches are efficient to implement and is more relevant to the task, we note that they lack geometric generalizability compared to dense information.
Iii Preliminary: 1D Series Elastic Actuator
In this section, we briefly review the concept of 1D SEAs, their proposed benefits, and the corresponding control strategies. Although the section will entirely be a review of previous work on SEAs, the ideas presented here will have direct correspondences with our generalization.
Iiia 1D Series Elasticity
Closedloop force control often requires a motor, a gearbox, and a force sensor in series. Typically, a relatively stiff sensor based on strain gauges is used. However, this forcefeedback setting can result in instability due to high contact stiffness [sea], as well as noncollocation of sensors and actuators [hoganimpedancecontrol, flexiblevehicles, macromicromanipulator]. This prevents the use of highgains that are necessary to overcome undesired effects of the gearbox.
SEAs, initially proposed in [sea], can be understood as a special case where the sensor stiffness is very low. Under this setting, forcefeedback enjoys better stability properties at the expense of controller bandwidth, as the spring acts like a mechanical lowpass filter [stableactuator, hoganimpedancecontrol]. For many household tooluse tasks such as wiping with a squeegee, the loss of control bandwidth does not pose a big problem, as such tasks are usually quasistatic. Thus, one may use highgain position control to overcome unwanted effects of the gearbox, while still maintaining stability of the system and achieving greater force accuracy [sea].
IiiB Force Control of Series Elastic Actuators
We present a simple version of force control with series elastic actuators. In force control, the user supplies a desired force , which can be turned into desired relative position using the sensor stiffness . Then, a highgain position controller can be used to achieve this relative position. The detailed procedure is described in Algorithm 1
. In practice, frequencydomain analysis can be done to carefully choose gains that stabilize the closedloop system
[seathesis].IiiC MultiDOF SEAs for ToolUse
How can we utilize the benefits of SEAs to multiple degreesoffreedom? One straightforward answer might be to connect SEAs serially at the jointlevel
[albuschaffer2]. However, achieving accurate endeffector position and force tracking using jointlevel SEAs requires fast and accurate jointlevel torque sensing, which is not available on many positioncontrolled robots. Instead, we offer an alternative generalization of SEAs that concentrate the 6D elasticity into the end effector, while allowing the robot to remain stiff. Our generalization involves the following three components:
A 6D deformable element capable of being stiff in multiple directions simultaneously.

A mechanism to sense the spatial deformation of the above element.

A manipulator capable of controlling spatial pose of the deformable element.
Iv SEED: Series Elastic end effector in 6D
In this section, we present Series Elastic End Effectors in 6D (SEED), a spatial generalization of 1D SEAs that satisfies the three requirements in Sec.IIIC by using a soft deformable membrane, visuotactile sensing to sense the spatial deformation of the membrane, and a positioncontrolled manipulator to control the pose of the membrane base.
Iva Defining 6D Series Elasticity
One of the challenges of generalizing the 1D SEA using a spatially compliant element comes from defining an appropriate notion of spatial stiffness [cartesianmatrix], especially for large rotations (rotations up to 30 degrees are common in our experiments). Rotational stiffness has been traditionally defined on the rollpitchyaw and axisangle parameterization of rotations [spatialimpedanceaxisangle, natale], which can be made to work for large rotations.
In this work we have chosen the bushing model, which was initially proposed as a coordinatefree parameterization of a bushing element in Drake [drake]. The bushing model also works for large rotations, and can be interpreted more intuitively due to its correspondence to a springloaded gimbal (Fig. 1). Based on the bushing model, we will develop a generalized stiffness map that relates the relative pose between two frames to a spatial force.
IvB Frame Definition
We give the definition of the frames here in order to better ground our notion of 6D series elasticity to the setting of a soft and tactile hand grabbing a tool. At the moment of grasp between the soft hand and the tool, two frames are initialized:
, which is rigidly attached to the gripper at a predefined nominal location (e.g. the center of the gripper), and , which is rigidly attached to the tool and initialized to be coincident with (i.e. identity relative transform).IvC The Generalized Stiffness Map
Given the definition of these frames, our goal is to characterize the relation between the relative pose of with respect to (denoted as
[T]XC∈SE(3)) and the spatial force (written in frame ) applied on , which we denote as . We abstractly denote this as a generalized stiffness map such that Eq.1 holds:(1) 
We expect to be a generalized notion of stiffness with smoothness and monotonicity properties under the following assumption of noslip.
Assumption 1.
No slip occurs between the contact patch of the gripper and the tool, such that smoothly maps relative transform to spatial force.
We now concretely describe the bushing model . We denote as the rollpitchyaw parametrization (which lives on a gimbal) of , and to be the position component of . Similarly, the spatial force is divided into torques and forces . Then, the bushing model gives spatial force for a pose using the following relation:
(2) 
where is the gimbal stiffness matrix, and is the standard translational stiffness matrix. is the coordinate transformation matrix necessary to convert gimbal torques to spatial torques, and is given by
(3) 
The necessity of the matrix becomes apparent by visualizing as torques that are being exerted on each axis of the gimbal, while is defined spatially in . We obtain the matrix by equating the power on a spatial representation to power on the gimbal representation , and using the standard conversion between angular velocities and gimbal rates. Throughout our work, we make the following assumption on the structure of and .
Assumption 2.
The gimbal stiffness matrix and the translational stiffness matrix are positive definite diagonal matrices.
Under Assumption 2, we present the following theorem which gives a more rigorous notion of smoothness mentioned in Assumption 1.
Theorem 1.
The bushing model stiffness map is a diffeomorphism under Assumption 2 everywhere for .
Proof.
Since there is no coupling between the orientation and translational maps, it suffices to separately show that each are diffeomorphisms. The translational map given by is trivially a diffeomorphism under Assumption 2. We use the Inverse Function Theorem to prove the inverse differentiability of the orientation map. The determinant of the Jacobian for the orientation map is given by
(4) 
where are the diagonal elements of , which is well defined everywhere in . We complete the proof by noting that the orientation map is bijective, and show it by providing a welldefined nonlinear inverse :
(5) 
Note that the equation is written in semiimplicit form to save space: one can easily make it explicit by substituting values starting from the bottom row. ∎
Theorem 1 tells us that our model of follows desirable properties that can smoothly map back and forth between relative pose deformation and spatial force, which we can effectively use in order to do force and impedance control in a manner akin to SEAs. We also note that in hardware, we expect to be confined to at most before Assumption 1 is broken and slip occurs.
V Force control with SEED
Now we present our main algorithms for doing control with SEED, which follows the general philosophy of controllers using SEAs: a force control problem is turned to a position control problem [sea]. Thus, we assume access to a manipulator that can achieve reliable position commands with high gains and rates (akin to how SEAs can use high gains to overcome gearbox effects quickly and achieve accurate positions), which describes most position controlled manipulators with high mechanical repeatability.
Va Problem Setup  Feedback and Action
To setup the control problem, we note that the position controlled manipulator can command endeffector pose with high rates using direct inverse kinematics or integration of differential inverse kinematics. Our feedback signal will come from the estimation of relative pose , which is measured by the visuotactile method given in Sec.VII. Then, the goal is to find a policy that achieves some desired specification of the user.
Throughout the section, we will assume we have some estimate of parameters and that define the generalized stiffness map, and denote the estimated map as .
VB 6D Force Control
In force control, the user specifies some desired spatial force , described in the world frame. SEED achieves this specified spatial force by converting it to some desired relative transform with the estimated generalized stiffness map . Then, a position command is sent to the manipulator to achieve this relative pose. We describe the detailed process in Algorithm 2.
The expression for the orientation part of has been given in Eq.5, while inverting the position simply requires . We note that, like most force control strategies, the controller will not be wellbehaved if there is no contact with an external environment. In particular, while positiononly force control can move until contact and maintain some desired force, the orientation torque controller must keep rotating until contact, which likely runs into workspace limitations of the manipulator quickly and makes the controller impractical to use in freespace.
VC 6D Hybrid Force/Pose Control
In many tasks involving tools, the goal is to simultaneously control force and torque in certain directions, while controlling position and orientation in other directions. We naturally extend spatial force control with SEED to this setting by defining a partial inverse of the impedance map that attempts to construct the spatial deformation from a subset of specified forces.
VC1 Hybrid Force/Position Control
Given a taskrelevant decomposition matrix which selects a subspace for desired position and desired force , we can compute the position that achieves the specified positions and forces:
(6) 
where is the matrix that represents the orthogonal complement of .
VC2 Hybrid Torque/Orientation Control
Unlike force / position control, coordinate transform in rotational space does not happen in a linear manner. Thus, defining hybrid torque/orientation control for an arbitrary taskrelevant coordinate representation is significantly more difficult. To deal with this problem, we make the following assumption:
Assumption 3.
The decomposition of specified orientation and torques happen in the frame of .
Such an assumption is not too restrictive under a large class of tools, as most tools require decomposition of torques and angles in a manner consistent with its natural taskrelevant coordinate frame. Under such assumption, we can define partial maps from a subset of desired torques to the full orientation as follows:

2 torques, 1 angle: The following angles achieve the given two desired torques and one desired angle , given the stiffness map :
(7) 
1 torque, 2 angles: The following angles achieves the given torque and the two desired angles , given the stiffness map :
(8)
After recovering the full pose from a subset of desired forces and torques, we use position control to command this pose, as done in Alg. 2.
Vi System Identification
In order to apply our framework, we need to do identification on the parameters of the stiffness map , which is consisted of stiffness parameters. The stiffness parameters can be identified by measuring the static sensitivity of wrench with respect to pose. We achieve this by having a dexterous manipulator grab a 6axis force/torque sensor and perturbing the pose to observe responses in wrench.
In addition, to see if squeezing or pressurizing the gripper changes the stiffness parameters of the hand, we use the pressure sensor on board the softbubble hand [bubblegrippers] to characterize how the gripper distance affects the pressure, and in turn, how the pressure affects the identified stiffness values. The results of our experiments are presented in Fig.2.
Along with the quantitative values of stiffness, we summarize our findings from the identification process:

The dependence on internal pressure of the hand with respect to the gripper distance is linear.

For
direction torque and all the forces, higher pressure nearlinearly corresponds to higher stiffness. The identified stiffness values also have low standard deviation.

For direction torque, the measurement is relatively unreliable and the identified stiffness values are subject to large standard deviations. In addition, higher pressure does not seem to lead to higher stiffness values along these directions.
The results of system identification, combined with the monotinicity of the stiffness map, leads to a very natural interpretation: if stiffer behavior is desired while controlling the tool, the hardware gives us the means to control the stiffness by grabbing the tool more firmly or by more pressurization.
Vii Tactile Relative Pose Estimation
In principle, our framework of control can work well with any tactile end effector that is compliant enough, and an estimation algorithm that produces a wellbehaved estimate of the relative pose . In our work, we show an example of such a relative pose estimator by utilizing the PicoFlexx IRDepth camera mounted within the bubble grippers [bubblegrippers].
Viia Contact Patch Estimation
We estimate the position of the contact patch using a simple background subtraction algorithm. Denote as depth image at time . Then, we simply compare to the initial depth image , taken when the bubble is not in contact. After performing a thresholding operation to obtain the difference, we perform a morphological transformation using an elliptical kernel to obtain a binary mask . Finally, we use the calibration matrix to transform the masked depth image into a set of points , where denotes elementwise multiplication, denotes the left camera frame, and denotes the left contact patch. Finally, we take a mean to obtain the 3D coordinate of the contact patch, and repeat this process for the right camera.
ViiB Frame Estimation from Contact Patches
Given the location of the contact patch on the left bubble expressed in the gripper frame and , we average the positions of the two patches to obtain the position of the contact frame:
(9) 
To compute the rotation , we introduce an intermediate frame such that the axis of is aligned with , and the component of the axis is zero. Denote as columns of (i.e.
), which individually represent the components of the unit vector that define
. Then, we compute the columns using the following process:
Set to be the normalization of .

Set to define the zeropitch frame, and

Set Compute by using , and normalize .

Set
ViiC Pitch Estimation with Optical Flow
After computing , we compute , by computing the rotation along the axis of . We estimate this quantity by computing the optical flow of the IR image. We denote as as the Eulerian flow of relative to . Then, we compute the curl of :
(10) 
where the superscript denotes the component of the vector field, and is some normalization constant we calibrate for. The gradients are computed using a Sobel filter with corresponding kernels.
ViiD Validation Results
In order to validate the performance of the proposed relative pose estimator, we use the same setup that was used for system identification (Fig.2). Through the forward kinematics of the manipulator, and the fact that is a fixed transform for the system identification setup, we compare the measured values of with the results of the relative pose estimator . Our results, illustrated in Fig.4, show that tracking performance of relative pose is dependent on which axis is being tracked:

The position, which uses depth information from each camera, can be tracked reliably.

On the other hand, the and position tracking is not very reliable due to the large contact patch caused by the cylindrical geometry of the tool.

The locations of contact patches on both sides give a very good estimate for roll angle. Optical flow is also successful in tracking pitch.

While yaw shows reasonable behavior, the estimate tends to underestimate the true yaw angle as the contact patch lags behind true rotation due to the softness of the membrane (i.e. perfect roll does not occur).
Viii Experiment Methods & Results
Viiia Simulation Methods & Results
To verify the performance of our proposed pipeline, we first set up a simulation in Drake [drake], where the compliance between the tool frame and the compliance frame is simulated using Drake’s 6D compliance element LinearBushingRollPitchYaw. By assuming a perfect measurement of the relative pose , we aim to decouple the validity of the proposed controller with the accuracy of the tactile relative pose estimator.
ViiiA1 The Squeegee Task
The squeegee is a tool that requires regulation of spatial forces along some principle axis, while requiring regulation of position along others. We illustrate the frame definition in Fig.5, and decompose the spatial forces and positions in the following directions in order to set a task specification of the hybrid force position controller:

are used for position control in order to specify the trajectory of the tool from a tabletop view.

and are used to enforce the magnitude of pressure between the blade and the table.

is used to enforce equal pressure distribution.
As a baseline, we include an openloop trajectory that is tuned such that the squeegee barely contacts the table, within the mechanical repeatability of the manipulator (0.1mm). In addition, we modify the controller for the case where the tool is rigidly fixed (welded) to the end effector in order to simulate the performance of a custom tool changer. The resulting contact forces are inspected based on how much force is exerted (), and how much the pressure distribution on the blade is balanced (). The resulting trajectory is shown in Fig.6.
We show that compared to the case where the end effector is rigidly attached, the compliant hardware allows much better tracking of , such that equal pressure is applied on both sides of the squeegee. We mainly attribute this behavior to the builtin compliance, as behaves well even in the openloop setup. By commanding the desired force in closedloop however, the 6D hybrid forceposition controller adds the ability to exert desired amount of forces. Finally, we note that there exists offset in the tracking error due to the unaccounted weight of the tool.
ViiiB Hardware Methods & Results
Though we have verified the behavior of the controller in simulation assuming perfect pose tracking, showing the controller on hardware requires coupling the pose estimator and the controller in all six axes. However, the estimator is unreliable in certain directions such as yaw or position, which can adversely destabilize closedloop behavior.
In order to overcome these limitations of the estimator, we propose a simple yet effective strategy: we purposely align the axis that requires force tracking with the axis that our estimator performs well in. As most tasks require at most two or three components of force tracking, we show that it is possible to only estimate well a subset of the relative pose, and still achieve the underlying task.
ViiiB1 Pen Writing Task
We first test the controller on a penwriting task, where the robot is commanded to write some characters in the plane, while some force is commanded in the direction. Our setup is illustrated in Fig.7.D. As the result of Fig.7.B demonstrates, our controller achieves good tracking performance of specified force, as observed by the differences in marker stroke width and darkness.
We also test our controller by writing letters in Fig.7.C. While we are successful in tracking the characters, the inherent softness of the hardware sacrifices the bandwidth of the position controller, and frictional interactions between the marker and the paper (e.g. caused by the Painleve effect) can compromise the tracking performance of SEED.
ViiiB2 Squeegee Task
We test the proposed controller on a reallife task of using a squeegee to clean some liquid on top of a cutting board. The results of our hardware experiment are shown in Fig.7.A. While the openloop baseline fails to exert much force on the board, the closedloop controller is successful in pressing down firmly and clearing all the liquid.
Ix Conclusion
We have presented SEED, a control and hardware framework that combines the benefits of hardware compliance with visuotactile sensing. Throughout our work, we have demonstrated that we can measure the relative pose of a tool with respect to the gripper using visuotactile sensing. Combined with offlineidentified parameters of our spatial stiffness model, we have shown that we can achieve closedloop spatial force control that can be useful for tooluse. By our demonstration, we aim to alleviate some of the difficulties that rigid contacts and the associated nonsmooth behavior bring in the setting of grasping and using tools in the wild.
Comments
There are no comments yet.