I Introduction
Impedance control is a powerful way to let robots safely interact with unknown environments. However, it is challenging and timeconsuming to tune the impedance gains in order to achieve desired robot behaviors. Furthermore, the adjusted gains may be specific to one environment and can hardly generalize to others. Therefore, a method that is able to obtain the optimal gains for different scenarios is vital for practical use of impedance control.
To adapt the optimal gains for different scenarios, adaptive control has been studied to adjust the controller online [14, 6]. The impedance gains will adapt following a predefined adaptive control law. However, to get a satisfactory result, it is necessary to tune good initial impedance gains. Moreover, most adaptive laws introduce additional hyperparameters to the problem and complicate the overall tuning process.
Machine learning is a powerful way to learn the optimal impedance controller. Wellknown methods include learning from demonstrations (LfD) [15, 1, 19]
and reinforcement learning (RL)
[10, 11, 16, 19]. For LfD approaches such as [15, 1], the robot collects data from expert demonstrations, and a variable impedance policy can be obtained by fitting the collected data. However, the learned policy is specific to the demonstrated task and may not be robustly transferred to different task settings. In the RL field, by encoding the robot’s performance into a reward function and setting variable impedance as the action space, the robot can learn the variable impedance policy to achieve the desired performance in the simulation [10, 11, 16]. However, as a common problem of machine learningbased methods, it is timeconsuming to collect data and train the policy, which limits their ability for realtime application.Optimization is also a powerful tool for gain tuning. Mehdi [13]
pioneered to use Particle Swarm Optimization (SWO) to tune the impedance controller offline. Similarly, Lahr
[8]proposed to use a multiobjective genetic algorithm to optimize the impedance gains and demonstrated the effectiveness on a real robot. Unfortunately, these methods usually require extensive computation power. The high computation time limits their use in the online phase. One possible reason for the low efficiency is that they do not have an explicit relation between the impedance gains and the robot behavior. Thus, they had to rely on blackbox optimization solvers to deal with an overly complicated problem. To the author’s knowledge, there are no online optimization methods in the literature for tuning the impedance controller.
To obtain the optimal impedance gains in realtime, we propose an online gain optimization framework (Safe OnGOVIC). In contrast with other methods that view impedance gain as a parameter, we express the dynamics of the impedance control into a controlaffine system and consider the impedance gain as a control input. By doing these, we provide a new perspective to understand gain tuning and innovatively formulate an optimization problem to optimize the impedance gains online. An objective function that is able to regulate smooth robot behaviors is designed. Compared with previous optimization and learning methods, our method does not require offline data collection and can be optimized online for different tasks and environments efficiently. Furthermore, benefiting from the novel structure, safety constraints can be embedded into the framework in order to avoid unwanted collisions by adjusting the impedance gains.
In summary, the contributions of this paper are as follows:

A new perspective to understand the relationship between impedance gains and the robot states.

An efficient online gain optimization framework for variable impedance control.

Collision avoidance for variable impedance control.

Comparative experiments demonstrating the effectiveness of the proposed algorithm.
Ii Proposed Method
Iia Preliminary: Cartesian Space Impedance Control
The dynamics model of a DOF manipulator in Cartesian space is written as [18]
(1) 
where , , are the Cartesian acceleration, velocity and position of the robot endeffector, stands for the massinertia matrix, denotes the Coriolis matrix,
is the gravity vector,
is the Jacobian matrix, represents the external force, and, stands for the torque input of the joints.By utilizing the impedance control law [18]
(2) 
the robot system acts as a massspringdamping system
(3) 
where , and denote tracking position, velocity and acceleration errors. , and are the desired mass, damping and stiffness matrices. By selecting , , and matrices, we can change the characteristics of the robot.
IiB Dynamics Formulation
For impedance control (3), a necessary condition to ensure stability is to guarantee the impedance gains are positivedefinite [7]. Therefore, we can leftmultiply the inverse of and obtain
(4) 
where and are the transformed gain matrices and .
We assume that the gain matrices , , and are diagonal matrices [9, 17]
where are diagonal elements of , , and matrices, and is the 6dimensional external wrench given by the environment. We can further reformulate (4) into a controlaffine form by collecting terms together:
(5) 
where the input . is given by:
Equation (5) reveals the relation between impedance gains and the robot states. With this specific formulation, we can optimize the gain based on these dynamics in order to regulate future robot trajectories.
In traditional methods, impedance gain is considered as a parameter instead of a control input when tuning the controller. To the authors’ knowledge, this is the first method to formulate impedance gain as input and construct a dynamics system for gain tuning. By doing these, we illustrate a new way to understand the tuning process and provide convenience for the following optimization operations.
IiC Objective Function
Minimizing the integral of timeweighted absolute error (ITAE) [12] is a commonly used objective to tune PID parameters offline:
(6) 
where is the robot position error at time .
For contactrich manipulation tasks, such as grinding and polishing, we prefer smoother trajectories. Therefore, inspired by the idea of the ITAE, we propose a new cost function named the finitetime integral of timeweighted absolute velocity error (FITAVE) in (7) to minimize the velocity error for a smooth trajectory.
(7)  
where is the robot states, is the robot dynamics in (5), and is the optimization variable for impedance gains.
Fig. 1 shows an example of FITAVE. A 1DoF robot controlled by an impedance controller is going to contact a surface located at . The dynamics of the surface is modeled by a spring with large stiffness. The objective is to select proper impedance gains to make contact as smooth as possible. We compared the results obtained from FITAVE with ITAE and manually tuned gains. The simulation result corroborates that FITAVE can generate smooth robot behaviors as we expected. Our experiment results in Section IIIC further prove the effectiveness of FITAVE.
IiD Collision Constraints for Safe Interaction
Consider a safe set defined as , where is continuously differentiable. The robot should not exceed this safe set throughout the entire task execution. More formally, with the robot dynamics in (5), we want to ensure that with proper gains selection, remains in the safe set.
According to Nagumo’s theorem [3] and Zeroing Barrier Functions (Proposition 1 in [2]), a sufficient condition to ensure is given as follows,
(8) 
where is the time derivative of given in (9) and is an extended class function suggested by Remark 6 in [2].
(9) 
Therefore, ensuring is sufficient to guarantee the robot stays in the safe set, and we can construct the constraint on . In this paper, we consider robot collision constraint , which only depends on the robot position. Since, the first element of in the robot dynamics (5) is zero (relative degree greater than 1), the impedance gains do not show up in as shown in Appendix A , we cannot directly obtain the constraints on to regulate the robot. Similarly, we note that is ensured if . This condition results in (10) as the final collision constraint. The derivation is provided in the Appendix A.
(10) 
IiE Safe OnGOVIC Algorithm
The goal of the proposed Safe OnGOVIC is to find optimal impedance gains and ensure safety. Since safety has the highest priority, we propose a framework that consists of twolevel optimizations, a highfrequency safety optimization and a lowfrequency gain optimization.
For the lowfrequency gain optimization in (11), the goal is to obtain the optimal impedance gains based on the online collected force data. The objective function is to minimize FITAVE in (6) to achieve smooth behaviors.
(11) 
For the highfrequency safety optimization, the goal is to keep the robot satisfying the safety constraints by changing impedance gains. Thus, we include the collision constraints that are shown in (LABEL:eqn_high_frequency). The objective function is designed to minimize the change of gains, where is the optimal gain values obtained from the previous lowfrequency optimization step.
(12)  
s.t. 
The high frequency safety optimization (LABEL:eqn_high_frequency) is convex, while the low frequency gain optimization (11) is not because of the environment force in . In practice, both problems can be tackled by a Sequential Quadratic Programming (SQP) solver [4]. After solving the optimization, impedance gains can be recovered by:
(13)  
The whole algorithm is illustrated in Alg. 1. At runtime, gains are initialized with random positive values. At each sampling interval, the highfrequency optimization is calculated based on the current robot states, and impedance gains are recovered by (13). At the same time, the force sensor data is collected. Every second, the lowfrequency gain adaptation optimization calculates the optimal impedance gain and sends it to the impedance controller.
Iii Experiments
We test the proposed Safe OnGOVIC in three tasks: collision avoidance, surface contact, and board wiping. The proposed method is benchmarked with two baselines, 1) conventional constantgain impedance control (CGIC) and 2) an advanced adaptive VIC (AVIC) proposed in [5].
Iiia Experiment Setup
As shown in Fig. 2, we use a 6DOF FANUC LR Mate 200iD robot to validate the proposed method. The coordinate system of the robot is shown in Fig. 2. A Microsoft Kinect 2.0 is utilized to monitor the environment. The robot is equipped with an ATI Mini 45 F/T sensor to measure the external wrench. The Cartesian space impedance control law (2) is implemented in Simulink RealTime on a target PC to control the robot. The proposed Safe OnGOVIC is programmed on a host PC with a communication frequency of Hz.
IiiB Exp1: Safety Guarantee in Constrained Environments
First, we evaluate the effectiveness of the highfrequency safety optimization in a constrained environment, where manipulation tasks should be accomplished within a safe region.
As shown in Fig. 2, the robot picks an object on the shelf. While the human, as a random disturbance, disturbs the robot in different directions. We want to show that Safe OnGOVIC is able to optimize impedance gains in order to stay safe. The constraints in (14) is to enable the robot to stay within a ‘safety zone’ and is enforced at ,
(14)  
where are the safety boundaries.
Fig. 3 (a) shows the snapshots of the experiments. In the beginning, the robot is away from the safety boundary, and impedance gains , and are chosen to achieve a smooth behavior under external force. After humans drag the endeffector towards the boundary, the robot is aware of the danger and quickly adapts the , , and to resist the disturbance and stay inside the safe region.
As a comparison, the results of the constant gain impedance control law are provided on our website. The results show that they are not able to stay within the safety region and avoid potential collisions with obstacles.
IiiC Exp2: Online Gain Adaptation in Unknown Environments
Plastic Board  Metal Board  Wood Board  
CGIC  AVIC  Proposed  CGIC  AVIC  Proposed  CGIC  AVIC  Proposed  
Approaching Time (s)  
Settling Time of Position Error (s)  
Variance of External Force at Steady State ()  N/A  N/A 
The experiment is to let the proposed algorithm obtain the optimal impedance gain (without human tuning) to make the robot endeffector smoothly contact with different surfaces, such as plastic, metal, and wood boards (as shown in Fig. 2).
We first test the performance of the constant gain baseline (CGIC). As shown in Fig. 4, parameters of the constant gain method are tuned to the best performance on a foam surface and then transferred to three test surfaces. However, the constant gain baseline cannot achieve satisfactory results in other scenarios due to the lack of generalizability. Furthermore, the gain value does not reflect the change of the environment characteristics, and therefore, it results in the oscillation of the endeffector.
For the adaptive gain baseline (AVIC), the initial impedance gains are the same as CGIC. During the task execution, the control law adapts the damping term according to the current force value. From the result in Fig. 5, the adaptive gain baseline performs better than the constant gain. However, it takes a long time to reach the steadystate and also results in some oscillation behaviors.
The proposed Safe OnGOVIC algorithm does not require any offline tuning, and it is randomly initialized with some gain values. As shown in Fig. 6 and the videos, the robot initially contacts the board hardly and results in clear oscillations. While the robot measures the contact, the gain adaptation optimization is aware of the change of environment characteristics and updates the impedance values immediately to make contact smoother. From the figures and the videos, we can observe that after one step of update, the robot stops oscillating, and the force profile immediately becomes stable. The lowfrequency optimization collects the force sensor data for every second, and the optimization takes about seconds to solve. Moreover, after successfully adapting to the surface, the human lifts the robot to some random positions (at s, s, s, and s in the video on our website), the robot can recover to the desired position smoothly using the obtained optimal impedance gain. Similarly, the results on the metal board and wood board are illustrated in Fig. 6 (b) and (c), respectively. According to the force curve, the results also reveal that our method has the smallest force ripple in the steadystate compared with CGIC and AVIC in all the scenarios.
Table I summarizes the performance of each algorithm and quantitatively compares the results. It indicates that the proposed Safe OnGOVIC algorithm can optimize the performance after the robot contacts the surfaces, achieving the shortest settling time and the smallest force variance at the steady state. Moreover, the proposed algorithm is able to approach the surface rapidly with shorter approaching times compared with AVIC. As for the constant gain method, the robot’s position error cannot converge to the steady state on plastic and wood boards for a long time. These numerical results prove the advantage of the proposed method over CGIC and AVIC again.
IiiD Exp3: Contact Rich Manipulation Task  Surface Wiping
The experiment’s goal is to provide an application of the proposed method (lowfrequency and highfrequency) and verify its effectiveness and robustness further. As shown in Fig. 2, the robot contacts with the board and wipes the “stain” without goes out to a predefined unsafe area (denotes by the purple line on the board). A Microsoft Kinect 2.0 is utilized to monitor the board and sends the “stain” location to the controller as the reference position.
Fig. 7 illustrates the change of the robot position and impedance gains. Similar to the results of Exp2, the robot finds optimal impedance gains after a few steps of adaptation, and when the human draws some “markers” on the board, the camera detects the position and sends it to the controller as a new reference point. At the same time, the gain adaptation optimization constantly collects the force data and optimizes impedance gains. The success of this task with the proposed method proves that our method can obtain the optimal impedance gain for different tasks online while guaranteeing safety at the same time.
Iv Conclusion
We proposed Safe OnGOVIC, an efficient online gain optimization algorithm, for variable impedance control. The relation between impedance gains and robot behavior is explicitly constructed as a controlaffine system. Based on that, an optimization problem is formulated to optimize impedance gains in realtime based on the online collected force data. Moreover, collision constraint is embedded into the framework to ensure safety. A series of experiments demonstrates the performance of the proposed method, and the comparison with the constant gain baseline and an adaptive gain baseline also indicates that the proposed method is more effective and robust to different unknown scenarios.
For future work, we plan to incorporate the regulation of contact force into the objective function to make the robot apply a constant force to the environment. Moreover, we will test the proposed algorithm in other contactrich manipulation tasks, such as polishing and grinding.
V Appendix
Va Derivation of the Collision Constraint
For collision constraint only on robot position, we have
(15) 
Substituting the above equation and the robot dynamics (5) to (8), we can obtain
(16)  
Therefore, ensuring is sufficient to guarantee the robot stays in the safe set. However, the impedance gains do not show up in the above equation (relative degree greater than 1), we cannot directly obtain the constraints on to regulate the robot. Therefore, we apply the condition (8) once more on to get the collision constraint.
References
 [1] (2018) Forcebased learning of variable impedance skills for robotic manipulation. In 2018 IEEERAS 18th International Conference on Humanoid Robots (Humanoids), pp. 1–9. Cited by: §I.
 [2] (2016) Control barrier function based quadratic programs for safety critical systems. IEEE Transactions on Automatic Control 62 (8), pp. 3861–3876. Cited by: §IID.
 [3] (1999) Set invariance in control. Automatica 35 (11), pp. 1747–1767. Cited by: §IID.
 [4] (1995) Sequential quadratic programming. Acta numerica 4, pp. 1–51. Cited by: §IIE.
 [5] (2018) Adaptive variable impedance control for dynamic contact force tracking in uncertain environment. Robotics and Autonomous Systems 102, pp. 54–65. Cited by: §III.
 [6] (2019) A framework for robot manipulation: skill formalism, meta learning and adaptive control. In 2019 International Conference on Robotics and Automation (ICRA), pp. 5844–5850. Cited by: §I.
 [7] (2016) Stability considerations for variable impedance control. IEEE Transactions on Robotics 32 (5), pp. 1298–1305. Cited by: §IIB.
 [8] (2017) Adjustable interaction control using genetic algorithm for enhanced coupled dynamics in toolpart contact. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1630–1635. Cited by: §I.
 [9] (1988) Impedance control stability properties in common implementations. In Proceedings. 1988 IEEE International Conference on Robotics and Automation, pp. 1185–1190. Cited by: §IIB.
 [10] (2017) Adaptive impedance control of human–robot cooperation using reinforcement learning. IEEE Transactions on Industrial Electronics 64 (10), pp. 8013–8022. Cited by: §I.
 [11] (2019) Variable impedance control in endeffector space: an action space for reinforcement learning in contactrich tasks. arXiv preprint arXiv:1906.08880. Cited by: §I.
 [12] (2005) Tuning pid controllers using the itae criterion. International Journal of Engineering Education 21 (5), pp. 867. Cited by: §IIC.
 [13] (2011) Impedance controller tuned by particle swarm optimization for robotic arms. International Journal of Advanced Robotic Systems 8 (5), pp. 57. Cited by: §I.
 [14] (2019) A selfmodulated impedance multimodal interaction framework for humanrobot collaboration. In 2019 International Conference on Robotics and Automation (ICRA), pp. 4998–5004. Cited by: §I.
 [15] (2015) Humanintheloop approach for teaching robot assembly tasks using impedance control interface. In 2015 IEEE international conference on robotics and automation (ICRA), pp. 1497–1502. Cited by: §I.
 [16] (2018) Learning motions from demonstrations and rewards with timeinvariant dynamical systems based policies. Autonomous Robots 42 (1), pp. 45–64. Cited by: §I.
 [17] (1997) Force tracking in impedance control. The International Journal of Robotics Research 16 (1), pp. 97–117. Cited by: §IIB.
 [18] (2017) Impedance control of robots: an overview. In 2017 2nd international conference on cybernetics, robotics and control (CRC), pp. 51–55. Cited by: §IIA, §IIA.
 [19] (2021) Learning variable impedance control via inverse reinforcement learning for forcerelated tasks. arXiv preprint arXiv:2102.06838. Cited by: §I.
Comments
There are no comments yet.