Adversarial Body Shape Search for Legged Robots

05/20/2022
by   Takaaki Azakami, et al.
0

We propose an evolutionary computation method for an adversarial attack on the length and thickness of parts of legged robots by deep reinforcement learning. This attack changes the robot body shape and interferes with walking-we call the attacked body as adversarial body shape. The evolutionary computation method searches adversarial body shape by minimizing the expected cumulative reward earned through walking simulation. To evaluate the effectiveness of the proposed method, we perform experiments with three-legged robots, Walker2d, Ant-v2, and Humanoid-v2 in OpenAI Gym. The experimental results reveal that Walker2d and Ant-v2 are more vulnerable to the attack on the length than the thickness of the body parts, whereas Humanoid-v2 is vulnerable to the attack on both of the length and thickness. We further identify that the adversarial body shapes break left-right symmetry or shift the center of gravity of the legged robots. Finding adversarial body shape can be used to proactively diagnose the vulnerability of legged robot walking.

READ FULL TEXT VIEW PDF

page 1

page 3

page 4

page 5

05/20/2022

Adversarial joint attacks on legged robots

We address adversarial attacks on the actuators at the joints of legged ...
03/20/2020

A Dexterous Tip-extending Robot with Variable-length Shape-locking

Soft, tip-extending "vine" robots offer a unique mode of inspection and ...
10/22/2019

Teach Biped Robots to Walk via Gait Principles and Reinforcement Learning with Adversarial Critics

Controlling a biped robot to walk stably is a challenging task consideri...
07/26/2022

Learning Bipedal Walking On Planned Footsteps For Humanoid Robots

Deep reinforcement learning (RL) based controllers for legged robots hav...
01/30/2019

Walking Posture Adaptation for Legged Robot Navigation in Confined Spaces

Legged robots have the ability to adapt their walking posture to navigat...
09/19/2018

Stair Climbing Stabilization of the HRP-4 Humanoid Robot using Whole-body Admittance Control

This paper considers dynamic stair climbing with the HRP-4 humanoid robo...
08/25/2020

Evaluating the Effect of Crutch-using on Trunk Muscle Loads

As a traditional tool of external assistance, crutches play an important...

I Introduction

Deep reinforcement learning in robotics has been widely studied [6]; the vulnerability and robustness have also been attracting attention. In particular, the vulnerability to adversarial attacks is inherent to deep reinforcement learning, and improving the robustness is a considerable challenge [13, 4]. Among the adversarial attacks in robotics, the attack on state observations is a main topic, because they directly affect the control of robots.

In this study, we consider the adversarial attack on the body shape of legged robots, which is seen as an attack on environments, not the state observation. The attack adds adversarial perturbations to the length and thickness of the legged robots; we call the attacked body shape as adversarial body shape. Figure 1 illustrates the adversarial body shape search, where the adversarial attack makes the length of the bipedal robot’s leg shorter and throws the robot off balance. Robot body shape changes can occur owing to various factors such as oxidation of the metal material, adhesion of foreign matter to the surface, and formation of defects and dents caused by collision. If such changes are too small to detect, vulnerability is a potential risk. The adversarial attack can search small perturbations that causes walking instability of the legged robots. Thus, the adversarial body shape search can be used to proactively diagnose the vulnerability of robot walking.

Fig. 1: Adversarial body shape search
(a) Walker2d-v2
(b) Ant-v2
(c) Humanoid-v2
Fig. 2: Legged robots

We propose a differential evolution method [9] for searching adversarial body shapes of the legged robots. The proposed search method is designed to reduce the cumulative reward earned by walking simulation of the legged robots. The rewards can be obtained only through walking simulation. Therefore, we employ evolutionary computation with walking simulation on a physical engine to search adversarial body shapes. We perform experiments with three legged robots: the bipedal robot Walker2d-v2, the quadruped robot Ant-v2, and the bipedal robot Humanoid-v2 [1], as shown in Fig. 2. These robot environments run on the physical engine MuJoCo [10]. We evaluate the effectiveness of the proposed search method in terms of the average cumulative rewards for 1000 walking simulations. The experimental results reveal that Walker2d and Ant-v2 are more vulnerable to the attack on the length than the thickness of the body parts, whereas, Humanoid-v2 is vulnerable to the attack on both of the length and thickness. In addition, we investigate the adversarial perturbations that interfere with the walking task for each body part and discover that the perturbations break left-right symmetry or shift the center of gravity of the legged robots.

The main contributions of this study are as follows:

  • We propose an evolutionary computation method with physical walking simulations for searching adversarial body shapes of legged robots. This method will be used to proactively diagnose the vulnerability of the legged robots.

  • We discover the adversarial body shapes that interfere with the walking task for the first time.

Ii Related work

Robot design optimization for legged robots has been studied. Several studies [3, 7, 5] jointly optimize robot design and reinforcement learning. Using REINFORCEMENT, Ha et al. [3] improve policies in environments such as BipedalWalker-2d and Ant-v1, and design more task-appropriate bodies. Schaff et al. [7] retain the distribution on the length and thickness of the legged robots: Hopper, Walker2d, and Ant. They use reinforcement learning to optimize the control policy by maximizing the expected reward for the design distribution. Luck et al. [5] combine design optimization and reinforcement learning to minimize the number of prototypes for optimal robot foot design, improving data efficiency. These studies optimize the robot design by maximizing the rewards. However, they do not consider the vulnerability inherent to the legged robots trained by reinforcement learning. Wang et al. [12] use evolutionary computation to design an optimal size and position of fish fins. To design that, neural graph evolution [11] is used, and the method iteratively evolves the graph structure using mutations. Desai et al. [2] propose an interactive computational design system that enables users to design legged robots with desired morphologies and behaviors. The interactive system automatically suggests candidate robot designs to achieve a specified behavior or task performance by an optimization algorithm. Although these studies do not use reinforcement learning, they still do not consider the vulnerability of the legged robots. In this study, we propose a method for detecting the vulnerability inherent to the body shapes of the legged robots trained by deep reinforcement learning.

Iii Adversarial body shape search

This section describes the adversarial body shape perturbation and the adversarial body shape generation algorithm.

0:  Population size
1:  Generate initial population of body shape perturbations with
2:  
3:  for each generation  do
4:     for each individual  do
5:        Compute adversarial shape
6:        Generate trial individual by mutation and crossover in Eqs. (7) and (8)
7:        Compute trial shape
8:        Compute average cumulative rewards in Eq. (6) fortrial and target
9:        if  then
10:           Accept trial individual
11:           if  then
12:              
13:              Update best individual
14:           end if
15:        else
16:           Retain target individual
17:        end if
18:     end for
19:  end for
20:  return  
Algorithm 1 Adversarial Body Shape Generation Algorithm

Iii-a Adversarial Body Shape Perturbation

We denote

as a body shape vector consisting of the length or thickness of the body parts of legged robots in Fig. 

2. We assume that body shape is constant over time. Body shape affects reward through reward function , i.e.,

(1)

where, is the action vector of a legged robot at time , and reward is computed by running a physical simulation for robot walking.

In this study, we perturb body shape as

(2)

where is an adversarial perturbation of body shape , and is the Hadamard product. Equation (2) perturbs the ratio of the length or thickness of the parts of legged robots. For example, the length of the -th body part, denoted by , is perturbed by

(3)

Adversarial perturbation is computed by minimizing the expected cumulative reward as

(4)

where indicates the maximum norm, is a small positive constant, and is the cumulative reward attacked by adversarial perturbation and is defined as

(5)

The expectation in Eq. (4) is taken with respect to a trajectory of states and actions

, and the expected cumulative reward can be empirically estimated by running

waking simulations as

(6)

where is the reward obtained by the -th walking simulation. In experiments, we set .

(a) length
(b) thickness
(c) Walking animations: clean (top), length attack (middle), thickness attack (bottom)
Fig. 3: Adversarial body shapes with length and thickness perturbations for Walker2d-v2.
Fig. 4: Averages of the average cumulative rewards with attack strength for Walker2d-2d

Iii-B Evolutionary computation for Adversarial Body Shape

We employ a differential evolution method [9] to find the best perturbation in Eq. (4). Note that it is difficult to compute the gradient of the cumulative reward, , with respect to because the computation of requires physical simulations. Therefore, we cannot use efficient methods such as gradient descent. Our simulation-based adversarial attack is a black box attack [4].

We show the differential evolution method for adversarial body shape attack in Algorithm 1. For differential evolution, we first generate an initial population of body shape perturbations , where is the population size. Each element of initial individual

is generated according to uniform distribution

. In differential evolution, the mutation and crossover of the population are performed. Mutant individual at the -th generation is generated according to

(7)

where is the parameter that adjusts the rate of mutation, and are randomly selected from the individual index set, . The best individual in Eq. (7) is the intermediate best until the -th generation. The trial individual by crossover is generated according to

(8)

where subscript indicates the -the element of the vector, is a uniform random number in , is a crossover constant, and is a random element index.

body part length (%) thickness (%)
torso +4.73 +3.93
right thigh +4.82 +4.91
right leg +4.59 +4.19
right foot +0.08 +4.62
left thigh -4.74 +4.15
left leg -3.80 +4.54
left foot -0.36 +2.08
TABLE I: Adversarial body shape perturbation for thickness and length of Walker2d-v2

Iv Experiment

To demonstrate the performance of the adversarial body shape search in Algorithm 1, we conducted experiments on a physical simulation environment MuJoCo [10] with three legged robots in OpenAI Gym [1]: Walker2d-v2, Ant-v2, and Humanoid-v2, as shown in Fig. 2.

(a) length
(b) thickness
(c) Walking animations: clean (top), length attack (middle), thickness attack (bottom)
Fig. 5: Adversarial body shapes with length and thickness perturbations for Ant-v2.
Fig. 6: Averages of the average cumulative rewards with attack strength for Ant-2d

Iv-a Experimental setup

Before the experiments, we first trained these three robots until they can walk well using the proximal policy optimization (PPO) [8]

. The PPO is a policy gradient-based reinforcement learning algorithm and suitable for continuous control. We set hyperparameters for training as follows: learning rate is

, epochs are

, clip range is , batch size is , horizon is . We designed reward functions to ensure that the robots do not fall and walk along the -axis as fast as possible. The respective reward functions for Walker2d-v2, Ant-v2, and Humanoid-v2 are given as follows:

(9)
(10)
(11)

where is the forward velocity, is the vector of joint torques, and is the impact force vector.

For performance evaluation, we compute average cumulative rewards in Eq. (6) 1000 times and then average them as

(12)

where is the average cumulative reward at the -th simulation run. Note that can vary with each simulation run even if is fixed, because action is stochastically generated according to policy network . To compute , we set the attack strength as , where means no attack. Following the literature on adversarial attack, we call the body shape with the clean body, and use it as the baseline in this study. In differential evolution, we set the number of individuals to for Walker2d-v2, Ant-v2, and Humanoid-v2, respectively, and set the number of the generations to for all the robots.

Iv-B Adversarial body shape search

We investigated the performance evaluation for each legged robots: Walker2d-v2, Ant-v2, and Humanoid-v2.

Iv-B1 Walker2d-v2

Figure 3 visualizes the perturbations of the length and thickness with . The blue body parts represent those that are longer or thicker than the clean ones, i.e., positive perturbations are added, while the red ones are those that are shorter or thinner with negative perturbations. The darker the color becomes, the greater the perturbation value becomes. Table I shows the perturbation values for each body part of Fig. 3. From the table, we find that the right and left sides of the whole body are longer and shorter, respectively. This result suggests that the attack on the length breaks the left-right symmetry of the body shape and throws the robot off balance. On the other hand, the table indicates that all the thicknesses increase, i.e., the attack on the thickness makes the robot heavier and more difficult to move than the clean one.

body part length (%) thickness(%)
torso - -4.51
left front thigh -1.88 +2.95
left front leg +0.30 -1.83
left front foot -4.68 -1.28
right front thigh +3.13 +4.29
right front leg +4.70 -1.41
right front foot +4.85 +4.38
left back thigh +2.05 -1.06
left back leg +4.17 -1.76
left back foot +4.90 +4.12
right back thigh -2.67 +1.94
right back leg -0.54 +0.67
right back foot -2.16 +0.36
TABLE II: Adversarial body shape perturbation for thickness and length of Ant-v2
(a) length
(b) thickness
(c) Walking animations: clean (top), length attack (middle), thickness attack (bottom)
Fig. 7: Adversarial body shapes with length and thickness perturbations for Humanoid-v2.
Fig. 8: Averages of the average cumulative rewards with attack strength for Humanoid-2d

Figure 3 shows the animations of the walking simulations for clean (top), length attack (middle), and thickness attack (bottom). We see that the length-attacked robot (middle) loses its balance and falls. This result comes from the left-right asymmetry of the body, as shown in Fig. 3 and Table I. We also see that the thickness-attacked robot (bottom) falls. As mentioned above, the thickness attack weighs down the robot; thus, it becomes difficult for the robot to move the legs for stable walking.

Figure 4 shows the average of the 1000 average cumulative rewards in Eq. (12) for Walker2d-v2. In Fig. 4, the blue and orange bars show the average rewards of the attacks on the length and thickness, respectively. The results demonstrate that the average rewards decrease by attacking the body shape with strength , e.g., the length perturbation with reduces the reward by 82% to the baseline (). In particular, we find that Walker2d-v2 is more vulnerable to the attack on the length than the thickness of the body parts.

Iv-B2 Ant-v2

Figure 5 visualizes the perturbations of the length and thickness with . Like Walker2d-v2 in Fig. 3, blue and red mean the positive and negative perturbations, respectively. Table II shows the perturbation values for each body part of Fig. 5. From the table, we find that the right front and left back parts are longer and the other two parts are shorter than the clean one. As shown in Fig. 5 (top), the clean Ant-v2 uses its right front and left back legs to move forward and uses the other legs to support its body. Hence, the adversarial perturbations in Table II cause a pitch oscillation of Ant-v2, as shown in Fig. 5 (middle). Unlike the length perturbation, the thickness perturbation has no significant effect on walking, as shown in Fig. 5 (bottom).

body part length (%) thickness (%)
head - +2.21
torso +0.69 -3.11
upper waist +3.71 -0.39
lower waist -2.89 -3.78
pelvis +3.36 -3.38
right thigh +2.96 -4.91
right leg +4.48 -4.09
right right foot +4.65 -3.43
right left foot +3.46 -3.85
left thigh -4.54 +1.66
left leg +3.84 -4.91
left right foot +2.16 -0.93
left left foot +4.41 -4.86
right upper arm +3.69 -3.29
right lower arm +3.35 +0.05
right hand - -1.86
left upper arm +4.09 -3.09
left lower arm +3.24 -2.9
left hand - -1.27
TABLE III: Adversarial body shape perturbation for thickness and length of Humanoid-v2

Figure 6 shows the average of the 1000 average cumulative rewards in Eq.(12) for Ant-v2. The results demonstrate that the length attack is successful with , whereas the thickness attack is not. For example, the length attack with reduces the average reward by 60% to that of the clean one, and the thickness attack does not. These results indicate that Ant-v2 is vulnerable to the length attack and is robust to the thickness attack.

Iv-B3 Humanoid-v2

Figure 7 visualizes the perturbations of the length and thickness with . Table III shows the perturbation values for each body part of Fig. 7. From the data in the table, we infer that the right leg parts—thigh, leg, right foot, and left foot— are longer and the left thigh is shorter than those of the clean one by attacking the lengths. Like that observed for Walker2d-v2, these adversarial length perturbations break the left-right symmetry of the body shape and throws the robot off balance, as shown in Fig. 7 (middle). On the other hand, Table III lists the thickness perturbations that make the head larger and the other parts thinner. As shown in Fig. 7 (bottom), the head of Humanoid-v2 is so heavy that it loses its balance and is pulled backward.

Figure 8 shows the average of the 1000 average cumulative rewards in Eq. (12) for Humanoid-v2. The results demonstrate that Humanoid-v2 is vulnerable to the adversarial attack on both the length and thickness, in relation to Walker2d-v2 and Ant-v2. Hence, both attacks can significantly reduce the average rewards with even , and the average rewards eventually become almost zero when .

V Conclusion

We proposed an evolutionary computation method for searching adversarial body shapes of the legged robots. The vulnerability to small body changes can be a potentially significant risk, because they cause the robots to fall. Because deep reinforcement learning has been widely used in robotics, finding vulnerability is very important for the safety and robustness in robotics. This study demonstrated that the legged robots—Walker2d, Ant-v2, and Humanoid-v2— are vulnerable to the attacks on the body shape and can be forced to fall. Through the experiments, we revealed that the adversarial attacks perturb the length or thickness of the body parts such that the left-right symmetry is broken or the center of gravity is shifted. In future, we will develop a method to design robust body shapes against the adversarial attacks to improve the safety and robustness of legged robots.

References

  • [1] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba (2016) OpenAI gym. External Links: 1606.01540 Cited by: §I, §IV.
  • [2] R. Desai, B. Li, Y. Yuan, and S. Coros (2018) Interactive co-design of form and function for legged robots using the adjoint method. CoRR abs/1801.00385. External Links: Link, 1801.00385 Cited by: §II.
  • [3] D. R. Ha (2019) Reinforcement learning for improving agent design. Artificial Life 25, pp. 352–365. Cited by: §II.
  • [4] I. Ilahi, M. Usama, J. Qadir, M. U. Janjua, A. Al-Fuqaha, D. T. Hoang, and D. Niyato (2022) Challenges and countermeasures for adversarial attacks on deep reinforcement learning.

    IEEE Transactions on Artificial Intelligence

    3 (2), pp. 90–109.
    External Links: Document Cited by: §I, §III-B.
  • [5] K. S. Luck, H. B. Amor, and R. Calandra (2020) Data-efficient co-adaptation of morphology and behaviour with deep reinforcement learning. In Proceedings of the Conference on Robot Learning, pp. 854–869. External Links: Link Cited by: §II.
  • [6] E. F. Morales, R. Murrieta-Cid, I. Becerra, and M. A. Esquivel-Basaldua (2021-11)

    A survey on deep learning and deep reinforcement learning in robotics with a tutorial on deep reinforcement learning

    .
    Intell. Serv. Robot. 14 (5), pp. 773–805. External Links: ISSN 1861-2776 Cited by: §I.
  • [7] C. B. Schaff, D. Yunis, A. Chakrabarti, and M. R. Walter (2019) Jointly learning to construct and control agents using deep reinforcement learning. 2019 International Conference on Robotics and Automation (ICRA), pp. 9798–9805. Cited by: §II.
  • [8] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. Cited by: §IV-A.
  • [9] R. Storn and K. Price (1997)

    Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces

    .
    Journal of global optimization 11 (4), pp. 341–359. Cited by: §I, §III-B.
  • [10] E. Todorov, T. Erez, and Y. Tassa (2012) Mujoco: a physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033. Cited by: §I, §IV.
  • [11] T. Wang, R. Liao, J. Ba, and S. Fidler (2018)

    NerveNet: learning structured policy with graph neural networks

    .
    In International Conference on Learning Representations, External Links: Link Cited by: §II.
  • [12] T. Wang, Y. Zhou, S. Fidler, and J. Ba (2019) Neural graph evolution: automatic robot design. In International Conference on Learning Representations, External Links: Link Cited by: §II.
  • [13] C. Xiao, X. Pan, W. He, J. Peng, M. Sun, J. Yi, M. Liu, B. Li, and D. Song (2019) Characterizing attacks on deep reinforcement learning. arXiv preprint arXiv:1907.09470. Cited by: §I.