Learning-based Intelligent Attack against Mobile Robots with Obstacle-avoidance

10/14/2019 ∙ by Yushan Li, et al. ∙ 0

The security issue of mobile robots have attracted considerable attention in recent years. Most existing works focus on detection and countermeasures for some classic attacks from cyberspace. Nevertheless, those work are generally based on some prior assumptions for the attacker (e.g., the system dynamics is known, or internal access is compromised). A few work are delicated to physical attacks, however, there still lacks certain intelligence and advanced control design. In this paper, we propose a physical-based and intelligent attack framework against the obstacle-avoidance of mobile robots. The novelty of our work lies in the following: i) Without any prior information of the system dynamics, the attacker can learn the detection area and goal position of a mobile robot by trial and observation, and the obstacle-avoidance mechanism is learned by support vector regression (SVR) method; ii) Considering different attack requirements, different attack strategies are proposed to implement the attack efficiently; iii) The framework is suitable for holonomic and non-holonomic mobile robots, and the algorithm performance analysis about time complexity and optimality is provided. Furthermore, the condition is obtained to guarantee the success of the attack. Simulations illustrate the effectiveness of the proposed framework.



There are no comments yet.


page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

With network communication, integrated computation and control to support the operations in the physical world, the mobile robots can be seen as a typical Cyber-Physical System (CPS). Due to excellent flexibility and scalability, mobile robots have been a research hotspot in the field of control and robotics and receive considerable attention. From unmanned aerial vehicles (UAVs) to unmanned ground vehicles (UGVs), either single or multiple coordinated, mobile robots are becoming more and more pervasive in both industrial and military fields, e.g., logistics transportation, environment exploration, and military reconnaissance.

Due to the increasing usage of mobile robots in a wide range of application domains, the security issue has become an essential requirement and imperative challenge [1]. The security shows the ability of a system to govern malicious behaviors or unanticipated events [2]. Attacks against the mobile robots are mainly from cyber space, and they can be roughly divided into three categories: DoS, replay, and deception attacks [3]. In those attacks, communication channels are maliciously jammed/disrupted [4], or the control data and measurements are compromised/altered [5], thus degenerating the mission effectiveness of mobile robots in critical and adversarial scenarios. Many efforts have been devoted to designing corresponding countermeasures. For instance, [6] addresses the problem of ensuring trustworthy computation in a linear consensus network with misbehaving agents. In [7], undetectable and unidentifiable attacks are characterized and detection filters are designed. And [8]

considers the secure estimation when the set of attacked nodes can change with time. However, most of the existing research rely on a baseline premise that the attacker has some knowledge or access to the formation system. For example, malicious agents have knowledge of system structure or nodes’ states

[6], or the packets transmitted over network are corrupted [9, 10]. These assumptions neglect the capacity limitations of the attacker in real scenarios, especially most of them are too hard for the attacker to actually implement. Therefore, there still remain potential gaps between theory and practice, concerning how to implement these kinds of attack or under what conditions the attack is possible to be launched.

Moreover, there are also a few work focusing on physical attacks, where the physical components are considered as attack target to make the attacks stealthy. For instance, the GPS sensor readings can be disturbed by GPS spoofing attack [11, 12, 13]. Designed acoustic noises can alter gyroscopic sensor data, leading to drone crashes [14]. Even the important values stored in memory (e.g, EEPROM, Flash memory) can be corrupted by heating up a memory cell while the device is without ang damage [15]. Compared with cyber attacks, these physical attacks are straightforward to implement, and traditional detection techniques from computer security community are usually not effective and powerful to handle them [1]. Nevertheless, this does not mean these physical attacks are impeccable. In fact, they are still not smart and advanced enough. On one hand, the attacks are generally against a specific kind transducer by utilizing its sensing mechanism, and the attack methods are not generalized. On the other hand, those physical attack is designed with a “open-loop”-like idea, only aiming to disturb the system performance more or less, without any specific attack purpose and sophisticated control design.

Motivated by above observation, we design a physical-based and intelligent attack scheme against the obstacle-avoidance of mobile robots. We describe it as “intelligent” for it reflects an intellectual growth of learning knowledge and mastering skills, like a child who knows nothing gradually acquires every ability he needs by observing and trying. The novelty lies in that we do not aim to design an attack against a single type of sensors, but against the intrinsic mechanism of obstacle-avoidance for the mobile robots.

Note that a reliable obstacle-avoidance methodology is extremely crucial for the effectiveness of navigation, which is universally needed and vital technology in almost all applications of mobile robots [16]. Normally, the mobile robots are equipped with transducers such as sonar, laser radar or cameras, to detect surrounding environments. After the environment information is gathered by the transducer, it transmits the information to the controller to make decisions by pre-programmed algorithms to tackle different situations. However, no matter what sensor and what obstacle-avoidance approach the mobile robot uses, it will not make a difference to our proposed attack method, for only the changes of robot’s posture are needed in our method.

The proposed framework works as follows. First, we propose a learning scheme for the attacker to learn the obstacle-avoidance mechanism of the mobile robots. It seeks to solve what information is needed and how to obtain these information in a feasible way to launch an attack. In fact, considering the attackers are quite powerful (as most existing works do) may lead to a robust defence for the system, but it also sacrifices the normal control performance to some degree (e.g, hardware burdens and computation complexity). The tradeoff between them is not easy to balance. From our point of view, if this gap between attack and defence is filled, the design of countermeasures for attacks will be more well-directed. Simply speaking, for a kind of attack that is almost impossible to launch, there is no need to design an sophisticated defence strategy along with degradation of the normal performance. The proposed learning scheme in this work can fill this gap. Specificly, it leverages the basic sampling methods to obtain the realtime motion information of a mobile robot. Based on that, the attacker needs to stay still and disguise as an obstacle. When the mobile robot encounters the disguised attacker, it will adjust its trajectory to avoid collision. This process will be utilized and observed by the attacker and the data is used to regress the obstacle-avoidance model by learning methods.

Next, the proposed attack involves designing sophisticated attack strategies to achieve the specific purpose, where the attacker disguises as an obstacle to fool the mobile robot into a preset trap (the trap could be a pothole, a cage or an area where the communication is invalid). Note that path distance and transition time are two commonly-used optimization objective in robot navigation (path planning) [17]. And it’s a variational calculus problem to find the exact trajectory in an obstacle field, and the analytic solution can only be obtained for some simplest cases [18]. Furthermore, the learned model in last step is basically impossible to be used for reversely computing inputs given outputs. To tackle this issue, we first formulate the attack design as a control optimization problem, where the objective is minimize the path cost or time cost. Drawing on the ideas similar to sampling-based approach [19, 20], we propose near-shortest path and hands-off attack algorithms for different requirements to solve the problems, respectively. The key point is to make use of the nature of obstacle-avoidance that once an obstacle is detected, the mobile robot will deviate its trajectory towards opposite directions. When the mobile robot is close enough to the trap, we mean the attack is successful.

It should be mentioned that there are generally two kinds of mobile robots: holonomic and non-holonomic. The latter one appears more common in daily life, however, their instantaneous movement is restricted [21], making the control more challenging than that of holonomic mobile robots. The proposed framework applies to both of the two kinds of mobile robots. We mainly illustrate our work on non-holonomic robots which is more difficult, and the work can be easily transfered to holonomic robots due to their more simple motion characteristic. Thus, the attack against holonomic robots will be briefly introduced.

This paper is an extension of the preliminary work presented in [22], providing a detailed and rigorous treatment of model learning, performances guarantees, and significant novel simulation results. Our study provides new insights into the security issues for mobile robots. The main contributions of this paper are summarized as follows:

  • To the best of our knowledge, this is the first time to consider a physical-based and sophisticated attack against mobile robots without any prior information of the system dynamics for an attacker.

  • We propose an intelligent attack framework for the attacker. It can learn the obstacle detection area of a robot through trial and observation. By the collected data and learning methods, the obstacle-avoidance mechanism is regressed. Both the holonomic and non-holonomic mobile robots are considered.

  • We design two kinds of attack strategies meeting different purposes, such that the victim robot moves into the preset trap area. The algorithm performance is analyzed in terms of the optimality and time complexity of the solution. Moreover, the condition for successful attack is obtained. Extensive simulations are conducted to illustrate effectiveness of the proposed approach.

The rest of this paper is organized as follows. In Section II, the basics about the kinematics and obstacle-avoidance of mobile robots are introduced. The control methods for holonomic and non-holonomic are both considered. The learning scheme for obstacle-avoidance mechanism is proposed in Section III. Section IV presents the attack strategies with performance analysis. Relative simulation results are shown in Section V. Finally, Section VI concludes this paper.

Fig. 1: The architecture of the learning-based intelligent attack

Ii Preliminary and Problem Formulation

In this section, we first introduce some basics about the kinematics of a mobile robot. The motion dynamics of both holonomic and non-holonomic mobile robots are presented. Following this, we show what an important role the obstacle avoidance palys for a mobile robot, and briefly introduce two classical algorithms, artificial potential approach and dynamic window approach. At last, our problem of interest is formulated.

Ii-a Motion Control for Mobile Robot

In 2-D plane, the posture of a mobile robot is usually represented by its position and orientation , denoted as . Considering the constraints imposed on the mobile robots, these robots are generally divided into two categories: non-holonomic and holonomic. For non-holonomic mobile robots (car-like wheeled mobile agents, unicycles, etc.), they are subject to pure rolling constraints without sliding between the wheel and the ground, which means the robots cannot move laterally and the motion direction is consistent with its instantaneous orientation at any time. The motion of this robots is controlled directly by linear velocity and angular velocity

or velocities of two driving wheels, which are equivalent with each other. The kinematics is modeled by a group of non-linear ordinary differential equations (ODEs), whose discrete forms are given by


where is the motion control period, () denotes the motion control time instant with , is the orientation respect to the axis, and are the linear and angular velocity, respectively. As for holonomic mobile robots, their kinematics in two directions is independent with each other and is formulated as the following discrete first-dynamics:


where and are velocities along X and Y axis directions, respectively. Since the motion of holonomic robot is the composition of the motions of two directions, the orientation is usually neglected.

The aim of motion control is to design the velocities in (1) and (2) according to different task requirements. As in [23], a “hand” position of non-holonomic robot is defined as , which lies a distance from the center along robot’s axis of orientation. By simple transformation, we have


The kinematics of the hand position is holonomic for . In this way, the control problem is simplified and sufficient for the purpose of this paper, and we obtain


where are the velocity control inputs in two directions. For holonomic robots, their motion control is directly formulated as .

For simplicity of expression, we denote the motion dynamics of all robots as


where is generalized velocity control vector at time .

In the following sections, we will propose an intelligent attack scheme that can be applied to both two kinds of robots, making our attack generic in practice.

Ii-B Obstacle-avoidance Algorithm

Numerous obstacle-avoidance algorithms have been developed in the literature, for example, potential fields based approach [24, 25, 26]

, genetic algorithm based approach

[27, 28, 29], fuzzy logic based approach [30, 31, 32]

, neural network based approach

[33, 34, 35], etc. According to characteristic of obstacle-avoidance mechanism that commonly used, similar to [36]

, we divide these algorithms into two types: instantly-deterministic (e.g, artificial potential method, learning-based method) and long-horizon exploring (dynamic window approach, genetic approach, evolutionary algorithm). The former one can be seen as determined-model driven, i.e, the solution of current obstacle-avoidance is unique (although learning-based method is commonly said data-driven, the model is generally injective mapping once the training process is complete). The key idea of the latter one is to search for feasible solutions in the solution space. And there is usually an evaluation function to select the best one, due to the multiplicity of solutions. In this paper, a representative method of each kind is used, i.e,, the classic artificial potential method (APM) and dynamic window approach (DWA).

APF is first presented in [24]. The basic idea is that when a robot detects an obstacle, it produces a repulsive potential field, with the artificial force acting in the negative direction of the potential gradient. Denote as the distance between two points. The algorithm is given by


where and are the coordinates of the robot and obstacle, respectively.

Regarding as attraction force term, we combine with to achieve robot motion control with obstacle-avoidance. Then, we have


where represents the final input of the robot.

Remark 1

Since the independent motion of two directions matches well with the control designs of APM, APM is quite convenient to be applied to holonomic mobile robots. A major drawback of APM is the local minima problem, and a easy solution is to add a random disturbance. However, there is no need to deal with this issue in this paper, for the obstacle disguised by the attacker is not always static.

DWA is a velocity space based approach and is commonly used in Robot Operating System (ROS). The robot first samples multiple groups of feasible velocity inputs from the velocity space , which is used to simulate following trajectories. Then the robot evaluates these trajectories and choose the first step velocity input of the best trajectory to actuate the movement. The key point of this approach is to design the sampling intervals of velocity space and the evaluation function. Specifically, an evaluation function is to select a heading and velocity that drives the robot to the goal with the maximum clearance from obstacles, given by


where is the angular deviation between robot’s orientation and the goal, is distance between the robot and its closest obstacle, and is exactly the velocity inputs of current trajectory. All of the three variables are normalized for unified evaluation, and the constans determine the contribution of each factor. For more details, readers are referred to [37] and [38].

Remark 2

Due to the direct sampling from the velocity space , DWA is easy and straightforward to be applied to non-holonomic mobile robots. A drawback of this approach is the computation may be cumbersome, introduced by multiple velocity samplings and trajectory evaluations.

Ii-C Problem of Interest

Our research is based on a simple yet quite representative scenario in most applications of mobile robot: a robot (or robots) is performing a go-to-goal task based on certain motion control algorithm. During this process, the robot is able to avoid obstacles occurring in its surroundings. And there is an attacker nearby, whose attack purpose is two-fold: learn the obstacle-avoidance mechanism of the victim robot, and design an efficient attack strategy to fool it to move into a preset trap. To achieve that, there are manly three challenges that need to be tackled: i) without any prior information for the motion dynamics of a mobile robot, what information is necessary for the attacker and how to obtain them; ii) what kind of feasible attack can be launched based on these information; iii) how to evaluate the attack performance and optimize the attack strategy. The whole framework of this paper is shown in Fig. 1 Hereafter, we denote the attacker as and the victim mobile robot as . The following assumptions hold throughout this paper.

Assumption 1

can move faster than , and has strong ability to sense objects. The observation by is noise-free.

Assumption 2

’s movement is regular, i.e., it can be modeled by a function of time explicitly, which is continuous everywhere and indifferentiable in finite points.

Iii Learning Scheme for Obstacle-avoidance Mechanism

When encounters an obstacle within its detection area, it will evaluate the obstacle’s influence and take corresponding action, deviating from its desired trajectory. Inspired by this, a learning scheme for the obstacle-avoidance mechanism is proposed. This scheme consists of three parts:

  • Pre-sampling. Ideally, the instantaneous motion information of (such as orientation, linear, angular and acceleration velocities) can be obtained based on three consecutive position samplings. This constitutes the cornerstone following steps.

  • Intentional learning. With the ability of mastering ’s motion information, seeks to make certain influence by approaching and then observes its reaction. By a sequence of trials, ’s detection area and goal position can be inferred.

  • Model regression. Through multiple observations of ’s reaction of avoiding obstacles, such as position and bearing variation, obtains a collection of data. Then, utilize it to regress a model that captures the obstacle-avoidance by learning-based method (e.g., SVR).

Iii-a Pre-sampling: Data Acquisition

Recalling that the posture of an mobile robot is updated every periodic control time , its trajectory during the period can be approximated as a straight line. Equipped with advanced sensors, is able to measure its relative displacement with a moving object. Supposing the sampling period of is , for simplicity of notation, we use the subscript to represent the sampling time with .

After three consecutive sampling moments, the instantaneous variables of

’s motion are estimated by


where represent the linear, angular and accelerated velocities, respectively. To make (9) completely available, the sampling period needs to be small enough (e.g, 10ms) and always has the last three groups of data stored to calculate (9). For holonomic mobile robots, this process is much easier with higher precision and we can directly utilize . To make a unified statement, let if the robot is non-holonomic or if holonomic.

Remark 3

Generally, the motion control period is very small (e.g., 0.5s or 0.1s), and the sampling period is determined by the information sensing and processing ability of . Concerning how to choose an appropriate , we assume it keeps the minimum sampling time that could afford. Ideally, we have .

Based on (9), can master ’s instantaneous motion information of any time. This constitutes the foundation of the following steps.

Iii-B Intentional-learning: Trajectory Trial

For the obstacle detection area of , it can be generalized to a circular region mostly. If we do not consider obstacles behind it during its moving forward, the detection area is modeled as a sector directly. Then, at this stage, the primary objective of is to infer the radius and angle range of the sector, which are together denoted as .

When moves close to , makes a record of ’s relative position, heading and bearing with it. After a period , does the measurement again. With the two readings, is able to calculate ’s position variation and heading variation after . During time slot , we have


where and . Note that in normal situations, goes straight forward to the goal, i.e., . Therefore, we assume is detected as an obstacle within by if . This whole process is illustrated in Fig. 2.

Fig. 2: ’s reaction after detecting as an obstacle.
(a) Learning ’s detection radius .
(b) Learning ’s detection ang .
Fig. 3: Illustration of the learning process for .

During the learning process of , it also records the trajectories of moving towards to the goal, as shown in Fig. 3. Denote these trajectories as , which are segments of straight lines leading to the goal. Let be the minimal distance from a position to , and be the estimated goal by . Then, the estimation error of is given by


where is the number of recorded trajectories. Then, the problem is formulated as


where , and is the coordinates in X-Y plane. Apparently (12) is an overdetermined equation problem, which has an exact solution only when the measurements are totally accurate. Thus, we can only obtain the least square solution of (12).

Assuming satisfies ,, we have


Since the real goal is generally a region, which is specified by maximum radius , we take the estimated as acceptable if . Here we set , which basically satisfies the requirements (if not, we only need to set more groups of the sampling trajectories). The whole steps of intentional learning is summarized in Algorithm 1.

0:    ’s posture , posture regulating variables , for every trial
0:    detection area and goal position
1:  Initialize: moves to remote posture such that is directly ahead of , i.e. =0;
2:  while  do
3:     ;
4:     ;
5:     Calculate ;
6:     if  then
7:         Record , and the following trajectory as ;
8:     end if
9:  end while
10:  Reset: moves to a posture such that is in ’s direction with , i.e., ;
11:  while  do
12:     ;
13:     Calculate ;
14:     if  then
15:         Record , and the following trajectory as ;
16:     end if
17:  end while
18:  Reset: moves to another posture such that is in ’s direction with , i.e., . Then does the same process again to obtain a new and ;
19:  , compute goal using the trajectories;
20:  Return and goal position
Algorithm 1 Learning ’s Detection Area and Goal

Iii-C Obstacle-avoidance Mechanism Regression

Next, can move inside of one agent. In this step, also records its relative distance and bearing with , and ’s heading deviation with the goal point. Once is detected, it stores two groups of data during the next period . The data groups are defined as


Furthermore, the reaction velocity input is obtained by . Next, we define the sampled feasible set as


where . Then, we propose Algorithm 2, by which collects the data set and uses it as training data to learn the obstacle-avoidance mechanism of . Note is the preset trial limit for . Specifically, the classic support vector regression (SVR) method is used.

Remark 4

SVR method has good performance on non-linear regression and strong generalization ability when the amount of data isn’t vast. It is insensitive to the model of learning object and has certain tolerance for data noises, due to the error-tube design


Now we give a detailed analysis for this algorithm. Considering in the beginning of time slot , we have


where and are determined by (4).

As mentioned in Remark 3, the sampling period does not equal to the control period necessarily, due to the limitation of sampling and computation ability of . Under this circumstance, only a group of real control velocities during the is able to be used. Then, the accurate model (16) is changed into a approximate form, given by


Simplifying (17), we obtain


where and represents the displacement and bearing variation in every sampling period, respectively.

0:    ’s detection area , intentional-learning’s ,
0:    Obstacle-avoidance mechanism
1:  Initialize: moves to a relatively far position from
2:  for  to  do
3:      moves into a random position in ;
4:     Compute , , , , , ] at ;
5:     Wait for a time slot
6:     Compute at ;
7:     ;
8:  end for
9:  Use and learning-based method (e.g, SVR) to regress ;
10:  Return
Algorithm 2 Regress Obstacle-avoidance Mechanism

Essentially, the learning method is applied to regress the mapping relationship between and . There exit inevitable model errors using (18). Ignoring the subscripts, let and

, the following probability is presented to describe the learning effects, given by


where is the error between real value and estimated value, and represents the confidence we have in . is a monotonically increasing function of , determined by specific learning model and ranging from 0 to 1. The smaller is, the more reliable is. Naturally, we have

Remark 5

Given a group of normal inputs, the output of the learned model is inevitably with certain noise, which is determined by the nature of learning methods. Besides, it is easy to understand that the learned model is a mapping from input to output, however, the model is not necessarily a surjection or injective mapping. In other words, the model cannot be inversely used to obtain input given a group of outputs in most cases.

Now that the obstacle-avoidance mechanism of is known to , it fills the gap between the powerful assumption (where the attacker has known the information of the target system from the very beginning), and the real implementation (where the attacker needs to acquire those necessary information first).

Iv Intelligent Attack Strategy

Iv-a Attack Feasibility Analysis

By above learning scheme, has found a way to master the obstacle-avoidance mechanism. Based on this, we design an intelligent attack strategy, which aims to fool the mobile robot into the preset trap. The feasibility of the attack lies in two parts. First, the proposed attack is launched from physical world by disguising as an obstacle, and there is no way for to evade its influence. Second, ’s presence is taken into consideration by ’s obstacle-avoidance mechanism, and the learning scheme proposed in last section provides solid information foundation for to implement more smart and stealthy attack.

However, these points do not indicate the attack design is simple. In fact, unlike traditional control design, it is a quite knotty problem because it involves two independent motion dynamics of and , and we need to take the mutual influence of two dynamics into consideration when designing the strategies. This implicates that many powerful analytical tools in control field may not work well in this scenario. To better tackle this problem, we propose two kinds of attack strategies considering the cost of and , respectively. In most situations, the solutions of the optimized strategies are often greedy and rarely provide performance guarantees. In this paper, under bounded noise assumptions, we provide a kind of optimal attack strategy that is deterministic. The attack performance is further analyzed.

Next, we give some notations that are commonly used in following section. Let the sector detection area of at time period be , and -domain of a position as


where is a user-specified small constant. Regardless of the type of mobile robots, the motion updates of and are denoted as


where is the velocity control input of and is attack input of . With learned model , can predict ’s velocity input, given by


Note the quality of the prediction cannot be guaranteed totally accurate, and in this paper we consider the prediction is associated with certain noises, formulated as


where denotes the motion-state-dependent prediction noise at time

, and is assumed normally distributed as

. And (24) is further used to infer ’s posture of next time, i.e.,


Iv-B Shortest-path Attack

From the perspective of path cost, we define the optimal trajectory in the sense that the trajectory length, from the position where is attacked for the first time (denoted as ) to the preset trap, is shortest. We formulate it as the following -optimal control problem.

Problem 1

Given the initial configuration of 1) the preset trap; 2) the goal position of ; 3) the initial attack position , our goal is to select a horizon and find control inputs for all time instants , that solve the following control problem


where the constraints hold for .

In (27a), stands for the sequence of control inputs applied from to . Essentially, it is to minimize the accumulated uncertainty brought by the learned regression model. The first constraint (27b) requires the attack inputs to be bounded by a constant . The second constraint (27c) guarantees that will move into the trap when the attack stops. The in (27d) is designed to keep a safe distance between and during the attack, avoiding possible collision at next-step movement. And (27e) makes sure always in the detection area during the attack. The last four constraints capture the state dynamics of and .

Ideally, we can directly obtain the ideal trajectory by connecting the position and trap position. However, is almost impossible to be real optimal trajectory, because some constraints of mechanical structure and control dynamics will not allow to be in the desired position. Even so, we are able to utilize as evaluation criteria: the closer the real trajectory is with , the better the attack strategy is. Then comes an interesting problem: how to choose the best to begin attacking, which we call entry point.

Let be an indicative function of a trajectory , satisfying


Then, the definition of entry point is given as follows.

Fig. 4: Illustration of the choice for entry point. To save the page space, we rotate the X’-Y’ coordinate frame clockwise. The X’-Y’ plane is divided into 4 parts using dash line: the area between two blue dash line is , the area between two red dash line is , the area between two gray dash line is , and all other regions is . The entry point of , and is , and , respectively.
Definition 1

(Entry point) Given the initial position , goal position of , and preset trap , denote the trajectory from to without being attacked as . Based on , rebuild a coordinate frame X’-Y’ where the Y’ positive axis are at a angle of to . Divide the X’-Y’ plane into 4 parts using dash line (as shown in Fig. 4). Then, the entry point is defined as follows

  1. If , is the vertical projection of onto , i.e,


    where .

  2. If , there is no exact position of . Under given constraints, the easier to attack, the better.

  3. If , there is no exact position of . Under given constraints, the later to attack, the better.

  4. If , is the projection of along X’ or Y’ direction onto .

During ’s movement towards , if at some time period , holds for the first time, set . Then begins attacking by moving to its initial attack position , which satisfies


Note the second condition in (30) requires the initial attack position and are in opposite positions of .

Since the intentional learning and attack process are separated, i.e., there is no direct feedback between the two parts. And due to the black-box characteristic of the learned model, obtaining a optimally global analytical solution for the problem is intractable. Even if a global optimal solution can be found off-line by using exhaustive search (e.g, depth-first search or breadth-first search), the optimality cannot be guaranteed. This is determined by the regressed outputs with certain noise, and the noises of every moment during the attack will be cumulated without being considered, which makes uncontrollable gap between real and desired attack results. Therefore, we propose a sampling-based approach and find sub-optimal solutions quickly, the complete process is summarized in Algorithm 3.

0:  (i) maximum number of iterations ;     (ii) dynamics, learned obstacle-avoidance mechanism ;     (iii) Termination error bound
0:  Terminal horizon , and attack input vector ;
1:  Initialize , , , and set the attack signal ;
2:  Compute entry point by (29);
3:  Randomly select a feasible position from as waiting position before attacking;
4:   starts to run towards ;
5:  for  to  do
6:     if  then
7:         Sample a feasible subset from ;
8:         for  do
9:            Compute by (26);
10:            Compute the distance from to trap by ;
11:            if  and  then
12:               Update ;
13:            end if
14:         end for
15:         , and select current attack input , (27b) and (27d) hold };
16:     end if
17:     if  and  then
18:          moves into the pre-attack position, ;
19:     else
20:          stays still;
21:     end if
22:     if  then
23:         break;
24:     end if
25:  end for
26:  , and construct attack input vector ;
Algorithm 3 Shortest-path Attack Strategy

Algorithm 3 is composed of three parts: i) First, before comes near to , needs to wait for the best attack time [Line 17-21]; ii) Then, after begins its initial attack, every iteration of following attacks are based on sampling to explore the motion space of both and and select a best attack input from a feasible attack set [Line 6-16]; iii) In the end, when is close enough from , stops and we call the attack is successful. The bottleneck of Algorithm 3 is that the sampling set can be large such that the total computation is time-consuming [Line 8-14]. Also, there is a tedious need to check whether the constraints are satisfied is every iteration [Line 7 and 15].

Iv-C Performance Guarantees

In this section, the existence of optimal solution of Problem 1 is proved, then we provide performance guarantees of the solution by Algorithm 3, with respect to the optimal solution. First, we claim the following statement:

Definition 2

(-Equivalent solution) Given a sufficient small constant , if there are two groups of solutions and of Problem 1, such that is smaller than , then we call the two solutions are -equivalent solutions.

This definition provides a criteria to judge the similarity of two solutions. In the following, if two solutions satisfy the -equivalent, we directly treat them as equal.

Fig. 5: Shortest-path attack against non-holonomic robot with DWA. (a) Illustrations of four special attack positions (b) Attack pattern1: a circular arc plus a line segment. (c) Attack pattern2: two circular arcs plus a line segment.
Theorem 1

For Problem 1, there must exist a choice of such that the associated cost by (27a) is minimal.

Proof 1

In essence, the motion control of mobile robot is in discrete form, therefore we consider formulating the problem using breadth traversal analysis. All the solutions of the problem are represented by a tree, where the initial attack position is the root node and a node in floor denotes the attack inputs at attack iteration .

Based on (27b), multiple attack inputs from at attack step are sampled. Suppose the number of sampling groups is , for every two adjacent node and , the deviation is the same. Then, we define the sub-node set of as , with . Due to the constraint (27d), can not stay unmoved all the time. Besides, at iteration , the distance must be smaller than . As a consequence of the two factors, it is deduced that the length of is finite. Denote the maximum attack depth as and construct a solution as

where . By the constraints (27d) and (27e), many sub-nodes of a node are excluded in Algorithm 3. Therefore, the total number of all feasible solutions, , is far smaller than .

Given a sampling size , let be the best solution among feasible solutions, and it is intuitive to have

By the -equivalence of solutions, when is large enough, the cost of different best solutions (of different ) will not change. Thus, the optimal solution is obtained (not necessarily unique). The proof is completed.

Next, we will investigate how good the solution of Algorithm 3 is. As we mentioned before, for non-holonomic robot, the output is the linear velocity and angular velocity, respectively. Since the relationship between the two velocities and the radius of curvature is deterministic (i.e, ), for simplicity, we use 2-tuple to illustrate the visual effect.

As shown in Fig. 5(a), supposing the trap is at the left side of , denote four extreme positions where could be to make the attack take effect. And the corresponding output is . Note that ’s reaction against of four positions can be sorted as , i,e, is the most threatening position for while the least threatening. Denote and .

Assumption 3

In this section, a trap (in ) whose distance to is less than is considered by Algorithm 3. The reason lies in that there is no need to design specific attack strategies with these traps, and the attacker only need to begin attacking a little earlier before the entry point.

Lemma 1

Suppose the reaction radius is , and consider an attack pattern of where keeps unchanged. Denote the length of ’s trajectory of being attacked, then we have


where .

Proof 2

First, consider that attacks by being at the same relative position with it. This process continues until the orientation of is heading towards the trap, the path length of this process is denoted as . Then moves along with such that goes straightforward to the trap. The path length is of this process is . The pattern is represented by Pattern 1, shown in Fig. 5(b) And the angle between the second line and is