Adversarial Attacks On Multi-Agent Communication

by   James Tu, et al.

Growing at a very fast pace, modern autonomous systems will soon be deployed at scale, opening up the possibility for cooperative multi-agent systems. By sharing information and distributing workloads, autonomous agents can better perform their tasks and enjoy improved computation efficiency. However, such advantages rely heavily on communication channels which have been shown to be vulnerable to security breaches. Thus, communication can be compromised to execute adversarial attacks on deep learning models which are widely employed in modern systems. In this paper, we explore such adversarial attacks in a novel multi-agent setting where agents communicate by sharing learned intermediate representations. We observe that an indistinguishable adversarial message can severely degrade performance, but becomes weaker as the number of benign agents increase. Furthermore, we show that transfer attacks are more difficult in this setting when compared to directly perturbing the inputs, as it is necessary to align the distribution of communication messages with domain adaptation. Finally, we show that low-budget online attacks can be achieved by exploiting the temporal consistency of streaming sensory inputs.


page 2

page 3

page 5

page 6

page 8

page 14


Adversarial attacks in consensus-based multi-agent reinforcement learning

Recently, many cooperative distributed multi-agent reinforcement learnin...

Mis-spoke or mis-lead: Achieving Robustness in Multi-Agent Communicative Reinforcement Learning

Recent studies in multi-agent communicative reinforcement learning (MACR...

Succinct and Robust Multi-Agent Communication With Temporal Message Control

Recent studies have shown that introducing communication between agents ...

Visualizing Representations of Adversarially Perturbed Inputs

It has been shown that deep learning models are vulnerable to adversaria...

Illusionary Attacks on Sequential Decision Makers and Countermeasures

Autonomous intelligent agents deployed to the real-world need to be robu...

Quantum-Secure Authentication via Abstract Multi-Agent Interaction

Current methods for authentication based on public-key cryptography are ...

Learning to Communicate and Correct Pose Errors

Learned communication makes multi-agent systems more effective by aggreg...

1 Introduction

With rapid improvements of modern autonomous systems, it is only a matter of time until they are deployed at scale, opening up the possibility of cooperative multi-agent systems. By sharing information, individual agents can benefit greatly to better perform their tasks [KonecnyMYRSB16, multidrone]. For example, by aggregating sensory information from multiple viewpoints, a fleet of vehicles can perceive the world more clearly to provide significant safety benefits [v2vnet]. Moreover, in a network of connected devices, distributed processing across multiple agents can improve computation efficiency [MobileCloudInference]. While cooperative multi-agent systems are promising, they rely on communication protocols between agents, introducing new security threats as shared information can be malicious or unreliable [WongS00, borselius2002mobile, NovakRHV03].

Although modern cyber security algorithms can protect against modifying communication messages and compromising autonomous systems, they are not perfect as showcased by modern security breaches where vehicles have been compromised remotely [compromised_bmw, compromised_tesla] and encryption algorithms have been broken [adobe_breach]. In the event of such failures attackers may send malicious messages to sabotage the victim autonomous system.

On the other hand, modern autonomous systems typically rely on deep neural networks known to be vulnerable to adversarial attacks. Such attacks craft small and imperceivable perturbations to drastically change a neural network’s behaviour and induce false outputs 

[szegedy2014intriguing, goodfellow2015explaining, carlini2017towards, madry2017towards]. Even with freedom to send any message, such small constrained perturbations may be the most dangerous as they are indistinguishable from their benign counterparts, making corrupted messages more difficult to detect while still being highly malicious.

Figure 1: Overview of a multi-agent setting with one malicious agent (red). Here the malicious agent attempts to sabotage a victim agent by sending an adversarial message. The adversarial message is indistinguishable from the original, making the attack difficult to detect.

Although adversarial attacks and defenses have been studied extensively, existing approaches have mostly considered attacks on input domains like images [szegedy2014intriguing, goodfellow2015explaining], point clouds [cao2019sensor, advmeshhat], and text [SatoSS018, seq2sick]. On the other hand, multi-agent systems often distribute computation across different devices and transmit intermediate representations instead of input sensory information [v2vnet, MobileCloudInference]. Specifically, when deep learning inference is distributed across different devices, agents will communicate by transmitting feature maps, which are activations of intermediate neural network layers.

In this paper we investigate this novel multi-agent setting for adversarial attacks where perturbations are applied to learned intermediate representations. An illustration is shown in fig:overview. We conduct experiments and showcase vulnerabilities in two highly practical settings: multi-view perception from images in a fleet of drones and multi-view perception from LiDAR in a fleet of self-driving vehicles (SDVs). By leveraging information from multiple viewpoints, these multi-agent systems are able to significantly outperform those that do not exploit communication.

We show, however, that adversarially perturbed transmissions, which are indistinguishable from the original, can severely degrade the performance of the receiver particularly as the ratio of malicious to benign agents increases. With only a single attacker, as the number of benign agents increase, attacks become significantly weaker as aggregating more messages decreases the influence of the adversarial message. When multiple attackers are present, they can coordinate and jointly optimize their perturbations to strengthen the attack. In terms of defense, when the threat model is known, adversarial training is highly effective, as adversarially trained models can defend against perturbations almost perfectly and can slightly enhance performance on natural examples as well. Without knowledge of the threat model, one can still achieve robustness by designing more robust message aggregation modules.

We then move on to more practical attacks in a black box setting where the model is unknown to the adversary. Since query-based black box attacks need to excessively query a target model that is often unaccessible, we focus on query-free transfer attacks that are more feasible in practice. However, transfer attacks are much more difficult to execute at the feature-level than on input domains. In particular, since perturbation domains are model dependent, vanilla transfer attacks are ineffective because two neural networks with the same functionality can have very different intermediate representations. Here, we find that training the surrogate model with domain adaptation is key to align the distribution of intermediate feature maps and achieve some degree of transferability. To further enhance the practicality of attacks, we propose to exploit the temporal consistency of sensory information processed by modern autonomous systems. When frames of sensory information are collected milliseconds apart, we can exploit the redundancy in adjacent frames to create efficient, low-budget attacks in an online manner.

2 Related Work

Figure 2: Attacking object detection proposals: False positives are created by changing the class of background proposals and false negatives are created by changing the class of the original proposals.

Multi-Agent Deep Learning Systems: Multi-agent and distributed systems are widely employed in real-world applications to improve computational efficiently [federated1, DillonWC10, federated2], collaboration [v2vnet, multidrone, MobileCloudInference, rauch2012car2x, rockl2008v2v], and safety [obst2014multi, nakamoto2019bitcoin]. Recently, autonomous systems have improved greatly with the help of neural networks. New directions have opened up in cooperative multi-agent deep learning systems e.g., federated learning [federated1, federated2]. Although multi-agent communication introduces a multitude of benefits, communication channels are vulnerable to security breaches, as communication channels can be attacked [comm_attacks_survey, intrusion_det], encryption algorithms can be broken [adobe_breach], and agents can be compromised [compromised_tesla, compromised_bmw]. Thus, imperfect communication channels may be used to execute adversarial attacks which are especially effective against deep learning systems. However, to the best of our knowledge, few works study adversarial robustness on multi-agent deep learning systems.

Adversarial Attacks: Adversarial attacks were first discovered in the context of image classification [szegedy2014intriguing], where a small imperceivable perturbation can drastically change a neural network’s behaviour and induce false outputs. Such attacks were then extended to various applications such as semantic segmentation [xie2017adversarialdetect]

and reinforcement learning 

[huang2017adversarial]. There are two main settings for adversarial attacks - white box and black box. In a white box setting [szegedy2014intriguing, goodfellow2015explaining, madry2017towards], the attacker has full access to the target neural network weights and adversarial examples can be generated using gradient-based optimization to maximize the network’s error. In contrast, black box attacks are conducted without knowledge of the target neural network weights and therefore without any gradient computation. In this case, if the attacker is able to query the target model, the literature proposes several different strategies to perform query-based attacks  [boundary_attack, ChenZSYH17, biased_sampling, chen2019hopskipjumpattack]. However, query-based attacks are highly impractical as they typically require an extremely high amount of queries and computation. Apart from query-based attacks, a more practical but more challenging alternative are transfer attacks [PapernotMGJCS17, XieZZBWRY19, ChengDPSZ19] which do not require querying the target model. In this setting, the attacker trains a surrogate model that imitates the target model. By doing so, the hope is that perturbations generated for the surrogate model will transfer to the target model. In this work, we explore both white box attacks to demonstrate adversarial vulnerability and consider black box transfer attacks to study vulnerability in the most practical scenario.

Adversarial manipulations on feature space: While most works in the literature focus on input domains like images, some prior works have considered perturbations on intermediate representations within neural networks. Specifically, [JiangMC0J19]estimates the projection of adversarial gradients on a selected subspace to reduce the queries to a target model. [PapernotMSH16, SatoSS018, seq2sick] proposed to generate adversarial perturbation in word embeddings for finding adversarial but semantically-close substitution words. [WuBR17, Zhu2020FreeLB] showed that training on adversarial embeddings could improve the robustness of transformer-based models for NLP tasks.

3 Attacks On Multi-Agent Communication

In this section, we first introduce the multi-agent framework in which agents leverage information from multiple viewpoints by transmitting intermediate feature maps. We then present our method for generating adversarial perturbations in this setting. Moving on to a more practical setting, we consider black box transfer attacks and find that it is necessary to align the distribution of intermediate representations. To this end, we propose training a surrogate model with domain adaptation to create transferable perturbations. Finally, efficient online adversarial attacks can be achieved by exploiting the temporal consistency of sensory inputs collected at high frequency.

3.1 Multi-Agent Communication

We consider a setting where multiple agents cooperate to better perform their task by sharing observations from different viewpoints encoded via a learned intermediate representation. Adopting prior work [v2vnet], we assume a homogeneous set of agents using the same neural network and denote. Then, each agent processes input sensory information to obtain an intermediate representation . The intermediate feature map is then broadcasted to other agents in the scene. Upon receiving messages, agent will aggregate and process all incoming messages to generate output , where N is the number of agents. Suppose that an attacker agent targets a victim agent . Here, the attacker attempts to send an indistinguishable adversarial message to maximize the error in . The perturbation is constrained by , to ensure that the malicious message is subtle and difficult to detect. An overview of the multi-agent setting is shown in fig:overview.

Figure 3: Our proposed transfer attack which incorporates domain adaptation when training the surrogate model. During training, the discriminator forces to produce intermediate representations similar to . As a result, can generate perturbations that transfer to .

In this paper, we specifically focus on object detection as it is a challenging task where aggregating information from multiple viewpoints is particularly helpful. In addition, many downstream robotics tasks depend on detection and thus a strong attack can jeopardize the performance of the full system. In this case, output is a set of bounding box proposals at different spatial locations. Each proposal consists of class scores and bounding box parameters describing the spatial location and dimensions of the bounding box. Here classes are the object classes and denotes the background class where no objects are detected.

When performing detection, models try to output the correct object class and maximize the ratio of intersection over union (IOU) of the proposed and ground truth bounding boxes. In a post processing step, proposals with high confidence are selected and overlapping bounding boxes are filtered with non-maximum suppression(NMS) to ideally produce a single estimate per ground truth object.

3.2 Adversarial Perturbation Generation

We first introduce our loss objective for generating adversarial perturbations against object detection. To generate false outputs, we aim to confuse the proposal class. For detected objects, we suppress the score of the correct class to generate false negatives. For background classes, false positives are created by pushing up the score of an object class. In addition, we also aim to minimize the intersection-over-union (IoU) of the bounding box proposals to further degrade performance by producing poorly localized objects. We define the adversarial loss of the perturbed output with respect to an unperturbed output instead of the ground truth, as it may not always be available to the attacker. For each proposal , let be the highest confidence class. Given the original object proposal and the proposal after perturbation

, our loss function tries to push

away from :


An illustration of the attack objective is shown in fig:det_attack. When and the original prediction is not a background class, we apply an untargetted loss to reduce the likelihood of the intended class. When the intended prediction is the background class , we specifically target a non-background class to generate a false positive. We simply choose to be the class with the highest confidence that is not the background class. The operator denotes the intersection over union of two proposals, is a weighting coefficient, and filter out proposals that are not confident enough.

Following prior work [advmeshhat], we find it necessary to minimize the adversarial loss over all proposals when generating perturbations.. Then, the optimal perturbation under an - bound is


In a white box setting, we minimize this loss across all proposals using projected gradient descent (PGD) [pgd], clipping to be within .

Figure 4: Two multi-agent datasets we use. On the left are images of ShapeNet objects taken from different view points. On the right are LiDAR sweeps by different vehicles in the same scene.

3.3 Transfer Attack

We also consider transfer attacks as they are the most practical when compared to white box and query-based alternatives. White box attacks assume access to the victim model’s weights while query-based optimization is highly inefficient, requiring hundreds of thousands of queries even for simple attacks on CIFAR10 [efficientquerybb]. Instead, when we do not have access to the weights of the victim model , we try to imitate it with a surrogate model such that perturbations generated by the surrogate model can transfer to the target model.

However our experiments show that for perturbations on intermediate features, vanilla transfer attacks do not work out of the box, as two networks with the same functionality do not necessarily have the same intermediate representations. When training and , there is no supervision on the intermediate features . Therefore, even with the same architecture, dataset, and training schedule, a surrogate may produce messages with very different distribution from

. As an example, a permutation of feature channels carries the same information but results in an entirely different distribution. In general, different random seeds, network initializations or non-deterministic GPU backpropagation can result in drastically different intermediate representations. It follows that if

does not faithfully replicate , we cannot expect to imitate .

Thus, to execute transfer attacks, we must have access to samples of the intermediate feature maps. Specifically, we consider a scenario where the attacker can spy on the victim’s communication channel to obtain transmitted messages. However, since sensory information is not transmitted, the attacker does not have access to pairs of input and intermediate representation to directly supervise the surrogate via distillation. Thus, we propose to use Adversarial Discriminative Domain Adaptation (ADDA) [tzeng2017adversarial] to align the distribution of and without explicit input-feature pairs. An overview of this method is shown in fig:adda.

In the original training pipeline, and would be trained to minimize task loss


where is a ground truth bounding box and is its class. To incorporate domain adaptation, we introduce a discriminator to distinguish between real messages and surrogate messages . The three modules , , and can be optimized using the following min-max criterion:


where is a weighting coefficient and we use the following loss


to supervise the discriminator. Here is a message and we have label for real messages from and for surrogate messages from . During training, we adopt spectral normalization [miyato2018spectral] in the discriminator and the two-time update rule [heusel2017gans] for stability.

3.4 Online Attack

In modern applications of autonomous systems, consecutive frames of sensory information are typically collected only milliseconds apart. Thus, there is a large amount of redundancy between consecutive frames which can be exploited to achieve more efficient adversarial attacks. Following previous work [wei19video] in images, we propose to exploit this redundancy by using the perturbation from the previous time step as initialization for the current time step.

Furthermore, we note that intermediate feature maps capture the spatial context of sensory observations, which change due to the agent’s egomotion. Therefore, by applying a rigid transformation on the perturbation at every time step to account for egomotion, we can generate stronger perturbations that are synchronized with the movement of sensory observations relative to the agent. In this case, the perturbations are updated as follows:


Here is a rigid transformation mapping the attacker’s pose at time to . By leveraging temporal consistency we can generate strong perturbations with only one gradient update per time step, making online attacks much more feasible.

4 Experiments

4.1 Multi-Agent Settings

Figure 5: Qualitative attack examples in ShapeNet and V2V. Top: Messages sent by another agent visualized in bird’s eye view. Bottom: outputs. Perturbations are very subtle but severely degrade performance.

Multi-View ShapeNet: We conduct our attacks on multi-view detection from images, which is a common task for a fleets of drones. Following prior work [ricson], we generate a synthetic dataset by placing 10 classes of ShapeNet [shapenet] objects on a table. A visualization is shown in fig:datasets. From each class we subsample 50 meshes and use a 40 / 10 split for training and validation. In every scene, we place 4 to 8 objects and perform collision checking to ensure objects do not overlap. Then, we capture 128x128 RGB-D images from 2 to 7 viewpoints sampled from the upper half of a sphere centered at the table center with a radius of 2.0 units. This dataset consists of 50,000 training scenes and 10,000 validation scenes. When conducting attacks, we randomly sample one of the agents to be the adversary. Our detection model uses an architecture similar to the one introduced in [ricson]. Specifically, we process input RGB-D images using a U-Net [unet] and then unproject the features into 3D using the depth measures. Features from all agents are then warped into the same coordinate frame and aggregated with mean pooling. Finally, aggregated features are processed by a 3D U-Net and a detection header to generate 3D bounding box proposals.

Vehicle To Vehicle Communication: We also conduct experiments in a self-driving setting with vehicle-to-vehicle(V2V) communication. Here, we adopt the dataset used in  [v2vnet], where 3D reconstructions of logs of real world LiDAR scans are simulated from the perspectives of other vehicles in the scene using a high-fidelity LiDAR simulator [lidarsim]. These logs are collected by self-driving vehicles equipped with LiDAR sensors capturing 10 frames per second. A visualization is shown in fig:datasets. The training set consists of 46,796 subsampled frames from the logs and we do not subsample the validation set, resulting in 96,862 frames. In every log we select one attacker vehicle and sample others to be cooperative agents with up to 7 agents in each frame unless otherwise specified. This results in a consistent assignment of attackers and V2V agents throughout the frames. In this setting, we use the state-of-the-art perception and motion forecasting model V2VNet  [v2vnet]. Here, LiDAR inputs are first encoded into bird’s eye view(BEV) feature maps. Feature maps from all agents are then warped into the ego coordinate frame and aggregated with a GNN to produce BEV bounding box proposals. We provide detailed descriptions of the ShapeNet model and V2VNet in the supplementary materials.

Figure 6: Evaluation under no perturbation, uniform noise, transfer attack, and white box attack. Results are grouped by the number of agents in the scene where one agent is the attacker.

Implementation Details: When conducting attacks, we set . For the proposed loss function, we set , and . For projected gradient descent, we use Adam with learning rate , and apply PGD steps for ShapeNet and only PGD step for low budget online attacks in the V2V setting. The surrogate models use the same architecture and dataset as the victim models. When training the surrogate model, we set , model learning rate , and discriminator learning rate . For evaluation, we compute area under the precision-recall curve of bounding boxes, where bounding boxes are correct if they have an IoU greater than 0.7 with a ground truth box of the same class. We refer to this metric as AP at 0.7 in the following.

4.2 Results

Attack Results: Visualizations of our attack are shown in fig:qua and we present quantitative results of our attack and baselines in fig:main_attacks. We split up the evaluation by the number of agents in the scene and one of the agents is always an attacker. As a baseline, we sample the perturbation from to demonstrate that the same bounded uniform noise does not have any impact on detection performance. The white box attack is especially strong when few agents are in the scene, but becomes weaker as the number of benign agents increase, causing the relative weight of the adversarial features in mean pooling layers to decrease. Finally, a transfer attack with domain adaptation achieves moderate success with few agents in the scene, but is significantly weaker than a white box attack.

ShapeNet V2V
Clean Perturbed Clean Perturbed
Original 66.33 0.62 82.19 7.55
Adv Trained 67.29 66.00 82.60 83.44
Table 1: Results of adversarial training. Robustness increases significantly, matching clean inference. Furthermore performance on clean data also improves slightly.

Robustifying Models: To defend against our proposed attack, we conduct adversarial training against the white box adversary and show the results in tab:adv_train. Here, we follow the standard adversarial training set up, except perturbations are applied to intermediate features instead of inputs. This objective can be formulated as


where is the natural training distribution. During training, we generate a new perturbation for each training sample. In the multi-agent setting, we find it much easier to recover from adversarial perturbations when compared to traditional attacks performed on a single input. Moreover, adversarial training is able to slightly improve performance on clean data as well, while adversarial training has been known to hurt natural performance in previous settings [advmixup, tsipras2018robustness].

Clean Perturbed
Agents 2 4 6 2 4 6
Mean Pool 82.09 89.25 92.43 0.90 12.93 41.77
GNN(Mean) 82.19 89.93 92.94 7.55 52.31 76.18
GNN(Median) 82.11 87.12 90.75 12.8 67.70 86.30
GNN(Soft Med) 82.19 89.67 92.49 21.53 61.37 84.99
Table 2: Choice of fusion in V2VNet affects performance and robustness. We investigate using mean pooling and using a GNN with various aggregation methods.

While adversarial training is effective in this setting, it requires knowledge of the threat model. When the threat model is unknown, we can still naturally boost the robustness of multi-agent models with the design of the aggregation module. Specifically, we consider several alternatives to V2VNet’s GNN fusion and present the performance under attacked and clean data in tab:which_aggr. First, replacing the entire GNN with an adaptive mean pooling layer significantly decreases robustness. On the other hand, we swap out the mean pooling in GNN nodes with median pooling and find that it increases robustness at the cost of performance on clean data with more agents, since more information is discarded. We refer readers to the supplementary materials for more details on implementation of the soft median pooling.

Multiple Attackers: We previously focused on settings with one attacker, and now conduct experiments with multiple attackers in the V2V setting. In each case, we also consider if attackers are able to cooperate. In cooperation, attackers jointly optimize their perturbations. Without cooperation, attackers are blind to each other and optimize their perturbations assuming other messages have not been perturbed. Results with up to 3 attackers are shown in tab:multi_atk. As expected, more attackers can increase the strength of attack significantly, furthermore, if multiple agents can coordinate, a stronger attack can be generated.

Figure 7: Visualization of how domain adaptation(DA) affects 4 channels of the intermediate feature map. Observe that the surrogate trained with DA closely imitates the victim model, while the surrogate trained without DA produces different features.
Cooperative Non-Cooperative
Agents 4 5 6 4 5 6
1 Attacker 52.31 65.00 76.18 52.31 65.00 76.18
2 Attacker 28.31 41.34 54.50 39.02 51.96 64.02
3 Attacker 12.07 22.84 35.13 24.27 38.17 51.58
Table 3: White box attack with multiple attackers in the V2V setting. Cooperative attackers jointly optimize their perturbations and non-cooperative attackers optimize without knowledge of each other.

Next, we apply adversarial training to the multi-attacker setting and present results in tab:multi_adv_tr. Here, all attacks are done in the cooperative setting and we show results with 4 total agents. Similar to the single attacker setting, adversarial training is highly effective. However, while adversarial training against one attacker improves performance in natural examples, being robust to stronger attacks sacrifices performance on natural examples. This suggests that adversarial training has the potential to improve general performance when an appropriate threat model is selected. Furthermore, we can see that training on fewer attacks does not generalize perfectly to more attackers but the opposite is true. Thus, it’s necessary to train explicitly against an equal or greater threat model to fully defend against such attacks.

Attackers 0 1 2 3
Train On 0 89.93 52.31 28.31 12.07
Train On 1 90.09 90.00 81.95 75.28
Train On 2 89.71 89.68 88.91 88.33
Train On 3 89.55 89.51 88.94 88.51
Table 4: Adversarial training with multiple attackers in the V2V setting. We train on settings with various number of attackers and evaluate the models across the settings.

Domain Adaptation: More results of the transfer attack are included in tab:xfer. First, we conduct an ablation and show that a transfer attack without domain adaptation (DA) is ineffective. On the other hand, a surrogate trained with DA can achieve some success. A visual demonstration of feature map alignment with DA is shown in fig:feature_vis, visualizing 4 channels of the intermediate feature maps. Features from a surrogate trained with DA is visually very similar to the victim, while surrogate trained without DA does not produce features with much resemblance.

Since our proposed DA improves the transferability of the surrogate model, we can further improve our transfer attack by also adopting methods from the literature which enhance the transferability of a given perturbation. We find that generating perturbations from diversified inputs (DI) [xie2019improving] is ineffective as resizing input feature maps distorts spatial information which is important for localizing objects detection. On the other hand, using an intermediate level attack projection (ILAP) [Huang2019] yields a small improvement. Overall, we find transfer attacks much more difficult at the feature level when. In standard attacks on sensory inputs, perturbations are transferred into the same input domain. However, at a feature level the input domains are model-dependent, making transfer attacks between different models much more difficult.

ShapeNet V2V
Clean 66.28 82.19
Transfer 66.21 81.31
Transfer + DA 42.59 72.45
Transfer + DA + ILAP 35.69 71.76
Transfer + DA + DI 49.38 75.18
Table 5: Transfer attacks evaluated with 2 agents. Training the surrogate with domain adaptation (DA) significantly improves transferability. In addition, we attempt to enhance transferability with ILAP [Huang2019] and DI [xie2019improving].

Online Attacks: We conduct an ablation on the proposed methods for exploiting temporal redundancy in an online V2V setting, shown in tab:online_ablation. First, if we ignore temporal redundancy and do not reuse the previous perturbation, attacks are much weaker. In this evaluation we switch from PGD [pgd] to FGSM [goodfellow2015explaining] to obtain a stronger perturbation in one update for fair comparison. We also show that applying a rigid transformation to match egomotion provides a modest improvement to the attack when compared to the No Warp ablation.

Loss Function Design: We conduct an ablation of our proposed loss against using negative task loss and present results in tab:loss_ablation. This ablation validates our loss function and showcase that for structured outputs, a well designed adversarial loss may be more effective than simply flipping the sign on the task loss. Our choice for the loss function design is motivated by our knowledge of the post-processing non-maximum suppression (NMS). Since NMS selects bounding boxes with the highest confidence in a local region, proposals with higher scores should receive stronger gradients. More specifically, an appropriate loss function of for proposal likelihood should satisfy We can see that the standard log likelihood does not satisfy this criteria, which is why the alternative in our loss formulation is more effective. In addition, we found that adding a focal loss term [focalloss] helped generate false positives, as aggressively focusing on one proposal in a local region is more effective.

2 Agents 4 Agents 6 Agents
Our Attack 7.55 52.31 76.18
No Warping 7.17 52.35 77.37
Independent 56.98 80.21 87.05
Table 6: Ablation on online attacks in the V2V setting. Independent refers to treating each frame independently and not reusing previous perturbations. No warp refers to omitting the rigid transformation to account for egomotion.
2 Agents 4 Agents 6 Agents
ShapeNet Task Loss 6.10 20.07 29.00
Our Loss 0.37 4.45 13.77
V2V Task Loss 20.8 63.82 79.11
Our loss 7.55 52.31 76.18
Table 7: Ablation on our loss function, it produces stronger adversarial attacks than simply using the negative of the training task loss.

5 Conclusion

In this paper, we investigated adversarial attacks on communication in multi-agent deep learning systems. Our experiments in two practical multi-view perception demonstrate that while communication is vulnerable to adversarial attacks, robustness increases as the ratio of benign to malicious actors increases. In a feature-level communication setting, adversarial training is very effective and can defend almost perfectly without sacrificing performance on natural examples. Even against unknown threat models, one can achieve greater robustness with the design of the message aggregation module. Furthermore, we found that more practical transfer attacks are more difficult in this setting and require aligning the distributions of intermediate representations. Finally, we proposed a method to achieve efficient and practical online attacks by exploiting temporal consistency of sensory inputs. By studying adversarial attacks, our work is a step towards safer multi-agent systems.


Appendix A Additional Implementation Details

a.1 V2VNet

LiDAR Feature Extraction

First, raw LiDAR point clouds are preprocessed to filter out points outside the region of interest (ROI), namely for the and coordinates. Point clouds are then voxelized into

density voxels using bilinear interpolation to calculate the weighting of each point in nearby voxels. The 3D voxel volume is then processed as a bird’s eye view image with a 2D CNN which produces an intermediate representation to be shared.

BEV Feature Aggregation

Upon receiving intermediate BEV features from other vehicles, the receiver first warps each image into its own coordinate frame such that messages are spatially aligned and features outside the receivers ROI are discarded. The messages are then fused using a graph neural network (GNN). The GNN performs 3 rounds of message passing where GNN node states are updated with a convolutional gated recurrent unit at each step. After the final iteration, an MLP outputs a post-aggregation BEV representation.

BEV Detection Following aggregation, the BEV image is processed with 4 multi-scale convolutional blocks similar to InceptionNet [inception] to capture different levels of contextual information. Finally, a detection header outputs bounding box proposals which are then processed with non-maximum suppression.

Learning To train the model we use cross entropy loss for proposal classification and smooth L1 loss for bounding box regression. During training, hard negative mining is used to select 20 hard negatives for each sample. We first pretrain the detection model without any fusion and then freeze the weights of the LiDAR feature extraction network to train the network with fusion. Both stages are trained with the same annotations and using Adam [adam] with learning rate 0.001.

a.2 ShapeNet Dataset

Objects are placed onto the same table which is also taken from the ShapeNet dataset and we use the same background with RGB (200, 200, 200). All images in this dataset is rendered using Habitat-sim [habitat] and we use the same lighting set up in every picture, with 5 light sources around the center of the table. When sub sampling 50 meshes from each class, we take the ones with the highest number of vertices to sample high quality meshes. Each agent uses a pinhole camera with a focal length of 1.0 units. Objects are placed on to a 5x3 grid on the table and we designate regularly spaced points for object placement. During placement, we sample one of these locations and apply a uniformly random offset bounded by in each direction and perform collision checking to ensure objects placements are valid.

a.3 ShapeNet Model

Image Feature Extraction: Our ShapeNet detection model closely follows prior work on active vision for drones [ricson]. The model starts with a 2D U-Net [unet] consisting of 4 blocks in the encoder and decoder. Each block consists of two groups of convolution, group normalization [groupnorm]

, and ReLU activation. In the encoder, we down sample inputs using stride 2 convolution in the first convolution of each block. In the decoder, we up sample features using bilinear interpolation before each block. We set the initial number of channels to be 48.

3D Feature Aggregation Following the UNet, we use the pose information of the camera and the depth values of the pixels to unproject the pixel features into 3D voxels. For voxelization we use a grid where height is the second dimension. If multiple pixels map to the same voxel, we apply mean pooling to aggregate the features. After unprojecting into a common 3D coordinate frame, each agent broadcasts and the features are then aggregated with mean pooling.

3D Detection Following aggregation, a 3D U-Net similar is used to process the voxel features. The 3D U-Net is similar to the 2D U-Net with 4 blocks in the encoder and decoder. Finally, a detection header processes the features to generate proposals for each voxel.

Learning To train the model, we use Adam optimizer with learning rate , cross entropy loss for classification, and smooth L1 loss for regression. During training, we employ hard negative mining and mine 10 hard negatives for each sample.

Appendix B Additional Results

Other Fusion Methods Aside from sharing learned intermediate representations, agents can alternatively share raw sensory inputs or predicted outputs. We follow prior implementations [v2vnet] and perform input fusion by directly overlaying LiDAR sweeps from other agents before running inference. For output fusion, we overlay output bounding boxes from other agents and then perform non-maximum suppression to select a single box for an instance.

clean perturbed
agents 2 4 6 2 4 6
feature fusion 82.19 89.93 92.94 7.55 52.31 76.18
input fusion 81.03 88.18 91.28 42.71 69.95 80.29
output fusion 80.32 86.69 89.76 45.27 58.71 64.82
Table 8: performance on clean and perturbed data for input fusion, output fusion, and intermediate feature fusion. attack becomes with weaker with more benign agents in all fusion methods.

We conduct additional experiments of attacks on these fusion methods in tab:fusion and designate one attacker amongst a variable number of agents in the communication network. For input fusion we apply perturbations to each LiDAR point with and for output fusion we perturb bounding box parameters with and also add fake bounding boxes where is the size of the unperturbed bounding box set. We find that the trend of increased robustness with more agents still holds. However, it is difficult to compare across different fusion methods fairly as it is unclear how to set fair constraints for all settings.

Attacker Distance In the V2V setting, agents perceive the world with a limited viewing range. Therefore, an attacker can only influence a victim SDV where their viewing ranges overlap. Thus, we expect stronger attacks when the attacker is closer to the victim and the overlap is maximized. We verify this intuition in fig:atk_dist where we plot the detection performance after attack versus the distance between the attacker and the victim. Observe that attacks become stronger when the attacker is close to the victim.

Cross Inference Another way to showcase the effectiveness of our proposed domain adaptation is to use the surrogate model to process features from the victim model and evaluate the performance, which we call Cross Inference. Specifically, this is evaluating the outputs of and we present results with a single attacker and victim in tab:xinf. Without domain adaptation, the surrogate model is not able to interpret features generated by the victim model to produce accurate detection outputs. However, with domain adaptation, the cross inference results are significantly better.

Figure 8: Attacks become stronger as attacker gets closer to victim.
AP @ 0.7
2 Agents
ShapeNet V2V
Cross Inf Transfer Atk Cross Inf Transfer Atk
Original Model 66.28 0.37 82.19 7.55
Surrogate w/o DA 0.51 66.21 2.47 81.34
Surrogate w/ DA 48.08 42.59 72.02 72.45
Table 9: Domain adaptation (DA) ablation. Cross inference refers to the surrogate model doing inference with the original model’s intermediate feature maps. Note that a transfer attack with the original model is equivalent to a white box attack. Without domain adaptation, the surrogate cannot use the original model’s features for inference and thus cannot produce transferable perturbations.

Qualitative Examples We provide more qualitative examples in the ShapeNet setting in tab:shapenet_qual. The feature volumes are visualized from bird’s eye view. After an imperceivable perturbation to the transmitted feature map, the output detections are severely degraded. For these visualizations, we projected the 3D bounding boxes onto the images. Similarly, examples in the V2V setting are shown in tab:v2v_qual.

Clean Message Clean Output Adversarial Message Adversarial Output
Figure 9: Qualitative examples of perturbing the transmitted feature map to attack bird’s eye view vehicle detection. With imperceivable perturbations on the messages, the detection output can be severely degraded.
Clean Message Clean Output Adversarial Message Adversarial Output
Figure 10: Qualitative examples of perturbing the transmitted feature map to attack 3D object detection on shapenet objects. With imperceivable perturbations on the messages, the detection output can be severely degraded.