Lightning: Striking the Secure Isolation on GPU Clouds with Transient Hardware Faults

12/07/2021
by   Rihui Sun, et al.
0

GPU clouds have become a popular computing platform because of the cost of owning and maintaining high-performance computing clusters. Many cloud architectures have also been proposed to ensure a secure execution environment for guest applications by enforcing strong security policies to isolate the untrusted hypervisor from the guest virtual machines (VMs). In this paper, we study the impact of GPU chip's hardware faults on the security of cloud "trusted" execution environment using Deep Neural Network (DNN) as the underlying application. We show that transient hardware faults of GPUs can be generated by exploiting the Dynamic Voltage and Frequency Scaling (DVFS) technology, and these faults may cause computation errors, but they have limited impact on the inference accuracy of DNN due to the robustness and fault-tolerant nature of well-developed DNN models. To take full advantage of these transient hardware faults, we propose the Lightning attack to locate the fault injection targets of DNNs and to control the fault injection precision in terms of timing and position. We conduct experiments on three commodity GPUs to attack four widely-used DNNs. Experimental results show that the proposed attack can reduce the inference accuracy of the models by as high as 78.3% and 64.5% on average. More importantly, 67.9% of the targeted attacks have successfully misled the models to give our desired incorrect inference result. This demonstrates that the secure isolation on GPU clouds is vulnerable against transient hardware faults and the computation results may not be trusted.

READ FULL TEXT
research
03/02/2021

Representing Gate-Level SET Faults by Multiple SEU Faults at RTL

The advanced complex electronic systems increasingly demand safer and mo...
research
05/20/2021

DeepStrike: Remotely-Guided Fault Injection Attacks on DNN Accelerator in Cloud-FPGA

As Field-programmable gate arrays (FPGAs) are widely adopted in clouds t...
research
10/12/2021

MoRS: An Approximate Fault Modelling Framework for Reduced-Voltage SRAMs

On-chip memory (usually based on Static RAMs-SRAMs) are crucial componen...
research
12/05/2022

Thales: Formulating and Estimating Architectural Vulnerability Factors for DNN Accelerators

As Deep Neural Networks (DNNs) are increasingly deployed in safety criti...
research
06/19/2023

Understanding the Effects of Permanent Faults in GPU's Parallelism Management and Control Units

Graphics Processing Units (GPUs) are over-stressed to accelerate High-Pe...
research
06/13/2021

Single Event Transient Fault Analysis of ELEPHANT cipher

In this paper, we propose a novel fault attack termed as Single Event Tr...
research
10/15/2019

Alleviating Bottlenecks for DNN Execution on GPUs via Opportunistic Computing

Edge computing and IoT applications are severely constrained by limited ...

Please sign up or login with your details

Forgot password? Click here to reset