Characterizing and Detecting CUDA Program Bugs
While CUDA has become a major parallel computing platform and programming model for general-purpose GPU computing, CUDA-induced bug patterns have not yet been well explored. In this paper, we conduct the first empirical study to reveal important categories of CUDA program bug patterns based on 319 bugs identified within 5 popular CUDA projects in GitHub. Our findings demonstrate that CUDA-specific characteristics may cause program bugs such as synchronization bugs that are rather difficult to detect. To efficiently detect such synchronization bugs, we establish the first lightweight general CUDA bug detection framework, namely Simulee, to simulate CUDA program execution by interpreting the corresponding llvm bytecode and collecting the memory-access information to automatically detect CUDA synchronization bugs. To evaluate the effectiveness and efficiency of Simulee, we conduct a set of experiments and the experimental results suggest that Simulee can detect 20 out of the 27 studied synchronization bugs and successfully detects 26 previously unknown synchronization bugs, 10 of which have been confirmed by the developers.
READ FULL TEXT