Detection of Adversarial Attacks and Characterization of Adversarial Subspace

10/26/2019 ∙ by Mohammad Esmaeilpour, et al. ∙ 0

Adversarial attacks have always been a serious threat for any data-driven model. In this paper, we explore subspaces of adversarial examples in unitary vector domain, and we propose a novel detector for defending our models trained for environmental sound classification. We measure chordal distance between legitimate and malicious representation of sounds in unitary space of generalized Schur decomposition and show that their manifolds lie far from each other. Our front-end detector is a regularized logistic regression which discriminates eigenvalues of legitimate and adversarial spectrograms. The experimental results on three benchmarking datasets of environmental sounds represented by spectrograms reveal high detection rate of the proposed detector for eight types of adversarial attacks and outperforms other detection approaches.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In the field of sound and speech processing, it is very common to use 2D representations of audio signals for training data-driven algorithms for both regression and classification tasks. Such 2D representations have lower dimensionality than audio waveforms and they easily fit advanced deep learning architectures mainly developed for computer vision applications. Mel frequency cepstral coefficient (MFCC), short-time Fourier transformation (STFT), discrete wavelet transformation (DWT) are among the most pervasive 2D signal representations which essentially visualize frequency-magnitude distribution of a given reconstructed signal over time. Thus far, the best sound classification accuracy has been achieved for deep learning algorithms trained on 2D signal representations

[1, 2]. However, it has been shown that despite achieving high performance, the approaches based on 2D representations are very vulnerable against adversarial attacks [3]

. Unfortunately, this poses a strict security issue because crafted adversarial examples not only mislead the target model toward a wrong label, but also, they are transferable to other models including conventional algorithms such as support vector machines (SVM)

[3].

There are some discussions about existence, origin, and behavior of adversarial examples, notably their linear characteristics [4], but there is no reliable approach to discriminate their underlying subspace(s) compared to legitimate examples. In an effort to characterize possible adversarial subspaces, some detectors have been introduced. They are mainly based on a statistical comparison on predictions of the victim model. Feinman et al. [5]

have proposed to estimate kernel density (KD) and Bayesian uncertainty (BU) of a trained deep neural network (DNN) for triplets of legitimate, noisy, and adversarial examples. All these measurements have been carried out with the assumption of approximating a DNN to a deep Gaussian process and they result in high ratios of KD and BU for adversarial examples compared to legitimate and noisy samples. Measuring maximum mean discrepancy and energy distance of examples are two other statistical metrics for investigating adversarial manifolds using divergence of model predictions for clusters of datapoints

[6]

. In addition to these output-level statistical measurements, logits of adversariality have been carefully assessed in each subnetwork placed on top of some hidden units of the victim model

[7] as well as measuring instability of potential layers to perturbations [8]. Ma et al. [9] presented a comprehensive study for characterizing adversarial manifolds and introduced local intrinsic dimensionality (LID) score, which measures distance of network prediction for a given example compared to prediction logits of its

neighbours at each hidden unit. The actual detector is a logistic regression binary classifier trained on one class made up of LID vectors of legitimate and noisy examples because they lie in a very close subspace and another class made up of LID vectors of adversarial examples generated by strong attacks. Experimental results on several datasets have shown the competitive performance of the LID detector compared to KD and BU

[9]. Unfortunately, it has been shown that these detectors of adversarial examples might fail to detect strong adversarial attacks in adverse scenarios [10, 11], due to the difficulty in tuning detectors or even due to the particular characteristics of the datasets.

In this paper we show that, adversarial manifolds lie far from legitimate and noisy examples using a unitary space-based chordal distance metric. We also provide an algorithm for proactively detect potential malicious examples using generalized Schur decomposition (a.k.a. QZ decomposition) [12]. This paper is organized as follows. Section 2 presents a brief explanation of unitary space of QZ as well as our adversarial detection algorithm. Experimental results on DWT representation of three environmental sound datasets are discussed in the Section 3. Conclusions and perspectives of future work are presented in the last section.

2 Adversarial Detection

Computing norm metrics is a common approach for measuring the similarity between crafted adversarial examples and their legitimate counterparts. In addition to basic norms such as and , human visual inference oriented metric has been also embedded in general optimization problems [13]

. These similarity constraints are probably the most valuable clues in studying possible subspaces of crafted examples.

It has been shown that, regardless of the category or type of adversarial attack, the generated examples, subject to a similarity constraint, lie in a sub-Cartesian space further than the legitimate ones [9]. However, this is tricky and may not work correctly for strong attacks [11]. Our detailed study of the failure cases of such detectors uncovered imperfection of Cartesian metric space (distance-based) for exploring adversarial subspaces. Therefore, vector spaces that may discriminate between adversarial and legitimate manifolds can be very useful to build robust adversarial example detectors.

We investigate in this paper the mapping of input samples to the vector space of generalized Schur decomposition and the use of chordal distance to identify their underlying subspaces.

2.1 Schur Decomposition and Chordal Distance

For computing generalized Schur decomposition of two spectrograms denoted as and in a complex set there should exist unitary matrices and such that:

(1)

where and are upper (quasi) triangular and denotes the conjugate transpose of . The eigenvalues () of and can be approximated as:

(2)

where and are diagonal elements of and , respectively, and for some zero-valued diagonal entries of and

. In other words, super-resolution similarity between two spectrograms can be calculated as:

(3)

Implied by Bolzano-Weierstrass theorem [12], the bounded basis matrices of support . The unitary subsequence of leads to the following relation:

(4)

which asymptotically implies equivalent to generic Schur decomposition of for nonsingular basis matrices of .

In perturbing the spectrogram where

increases considerably the chance of noticeable variations in the resulting eigenvalues/eigenvectors

[12]. Theoretically, we can measure it using the chordal metric where the pencil of is the point of interest for perturbed by as conditioned in Eq. 5.

(5)

where is a very small perturbation. The chordal distance for the vectors of eigenvalues associated with pencil of can be measured by Eq. 6 [12].

(6)

where pencils are neither necessarily bound to be normalized nor differentiable. For any adversarial attack that perturbs a legitimate spectrogram by , we compute chordal distance as Eq. 6 and we compare the distances obtained to find separable manifolds for legitimate and adversarial examples.

2.2 Adversarial Subspace

We can explore the properties of adversarial examples using chordal distance in unitary space of eigenvectors where each spectrogram is represented by basis functions and . For any legitimate and adversarial spectrograms, the chordal distance between their associated eigenvalues () must satisfy the constraint defined in Eq. 7 [12].

(7)

where and satisfy and for the symmetric in the upper bound of and . The extreme case for the defined pencil may happen when both and are zero. Therefore, we can replace their division with a small random value close to their neighbours.

Not only satisfying Eq. 5 is required for properly computing chordal distance of eigenvalues, but it must also be part of the optimization procedure of any adversarial attack because the perturbation value should not be perceivable. For adversarial perturbations, an adjustment of the chordal distance by a factor is also required (

). The value of such a hyperparameter should be very small and associated to mean eigenvalue, otherwise it might cause ill-conditioning cases. We examine the effects of different pencil perturbations on the chordal distance and inequality of Eq. 

7 from random noisy to carefully optimized adversarials in Section 3.

2.3 Adversarial Discrimination

In practice, detecting adversarial examples using chordal distance for a given test input requires access to its reference spectrogram as well as to the perturbation . However, this is not feasible for real life applications. For rectifying this issue, we propose to compare eigenvalues of legitimate and adversarial examples to draw a decision boundary between them. To this end, we train a logistic regression on the eigenvalues of legitimate and adversarial examples as shown in Algorithm 1.

For every spectrogram pairs randomly picked from an identical class, we compute their associated eigenvalues using QZ decomposition. We assume that spectrograms have been generated for short audio signals and they share significant similarities, especially when they are split into smaller batches.

1:: Class of legitimate samples
2:Detector : test spectrogram
3: , lists
4:for  in  do : legitimate batch
5:      adversarial attack on
     : adversarial batch
6:     
7:     
8:     ,
9:Detector = train a classifier on (, )
Algorithm 1 Discriminating adversarial examples from legitimate ones using their associated eigenvectors.

For a test input spectrogram , its eigenvalues generated by Schur decomposition will be used as arguments for the final front-end classifier (the detector) as relations of these two decomposition have been explained in Section 2.1. Generalizing this algorithm to a multiclass classification problem requires computing eigenvectors of inter-class samples sharing no significant similarity due to causing ill-conditioned decomposition for pencils.

3 Experimental Results

FGSM BIM-a BIM-b JSMA CWA Opt EA LFA
NA NA
Accuracy (%)
NA: Not Applicable.
Table 1: The mean values for justifying chordal distances of adversarial examples, the corresponding mean perturbation and the recognition accuracy of victim models (ConvNet & SVM) on adversarial sets.
ConvNet SVM
Detector FGSM BIM-a BIM-b JSMA CWA Opt EA LFA
KD
BU
KD+BU
LID 94.781 71.239
Proposed 84.132 96.519 95.349 94.375 89.957 75.227
Table 2: Mean class-wise comparison of the AUC (%) achieved by the adversarial detectors for spectrograms attacked with eight adversarial attacks. The best results are highlighted in bold.

In this section, we study the performance of computing chordal distance on adversarial detection and we evaluate the performance of the proposed detector in adverse scenarios on three environmental sound datasets: ESC-10 [14], ESC-50 [14], and UrbanSound8k [15]. The first dataset includes 400 five-second length audio recordings of 10 classes. It is actually a simplified version of ESC-50 which has 2000 samples of 50 classes with the same length. The UrbanSound8k dataset contains 8732 samples (s) of 10 classes and compared to the first two datasets, it provides more sample diversity both in terms of quality and quantity.

We apply pitch-shifting operation as part of 1D signal augmentation as proposed in [2]. This low-level data augmentation increases the chance of learning more discriminant features by the classifier, especially for ESC-10 and ESC-50 compared to UrbanSound8k. Four pitch-shifting scales, namely are applied to each sample in order to add four new samples to the legitimate sets. These hyperparameters are reported to be the most effective scales for the benchmarking datasets [2]. The wavelet mother function which we use for producing DWT spectrogram representations is complex Morlet. Sampling frequencies and frame length are set to 8 kHz and 50 ms for ESC-10 and UrbanSound8k and 16 kHz and 30 ms for ESC-50 with fixed overlapping ratio of 0.5 for all datasets [1]. The convolution of the Morlet function with the signal produces a complex function with considerable overlap between real and imaginary parts. Therefore, for representing real spectrograms we use linear, logarithmic, and logarithmic real visualizations. The first visualization scheme highlights high-frequency magnitudes which denote high variation areas. Low-frequency information has been characterized by a logarithmic operation which expands their distances. Energy of the signal, which is associated with the signal’s mean, has been obtained by applying a logarithmic filter on the real part.

Since the frequency-magnitude of a signal distributed over time has variational dimensions, none of the three mentioned visualizations produce square spectrograms. Hence, we bilinearly interpolate each spectrogram to fit square size with respect to this constraint of QZ decomposition. The actual size of the spectrograms for ESC-10 and ESC-50 is 1536

768 and 1168864 for UrbanSound8k because the latter has shorter audio recordings of at least one second. Final size of spectrogram after downsampling and interpolation is 768768. This lossy operation may remove some pivotal frequency information and consequently it may decrease the performance of the classifier. However, obtaining the highest recognition accuracy is not our point of interest in this paper, but studying adversarial subspaces.

For the choice of the victim classifier, we use an SVM and a convolutional neural network (ConvNet) to compare detection rate of the proposed detector for variety of adversarial attacks. In SVM configuration, we use scikit-learn

[16] with a grid search. Linear, polynomial, and RBF kernels have been tested on the 2/3 of the shuffled datasets (training and development). The best recognition accuracy on the test set was achieved with the RBF kernel with about 72.056%, 71.257%, 72.362% for ESC-10, ESC-50 and UrbanSound8k datasets, respectively. The proposed ConvNet has four convolutional layers with receptive field 3

3, stride 1

1, and 128, 256, 512, and 128 filters, respectively. On top of the last convolution layer there are two fully connected layers of sizes 256 and 128. All layers use ReLU activation function, except the output layer for which softmax is used. Batch and weight normalization have been applied at all convolutional layers. Such a ConvNet can achieve recognition performance of 73.415%, 73.674%, and 75.376% for ESC-10, ESC-50, and UrbanSound8k datasets respectively on the 1/3 test set.

We attack the ConvNet by fast gradient sign method (FGSM) [4], basic iterative methods (BIM-a and BIM-b) [17], Jacobian-based salience map attack (JSMA) [18], optimization-based attack (Opt) [19], and Carlini & Wagner attack (CWA) [20]. For the SVM model, we use label flipping attack (LFA) [21] and evasion attack (EA) [22]. Overall, for each legitimate DWT spectrogram (), eight adversarial examples are crafted ( for ). For each pencil of , we measure their chordal distances using Eq. 6, then for a random unit 2-norm and matrices, we check for the inequality as stated in Eq. 7 and required adjustments. Similarly, we add random Gaussian noise to each with zero mean and and build pencil of where for denote the noisy spectrograms which also satisfy Eq. 5. Table 1 summarizes the adjustment of required for crafted adversarial examples to satisfy Eq. 7. For the generated noisy samples, an adjustment of to is needed, which is averaged over different values of . Considerable displacement between chordal distance adjustments required for adversarial and noisy spectrogram sets denote their non-identical and dissimilar subspaces.

For testing the performance of the Algorithm 1 in discriminating adversarial from legitimate examples, we use all the attacks mentioned above for crafting . Regularized logistic regression has been used as the front-end classifier for discriminating from . We compare the performance of the proposed detector versus LID, KD, BU, and the combination KD+BU. Table 2 shows that the proposed detector outperforms other detectors for the majority of the attacks. The proposed detector can be used for MFCC and STFT representations of sounds or even other datasets commonly used for computer vision applications. The key challenge in this detector is its sensitivity to intra-class sample similarities, otherwise it may not satisfy Eq.7, especially for black-box multiclass discrimination.

4 Conclusion

Since adversarial examples are visually very similar to the legitimate samples, differentiating their underlying subspaces is very challenging in metric space of Cartesian. In this paper we showed that offset between subspace of legitimate spectrograms compared to their associated adversarial examples can be measured by a metric called chordal distance defined in unitary vector space of generalized Schur decomposition. Using this metric, we demonstrated that manifold of adversarial examples lie far from legitimates and noisy samples which have been slightly perturbed by Gaussian filters.

In order to detect any adversarial attacks when there is no access neither to reference spectrogram nor adversarial perturbation, we proposed a detector which is a regularized logistic regression model for discriminating eigenvalues of a malicious spectrogram from legitimates. Experimental results on three benchmarking environmental sound datasets showed our proposed detector outperforms other detectors for six out of eight different adversarial attacks. For future studies, we would like to improve chordal distance to better characterize adversarial manifolds and also study possibility of encoding this metric directly into the adversarial detector.

References