Adversarial samples is an emerging field in Information Forensics and Security, addressing the vulnerabilities of Machine Learning algorithms. This paper casts this topic to the application of Computer Vision, and in particular, image classification. A Deep Neural Network is trained to classify images depending on the type of object represented in the picture. This is for instance the well-known ImageNet challenge encompassing a thousand of classes. The state-of-the-art proposes impressive results as classifiers do a better job than humans with less classification errors and much faster timings. This may deserve the wording ‘Artificial Intelligence’ as a computer now compete with humans on a difficult task.
The literature of adversarial samples reveals that these classifiers are vulnerable to specific image modifications. For a given image, an attacker can craft a perturbation that triggers a wrong classification. This perturbation is often a weak signal barely visible to the human eyes. Almost surely, no human would incorrectly classify these adversarial images. This topic is extremely interesting as it challenges the ‘Artificial Intelligence’ qualification too soon attributed to Deep Learning.
We can find in the literature four types of defenses or counter-attacks to deal with adversarial contents:
To detect: Being barely visible does not mean that the perturbation is not statistically detectable. This defense analyses the image and bypasses the classifier if detected as adversarial .
To reform: The perturbation looks like a random noise that may be filtered out. This defense is usually a front-end projecting the image back to the manifold of natural images .
To robustify: At learning, adversarial images are included in the training set with their original class labels. Adversarial re-training usually robustifies a ‘naturally’ trained network .
To randomize: At testing, the classifier depends on a secret key or an alea. This blocks pure white-box attacks [4, 5].
I-B Connections with Information Hiding
Paper  makes the connection between Adversarial Samples and Information Hiding (be it watermarking or steganography). Both fields modify images (or any other type of media) in the spatial domain so that the content is moved to a targeted region of the feature space. That region is the region associated to a secret message in Information Hiding or to a wrong class in Adversarial Sampling. Indeed, paper  shows that Adversarial Sampling benefits from ideas proven efficient in Watermarking, and vice-versa.
This paper contributes to the same spirit by investigating what both Steganography and Steganalysis bring to Adversarial Sampling.
There are two natural ideas:
Steganalysis is the art of detecting weak perturbations in images. This field is certainly useful for the defender.
Steganography is the art of modifying an image while being non-detectable. This field is certainly useful for the attacker.
These two sides of the same coin allow to mount a defense and to challenge it in return. This paper aims at revealing the status of the game between the attacker and the defender at the time of writing, i.e. when both players use up-to-date tools: state-of-the-art image classifiers with premium steganalyzers, and best-in-class steganography embedders. As far as we know, this paper proposes three first time contributions:
Apply the best steganalyzer SRNET  to detect adversarial images,
Ii State of the art
Ii-a Steganalysis is Versatile
Steganalysis has always been bounded to steganography, obviously. Yet, a recent trend is to resort to this tool for other purposes than detecting whether an image conceals a secret message. For instance, paper  claims the universality of SRM and LBP steganalyzers to detect image processing (like Gaussian blurring, gamma correction) and splicing. The authors of  used this approach during the IEEE IFS-TC image forensics challenge. The same trend holds as well on audio forensics 
. As for camera model identification, the inspiration from steganalysis (co-occurrences, color dependencies, conditional probabilities) is clearly apparent in.
This reveals a certain versatility in steganalysis. It is not surprising since the main goal is to model and detect weak signals. Modern steganalyzers are no longer based on hand-crafted features like SRM . They are no more no less than Deep Neural Networks like XU-Net  or SRNET . The frontier between steganalysis and any two-class image classification problem (such as image manipulation detection) is blurred. Yet, these networks have a specific structure able to focus on weak signal detection. They for example avoid pooling operations in order to preserve high frequency signals, they also need large databases combined with augmentation techniques and curriculum learning to converge .
This general-purpose based on steganalysis method has some drawbacks. It lacks fine-grained tampering localization, which is an issue in forensics . Paper  goes a step further in the cat-and-mouse game with an anti-forensic method: knowing that the defender uses a steganalyzer, the attacker modifies the perturbation (accounting for a median filtering or a contrast enhancement) to become less detectable.
As for adversarial images detection, this method is not new as well. The authors of 
wisely see steganalysis detection as a perfect companion to adversarial re-training. This last mechanism fights well against small perturbation. It however struggles in correctly classifying coarser and more detectable attacks. Unfortunately, this idea is supported with a proof of concept (as acknowledged by the authors): the steganalyzer is rudimentary, the dataset is composed of tiny images (MNIST). On the contrary, the authors of
outline that steganalysis works better on larger images like ImageNet (ILSVRC-2016). They however use a deprecated classifier (VGG-16) with outdated steganalyzers based on hand-crafted features (SPAM and SRM).
Conversely, adversarial samples recently became a source of inspiration for steganography: paper  proposes the concept of steganography with an adversarial embedding fooling a DNN-based steganalyzer.
Ii-B Adversarial Images
This paper focuses on white-box attacks where the attacker knows all implementation details of the classifier.
To make things clearer, the classifier has the following structure: a pre-processing maps an image (with , 3 color channels, lines and columns of pixels) to , with (some networks also use or
). This pre-processing is heuristic, sometimes it just divides the pixel value by
is fed the trained neural network to produce the estimated probabilitiesof being from class . The predicted class is given by:
The classification is correct if , the ground truth label of image .
An untargeted adversarial attack aims at finding the optimal point:
where is usually the Euclidean distance.
Discovering this optimal point is difficult because the space dimension is large. In a white-box scenario, all attacks are sub-optimal iterative processes. They use the gradient of the network function efficiently computed thanks to the back-propagation mechanism to find a solution close to . They are compared in terms of probability of success, average distortion, and complexity (number of gradient computations). This paper considers well-known attacks: FGSM , PGD (Euclidean version) , DDN , and CW  (ranked from low to high complexity).
As outlined in , definition (2) is very common in literature, yet it is incorrect. The goal of the attacker is to create an adversarial image in the pixel domain. Applying the inverse mapping is not solving the issue because this a priori makes non integer pixel values. Rounding to the nearest integer, , is simple but not effective. Some networks are so vulnerable (like ResNet-18) that is a weak signal partially destroyed by rounding. The impact is that, after rounding, is no longer adversarial. Note that DDN is a rare example of a powerful attack natively offering quantized pixel values.
Paper  proposes a post-processing on top of any attack that makes sure is i) an image (integral constraint), ii) remains adversarial, and iii) has a low Euclidean distortion . This paper follows the same approach but adds another constraint: iv) be non-detectable.
Ii-C Steganographic Embeddings
Undetectability is usually tackled by the concept of costs in the steganographic literature: each pixel location of the cover image is assigned a set of costs that reflects the detectability of modifying the -th pixel by quantum. Usually, , , and is increasing. The goal of the steganographer is to embed a message while minimizing the empirical steganographic distortion:
This is practically achieved using Syndrome Trellis Codes . Note that this distortion is additive, which is equivalent to consider that each modification yields to a detectability which is independent from the others.
We propose to use the steganographic distortion (instead of , or norms in adversarial literature) in order to decrease detectability. There are strategies to take into account potential interactions between neighboring modifications. The image can first be decomposed into disjoint lattices to be sequentially embedded. And costs can then be sequentially updated after the embedding of every lattice . This work uses three different families of steganographic costs.
The first one, HILL , is empirical and naive, but has nevertheless been widely used in steganography and is easy to implement. The cost map associated to is computed using 2 low-pass averaging filters et of respective size et and one high pass filter :
The second one, derived from MiPod , assumes that the residual signal is distributed as for the original image, and
for the stego image. The varianceis estimated on each pixel using Wiener filtering and a least square approximation on a basis of cosine functions. The cost is the log likelihood ratio between the two distributions evaluated at 0, i.e.:
Unlike the previous one, this model can handle modifications other than .
The last one is a cost updating strategy favoring coherent modifications between pixels within a spatial or color neighborhood. It is called GINA  is derived from CMD . It splits the color images into 4 disjoint lattices per channel, i.e. 12 lattices. The embedding performs sequentially starting by the green channel lattices. The costs on one lattice is updated according to the modifications done on the previous ones as:
with the average of the modifications already performed in the neighborhood of location .
Iii Steganographic Post-Processing
This section presents the use of steganography in our post-processing mounted on top of any adversarial attack.
Iii-a About Steganography and Adversarial Examples
Paper  stresses a fundamental difference: Steganalysis has two classes, where the class ‘cover’ distribution is given by Nature, whereas the class ‘stego’ distribution is a consequence of designed embedding schemes. On the other hand, a perfect adversarial example and an original image are distributed as by the class or , which are both given by Nature.
We stress another major difference: Steganographic embedding is essentially a stochastic process. Two stego-contents derived from the same cover are different almost surely. This is a mean to encompass the randomness of the messages to be embedded. This is also the reason why steganographic embedders turns the costs into probabilities of modifying the -th pixel by quantum. These probabilities are derived to minimize the detectability under the constraint of an embedding rate given by the source coding theorem:
In contrast, an attack is a deterministic process always giving the same adversarial version of one original image. Adversarial imaging does not need these probabilities.
Iii-B Optimal post-processing
Starting from an original image, we assume that an attack has produced mapped back to . The problem is that , i.e. its pixel values are a priori not quantized. Our post-processing specifically deals with that matter, outputting . We introduce the perturbation after the attack and the perturbation after our post-processing:
The design of amounts to find a good . This is more complex than just rounding perturbation .
We first restrict the range of
. We define the degree of freedomas the number of possible values for each , . This is an even integer greater or equal than 2. The range of is centered around . For instance, when , . In general, the range is given by
Over the whole image, there are possible sequences for .
We now define two quantities depending on . The classifier loss at :
where is the ground truth class of and is the predicted class after the attack. When the attack succeeds, it means that is classified as because so that . Our post-processing cares about maintaining this adversariality. This constrains s.t. .
The second quantity is the detectability. We assume that a black-box algorithm gives the stego-costs for a given original image. The overall detectability of is gauged by (3). In the end, the optimal post-processing minimizes detectability while maintaining adversariality:
Iii-C Our proposal
The complexity for finding the solution of (12) a priori scales as . Two ideas from the adversarial examples literature help reducing this. First, the problem is stated as an Lagrangian formulation as in :
where is the Lagrangian multiplier. This means that we must solve this problem for any and then find the smallest value of s.t. .
Second, the classifier loss is linearized around , i.e. for around : , where . This transforms problem (13) into
The solution is now tractable because the functional is separable: we can solve the problem pixel-wise. The algorithm stores in matrix the costs, and in matrix the values for (10). For a given , it computes and looks for the minimum of each column . In other words, it is as complex as minimum findings, each over values, which scales as .
Note that for , quantizes ‘towards’ to minimize detectability. Indeed, if is admissible ( holds if ), then at .
On top of solving (14), a line search over is required. The linearization of the loss being a crude approximation, we make calls to the network to check that is adversarial: When testing a given value of , is computed to produce that feeds the classifier. If is adversarial then and we test a lower value of (giving more importance to the detectability), otherwise we increase it. We use a binary search with a stopping criterion to control complexity of the post-processing. The search stops when two successive values of are different by less than 1,000. Optimal varies widely between different images. This criterion was empirically set to give both optimal value and short research time.
Iii-D Simplification for quadratic stego-costs
We now assume that the stego-costs obey to the following expression: . This makes the functional of (14) (restricted to the -th pixel) equals to which minimizer is .
Yet, this value a priori does not belong to (10). This is easily solved because a quadratic function is symmetric around its minimum, therefore the minimum over is its value closest to as shown in Fig. 1. The range being nothing more than a set of consecutive integers, we obtain a closed form expression:
where is the rounding to the nearest integer. The post-processing has now a linear complexity.
In this equation, the min and max operate a clipping so that belongs to . This clipping is active if , which happens if with
where if , otherwise. This remark is important because it shows that for any , the solution of (15) remains the same due to clipping. Therefore, we can narrow down the line search of to .
Iv Experimental Investigation
Iv-a Experimental setup
Our experimental work uses 18,000 images from ImageNet of dimension 2242243. This subset is split in 1,000 for testing and comparing, 17,000 for training. An image is attacked only if the classifier predicts its correct label beforehand. This happens with probability equaling the accuracy of the network . We measure the average Euclidean distance of the perturbation and the probability of a successful attack only over correctly labeled images.
We attack the networks with 4 different attacks: FGSM , PGD , CW  and DDN . All of the attacks are run in a best-effort fashion with a complexity limited to 200 iterations. For FGSM and PGD the distortion is gradually increased until the image is adversarial. For more complex CW and DDN, different sets of parameters are used on a total maximum of 200 iterations. The final attacked version is the adversarial image with the smaller distortion. DDN is the only attack that creates integer images. The other 3 are post-processed either by the enhanced quantization , which is our baseline, or by our method explained in Sect. III-C.
The adversarial image detectors are evaluated by the true positive rate when the false positive rate is fixed to .
Iv-B Robustness of recent classifiers: there is free lunch
Our first experiment compares the robustness of the famous ResNet-50 network to the recent classifiers: the natural version of EfficientNet-b0  (Nat) and its robust version trained with AdvProp  (Rob). Note that the authors of  apply adversarial re-training for improving accuracy. As far as we known, the robustness of this version is not yet established.
Table I confirms that modern classifiers are more accurate and more robust (lower and/or bigger
). This is indeed a surprise: It pulls down the myth of ‘No Free Lunch’ in adversarial machine learning literature[33, 34] (The price to pay for robustifying a network is pretendedly a lower accuracy).
Iv-C Detection with a Steganalyzer
We use three steganalyzers to detect adversarial images. Their training set is composed of 15,651 pairs of original and adversarial images. The latter are crafted with best-effort FGSM against natural EfficientNet-b0.
The first detector is trained on SRM feature vectors , with dimensions 34,671. SRM is a model that applies to only one channel. It is computed on the luminance of the image in our experimental work. The classifier used to fit these high-dimensional vectors into two classes is the linear regularized classifier . The second detector is based on the color version of SRM: SCRMQ1  with dimension 18,157. The classifier is the same. The third detector is SRNet 
, one of the best detectors in steganalysis. Training is performed on 180 epochs: The first 100 with a learning rate of, the remaining 80 with . Data augmentation is also performed during training. First, there is a probability of mirroring the pair of images. Then, there is another probability of rotating them 90 degrees.
The attacks: Table II shows that the probabilities of success are similar except for DDN (a larger complexity increases but it is not the aim of this study). Note that PGD and CW whose samples are quantized with  are attacks as reliable as FGSM but with a third of the distortion.
The detectors: Table II gives also the associated to the detectors. Although  achieve good performances with SRM, we were not able to reproduce their results. This could be due either to finer attacks or to the effect of quantization. Our results show that the detectors generalize well: although trained to detect images highly distorted by FGSM, they can detect as well and sometimes even better more subtle attacks like CW. Moreover, SRNet always outperforms SCRMQ1 and delivers an impressive accuracy. Table II shows that PGD+ is the worst-case scenario for defense. The probability of fooling both the classifier EfficientNet-b0 and the detector SRNet combines to only 5.9%.
Iv-D Post-processing with a Steganographic Embedder
We now play the role of the attacker. We use PGD with best effort as the base attack to compare the detectability of four post-processings: The non-steganographic insertion  as a baseline, HILL (4), MiPod (5), and GINA (6). GINA uses the quadratic method explained in Sect. III-D sequentially over the 12 lattices. Quadratic stego-costs are updated with CMD strategy (6). Each lattice contributes to a 1/12 of the initial classification loss.
Distortion increases with each method and along the degree of freedom . Steganographic embedding therefore reduces detectability at the cost of increased distortion. From the attacker perspective, the best-case scenario with PGD is with GINA at d=2 as seen on Table III. This scenario now has 69.9% chance of fooling both the classifier and the detector on EfficientNet-b0. Fig. 2 shows the two examples with highest distortion on EfficientNet-b0 that still fool SRNet. The added distortion remains imperceptible to the human eye even in these cases.
against the Naturel model (Nat) and its robust version (Rob).
This paper explores both sides of adversarial image detection with steganographic glasses.
On the Defense side, we use SRNet , state-of-the-art in steganalysis to detect adversarial images. Training it on images attacked with the basic FGSM shows impressive performance. Detection also generalizes well even on the finest attacks such as PGD  and CW .
On the Attack side, our work on steganographic embedding is able to reduce dramatically the detection rates. The steganographic embedding targets specific regions and pixels of an image to quantize the attack. The distortion increases w.r.t. the original attack but remains imperceptible by the human eye (Fig. 2). The main conclusion is that the field of steganography benefits more to the attacker than to the defender.
Our future works will explore the effect of retraining detectors on adversarial images crafted with steganographic embedding towards an even more universal detector.
-  S. Ma, Y. Liu, G. Tao, W. Lee, and X. Zhang, “NIC: detecting adversarial samples with neural network invariant checking,” in NDSS 2019, San Diego, California, USA., 2019.
-  D. Meng and H. Chen, “Magnet: a two-pronged defense against adversarial examples,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2017, pp. 135–147.
-  A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in ICLR 2018, Vancouver, BC, Canada., 2018.
-  O. Taran, S. Rezaeifar, T. Holotyak, and S. Voloshynovskiy, “Defending against adversarial attacks by randomized diversification,” in IEEE CVPR, Long Beach, USA, June 2019.
-  ——, “Machine learning through cryptographic glasses: combating adversarial attacks by key based diversified aggregation,” in EURASIP Journal on Information Security, January 2020.
-  A. Athalye, N. Carlini, and D. A. Wagner, “Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples,” in ICML, 2018, pp. 274–283.
-  N. Carlini and D. Wagner, “Adversarial examples are not easily detected: Bypassing ten detection methods,” arXiv:1705.07263, 2017.
-  E. Quiring, D. Arp, and K. Rieck, “Forgotten siblings: Unifying attacks on machine learning and digital watermarking,” in IEEE European Symp. on Security and Privacy, 2018.
M. Tan and Q. V. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,”arXiv, 2019.
-  C. Xie, M. Tan, B. Gong, J. Wang, A. Yuille, and Q. V. Le, “Adversarial examples improve image recognition,” arXiv, 2019.
-  M. Boroumand, M. Chen, and J. Fridrich, “Deep residual network for steganalysis of digital images,” IEEE Transactions on Information Forensics and Security, vol. 14, no. 5, pp. 1181–1193, 2018.
-  B. Li, M. Wang, J. Huang, and X. Li, “A new cost function for spatial image steganography,” in Image Processing (ICIP), 2014 IEEE International Conference on. IEEE, 2014, pp. 4206–4210.
-  V. Sedighi, R. Cogranne, and J. Fridrich, “Content-adaptive steganography by minimizing statistical detectability,” Information Forensics and Security, IEEE Transactions on, vol. 11, no. 2, pp. 221–234, 2016.
-  B. Li, M. Wang, X. Li, S. Tan, and J. Huang, “A strategy of clustering modification directions in spatial image steganography,” Information Forensics and Security, IEEE Trans. on, vol. 10, no. 9, 2015.
-  Y. Wang, W. Zhang, W. Li, X. Yu, and N. Yu, “Non-additive cost functions for color image steganography based on inter-channel correlations and differences,” IEEE Trans. on Information Forensics and Security, 2019.
-  X. Qiu, H. Li, W. Luo, and J. Huang, “A universal image forensic strategy based on steganalytic model,” in Proc. of ACM IH&MMSec ’14, New York, NY, USA, 2014, pp. 165–170.
-  S. Farooq, M. H. Yousaf, and F. Hussain, “A generic passive image forgery detection scheme using local binary pattern with rich models,” Computers & Electrical Engineering, vol. 62, pp. 459 – 472, 2017.
-  W. Luo, H. Li, Q. Yan, R. Yang, and J. Huang, “Improved audio steganalytic feature and its applications in audio forensics,” ACM Trans. Multimedia Comput. Commun. Appl., vol. 14, no. 2, Apr. 2018.
-  A. Tuama, F. Comby, and M. Chaumont, “Camera model identification based machine learning approach with high order statistics features,” in EUSIPCO, 2016, pp. 1183–1187.
-  J. Fridrich and J. Kodovsky, “Rich models for steganalysis of digital images,” Information Forensics and Security, IEEE Transactions on, vol. 7, no. 3, pp. 868–882, 2012.
-  G. Xu, H.-Z. Wu, and Y.-Q. Shi, “Structural design of convolutional neural networks for steganalysis,” IEEE Signal Processing Letters, vol. 23, no. 5, pp. 708–712, 2016.
-  Y. Yousfi, J. Butora, J. Fridrich, and Q. Giboulot, “Breaking ALASKA: Color separation for steganalysis in jpeg domain,” in Proc. of ACM IH&MMSec ’19, 2019, pp. 138–149.
-  W. Fan, K. Wang, and F. Cayre, “General-purpose image forensics using patch likelihood under image statistical models,” in IEEE Int.Workshop on Information Forensics and Security (WIFS), 2015, pp. 1–6.
-  Z. Chen, B. Tondi, X. Li, R. Ni, Y. Zhao, and M. Barni, “A gradient-based pixel-domain attack against svm detection of global image manipulations,” in IEEE WIFS, 2017, pp. 1–6.
-  P. Schöttle, A. Schlögl, C. Pasquini, and R. Böhme, “Detecting adversarial examples - a lesson from multimedia security,” in European Signal Processing Conference (EUSIPCO), 2018, pp. 947–951.
-  J. Liu, W. Zhang, Y. Zhang, D. Hou, Y. Liu, H. Zha, and N. Yu, “Detection based defense against adversarial examples from the steganalysis point of view,” in IEEE/CVF CVPR, 2019, pp. 4820–4829.
-  W. Tang, B. Li, S. Tan, M. Barni, and J. Huang, “CNN-based adversarial embedding for image steganography,” IEEE Transactions on Information Forensics and Security, vol. 14, no. 8, pp. 2074–2087, 2019.
-  I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in ICLR 2015, San Diego, CA, USA,, 2015.
-  J. Rony, L. G. Hafemann, L. S. Oliveira, I. B. Ayed, R. Sabourin, and E. Granger, “Decoupling direction and norm for efficient gradient-based l2 adversarial attacks and defenses,” in Proc. of the IEEE CVPR, 2019.
-  N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in IEEE Symp. on Security and Privacy, 2017.
-  B. Bonnet, T. Furon, and P. Bas, “What if adversarial samples were digital images?” in Proc. of ACM IH&MMSec ’20, 2020, pp. 55–66.
-  T. Filler, J. Judas, and J. Fridrich, “Minimizing additive distortion in steganography using syndrome-trellis codes,” IEEE Transactions on Information Forensics and Security, vol. 6, no. 3, pp. 920–935, 2011.
D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry, “Robustness may be at odds with accuracy,” 2018.
-  E. Dohmatob, “Generalized no free lunch theorem for adversarial robustness,” in Proc. of Int. Conf. on Machine Learning, Long Beach, California, USA, 2019.
-  R. Cogranne, V. Sedighi, J. Fridrich, and T. Pevnỳ, “Is ensemble classifier needed for steganalysis in high-dimensional feature spaces?” in Information Forensics and Security (WIFS), 2015 IEEE International Workshop on. IEEE, 2015, pp. 1–6.
-  M. Goljan, J. Fridrich, and R. Cogranne, “Rich model for steganalysis of color images,” 2014 IEEE International Workshop on Information Forensics and Security, WIFS 2014, pp. 185–190, 04 2015.