Machine Learning (ML) is used to tackle various types of complex problems. For instance, ML can predict events based on the patterns and features they learnt from previous examples (the training set) and to classify given inputs in specific categories (classes) learnt during training.
Although ML and, in particular, Deep Neural Networks (DNNs) have provided impressive results, they are vulnerable to the so-called adversarial examples. Adversarial examples are inputs that fool ML, i.e., cause misclassifications. The interesting thing with adversarial examples is that the misclassifications are triggered by a systematic procedure. This procedure is tailored to alter the input data (images in particular) in such a way that is not noticeable by human eyes.
The elusive property of adversarial data has always been perceived by the research community as a major weakness that should be avoided or mitigated. While this is important for specific application domains, we argue that such a property can be useful, e.g., to support watermaking and steganography applications. Therefore, our paper takes a different perspective on adversarial attacks by showing that the main properties of targeted attacks (invisibility, non-transferability, resilience and adaptation against input tampering) can be used to form strong watermaking and steganography techniques, which we call Adversarial Embedding.
Digital watermarking and steganography are methods aiming at hiding secret information in images (or other digital media), by slightly altering original images (named the cover images) to embed the information within them. Watermarking techniques require the embedding to be robust (i.e. resilient to malicious tampering) while steganography focuses on maximizing the amount of embedded data (that is, achieve high-density embedding). Of course, undetectability and un-recoverability by a non-authorized third party are of foremost importance in both cases. Thus, the image embedding the secret information (named the stego image) should not be detected by steganalysis techniques.
Our objective is to pave the way for a new generation of watermarking and steganography techniques relying on adversarial ML. We believe that
Existing steganography and watermarking techniques either are easily detected, manage to embed limited amount of information, or are easily recoverable . This means that there are multiple dimensions on which the techniques need to succeed. Notably, previous research achieves relatively good results in a single dimension but not all. In contrast, our technique dominates on multiple dimensions together. For instance, we can embed much more information without risking them to be recovered by a third party. We can also outperform the performance of existing techniques by considering single dimensions alone. All in all, our paper offers a novel and effective watermarking and steganography technique.
Applying our technique requires a DNN classification model with multiple output categories and a targeted adversarial attack. In the case of steganography, we assume that the model has been ’safely’ shared between the people that exchange messages. In the case of watermarking, the person who embeds and extracts the messages is the same and thus there is no need for such an assumption.
The classification model is used to extract the hidden messages by mapping the output classes to bits. The adversarial attack (non-transferable targeted attack on the shared model) is used to embed information by creating adversarial images (the stego images) from original images (the cover images) in a way that the model is forced to classify the adversarial images in the desired classes (corresponding to the message to encode). Since the attack is non-transferable, only the shared model can identify the embedding and give the sought outputs.
The contributions made by this paper can be summarised in the following points:
We propose a pipeline that uses adversarial attacks to embed information in images. It can be used both for image watermarking and steganography and is founded on the fast-pacing research on adversarial attacks. We call this approach Adversarial Embedding. Our pipeline relies on a targeted adversarial attack called Sorted Targeted Adversarial Attack (SATA) that we specifically develop for this work. Our adversarial attack increases the amount of possible data that can be hidden by using multi-class embedding. SATA is capable of embedding seven times more data than existing adversarial attacks with small models (with 10 output classes like the Cifar-10 dataset), and orders of magnitude more with bigger models (with 100 output classes for example).
The steganography literature contains few approaches that use Deep Learning to embed data. Baluja proposed a system with 2 deep neural networks, an encoder and a decoder
. Zhu et al. expanded that idea to a system with 3 convolutional neural network to play the Encoder/Decoder/Adversary. Volkhonskiy et al. on the other hand proposed to use a custom Generative Adversarial Network  to embed information.
The main advantage of our pipeline over the previous techniques relying on deep learning is that it does not require a specific model to be designed and trained for this task. It can indeed be used with any image classification model and adversarial attack to enforce higher security.
We demonstrate that our pipeline has competitive steganography properties. The fact that SATA (the new attack we propose) builds upon a state-of-the-art adversarial attack algorithm allows us to generate minimal perturbation on the cover images. This places our approach among the steganography techniques with the least footprint.
We also show that Adversarial Embedding with SATA can achieve almost 2 times the density of the densest steganography technique , with an embedding density up to 10 bits per pixel.
We analyze the resilience of our system to tampering and show that, because our system allows the freedom to choose any image classification model in the pipeline, we can find a combination of classification models and adversarial attacks resilient to image tampering with more than 90% recovery rate under perturbation.
Finally, we assess the secrecy of our system and demonstrate that available steganalysis tools poorly perform in detecting or recovering data hidden with adversarial embedding.
Ii Background and Related Work
Steganography is the process of hiding important information in a trivial medium. It can be used to transmit a message between a sender A and a receiver B in a way that a malicious third party M cannot detect that the medium contains a hidden message and, in the case M still detects the message, M should not be able to extract it from the medium (Fig 1).
The term Steganographia was introduced at the end of the 15th century by Trithemius, who hid relevant messages in his books. Since then, steganography expanded and has been used in various media, such as images  and audio .
Steganography techniques use either the spatial domain or the frequency domain to hide information.
When operating on the spatial domain, the steganography algorithm changes adaptively some pixels on the image to embed data. Basic techniques to embed messages in the spatial domain include LSB (Least Significant Bit)  and PVD (Pixel-value Differencing) . Among last generation spatial steganography techniques, the most popular are:
HUGO - Highly Undetectable steGO:
HUGO was designed to hide 7 times longer message than LSB matching with the same level of detectability.
WOW - Wavelet Obtained Weights:
WOW uses syndrome-trellis codes to minimize the expected distortion for a given payload.
HILL  -High-pass, Low-pass, and Low-pass:
HILL proposed a new cost-function that can be used to improve existing steganography. It uses a high-pass filter to identify the area of the image that would be best to target (less predictable parts).
S-UNIWARD - Spatial UNIversal WAvelet Relative Distortion:
UNIWARD is another cost function, it uses the sum of relative changes between the stego-images and cover images. It has the same detectability properties as WOW but is faster to compute and optimize and can also be used in the frequency domain.
Frequency domain steganography, on the other hand, relies on frequency distortions 
to generate the perturbation. For instance during the JPEG conversion of the picture. Such distortions include Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT) and Singular Value Decomposition(SVD). In a nutshell, the frequency domain attacks change some coefficients of the distortions in a way that can only be detected and decoded by the recipient.
The technique we propose, adversarial embedding uses images as media. Its novelty lies in the use of adversarial attack algorithms that can embed the sought messages in the form of classification results (of adversarial examples) of a given ML model. The information is not embedded in the pixels themselves but in the latent representation of the image formed by the classification (ML) model that processes the image. Some of these latent representations can be tailored using adversarial attacks and be extracted in the shape of classification classes.
Digital Watermarking aims at hiding messages inside a medium. Unlike Steganography, however, the recipient is not supposed to extract the message. That is, when someone, say A, watermarks a medium and shares it with a receiver, say B, it is expected that neither B nor any third party M can detect the watermark and decode it (only A should be able to do that). Watermarking has multiple applications like copyright protection or tracking and authenticating the sources of the mediums.
Thus, while digital watermarking and steganography use the same embedding techniques, they favour different qualities of the embedding. For instance, steganography aims to maximize the quantity of data that can be embedded within the medium. Watermarking focuses more on the integrity of the embedded message, in particular when the medium is subject to perturbations during its life cycle.
Steganalysis is the field opposite to steganography. It is dedicated to the detection of messages hidden using steganography techniques. Traditionally, steganalysis had to manual extract the features relevant to the detection of every Steganography technique.
Steganalysis techniques use various approaches to detect hidden messages. Hereafter some of the most notable approaches when facing spatial-domain steganography:
Visual Steganalysis: It analyzes the pixel values and their distribution. It is sufficient to detect basic inconsistencies, for instance, unbalanced distribution of zeroes and ones that indicates LSB steganography. Given the original cover images, visual steganalysis techniques can identify the difference between the two and assess whether this noise is an artefact or if it is meaningful.
Signature Steganalysis: Many steganography solutions append remarkable patterns at the end of the embedded message. For instance, Hiderman steganography software adds CDN at the end while Masker, another tool dedicates, the last 77 bytes of the stego-image for its signature. A steganalysis tool would scan the files for such signatures.
Statistical steganalysis: Detectors of this family focus on some statistics that are commonly modified as a result of the embedding process. SPA (Sample Pair Analysis) method, RS (Regular and Singular groups) method and DIH (Difference Image Histogram) method are among the most popular statistical steganalysis techniques.
Deep Learning steganalysis: Deep learning models are becoming more popular as steganalysis approach. They can be trained to learn the features of existing steganography approaches and detecting them with high rates. They require however the interception of a large amount of cover and stego-images and the ability to label these images .
Ii-D Adversarial examples
Adversarial examples result from applying intentional small perturbation to original inputs to alter the prediction of an ML model. In classification tasks, the effect of the perturbation goes from reducing the confidence of the model to making it misclassify the adversarial examples. Seminal papers on adversarial examples [4, 34, 14] consider adversarial examples as a security threat for ML models and provide algorithms (commonly named “adversarial attacks”) to produce such examples.
Since these findings, researchers have played a cat-and-mouse game. On the one hand, they design defence mechanisms to make the ML model robust against adversarial attacks (avoiding misclassifications), such as distillation , adversarial training , generative adversarial networks  etc. On the other hand, they elaborate stronger attack algorithms to circumvent these defences (e.g., PGD , CW ).
One can categorise the adversarial attack algorithms in three general categories, black-box, grey-box and white-box. Black-box algorithms assume no prior knowledge about the model, its training set and defence mechanisms. Grey-box ones hold partial knowledge about the model and the training set but have no information regarding the defence mechanisms. Finally, white-box algorithms have full knowledge about the model, its training set and defence mechanisms.
The literature related to applications of adversarial examples is scarcer and mainly rely on the ability of those examples to fool ML-based systems.focuses on their ability to fool ML-based systems used for, e.g., image recognition , malware detection  or porn filtering . In this work, we rather consider adversarial examples as a mean of embedding and hiding secret messages.
Our embedding approach relies on 3 attributes of adversarial examples:
Universal existence of adversarial examples: Given sufficient amount of noise , we can always craft an adversarial image for a model , starting from any cover image to be classified into target class .
Non-transferability of targeted adversarial examples: A targeted adversarial image crafted for a model to be classified as
will have a low probability to be classified asif we use any model to decode it.
Resilience to tampering of adversarial examples: A targeted adversarial image crafted for a model to be classified as will still be classified as by if the image has suffered low to average tampering and perturbation.
These attributes have been studied by previous research on adversarial examples [29, 26, 6, 2] and we will show in the following sections how they make our Adversarial Embedding approach among the best watermarking and steganography techniques.
Another active field of research about adversarial examples is the detection of these perturbations. It can be used as a preemptive defence by flagging suspicious inputs to be handled by a specific pipeline (human intervention, complex analysis….). In 2017, Carlini et al. surveyed 10 popular detection techniques 
and demonstrated that they can all be defeated by adjusting the attack algorithm with a custom loss function. Their work showed that the properties that researchers believed were inherent to the adversarial attack process, and were used to detect them are not a fatality and can be bypassed by simple optimizations in the attack algorithm.
Since his work, new detection mechanisms have been proposed, borrowing techniques from statistical testing , information theory  and even steganography detection . However, no technique has demonstrated its ability to apply to every model, dataset and attack available in the literature without being easy to bypass by the optimization of existing attack algorithms.
Iii Adversarial Embedding
Our objective is to provide an integrated pipeline for image watermarking and steganography that relies on the known properties of adversarial examples. We name our approach adversarial embedding. Its principle is to encode a message as a sequence of adversarial images. Each of those images results from altering given original images (cover images) to force a given -class ML model to classify them into a targeted class. To decode the message, the recipient has to use the same model to classify the adversarial images to retrieve their class numbers, which together form the decoded message.
Iii-a Inputs and Parameters
More precisely, our pipeline comprises the following components:
An image classification dataset : The dataset defines the number of classes (of the related classification problem) which, in turn, determines the size of the alphabet used for the encoding. We can provide our system with any number of images. Still, adversarial attack algorithms may fail to work on some images. The more images in the dataset, the more we increase the size and diversity of the space of the adversarial images that can be used for the encoding.
A pair of encoder/decoder /: Having defined through the choice of the dataset, we have to transform a (e.g.) binary secret message in Base . Conversely, at the decoding stage, we transform back the retrieved Base- message into the original binary message. We use such encoder/decoder pair in a black-box way and make no assumption regarding their properties. For instance, the encoding can be improved to support redundancy or sanity check techniques but this is not a stringent requirement for our approach to work.
A classification model : Although our method is compatible with any -class classification model that can deal with the chosen dataset, we focus more particularly on Deep Neural Networks (DNNs) as those are arguably the established solution for image classification.
denotes the set of parameters that define the model. In DNN classifiers, the parameters include the architecture (number, types and inner parameters of the layers), the weights of the layers learned during training, hyperparameters, etc. The model acts as a secret key for both encoding and decoding. Our approach assumes that the model can be transmitted from the sender to the intended recipient of the message without being intercepted or altered (e.g. through a secure physical device). The choice of the model also impacts the success rate of the adversarial attack algorithms and, thus, the images that can be used for the encoding.
An adversarial attack algorithm : We can use any targeted adversarial attack algorithm that can force a model to classify a given image in a chosen class. The hyperparameters of the attack, noted , include the maximum amount of perturbation allowed on images as well as attack-specific parameters. The choice of the attack algorithm and its parameters impacts the success rate of the attack and the detectability of the perturbation. In this work, we focus more particularly on the Projected Gradient Descent (PGD)  attack because we found it provides a good balance between success rate and perturbation minimization. Moreover, PGD randomly selects starting points around the original images, which makes the embedding non-deterministic (i.e. the same message and the same original image can lead to different steganography images) and, thus, harder to detect (see Section VI-B). Finally, PGD is known to have a low transferability from one model to another , which increases the resilience of our approach to illegitimate decoding (see Section VI-C).
Iii-B Embedding Pipeline
Figure 2 illustrates an instantiation of our approach to embed binary messages of length into an image dataset with classes. First, we use to encode into base 10, resulting in a new message of length . is also the number of adversarial images needed to encode the message. In the second step, we apply the adversarial attack to insert targeted adversarial perturbations into original images (picked with or without replacement), resulting into adversarial images such that classifies into class . These adversarial images form the sequence of steganography images that are sent to the recipient throughout a (potentially unsecure) channel. While a malicious third party can intercept the sent images, we assume that either the channel is reliable enough to preserve the image ordering or that some consistency mechanism allows to retrieve this ordering reliably.
Iii-C Decoding Pipeline
Once the recipient receives the adversarial images, he can input them sequentially into the classification (which was previously transmitted to her securely) to retrieve their associated class. The resulting class numbers form back the message (in Base 10), which can then go through the decoder to retrieve the original binary message .
External disruptions may alter the images, due to the natural process of the carrier (e.g. websites and mobile applications often show only compressed versions of the images) or malicious intent from a third party. For instance, copyright enforcement is a popular application of watermarking. To circumvent this protection while permitting the illegal use of the protected material, malicious people can degrade the mediums with local transformations (e.g. rotation, cropping …). In such cases, the recipient needs to correctly classify altered images resulting from applying the aforementioned transformations to the adversarial images. It is desirable for adversarial embedding to be resilient to such transformation, such that the classification of remains preserved in spite of the transformations.
Iv Sorted Adversarial Targeted Attack (SATA)
A drawback of adversarial embedding as described previously is that it can only encode bits per adversarial image (where is the number of classes). In the particular case of Cifar-10 and its 32x32 images and 3 channels, this yields a density of 9.77e-4 bits per pixel (BPP). By contrast, alternative solutions achieve a BPP between 0.1 and 0.2 . Comparing embedding density is not always fair, though, as the density of adversarial embedding depends on the number of classes and not the number of pixels.111 Established benchmark datasets like MNIST, Cifar-10 and ImageNet exhibit some correlation between their number of classes and image resolution. However, others like NIH Chest X-Ray have large images categorized in few classes.
Established benchmark datasets like MNIST, Cifar-10 and ImageNet exhibit some correlation between their number of classes and image resolution. However, others like NIH Chest X-Ray have large images categorized in few classes.Thus, its BPP density depends exclusively on the used dataset.
Nevertheless, this limited density originates from the fact that existing adversarial attack algorithms maximize the classification probability of one class, without considering the other classes. This restricts the embedding to use only one value per image, the most probable class value.
To improve density without changing dataset, we propose a new targeted attack that forces the top- classes suggested by the model for a given image. Thus, the encoding of a message chunk is not only the top class suggested by the model but the top- ones. We name this type of adversarial attack algorithm Sorted Adversarial Targeted Attack (SATA) and we instantiate its principle to extend the PGD algorithm.
Existing targeted adversarial attack algorithms that rely on gradient back-propagation measure the gradient to the target class. This gradient is iteratively used to drive the perturbation added to the input image until the image is misclassified as expected. In SATA, we consider all classes and measure the gradient toward each class. Then, we add a perturbation to the input with a weighted combination of these gradients. The weight is 1 for the top-1 class we want to force, and it decreases gradually for each next class. The more classes we force, the harder it is to ensure that the classes are sufficiently distinguishable and the harder it is to build an appropriate perturbation.
Apart from that, the remaining parts of our SATA algorithm are similar to PGD: we apply small perturbation steps iteratively until we reach a maximum perturbation value (based on L2 distance) and we repeat the whole process multiple times to randomly explore the space around the original input.
Algorithm 1 describes our SATA algorithm. It calls several internal procedures. SplitIntoChunks (Line 1) takes as input the full message and splits it into chunks, each of which contains up to different Base- digits. Each of these numbers represents a class that we want to force into the top (the -th digit corresponding to the -th top class). The digits must be different because a class cannot occur in two ranks at the same time. This implies that a chunk cannot contain two identical digits. To avoid this, we cut chunks as soon as a digit occurs in it for the second time or when reaching digits. Then, the number of chunks (into which the full message was decomposed) corresponds to the number of images required to encode the message. For instance, splitting 29234652 into chunks of maximum size using a dataset of classes leads to two chunks . This means that encoding this message requires two adversarial images. The model should classify the first image into 29 as the most probable class, 23 as the second most and 46 as the third most. If we set instead, we can encode the message using one adversarial image, such that the model predicts 52 as the fourth most probable class.
Next, the buildWeightedLogits procedure (Line 2) computes, for each chunk, the weights applied to the gradient of each class when building the perturbation. We start with and decrease the subsequent weights following an exponential law: . is a hyperparameter of SATA that has to be set empirically.
Having our chunks and weights, we enter a loop where we perform trials to generate a successful perturbation (Lines 5–16). At each trial, randomPick procedure (Line 6) randomly picks, from the whole set of original images, as many images as the number of chunks that must be encoded.
The procedure computeAdv (Line 9) computes the weighted gradients with all the target classes and use them to drive the adversarial images with a small perturbation step without exceeding the maximum amount of perturbation allowed.
This procedure is elaborated in Algorithm 2. It uses similar sub-procedures as PGD:
RandomSphere(x, epsilon): builds a matrix of the same shape as ’x’ and a radius epsilon centered around 0.
Project(v,eps): projects the values in ’v’ on the norm ball of size ’eps’.
LossGradient(x,y): measures the loss of the classification model on every input x to its associated classes y.
Finally, the computeSuccess procedure (Line 10) checks if the most probable classes are correctly predicted. Any chunk that has one or more classes disordered is considered as a failure (returns 0). If a message requires multiple chunks to be embedded (i.e. multiple cover images), computeSuccess returns the average success rate over all chunks.
Iv-B Embedding Capacity and Density
For a dataset with class, the worst-case embedding capacity (in bits) of the standard adversarial embedding (using PGD to encode one digit per image) is . Thus, with Cifar-10, each image can encode 3 bits. With SATA, each image encodes classes.
Let us assume . In this case, each image can encode one of the 90 couples of different 0–9 digits. The couples with the same two digits cannot be encoded since a class cannot be the first and second most probable class at the same time. Thus, in of the cases, the capacity of an image becomes 6 bits. However, when two successive numbers to encode are identical, we have to use two images (each of which has, therefore, a capacity of 3 bits). On average, the embedding capacity of SATA with and a 10-class dataset is, thus, bits. Given that Cifar-10 images contain 32x32 pixels and 3 channels, this yields an embedding density of BPP.
We extend this study empirically to classifiers with 100 classes, 1,000 classses and 10,000 classes in Section VI-E1.
Further raising can increase significantly the capacity (although the marginal increase lowers as goes higher). However, as the adversarial attack has to force the model to rank more classes, its success rate decreases. We investigate this trade-off in our empirical evaluation.
V Evaluation Setup
Our evaluation aims at determining whether adversarial embedding is a viable technique for steganography and watermarking. For such techniques to work, a first critical requirement is that a third-party should not detect the embedding in the images. Thus, we focus on the ability of adversarial embedding to avoid detection, either by manual or automated inspection. For the first case (manual inspection) we want to ensure that humans cannot identify the adversarial embedding. To validate we check whether the images with the embedding can be distinguished from the original ones. If not, then we can be sure that our embedding can pass unnoticed. In steganography, this is important for the confidentiality of the message; in watermarking, this ensures that the watermark does not alter the perception of the image by the end-user. Thus, we ask:
Does adversarial embedding produce elusive (wrt human perception) images?
As for automated methods, steganalysis is the field of research that studies the detection of hidden messages and watermarks implemented using steganography techniques. Having intercepted a load of exchanged data, steganalysis methods analyze the data and conclude whether or not they embed hidden messages. We confront adversarial embedding to those methods and ask:
Can adversarial embedding remain undetected by state-of-the-art steganalysis methods?
A second security requirement is that, in the case where a third-party detected an embedded message in the cover, it is unable to extract it. In our method, the classification model from which the adversarial embedding is crafted is the key required to decode the message. If we assume that the third-party has no access to this model, the only way to decode is by building a surrogate model and classify the adversarial examples exactly as the key model does. Thus, we ask:
Can adversarial embedding be extracted by different models?
After studying the confidentiality of the embedded information, we turn our attention towards its integrity when reaching the recipient (in steganography) or when checking the authenticity of the source (in watermarking). Integrity is threatened by image tampering. Under the assumption that the model used to craft the adversarial embedding is unaltered, we want to ensure that decoding tampered images still yields the original message. We consider spatial domain tampering resulting from basic image transformations (rotation, upscaling and cropping) as well as frequency domain tampering like JPEG compression and color depth reduction. We ask:
Is adversarial embedding resilient to spatial domain and frequency domain tampering?
The last part of our study focuses on the steganography use case and considers the benefits of our SATA algorithm to increase the density (in bits per pixel) achieved by the adversarial embedding (by targeting classes). We study this because it is possible to get a smaller success rate by SATA (that considers multiple classes) compared to PGD (which targets a single class). We study this trade-off and ask:
What are the embedding density and the success rate achieved by of Sorted Adversarial Targeted Attack?
Density and success rate are dependent on the number of classes that are targeted by the attack algorithm. Thus, we study this question for different values of . In particular, we are interested in the maximum value of , which suggest the maximum capacity of SATA to successfully embed information.
V-B Experiment subjects
Messages. Most of our experiments consider two messages to encode. Message1 is a hello message, encoded in Base 10 as 29234652. Message2 is a randomly-generate message of 100 alpha-numerical characters encoded in Base 10.
In our RQ5 experiments we asses the embedding density of 3 classifiers with 1000 thousand randomly generated messages each (of length of 6.64 Kbits for the first 2 classifiers and 33 Kbits for the third classifier) then use Message3, a randomly-generated message of 6.64 kbits to assess the tradeoff between density and success rate.
Image dataset. We use the Cifar-10 dataset  as original images to craft the adversarial embedding. Cifar-10 comprises 60,000 labelled images scattered in 10 classes, with a size of 32x32 pixels and 3 color channels (which make it suitable for watermarking). With 10 classesand one-class embedding (PGD for instance), every image can embed 3 bits (). With the 32x32 pixel size and the 3 color channels, this yields an embedding density of BPP. By comparison, ImageNet usually uses images of 256x256 pixels and supports 21K categories which allow us to embed up to 14bits per image. However, due to the size of images, the image density is only 7.12e-5 BPP. Moreover, classification models for Cifar-10 require reasonable computation resources (compared to ImageNet) to be trained.
. Our experiments involve two pre-trained models taken from the literature and 100 generated models. The two pre-trained models are (i) the default Keras architecture for Cifar-10222https://keras.io/examples/cifar10_cnn/ – named KerasNet – and a Resnet 20 (V1) architecture. Both achieve
accuracy with Adam Optimizer, data-augmentation and 50 training epochs. The remaining 100 models were produced by FeatureNet, a neural architecture search tool that can generate a predefined number of models while maximizing diversity.
V-C Implementation and Infrastructure
All our experiments run on Python 3.6. We use the popular Keras Framework on top of Tensorflow and various third-party libraries. The Github repository of the project333https://github.com/yamizi/Adversarial-Embedding defines the requirements and versions of each library.
To craft the adversarial examples, we use the PGD algorithm with its default perturbation step and maximum perturbation parameters . To make sure these algorithm and parameters are relevant, we measure the empirical success rate of the algorithm in applying adversarial embedding on Cifar-10 with the Keras model. We encode Message 2 into 154 adversarial images produced by applying PGD on 154 original images selected randomly (without replacement). We repeat the process 100 times (resulting in 15,400 runs of PGD) and measure the percentage of times that PGD successfully crafted an adversarial example. We obtain a 99% success rate, which tends to confirm the relevance of PGD for adversarial embedding.
Model generation and training (using FeatureNet) were performed on a Tesla V100-SXM2-16GB GPU on an HPC node. All remaining experiments were performed on a Quadro P3200-6GB GPU on a I7-8750, 16Gb RAM laptop.
Vi Evaluation Results
Vi-a RQ1: Visual Perception of Perturbations
We rely on two metrics to quantify the perception of the perturbation on adversarial images. The first is Structural Similarlity Index Metric (SSIM) 
, which roughly measures how close two images are. It is known to be a better metric than others like signal-to-noise ratio (PSNR) and mean squared error (MSE). In, the authors show that humans cannot perceive perturbations with a SSIM loss smaller than 4%. Depending on the case, humans might start perceiving small perturbations from 4% to 8% of SSIM loss.
The second metric is LPIPS . It relies on a surrogate model (in our case, we used an AlexNet ) to assess perceptual similarity. A preliminary study  showed that it outperforms SSIM in terms of correlation with human perception and it can compare more finely perturbations with close SSIM values.
To evaluate whether humans can perceive the perturbations incurred by adversarial embedding, we embed Message1 using 8 cover images and Message2 using 154 cover images. In both cases, we applied PGD on the KerasNet model to generate the adversarial images. Then, we measure the SSIM and LPIPS losses between the original images and the perturbed images. To complement our analysis, we compare the degree of perceived perturbation (as computed by the metrics) with the one resulting from applying a 75% quality JPEG compression (resulting in a loss of information of 25%) on the original images.
|Image set||Perturbation||SSIM loss (%)||LPIPS loss (%)|
|Message1||Adv. Emb.||8.98 +/- 0.75||0.29 +/- 3.71e-04|
|Message1||JPEG (75%)||8.17 +/- 0.21||1.07 +/-3.44e-03|
|Message2||Adv. Emb.||6.02 +/- 0.58||0.33 +/- 9.68e-04|
|Message2||JPEG (75%)||6.65 +/- 0.20||1.05 +/- 8.44e-03|
Table I shows the results. The embedding of Message1 results in images with a mean SSIM loss of 8.98%, while the mean LPIPS loss is 0.29%. As for Message2, the mean SSIM loss is 6.02% and the mean LPIPS loss is 0.33%. The SSIM loss indicates that some human eyes could observe minor effects on the images, but this effect remains small. Moreover, the LPIPS metric reveals that the perturbation due to adversarial embedding is 3 times less noticeable than the ones incurred by JPEG compression. Overall, our results tend to show that the produced adversarial images remain within an acceptable threshold of human perception.
It is to be noted that the degree of perturbation depends on the choice of the adversarial attack algorithm and its parameter. We selected PGD as a relevant baseline. Still, we can further reduce this impact by lowering the maximum perturbation of PGD or by using alternative algorithms that are known to apply smaller perturbations, e.g. CW . In the end, this choice boils down to a compromise between perturbation, efficiency and the rate of success (in creating the adversarial images).
Similarly, using a different classifier leads to different perceptual loss values. In Fig 3, some models cause lower LPIPS loss when embedding the image while others achieve better performance in terms of SSIM loss. Some models even achieve SSIM loss lower than the threshold a human eye can notice.
Vi-B RQ2: Detection by Steganalysis
A basic approach in image steganalysis would be to compare the intercepted data with some original reference. This, however, does only work if the difference between the two is only caused by the embedding and not any alteration/noise/tampering during transit. Steganalysis focuses therefore on identifying the noise profile of different steganography techniques. A basic technique is the LSB embedding: It uses the Least Significant Bits to embed the message to hide. Other techniques rely either on a statistical analysis of the noise to identify patterns (SPA) or even Deep Neural Network to learn the patterns. The latter technique is the most efficient, it requires, however, the steganalyst to have a large number of labelled images to train the model to recognize the embedding technique.
We test 2 different detectors on a set of 154 stego-images (generated by encoding Message2) and 154 clean images and measured the AUC score of the detectors:
LSB matching detector: We measure an AUC-ROC score of 0.5, which was expected as this detectors was tailored for steganography techniques that are based on LSB embedding.
SPA dectector: We also measure an AUC-ROC score of 0.5. This however demonstrates that our embedded images do not show common statistical features that can be used to identify them.
To sum up, common detectors do not perform better than random chance into distinguishing relevant pictures (stego-images with Adversarial Embedding) from clean pictures.
Vi-C RQ3: Decoding by Alternative Models
We study whether a different model can classify the adversarial examples (forming the embedding) exactly as the model used to craft them does. We consider Message2 and the 154 adversarial images used to encode it, which were produced by applying PGD on the KerasNet. Then we make different models classify the adversarial images: the KerasNet with different parameters, the ResNet model, and the 100 models generated by FeatureNet. All models were trained using the whole Cifar-10 training set. The capability of those models to decode the message is measured as the percentage of images that they classify like the original KerasNet, which we name decoding rate.
KerasNet with different parameters achieves a decoding rate of 23.72%, while the ResNet achieves 12.18%. This indicates that adversarial embedding is highly sensitive to both the model parameters and architecture. To confirm this, we show in Figure 4 the decoding rate achieved by the generated models. It results that no model can retrieve the class (as labelled by the original KerasNet) of more than 37% of the adversarial images, with more than half failing to surpass 26% of decoding rate.
All these low decoding rates increase our confidence that neither randomly-picked models nor handcrafted, state-of-the-art models can break the confidentiality of adversarial-embedded messages. Even if the malicious third party knows the model architecture used for the embedding, differences in parameters also result in a low capability to decode the message illicitly.
Vi-D RQ4: Resilience to image tampering
Vi-D1 Spatial Domain Tampering
We focus first on soft image tampering and three local image alterations:
Rotation: We rotate the images by 15°.
Upscaling: We use bilinear interpolation to resize the images to 64x64 pixels.
Cropping: We remove 12.5% of the images by cropping, keeping only the central part.
These transformations are common when copyrighted images are shared illegally .
To measure the resilience of adversarial embedding to those transformations, we consider the 100 generated models and the images. We create three altered versions of each image, using the three transformations above independently. Then, for each model , original image and altered image , we check whether assigns the same class to and . If that is not the case, it means that the alteration results in changing the classification of the image, thereby threatening the integrity of messages encoded by adversarial embedding. We measure the resilience of to each transformation by computing the recovery rate of against , i.e. the percentage of images resulting by applying that classifies in the same class as their original counterpart.
Figure 5 shows, for each transformation, the recovery rates achieved by the 100 models. We observe that, for all transformations, we can always find a classification model that achieves a high recovery rate. 99% for upscaling transformation, 94.9% for rotation transformation and 94.2% under cropping perturbation, these results indicate that adversarial embedding is resilient to spatial-tampering of the images and that we can craft adversarial models strong against such transformations.
It is worth noticing that our default model, KerasNet, crafted and trained to achieve high classification accuracy is not the best model to ensure the most robust embedding.
Vi-D2 Frequency Domain perturbations
Next, we study the impact of two aggressive image tampering: JPEG compression (A frequency domain transformation) and Color Depth Reduction (CDR). JPEG compression relies on various steps (color transformation, DCT, quantization) that cause information loss. CDR reduces the number of bits used to encode different colors. For instance, Cifar-10 images use a 32 bits color depth. This means that every channel is encoded on 32bits. Reducing color depth to 8 bits makes the picture contain less tone variation and fewer details. We apply JPEG compression with a 90%, a 75% and 50% quality rates (resulting in loss of information of 10%, 25% and 50%, respectively) and CDR (to 8 bits, leading to pictures with only 1/12 of the original information) independently and we measure again the recovery rate achieved by the 100 models.
In Fig 6(a), our models achieve up to 100% recovery rate under jpeg compression (Q=90). When the compression rate increases, we can still achieve up to 95% (Fig 6(b)) recovery rate against jpeg(Q=75) and one of our models achieves more than 72% recovery rate under a jpeg(Q=50) compression(6(c)).
Our models also show a spread robustness to color depth reduction (Fig 6(d)) and reach up to 88% recovery rate.
Vi-E RQ5: SATA Embedding
Vi-E1 Embedding Capacity
To determine how much data our adversarial embedding with SATA could achieve, we randomly pick 1,000 messages of 6,643 bits and use our technique to determine how many pictures are needed depending on how much classes we would embed per picture. Once we know the number of pictures needed, we measure the embedding density in the case of color pictures of 32x32 pixels.
(a) confirms our previous estimate that a 10-class classifiers has a density fromBPP with 2 classes embedded per image to BPP when embedding 9 classes per image.
Vi-E2 Embedding Capacity and Success Rate Trade-off
To embed many classes per image, we have to tweak the hyperparameters of our embedding, to ensure that we recover all the classes encoded in the right order (i.e. the success rate of our embedding).
To maximize the success rate, we used a grid search to find the best combination of the following two SATA hyperparameters:
, which controls the relative weights of the forced classes.
the maximum amount of perturbation (measured as L2 distance) SATA can apply to an image.
The best was always 0.5, while the ideal value of changed depending on .
SATA attack has also other hyperparameters that we kept unchanged from the PGD implementation. For instance the perturbation step and the number of random initialisations within the epsilon ball .
We ran our experiment using the KerasNet model and picked our cover images randomly from the cifar-10 dataset.
We associate each image with a binary result set to success SATA managed to find a successful perturbation leading to the intended encoding; otherwise, it is set to failed. Then, the success rate of SATA is measured as the percentage of images for which it was successful.
Table 9 presents, for values of , the success rate when SATA embeds a randomly-generated message of 6,643 bits. We see that SATA can always find at least an image to embed up to seven classes per-image. Eight classes or more are not achievable using our current model. For , the success rate over 1000 images was 0.8%, meaning that SATA was successful for 8 images.
In this paper, we proposed an embedding pipeline that hides secret messages in images by using image classification (DNN) models and adversarial attack algorithms. We have shown that such an approach can be used to form a highly elusive (bypass detection techniques), flexible and customizable steganography and watermarking technique. We have also demonstrated that the combined use of a 1,000 output class model with our new sorted adversarial attack algorithm can achieve high-density embeddings (higher than existing steganography research). We also showed that our embeddings are resilient to image tampering, e.g., jpeg compression.
An inherent benefit of our approach is that it leverages adversarial attack algorithms in a black-box way. Therefore, our technique can take advantage of any, current or future, state-of-the-art technique (targeted adversarial) coming out from this highly active research area and (theoretically) can be at least as effective as existing adversarial attacks (our technique will remain effective as long as adversarial attacks manage to remain undetected).
Future work should attempt to expand our study to larger models and datasets, including other media where adversarial examples have showed mature results (such as audio, video and text).
-  (2017) Evading machine learning malware detection. Cited by: §II-D.
-  (2018) Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. External Links: Cited by: §II-D.
-  (2017) Hiding images in plain sight: deep steganography. In NIPS, Cited by: §I.
-  (2013) Evasion attacks against machine learning at test time. pp. 387–402. External Links: Cited by: §II-D.
-  (2017) Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods. External Links: Cited by: §II-D.
-  (2017) Towards Evaluating the Robustness of Neural Networks. Proceedings - IEEE Symposium on Security and Privacy, pp. 39–57. Note: Security and Privacy Conference 2017 External Links: Cited by: §II-D, §II-D, §VI-A.
-  (2004-03) Hiding data in images by simple lsb substitution. Pattern Recognition 37, pp. 469–474. External Links: Cited by: §II-A.
-  (2010) Digital image steganography: survey and analysis of current methods. Signal Processing 90 (3), pp. 727 – 752. External Links: Cited by: §II-A.
-  (2011-04) A view on latest audio steganography techniques. In 2011 International Conference on Innovations in Information Technology, Vol. , pp. 409–414. External Links: Cited by: §II-A.
-  (2003-01) Detection of lsb steganography via sample pair analysis. Vol. 2578, pp. 355–372. External Links: Cited by: 3rd item.
-  (2013) Image quality assessment using the ssim and the just noticeable difference paradigm. In Engineering Psychology and Cognitive Ergonomics. Understanding Human Cognition, D. Harris (Ed.), Berlin, Heidelberg, pp. 23–30. External Links: Cited by: §VI-A.
-  (2001) Reliable detection of lsb steganography in color and grayscale images. In Proceedings of the 2001 Workshop on Multimedia and Security: New Challenges, MM&Sec ’01, New York, NY, USA, pp. 27–30. External Links: Cited by: 3rd item.
-  (2019) Automated search for configurations of convolutional neural network architectures. pp. 1–12. External Links: Cited by: §V-B.
-  (2014) Explaining and Harnessing Adversarial Examples. pp. 1–11. External Links: Cited by: §II-D.
-  (2017) On the (Statistical) Detection of Adversarial Examples. External Links: Cited by: §II-D.
-  (2014) Universal distortion function for steganography in an arbitrary domain. EURASIP Journal on Information Security 2014, pp. 1–13. Cited by: 4th item, §II-A.
-  (2012) Designing steganographic distortion using directional filters. 2012 IEEE International Workshop on Information Forensics and Security (WIFS), pp. 234–239. Cited by: 2nd item.
-  (2018) Image steganography in spatial domain: A survey. Signal Processing: Image Communication 65 (December 2017), pp. 46–66. External Links: Cited by: §I, §I, §VI-E1.
-  (2018) A review of image steganalysis techniques for digital forensics. Journal of Information Security and Applications 40, pp. 217–235. External Links: Cited by: §II-C.
-  (2012-01) ImageNet classification with deep convolutional neural networks. Neural Information Processing Systems 25, pp. . External Links: Cited by: §VI-A.
-  (2012-05) Learning multiple layers of features from tiny images. University of Toronto, pp. . Cited by: §V-B.
-  (2016) Adversarial machine learning at scale. ArXiv abs/1611.01236. Cited by: §II-D.
-  (2014-10) A new cost function for spatial image steganography. In 2014 IEEE International Conference on Image Processing (ICIP), Vol. , pp. 4206–4210. External Links: Cited by: 3rd item.
-  (2011-05) A survey on image steganography and steganalysis. Journal of Information Hiding and Multimedia Signal Processing 2, pp. . Cited by: 4th item, §II-A.
-  (2018) Detection based Defense against Adversarial Examples from the Steganalysis Point of View. pp. 4825–4834. External Links: Cited by: §II-D.
-  (2016) Delving into transferable adversarial examples and black-box attacks. External Links: Cited by: §II-D, 4th item.
-  (2006) Equivalence analysis among dih, spa, and rs steganalysis methods. In Communications and Multimedia Security, H. Leitold and E. P. Markatos (Eds.), Berlin, Heidelberg, pp. 161–172. External Links: Cited by: 3rd item.
-  (2017) Towards Deep Learning Models Resistant to Adversarial Attacks. pp. 1–27. External Links: Cited by: §II-D, 4th item, §V-C.
-  (2016) The limitations of deep learning in adversarial settings. Proceedings - 2016 IEEE European Symposium on Security and Privacy, EURO S and P 2016, pp. 372–387. Note: Security and Privacy 2016 External Links: Cited by: §II-D, §II-D.
-  (2010) Using high-dimensional image models to perform highly undetectable steganography. In Information Hiding, R. Böhme, P. W. L. Fong, and R. Safavi-Naini (Eds.), Berlin, Heidelberg, pp. 161–177. External Links: Cited by: 1st item.
-  (2001) Digital watermarking: algorithms and applications. IEEE signal processing Magazine 18 (4), pp. 33–46. Cited by: §VI-D1.
-  (2018) Defense-gan: protecting classifiers against adversarial attacks using generative models. ArXiv abs/1805.06605. Cited by: §II-D.
Accessorize to a crime: real and stealthy attacks on state-of-the-art face recognition. pp. 1528–1540. External Links: Cited by: §II-D.
-  (2013) Intriguing properties of neural networks. CoRR abs/1312.6199. Cited by: §II-D.
-  (2019) Deep learning applied to steganalysis of digital images: A systematic review. IEEE Access 7, pp. 68970–68990. External Links: Cited by: 4th item.
-  (2017) Steganographic generative adversarial networks. ArXiv abs/1703.05502. Cited by: §I.
-  (2001) F5—a steganographic algorithm. In Information Hiding, I. S. Moskowitz (Ed.), Berlin, Heidelberg, pp. 289–302. External Links: Cited by: §II-A.
-  (2003) A steganographic method for images by pixel-value differencing. Pattern Recognition Letters 24 (9), pp. 1613 – 1626. External Links: Cited by: §II-A.
-  (2019) Stealthy Porn : Understanding Real-World Adversarial Images for Illicit Online Promotion. 2019 IEEE Symposium on Security and Privacy, pp. 547–561. Cited by: §II-D.
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric J UST N OTICEABLE D IFFERENCES ( JND ). Cvpr2018 (1), pp. 13. Cited by: §VI-A.
-  (2018) The Adversarial Attack and Detection under the Fisher Information Metric. External Links: Cited by: §II-D.
-  (2004-04) Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13 (4), pp. 600–612. External Links: Cited by: §VI-A.
-  (2018) HiDDeN: hiding data with deep networks. External Links: Cited by: §I, §IV, §VI-E1.