Multitask Identity-Aware Image Steganography via Minimax Optimization

07/13/2021 ∙ by Jiabao Cui, et al. ∙ 0

High-capacity image steganography, aimed at concealing a secret image in a cover image, is a technique to preserve sensitive data, e.g., faces and fingerprints. Previous methods focus on the security during transmission and subsequently run a risk of privacy leakage after the restoration of secret images at the receiving end. To address this issue, we propose a framework, called Multitask Identity-Aware Image Steganography (MIAIS), to achieve direct recognition on container images without restoring secret images. The key issue of the direct recognition is to preserve identity information of secret images into container images and make container images look similar to cover images at the same time. Thus, we introduce a simple content loss to preserve the identity information, and design a minimax optimization to deal with the contradictory aspects. We demonstrate that the robustness results can be transferred across different cover datasets. In order to be flexible for the secret image restoration in some cases, we incorporate an optional restoration network into our method, providing a multitask framework. The experiments under the multitask scenario show the effectiveness of our framework compared with other visual information hiding methods and state-of-the-art high-capacity image steganography methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 8

page 10

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Visual security authentication, e.g., face recognition 

[galbally2013image, huang2015benchmark, xie2017robust] and fingerprint identification [valdes2019review], has achieved considerable advances in recent years. Its wide application also poses a challenge that such sensitive data as face and fingerprint need to be protected. High-capacity image steganography, which generates a container image to conceal a secret image in a cover image, is an elegant and widespread technique to address this issue [hussain2018image]. Previous methods [wu2015steganography, baluja2017hiding] focus on the protection of secret images during transmission. Consequently, they run a risk of visual privacy leakage after restoration: once the receiver is under attack, secret images can be stolen. To this end, we propose a framework, called Multitask Identity-Aware Image Steganography (MIAIS), to perform recognition directly on container images without restoring secret images, as shown in Fig. 1.

To perform direct recognition on container images, two intuitively contradictory issues need to be addressed. On the one hand, concealment allows small perturbations in cover images, which means the container images should look similar to their cover images. On the other hand, direct recognition may require large perturbations in cover images to preserve discriminative features of secret images. The former aspect is the focus of prevalent image steganography; we focus on the latter aspect and propose a strategy to deal with the contradiction.

Fig. 1: Comparison of previous image steganography methods and ours. An image steganography method generates a container image by hiding a secret image in a cover image. The container image would be transferred from a sender to a receiver. Our difference from previous methods lies in the processing of container images. Top: Previous methods restore secret images from container images for recognition, which raises a risk of privacy leakage. Bottom: Our framework performs recognition directly on container images.

For direct recognition, we introduce a content loss, adding a similarity constraint between a secret image and its container image. In general, identity information lies in high-level features in a deep network [he2016deep]

. Thus, we adopt the feature extractor part of a classifier to implement a similarity constraint for high-level features. The content loss preserves the discriminative high-level feature consistency between the container and secret image, although the container is similar to the cover image visually. As a comparison, previous steganography methods, such as HIiPS 

[baluja2017hiding] and SteganoGAN [zhang2019steganogan], are not suitable for direct recognition because of their indiscriminate features, as shown in Fig. 2 (a), (b), and (c).

            (a) HIiPS [baluja2017hiding]                (b) SteganoGAN [zhang2019steganogan]       (c) With the content loss      (d) With the minimax optimization               (e) With (c) and (d)

Fig. 2: T-SNE [dermaaten2008visualizing] visualization of different methods in terms of high-level features of container images out of five classes on the AFD dataset [xiong2018an]. (a) HIiPS [baluja2017hiding] and (b) SteganoGAN [zhang2019steganogan] are two mainstream image steganography methods, where the features are not discriminative. (c) Our content loss preserves the identity information to produce discriminative features on container images. (d) The minimax optimization can also produce discriminative features on container images. (e) When both the content loss and the minimax optimization are used, the features are even more discriminative. Best viewed in color.

To deal with the aforementioned contradiction, we design a minimax optimization including a network for hiding secret images called a steg-generator and two container image classifiers called steg-classifiers, as shown in Fig. 3. The steg-generator and the steg-classifiers are alternately trained to compete with each other. Specifically, at Stage A, the two steg-classifiers are trained to maximize the discrepancy of them with the steg-generator fixed. At Stage B, with the two steg-classifiers fixed, the steg-generator is trained to minimize the discrepancy of the two steg-classifiers. Stage A and Stage B are performed iteratively. As a result, the process decreases the intra-class distance and increases the inter-class distance, obtaining more discriminative container images for recognition, as shown in Fig. 2 (d) and (e).

Although our method can perform recognition without restoration, in order to be flexible for the secret image restoration in some cases, we incorporate an optional restoration network called a steg-restorer into our pipeline, providing a multitask learning scenario in our framework, as shown in Fig. 3

. It is extremely easy to implement it by adding a visual similarity constraint between secret images and the restored secret images in container images. So far, our MIAIS framework could entirely generate a container image one path through an end-to-end learning pipeline, while maintaining a balance among the

visual concealment, identity recognition, and secret restoration tasks.

The main contributions are summarized as follows:

  • We propose a novel framework to perform direct recognition on container images, preventing privacy leakage while simplifying the recognition workflow. To the best of our knowledge, it is the first work to propose and study the direct recognition task on container images.

  • We introduce a content loss to preserve identity information for secret images.

  • We design a minimax optimization handling the intuitive contradiction between preserving identity information and making container images similar to the corresponding cover images.

  • We conduct the extensive experiments to show the 1) effectiveness of our framework compared with other visual information hiding methods, 2) robustness of the recognition results across different cover sets, and 3) better performance compared to the state-of-the-art high-capacity image steganography methods.

Ii Related Work

In this section, we first review traditional visual information hiding methods used for privacy protection. Then, we focus on image steganography. Finally, we review some adversarial learning strategies, including generative adversarial networks and minimax optimization.

Ii-a Visual Information Hiding

With the introduction of recognition, many works have been considering the security computation on sensitive images or features [ergun2014privacy, madono2020block, hu2016securing], in which visual information is hidden from humans in the whole process.

A group of the visual information hiding works focused on cryptography [gilad2016cryptonets, hesamifard2017cryptodl, xu2019cryptonn]. In order to protect the data at the receiving end, homomorphic encryption (HE) was introduced as the underlying mechanism. Several privacy-preserving approaches [gilad2016cryptonets, yonetani2017privacy, wang2018efficient, sadeghi2009efficient, gentry2009fully]

hid the visual information in encrypted data and applied the neural networks to recognize the encrypted data. However, it was not feasible due to the computational complexity and memory costs 

[madono2020block]. Furthermore, the mathematical operations (additive and multiplicative) in HE are limited, which cannot be adapted to complex transformations in state-of-the-art deep neural networks.

Now that the cryptography-based methods were not practicable, another group of the research turned to the combined approaches of

cryptography, machine learning

, and image processing. Ergun et al. [ergun2014privacy] proposed a Privacy Preserved Face Recognition (PPFR) framework, which encrypted the plain-text by random cryptographic keys based on the continuous chaotic system [ergun2011high] and directly recognized the encrypted data. Block-based encryption methods were popular in this area, such as Combined Cat Map (CCM) [wang2015novel] based on hybrid chaotic maps and dynamic random growth technique, encryption method of histograms of oriented gradients (HOG) features [kitayama2019hog], and Encryption-then-Compression (EtC) framework [kawamura2020privacy]. Moreover, Chuman et al. [chuman2018encryption] presented a block scrambling-based encryption scheme to enhance the security of EtC framework with JPEG compression.

Several studies had also used Deep Neural Networks (DNNs). Tanaka et al. [tanaka2018learnable] applied learnable encryption (LE) images to DNNs by reducing the influence of image encryption by adding an adaptation network prior. Extended learnable encryption (ELE) [madono2020block] hid perceptual information by Block-wise image scrambling. Sirichotedumrong et al. [sirichotedumrong2019privacy] extended their conference version [sirichotedumrong2019pixel] and proposed a pixel-based image encryption method to maintain the important features of original images for privacy-preserving DNNs’ classification. McPherson et al. [mcpherson2016defeating] empirically showed how to train artificial neural networks to successfully identify faces and recognize handwritten digits even if the images were protected by various forms of obfuscation techniques, such as mosaicing, blurring [hill2016effectiveness], and Privacy Preserving Photo Sharing (P3) [ra2013p3].

Our approach is based on the high-capacity image steganography to generate more realistic images for transmission. We compare our method with these methods in Section IV-B.

Ii-B Image Steganography

Steganography is the art and science of hiding secret information in a payload carrier (cover) and get a container [4655281]. It can be categorized by the cover forms to image [hussain2018image], audio [djebbar2012comparative], video [sadek2015video], text [liu2015text], DNA [santoso2015information], etc. The most popular medium is the image because it has the biggest amount in our life. In this paper, we focus on the high-capacity image steganography where the secret information is also images.

Ii-B1 Traditional Image Steganography

The secret information of traditional image steganography is usually messages (e.g., text, binary string). From the perspective of embedding domain, traditional steganography algorithms can be divided into spatial-domain algorithms[lie1999data, pevny2010using, pevny2010steganalysis, holub2014universal, holub2012designing, tamimi2013hiding] and transform-domain algorithms [zhang2009high, ramkumar1999robust, quan2009high]. The spatial-domain algorithms embed the secret message through pixel brightness value, color, texture, edge, and contour modification of the cover image. The transform-domain algorithms embed secret messages in the transform domain of the cover image under different kinds of transforms, such as Discrete Wavelet Transform (DWT) [zhang2009high]

, Discrete Fourier Transform (DFT) 

[ramkumar1999robust], Discrete Cosine Transform (DCT) [quan2009high] and so on.

Ii-B2 High-Capacity Image Steganography

With the advent and development of deep learning, high-capacity image steganography appears based on the encoding (concealment) and decoding (restoration) networks 

[weng2019high]. To leverage the high sensitivity to tiny input perturbations of deep neural networks, Zhu et al. [zhu2018hidden] hid and restored the secret data with the help of cover images. Baluja et al. [baluja2017hiding] attempted to place a full-size color image (secret) within another image of the same size (cover) with deep neural networks. Following such direction, Duan et al. [duan2019reversible] proposed a new image steganography scheme based on a U-Net structure. Similarly, Wu et al. [wu2018stegnet] and Rahim et al. [rahim2018end]

combined recent deep convolutional neural network methods with image-into-image steganography to hide the same size images successfully.

These steganographic methods focus on the security during transmission but run a risk of the leakage of the secret images after restoration. Our proposed MIAIS framework prevents privacy leakage by performing recognition directly on container images without restoring the secret images.

Ii-C Adversarial Learning

To enhance anti-analysis ability, many steganography methods adopted GAN-based adversarial learning. The pipelines of these GAN-based methods were similar, but the implementations were different [tang2017automatic, Volkhonskiy2017SGAN, shi2017ssgan]. In order to achieve better invisible steganography, Zhang et al.[zhang2019invisible]

implemented a steganalyzer based on the divergence on the empirical probability distributions. To optimize the perceptual quality of the images, Zhang 

et al. [zhang2019steganogan] proposed a novel technique named SteganoGAN for hiding arbitrary binary data in images. Hayes et al. [hayes2017generating] showed that adversarial training can produce robust steganographic techniques under an unsupervised training scheme. A similar conclusion was demonstrated in Shi et al.’s work [shi2019synchronized].

In a word, the aforementioned GAN analogous methods essentially aimed to approximate the distribution of container images with that of cover image domain by a T/F discriminator. But our task aims to perform recognition directly on container images, which means the discriminator is a classification instead of T/F decision. Inspired by the Maximum Classifier Discrepancy (MCD) [saito2018maximum] in Unsupervised Domain Adaptation (UDA), we design a minimax optimization. Unlike GAN analogous methods, it pays attention to the class distribution between the secret image domain and the container image domain, not just the domain gap between the cover images and container images.

Iii Multitask Identity-Aware Image Steganography

to 0.49X[c]X[4] & a secret image dataset
& a set of cover images
& a secret image
& a cover image
& a container image
& the steg-generator
& the steg-classifier without minimax optimization
, & the two steg-classifiers in the minimax optimization
& the steg-restorer
& the parameters of
& the parameters of
& the parameters of
& the parameters of
& the parameters of
& the content loss
& the visual similarity loss
& the recognition loss
& the standard cross-entropy loss
& the standard triplet loss
& the restoration loss
& the discrepancy loss

TABLE I: Notations

Multitask Identity-Aware Image Steganography (MIAIS) consists of a recognition branch and an optional restoration branch, both of which share a concealment part, as shown in Fig. 3. Thus, the receiver can perform direct recognition on the container image with steg-classifier for privacy protection (the single-task scenario), or alternatively, perform the recognition and restoration (the multitask scenario) to make the secret image restoration flexible in some cases.

In this section, we first specify the whole architecture of MIAIS. Second, we illustrate the objective functions and the optimization process. Finally, we design a minimax optimization to deal with the contradictory issues during optimization. For convenience, Table I summarizes the notations.

Fig. 3: Overview of Multitask Identity-Aware Image Steganography (MIAIS). The framework is composed of a network for hiding secret images called a steg-generator, two classifiers to recognize container images called steg-classifiers, and a network to restore secret images called a steg-restorer.

Iii-a Architecture

In this part, we walk through the architecture of MIAIS and introduce the data flow.

Iii-A1 The Steg-Generator

The steg-generator generates a container image to hide a secret image in a cover image. Let be a secret image dataset and be a set of cover images. Given and , the container image for is

(1)

where denotes the steg-generator and is its parameters.

Iii-A2 The Steg-Classifiers

To prevent the privacy leakage after restoration, we propose a steg-classifier to perform recognition directly on container images. Let be a steg-classifier and

be its parameters. The predicted probability

of the container image is

(2)

where is the softmax function and , where is the class amount of secret images. We will see how we use two competing steg-classifiers to improve the recognition performance in Section III-C.

Iii-A3 The Steg-Restorer

To compare our method with traditional steganography, we embed a restoration branch in our framework. The steg-restorer, denoted by , inputs and outputs a restored secret image . That is,

(3)

where is the parameters of .

Iii-B Loss Functions

In this part, we illustrate the loss functions in MIAIS according to the task scenario: concealment, recognition, and restoration.

Iii-B1 Concealment

Firstly, our framework aims to hide a given secret image into a cover image , producing a container image . To make look similar to , a visual similarity loss is used to evaluate the consistency, i.e.,

(4)

where and are the -th container and cover image in the batch respectively, and MS-SSIM is Multi-Scale Structural Similarity [wang2003multiscale], widely used in image steganography [zhang2019invisible].

During the concealment, our framework requires to preserve the identity information of , in order to perform recognition directly on container images. To preserve the identity information, we propose a content loss, inspired by the perceptual loss [johnson2016perceptual] in the area of style transfer [huang2017arbitrary]. That is,

(5)

where is the batch size, and are the -th secret and container image in the batch respectively, is the feature extractor of pre-trained on the secret image dataset , and is its parameters, which are fixed in our framework. The feature extractor is essentially the partial deep neural network of our steg-classifier except for the last fully connected layer.

Iii-B2 Recognition

To measure the recognition performance, we denote the recognition loss as . For the normal image classification task, we adopt the standard cross-entropy loss function as the recognition loss, That is, ,

(6)

where is the class amount of secret images, is the predicted probability of and

is the one-hot ground truth vector of

.

As for the face verification task, we add an extra triplet loss term into the recognition loss. That is, ,

(7)

where , , represent anchor, positive and negative examples respectively, and is a margin that is enforced between positive and negative pairs.

Iii-B3 Restoration

An optional objective of MIAIS is restoring secret images. Akin to , the restoration loss is

(8)

Iii-C Minimax Optimization

Fig. 4: Illustration of the minimax optimization. The optimization consists of two stages. At Stage A, with the fixed steg-generator (container images), to maximize the discrepancy, the two decision boundaries and are optimized for the farther distances, resulting in narrower intersection region. At Stage B, with the two steg-classifiers fixed, minimizing the discrepancy means to move the features of the container images towards the intersection region of these two steg-classifier decision boundaries and , through training the steg-generator. After repeating the above two stages, the intra-class distance is reduced and inter-class distance is increased, as shown in the final distribution. Best viewed in color. Note that for the single-task scenario, only the is optimized in the minimax optimization. For the multitask scenario, and are jointly optimized in the minimax optimization.

Iii-C1 Objective

In the single-task scenario (recognition), the overall objective is to optimize the steg-generator and the steg-classifier by minimizing the summation of visual similarity loss, the classification loss, and the content loss, i.e.,

(9)

In the multitask scenario (joint recognition and restoration), the overall objective is to optimize the steg-generator, the steg-classifier, and the steg-restorer by minimizing the summation of visual similarity loss, the classification loss, the content loss, and the restoration loss. That is,

(10)

During the optimization, the parameters of steg-classifier are updated by the gradients of . But the input of steg-classifier, the container image , depends on the , which accounts for the calculation of depends not only on , but also on . The performance of concealment depends on , while that of recognition depends on . Therefore, it is hard to seek for a trade-off between the concealment and recognition when optimizing the steg-generator and steg-classifier simultaneously. To address this issue, we adopt the minimax optimization to iteratively update each of the two parts with the other one fixed, as shown in Fig. 4.

To design a minimax optimization, we introduce two steg-classifiers, denoted by and . After initialization, we train our framework in two alternate stages iteratively. At Stage A, and are trained to maximize the discrepancy between them with fixed. At Stage B, the steg-generator is trained to minimize the discrepancy between the fixed and . The process is shown in Algorithm 1. The details of the discrepancy loss and the two stages are described as follows.

Input: Secret image dataset ;
           Cover image set ;

1:  Initialize (parameters );
2:  for

 epoch

, #epochs do
3:     for every batch do
4:        Load and normalize (the batch size) secret images from ;
5:        Randomly sample cover images from ;
6:        Generate container images and restored images;
7:        if epoch % 2 == 1 then
8:           Stage A:
9:           Update (Eq. 12);
10:        else
11:           Stage B:
12:           Update for the recognition(and update in the multitask scenario)(Eq. 13 or Eq. 14);
13:        end if
14:     end for
15:  end for

Output:

Algorithm 1 The minimax optimization

Fig. 5: T-SNE visualization of the feature distribution of four classes at different stages on the AFD dataset. Each class has 30 samples. We extract the initial container features before training, and then extract the container features after maximum discrepancy and minimum discrepancy in the 25th epoch and the 120th epoch, respectively. To clearly show the samples with large discrepancies, we highlight the samples with a discrepancy more than 3e-4 by special red marker (tri-down).

Iii-C2 Discrepancy Loss

Given an arbitrary container image , and

denote the logits output of two steg-classifiers respectively. Similar to many minimax optimization method 

[saito2017maximum, lee2019drop], to measure the differences between the prediction results of the two steg-classifiers, we define the discrepancy loss based on absolute values of the difference between the two logits:

(11)

Iii-C3 Training

We train our framework with in a two-stage manner.

Stage A (Maximize Discrepancy): At this stage, we aim to maximize the discrepancy between the two steg-classifiers. With the steg-generator fixed, we train the two steg-classifiers and to minimize the container classification loss and maximize the discrepancy loss , i.e.,

(12)

Maximizing the two steg-classifiers discrepancy leads to two far-distant class decision boundaries derived from the two steg-classifiers. Features within the intersection region of these two boundaries are effective for classification because these features are discriminative to any classifier whose decision boundary is between the two far-distant decision boundaries. Then at the next Stage B, with the two steg-classifiers fixed, the minimization of the discrepancy causes the features of the container images to move towards the intersection region of these two steg-classifier decision boundaries by training the steg-generator, as shown in the bottom of Fig. 4.

Stage B (Minimize Discrepancy): At this stage, we aim to minimize the discrepancy between the two steg-classifiers. We train the steg-generator to minimize the discrepancy loss of two fixed steg-classifiers and , visual similarity loss , and content loss jointly. Then, the optimization objective in Eq. 9 for the single-task scenario is rewritten as follows:

(13)

Note that since the two steg-classifiers are fixed, and do not need to be optimized. In the single-task scenario, only the are optimized in minimize discrepancy stage.

As for the multitask scenario, we just need to add the restoration loss to Eq. 13 and replace the optimization objective of Stage B as follows:

(14)

In the multitask scenario, and are jointly optimized in minimize discrepancy stage.

We further visualize the intermediate snapshots of container image features during the minimax process using T-SNE. As shown in Fig. 5, after maximum discrepancy, the samples with large discrepancy (red markers) increase, which means the discrepancy of the two steg-classifiers enlarges. After the minimum discrepancy in Fig. 5, the red markers decrease, which indicates the classification difference of the samples is significantly reduced. After this minimax discrepancy process, the intra-class distance is reduced and the inter-class distance is increased, resulting in the compact and discriminative features of container images for recognition. The coefficients to balance the weight among the loss terms in Eqs. 12, 14 and 13 will be clarified in Section I of the supplemental material for reproduction.

Iv Experiments

In this section, we first describe the experimental setup. Then, we perform extensive experiments on the direct recognition task. Finally, we evaluate the multitask scenario (recognition and restoration) to compare our method with traditional image steganography methods.

Iv-a Experimental Setup

Iv-A1 Datasets

We conduct the experiments in two recognition tasks: face recognition and image classification. For face recognition, we conduct the experiments on the AFD (Asian Face Dataset) [xiong2018asian] dataset. We use 310,969 images of 1,662 identities for training, and 35,780 images of 356 identities for validation. For image classification, we use the

Tiny-Imagenet

dataset, consisting of 200 classes, each of which has 500 training, 50 validation, and 50 test images.

For the cover image selection in steganography, we use 100 images from the style transfer and satellite datasets, which have a rich texture for concealing the secret image information. Note that all the images are unrelated to the training datasets, such as AFD and

Tiny-ImageNet

. To show the covercarrier robustness of our proposed methods, we use LFW, Pascal VOC 2012, and ImageNet as the cover image set respectively in the single-task scenario. Pascal VOC 2012 is a dataset designed for object detection and semantic segmentation and contains 33260 images. LFW

(Labeled Faces in the Wild) dataset contains 13233 images of 5749 identities.

ImageNet is a large visual database in object recognition.

Iv-A2 Network Architecture

Similar to many image steganography methods[weng2019high, duan2019reversible], the steg-generator is U-net[ronneberger2015u_net] structure. Both steg-classifiers are the Inception network structure[szegedy2016rethinking]. We design a simple steg-restorer inspired by the similar architecture[weng2019high]. The channels of input and output are shown in Table II. Each layer of the steg-restorer has a

convolution one-padding (a stride of 1). There is a batch normalization (BN) layer and a rectified linear unit (ReLU) after each convolution layer except for the last one. The results of the last convolution layer will be activated by a Sigmoid function, then output as the restored secret image. We draw the network architecture details for reproduction.

Please see Section IX of the supplemental material for more details.

Index 1 2 3 4 5 6 Input 3 64 128 256 128 64 Output 64 128 256 128 64 3

TABLE II: The channels of the Steg-restorer.

Iv-A3 Evaluation Metrics

From the perspective of privacy protection, we measure the image perturbations between the cover and container image using Mean Square Error (MSE), Multi-Scale Structural Similarity (MS-SSIM[wang2004image, ma2016group]

, and Peak Signal to Noise Ratio (

PSNR). The Detection Rate (DR) is the result of the StegExpose[boehm2014stegexpose] which is a popular statistical steganalysis toolkit. Our main focus, the recognition accuracy (Acc) on container images denotes the classification accuracy in the image classification task or identification accuracy in the face recognition task. To evaluate the MIAIS framework embedded with the restoration module quantitatively, we use the MSE and MS-SSIM to measure the distortion between secret and restored secret images.

Iv-A4 Implementation Details

All experiments are performed on a workstation with two Titan X GPU cards. The input images are rescaled to . In all of our experiments, we use an SGD optimizer to update the parameters of our networks. The batch size is set to 16 for face recognition, 32 for image classification in the single-task scenario, and 4 for face restoration in the multitask scenario. The learning rate is initialized by 0.001 in the single-task scenario and 0.0001 in the multitask scenario, and reduced by a factor of 0.2 when the overall loss stagnates. In order to evaluate the effectiveness of identity preserving at a certain level of visual information hiding, we set a hiding threshold to control the visual quality of steganography. We dynamically adjust the weights of losses in Eqs. 14 and 13 through the magnitude relation between and the hiding threshold. In order to determine the hiding threshold, we train our baseline for large iterations and observe that visual loss (MS-SSIM) converges to about 0.02 on the test dataset (Please see Section II of the supplemental material for more details). Therefore, we set the hiding threshold to 0.02 in our experiments. As described in the Section III-B, we adopt the triplet loss[schroff2015facenet] for face recognition to obtain the better performance (, ).

Iv-B Experiments in Direct Recognition

Methods AFD Tiny-ImageNet
Baseline Minimax MSE PSNR MS-SSIM DR Acc MSE PSNR MS-SSIM DR Acc
0.0047 23.32 0.9808 0.521 0.8517 0.0032 24.94 0.9797 0.486 0.7335
0.0070 21.52 0.9805 0.538 0.8648 0.0034 24.68 0.9820 0.491 0.7353
0.0058 22.37 0.9798 0.531 0.8732 0.0032 24.98 0.9818 0.487 0.7512
0.0052 22.85 0.9799 0.586 0.8752 0.0038 24.22 0.9819 0.497 0.7535
TABLE III: Ablation Study of Our Framework. means the higher the better; means the lower the better.
Fig. 6: Visual quality of ablative models on the AFD dataset. The first two columns show secret images and cover images. The rest columns show the container images produced by ablative models. The ablative method Baseline + is more colorful and vivid compared with the method Baseline. The methods without content loss introduce more mosaics and lattices. However, the content loss would bring about some artifacts, which affects the visual quality. Best viewed in color.

Iv-B1 Ablation Study

To perform a series of ablation experiments, we choose the images in AFD or Tiny-ImageNet as the secret images. We calculate the MSE, PSNR, and MS-SSIM between the cover and the container images for quantitative evaluation. In order to measure the secrecy of steganography, we show the detection rate (DR) of StegExpose in the container image domain. The comparison methods of ablation experiments are shown as follows:

  • Baseline. We train one steg-generator paired with one steg-classifier with the summation of and as our baseline.

  • Baseline + . We add the to baseline and train the steg-generator together with the steg-classifier jointly (Eq. 9).

  • Baseline + Minimax. We train one steg-generator paired with two steg-classifiers via the minimax optimization.

  • Baseline + + Minimax. We train one steg-generator paired with two steg-classifiers with via the minimax optimization.

Quantitative Evaluation. As shown in Table III, in both the face recognition and image classification tasks, the methods with are better than the methods without in Acc, which indicates that the similarity constraint between the secret images and container images preserve the identity (class) information. Most importantly, the method using the minimax optimization performs better than the baseline (joint training of single steg-classifier) over 2%, which shows that our minimax optimization obtains more discriminative features of the container images. As defined in Eq. 4, is related to MS-SSIM. Therefore, in Table III, the MS-SSIM is similar through the hiding threshold control. Within a similar visual loss range, the detection rates of all methods are not far from random guessing. Under the similar visual quality of hiding measured by MS-SSIM between the container image and the cover image, the methods with the minimax optimization are better than the methods without it. From Table III, the Baseline method has higher PSNR and MS-SSIM than the Baseline + method, which means that the method without the content loss can preserve more appearance information and has the better visual quality. In fact, the content loss acts on the constraint of the global discriminative identity features, which is indeed detrimental to the visual quality of images.

Visual Quality. Through the control of the hiding threshold for visual loss, all the generated images have an imperceptible perturbation compared with the original cover image. But it is interesting that, although every method has an almost similar MS-SSIM, their generated images have different styles. As shown in Fig. 6, the ablative method Baseline + is more colorful and vivid compared with the method Baseline. The methods without content loss introduce more mosaics and lattices. However, the content loss would bring about some artifacts, which affects the visual quality. For example, as highlighted in the red box of the first example (a), we can see the methods without content loss has unrealistic grids compared with the methods with content loss. In the second example (b), the color of green leaves highlighted in the red boxes is preserved well by the ablative models with content loss. Please see Section III of the supplemental material for the visual quality and color histograms of more examples.

Cover Set AFD Tiny-ImageNet
MSE PSNR MS-SSIM DR Acc MSE PSNR MS-SSIM DR Acc
LFW 0.0049 23.11 0.9801 0.557 0.8730 0.0037 24.32 0.9812 0.505 0.7545
PASCAL-VOC12 0.0057 22.45 0.9806 0.622 0.8724 0.0042 23.77 0.9800 0.501 0.7519
ImageNet 0.0052 22.83 0.9813 0.588 0.8729 0.0041 23.91 0.9806 0.502 0.7515
TABLE IV: Robustness on Different Cover Sets. means the higher the better; means the lower the better.

Iv-B2 Robustness for the Cover Sets

In practical applications, the cover sets are various due to different communication scenarios. Therefore, it is necessary to evaluate the robustness of the results across different datasets. We choose three different task datasets (LFW, Pascal VOC 2012, and ImageNet) as the cover set separately and use the vanilla transfer learning pipeline — fine-tuning to show the carrier robustness of our MIAIS framework. The model is pre-trained with 100 cover images from the style transfer and satellite datasets. From

Table IV, the recognition accuracy fluctuates during 0.05% 0.28% on AFD dataset, and 0.04% 0.30% on Tiny-ImageNet dataset. The MSE, MS-SSIM, and PSNR fluctuate within 0.06%, 0.16%, 0.60 separately on average, which demonstrates that the results can be transferred across datasets.

The scope of our work is to achieve a trade-off between direct recognition measured by the accuracy and visual concealment measured by PSNR. In the experiments, recognition accuracy was our first priority, which possibly sacrificed the PSNR performance to a certain extent. Actually, the visual quality is controlled by the hiding threshold (default setting 0.02). To raise the PSNR result, we adjust the hiding threshold to 0.001 and report our results with high PSNR. Please see Section V of the supplemental material for more details.

Iv-B3 Comparison with Others

To evaluate the effectiveness, we compare the proposed MIAIS framework with some other visual information hiding methods on AFD datasets. Take the face recognition result on secret image as upper bound, the competing methods include Combined Cat Map (CCM) [wang2015novel], Learnable Encryption (LE) [tanaka2018learnable], PBIE [sirichotedumrong2019pixel], Encryption-Then-Compression (EtC) [kawamura2020privacy], Extended Learnable Encryption (ELE) [madono2020block], EM-HOG [kitayama2019hog], Defeating Image Obfuscation (DIO) [mcpherson2016defeating], Privacy Preserving Face Recognition (PPFR) [ergun2014privacy] and a naive Block-Wise pixel Shuffle (BWS). BWS only naively shuffles the pixel blocks without consideration for privacy protection. In Table V, we show the MSE, MS-SSIM, PSNR between the secret images and the container images, and accuracy (Acc) of the same steg-classifier model learned on their hiding images. As shown in Table V, our proposed MIAIS framework gives the best face recognition performance, which drops only 0.28% from the upper bound. Please see Section IV of the supplemental material for the visual quality of these methods.

Method MSE PSNR MS-SSIM Acc Secret Image 0 inf 1 0.8780 CCM [wang2015novel] 0.1337 8.9840 0.4179 0.5575 LE [tanaka2018learnable] 0.1437 8.5156 0.4998 0.6154 PBIE [sirichotedumrong2019pixel] 0.1628 8.0266 0.4106 0.5737 EtC [kawamura2020privacy] 0.1597 8.1376 0.3811 0.5575 ELE [madono2020block] 0.2951 5.5851 0.4132 0.5830 EM-HOG [kitayama2019hog] 0.1735 8.9596 0.2279 0.5728 DIO [mcpherson2016defeating] 0.0965 10.2998 0.4646 0.8120 PPFR [ergun2014privacy] 0.1260 8.9964 0.4718 0.8540 BWS 0.0223 17.2696 0.9120 0.8606 Ours 0.1521 8.1795 0.4191 0.8752

TABLE V: Comparison with Other Visual Information Hiding Methods on the AFD Dataset. means the higher the better; means the lower the better.
Fig. 7: The receiver operating characteristic (ROC) curve produced by the StegExpose for a set of 1000 cover-container pairs.

Iv-B4 Security Analysis

The security of steganography is crucial for privacy protection, which determines the practicability of steganography. We use the popular statistical steganalysis toolkit called StegExpose[boehm2014stegexpose] to analyze our MIAIS framework and the ablation method (baseline + ). StegExpose combines several popular steganalysis techniques, such as Sample Pair Analysis [dumitrescu2002detection], LSB Steganography Analysis [fridrich2001reliable, dumitrescu2002steganalysis], Visual & Statistical Attacks [westfeld1999attacks]. The source code of StegExpose is available on Github111https://github.com/b3dk7/StegExpose.. Digging into the code, the StegExpose will return a length of the hidden message through the function “Fuse.se(stegExposeInput)”. The length divided by the total length as a modification score which was ranged from 0 to 1. We follow the experiment setting of HIiPS[baluja2017hiding] to calculate the receiver operating characteristic (ROC) curve.

In the experiment, we select 1000 images randomly from the AFD dataset as the secret image and produce the corresponding 1000 container images through each method. As shown in Fig. 7, we plot the ROC curve of steganalysis and calculate the area under the curve (AUC). From Fig. 7, compared with the Baseline + , our MIAIS framework shows the greater ability to fooling the steganalyser (StegExpose), which indicates that the container image can’t be discriminated from the cover image by steganalysers.

Iv-C Experiments in a Multitask Scenario

Model Cover-Container Secret-Restored DR Acc
MSE PSNR MS-SSIM MSE PSNR MS-SSIM
HliPS* 0.0162 17.88 0.9096 0.0019 27.08 0.9646 0.480 0.8177
SteganoGAN* 0.0206 16.84 0.8909 0.0089 20.46 0.9085 0.466 0.8473
HiDDeN* 0.0123 19.11 0.8957 0.0078 21.05 0.9727 0.490 0.8373
StegNet* 0.0085 20.70 0.9363 0.0056 22.47 0.9767 0.472 0.8493
Ours 0.0063 22.00 0.9864 0.0014 28.53 0.9962 0.479 0.8690
TABLE VI: Comparison with State-of-the-Art High-Capacity Image Steganography Methods on the AFD Dataset. means the higher the better; means the lower the better.

In this subsection, we conduct our MIAIS framework embedded with the restoration module and other competitive high-capacity image steganography methods under the multitask scenario to jointly restore the secret images and recognize the container images. Similar to experiments on recognition in Section IV-B, we calculate MSE, PSNR, and MS-SSIM for cover-container pairs and secret-restored pairs. We calculate DR and accuracy (Acc) for container images. Different from the experiments in recognition, we restore the secret image from the container image and preserve the identity information into container images at the same time. We first introduce the modified steganography methods, then compare these methods to our method from the quantitative and qualitative evaluation. Finally, we perform the security analysis with 3 learning-based steganalysers and the visualization about the spatial and channel modification in an image.

Fig. 8: Visual quality of different image steganography methods on the AFD dataset. To show the difference between the cover images and the container images, we enlarge the residual gray images by a factor of 5 or 10. The container images and the restored images produced by our MIAIS look most similar to the cover images and the secret images visually in terms of tones, colors, and sharpness, which can also be seen in the magnified residuals.

Iv-C1 Competing Methods

We choose for comparison state-of-the-art image steganography methods including HIiPS [baluja2017hiding], SteganoGAN [zhang2019steganogan], HiDDeN [zhu2018hidden] and StegNet [wu2018stegnet]. As HiDDeN [zhu2018hidden] is not specially designed for high-capacity image steganography, we re-implement its input and output layer to ensure the consistency of experiment settings. The competing methods didn’t perform direct recognition on container images. For a fair comparison, we re-implement the aforementioned image steganography methods under the multitask scenario. Specifically, we add the cross-entropy and triplet loss to the other methods and adopt joint training. The modified methods are marked with an asterisk (*), as shown in Table VI.

Iv-C2 Quantitative Evaluation

As for the performance of concealment/restoration, we focus on the metrics on the cover-container/secret-restored pairs. From Table VI, our proposed MIAIS framework performs better than the other image steganography methods. Particularly, our framework outperforms the other image steganography methods about 7% MS-SSIM on average in concealment. As for the performance of recognition, our framework achieves the best results on the Acc, which shows that our method could preserve the identity information more effectively and obtain the discriminative container image for recognition. To explore the real-world biometric recognition scenario, we reduce the number of training images by sampling 10 faces per identity. We evaluate the performance of our method and report the results in the Section VI of the supplemental material.

As discussed in Section IV-B2, the scope of our work is to achieve a trade-off between direct recognition measured by the accuracy and visual concealment measured by PSNR. Similarly, we report our experimental results with high PSNR under the multi-task scenario. Please see Section V of the supplemental material for more details.

Iv-C3 Visual Quality

Fig. 8 shows the visual quality of different image steganography methods and their corresponding enlarged residual images. The restored secret images produced by our MIAIS are the most similar to the original secret image, while those of the other methods have discordant color areas and blur areas (face). As shown by the residuals of even rows, our MIAIS hides the profile and details of the face very well. These experiments show that our MIAIS framework performs well in the multitask scenario whether from the quantitative or qualitative evaluation. Please see Section VIII of the supplemental material for more examples.

Iv-C4 Deep Steganalysis

Except for the statistical steganalysis — StegExpose, we further examine our method with learning-based steganalysers[ye2017deep, yedroudj2018yedroudj]. In the experiment, we consider two different cases of whether the steganography model can be available. As shown in Table VII, in the first case, the learning-based steganalysers achieve 100% detection rate ability over all the steganography methods, as other related work has shown[baluja2019hiding, zhang2020udh]. As for the second case, our method achieves a comparable detection rate compared with the most secure method.

Steganalyser Publication Year Model HiDDeN HliPS SteganoGAN StegNet Ours
DLHR [ye2017deep] TIFS 2017 First case 1.000 1.000 1.000 1.000 1.000
Second case 0.889 0.782 0.949 0.929 0.891
Yedroudj-net [yedroudj2018yedroudj] ICASSP 2018 First case 1.000 1.000 1.000 1.000 1.000
Second case 0.538 0.613 0.955 0.690 0.533
TABLE VII: The neural steganalysis results. Both the two cases are trained with 4000 image pairs, and tested on 1000 image pairs. In the first case, training data and testing data come from the same specific steganography method. In the second case, the testing data comes from the specific method, but the training data consists of the other four methods evenly.

Similar to other steganography methods, our primary objective is the secret communication without revealing the transported information to a third-party. In our scenario, the transported information is the content/identity of the secret image. Even if a third-party could detect the container image, they cannot recover the content/identity of the secret image. In fact, in real-world applications, the steganography model of our method is confidential and not available, thus ensuring security in terms of resisting detection by the learning-based steganalysis.

Iv-C5 Modification in an Image

Our framework directly generates the container image on the whole by the deep learning-based steg-generator instead of modifying the least significant bit (LSB). To further show the modification of our steganography network, we do statistics on the distribution of perturbation for each bit and the distribution of the modification values of pixels respectively. Please see Section VII of the supplemental material for more examples.

V Conclusion

In this paper, we point out the threat of the conventional high-capacity image steganography, and propose a novel Multitask Identity-Aware Image Steganography (MIAIS) framework. We introduce a content loss to preserve the identity information of secret images, and design a minimax optimization focusing on the discriminability of container images through decreasing the intra-class distances and increasing the inter-class distances. Furthermore, we incorporate the restoration module to our MIAIS framework and derive a flexible multitask scenario version, which is adapted to both restoration and recognition. The experiments show the effectiveness of MIAIS compared with the other visual information hiding methods, the robustness of the recognition results across different cover datasets, and the better performance compared to the state-of-the-art high-capacity image steganography methods on face recognition. We believe this work opens up new avenues for privacy preservation based on the visual information hiding methods by achieving direct recognition on container images.

Acknowledgment

This work is supported in part by National Key Research and Development Program of China under Grant 2020AAA0107400, Zhejiang Provincial Natural Science Foundation of China under Grant LR19F020004, key scientific technological innovation research project by Ministry of Education, and National Natural Science Foundation of China under Grant U20A20222. This work was supported in part by Ant-Zhejiang University Research Institute of FinTech. The authors would like to thank Guijie Zhao, Jiaming Ji, Boyu Dong, Xiaoyang Wang for their valuable suggestions. The authors are deeply indebted to anonymous reviewers for their constructive suggestions and helpful comments.

References