Determining the authenticity of an image is becoming increasingly important for legal proceedings, criminal investigations, and verifying identity-supporting documents. In recent years, Convolutional Neural Networks have been employed to detect image manipulations, ranging from identifying splicing and copy-move forgeries to manipulations such as contrast enhancement [32, 38], resampling , JPEG compression , Gaussian blurring [19, 6], median filtering , and additive white Gaussian noise . Of this latter group, some may be innocuously applied but often are maliciously included in post-processing to disguise other manipulations , .
More recently, research in image forensics has included the presence of an adversary, a situation in which the vulnerability of CNNs has been well-studied . In regards to image forensics, consideration of these adversarial attacks have been primarily limited to pixel-based adversarial examples and JPEG double compression[1, 31]. In pixel-based attacks, an adversary with knowledge of the CNN model in deployment can craft an ”attacked” image which appears visually identical to the original image, but is mislabeled by the CNN 
. This problem is well known in computer vision and has been at the forefront of recent work in field. However, this type of attack demands a certain level of expertise by the adversary, and is unlikely to be employed in a majority of cases in image forensics. Even for skilled adversaries, constructing pixel-based adversarial attacks is often labor-intensive, and recent work has cast doubt on the transferability of adversarial attacks in image forensics applications. While pixel-based adversarial attacks require at least some knowledge of the model, a low-level adversarial manipulation such as double JPEG compression requires no such knowledge. In this type of attack, the images are simply JPEG compressed after the manipulation has been applied, hampering the model’s ability to correctly identify post-processing methods such as additive white Gaussian noise or median filtering . For this reason, building models robust to low-level, simple adversarial manipulations such as JPEG double compression, to which several manipulation detection models have been found to be vulnerable [1, 31, 1], is particularly important. The goal of this paper is to investigate the vulnerability of state of the art models to another kind of low-level adversarial manipulation: printing and scanning. To our knowledge this is the first investigation into adversarial attack in digital image manipulations through printing and scanning.
In physical forgery, repeated printing and scanning can be used to obscure manipulations or watermarks. A document may be modified, usually non-digitally, and then repeatedly printed and scanned to disguise the manipulation artifacts. While scanning a printed document is not always related to forgery, it is reasonable to expect that state-of-the-art models be impervious to this type of post-processing, as is noted in related work in double JPEG compression [1, 14]. In addition, unlike complex pixel-based adversarial attacks, simply printing and scanning an image is both low-cost and requires little expertise, similar to JPEG compression.
In this paper, we limit our investigation to globally-applied manipulations, such as Gaussian blurring, additive white Gaussian noise, and median filtering, rather than local manipulations such as copy-move or splicing, as in related work . We construct printed and scanned datasets from three different printers and experiment with two state-of-the-art models, as well as our own novel model. Related to our work is research involved in identifying camera models  – we additionally report results for identifying printer model. Our main contributions include the following:
We conduct the first analysis into the vulnerability of image manipulation detectors to printing and scanning, demonstrating that at least two state-of-the-art models are vulnerable to this type of highly plausible and inexpensive attack
We propose a novel model architecture which performs comparably than the state-of-the-art models when trained and evaluated on printed and scanned images, including performing 5% better when trained on images from a single printer
We conduct an in-depth analysis on the relationship between CNN-based image manipulation detectors, including training on composite datasets, and plan to share our dataset of over 6,000 printed and scanned images with the community to facilitate further investigation
The rest of the paper is organized as follows. In Section 2, we give context and background through related work. In Section 3, we describe our novel model architecture, as well as those of the models we used for comparison. In Section 4, we describe the datasets used for training and validation. In Section 5, we explain the experiments conducted, and in Section 6, we discuss the results of these experiments. The paper ends in Section 7, where we summarize our conclusions and suggest areas of future research.
2 Related Work
As the scope of this paper is limited to CNN-based global image manipulation detectors, we provide background on convolutional neural networks and CNN-based detectors. We similarly provide context on adversarial attacks on CNNs generally as well as specifically on CNN-based image manipulation detectors. Related to this work is work on detecting manipulations through inconsistencies in lighting  and despite various compression qualities . 
contributes significantly to this problem area, though without examining models that leverage deep learning. similarly advances the area, but without addressing specifically the problem of printing and scanning in relation to CNN-based detectors, and is thus complementary to this work.
2.1 Convolutional Neural Networks
Convolutional neural networks, and deep learning approaches more generally, are an extension of neural networks. The architecture is defined by a set of parameters used to design the network and generally consists of stacked layers. This approach has proven effective in extracting higher-order features, particularly in object identification, but recent work has also demonstrated that these types of networks, with some modifications, are also useful for extracting the low level features useful for detecting global image manipulations. In a Convolutional Neural Network (CNN), the first layer is a set of convolutional feature extractors that convolve across the image in parallel using a set of learnable filters. The filters are initialized randomly, and the values are updated iteratively through back-propagation [23, 22]
. The filters use a sliding window approach to reach all regions of the input image, producing feature maps. The following hidden layers function similarly, convolving over the resulting feature maps to produce higher-level feature maps. Each convolutional layer is followed by an activation function to provide non-linearity. Additionally, convolutional layers are often followed by pooling layers (max, average, stochastic, etc) to reduce the dimensionality of the resulting features and reduce over-fitting. The output of the final convolutional layer is inputted into a fully-connected neural network for classification.
2.2 Deep Learning for Image Forensics
Recent methods in image forensics techniques leveraging deep learning have reached impressive performance. In 2015, a CNN-based classifier was proposed for detecting median filtering in images. Building on this work,  proposed CNN-based model with the addition of a ”constrained convolutional layer”, or a layer constrained to learn the high-pass features of an image by attempting to predict a central pixel based on its neighbors. This serves to suppress the image content while learning the manipulation fingerprint, drawing inspiration from steganalysis rich model (SRM) filters in steganalysis .
More recently, it has been shown that third order subtractive pixel analysis matrix (S3SPAM) features can be learned by a simple shallow CNN, and can employ transfer learning to achieve good performance on little training data. In addition to directly detecting manipulations, a deep learning method for analyzing the image processing history as an important component for image forensics has been proposed, as the processing history pipeline can affect the accuracy of other forensic tools .
|Original||IFSTC dataset after six manipulations||198,624||
|Composite Printers||Combined set of images from each printer (balanced)||6,426||
|Printer Identification||Pristine images after being printed and scanned by all three printers||3,213||Dell, X1, X2|
|JPEG Compression||IFSTC dataset with JPEG compression (QF=80) on all images||198,624||
2.3 Adversarial Attacks on CNNs
The vulnerability of CNNs to adversarial attacks has been well documented [16, 26]. Adversarial noise can be designed in such a way that, when added to the image, can retain visual quality while misleading the classifier. For example, Fast Gradient Sign Method (FGSM) 
leverages the differentiability of the loss function, assumed to be known to the adversary. The method proposes altering each pixel based on the gradient of the loss with respect to the original pixels in the input image. These changes small are enough such that the resulting image is visually nearly identical to the original, but are large enough cumulatively to increase the loss such as to impair the classification. Similarly, projected gradient descent (PGD) seeks a perturbation that maximises the loss on a specific input while keeping the perturbation size smaller than a given epsilon. DeepFool  uses a local linearization of the classifier to approximate the decision boundary and alter the images accordingly. The Jacobian-based Saliency Map Attack (JSMA)  uses a greedy iterative procedure, altering only the pixels which contribute most to the correct classification as identified by a saliency map. Each of these pixel-based adversarial attacks, while effective, requires at least partial knowledge of the network used for image manipulation detection. In contrast, low-level adversarial attacks such as JPEG compression or printing and scanning, the subject of this paper, require no such knowledge.
2.4 Adversarial Attacks in Image Forensics
While CNN-based classifiers have achieved high performance on benchmark image forensic tasks, recent research in computer vision has demonstrated that CNN-based detectors, like CNNs more broadly, are highly vulnerable to adversarial attacks. For example, in , the authors demonstrate that a GAN-based architecture can conceal 3x3 median filtering manipulation, causing a detector to label the image as non-manipulated, including for the CNN-based detectors proposed in  and . Additionally, a method of adversarial attack based on small pixel-based distortions has been proposed for fooling global image manipulation detectors . However, 
notes that unlike in most pattern recognition tasks, pixel-based adversarial attacks such as Fast Gradient Sign Method (FGSM) and Jacobian-based Saliency Map Attack (JSMA) , are not for the most part transferable between models.
Here we describe our proposed model architecture for improved performance on printed and scanned images. We compare our model to the model proposed in , the inspiration for the constrained convolutional layer, as well as to XceptionNet (Xception) , as they have nearly identical number of parameters and similar architecture, so the difference in performance cannot be attributed to increased network capacity.
3.1 Proposed Model
Our proposed architecture consists of one constrained convolutional layer 
, 1 convolutional layer, 34 separable convolutional layers, 5 pooling layers (4 max pooling, 1 global average pooling), and a final fully connected layer (see Figure3
In the constrained convolutional layer, a 5x5 filter is employed in which the sum of all the weights is constrained to be zero . Specifically, the center pixel is predicted by the rest of the pixels in the field, and the output of the filter can be interpreted as the prediction error, as suggested by research in steganalysis . Specifically, the weights in the filter are constrained such that:
where refers to the weight, and and refer to the coordinates in the filter, where is the central weight.
The purpose of the constrained convolutional layer is to constrain the model to learn image manipulation fingerprints, rather than image content and higher order features, such as those useful for object recognition and classification tasks. The prediction error fields are then used as low-level forensic trace features by the rest of the network to assist in classifying global image manipulation detection.
For the separable convolutional layers, a spatial convolution is performed independently for each channel and is followed by a point-wise or 1x1 convolution, as proposed in . These components decrease the number of free parameters allowing the deep network to learn effectively even with a small training set, making it particularly appropriate for our investigation.
In this approach, we hope to leverage both the SRM-like features produced by the convolutional layer as well as the improved generalization ability provided by the added depth and separable layers.
Proposed in 2016, the constrained convolution method of image manipulation detection, hereafter referred to at Bayar2016, proposes a three-layer CNN, with two max-pooling layers and three fully-connected layers (including the initial constrained convolutional layer) . This model demonstrated impressive results in discerning between the six manipulations investigated in this paper using the dataset described in the next section, achieving 99.9% validation accuracy.
In addition to a the Bayar2016 shallow network, recent work has demonstrated that increasing network depth can dramatically improve model generalization. To compare with a model of similar depth that also uses separable convolutional layers, we experiment with XceptionNet, a deep network comprising of 42 layers, including separable convolutional layers . The network design is built upon Inception architecture , with the innovation of separable filters.
|Gaussian Blurring (GB)||
|JPEG compression (JPEG)||quality = 70|
|Median Filtering (MF)||kernal size = (5,5)|
|Bilinear Resampling (RS)||ratio = 1.5|
Following the procedure described in , we use images from the first IEEE IFSTC Image Forensics Challenge , obtaining 3,334 images of size 1024x768, which was further split into training, validation and testing data. The dataset comprises of many images captured from several different digital cameras of both indoor and outdoor scenes.
4.1 Printing and Scanning
We used three different printers and one scanner to create a dataset of printed and scanned images: one Dell S3845CDN Laser Multifunction Printer, one Xerox Altalink C8070 Multifunction Printer, and one Xerox WorkCentre 7970 Multifunction Printer, which we refer to as Dell, Xerox1 and Xerox2 respectively hereafter. We printed 50 images of each manipulation type on each printer and used the Dell scanner to scan each image (see Figure 1). After scanning and extracting the images from the resulting pdfs, the image sizes were 1700 x 2200 pixels, which was then center-cropped to 1536 x 1792 to remove the white border added by the scanning process. Each image was then split into 42 256x256 blocks (or 299x299 blocks for our proposed model and for Xception), resulting in 2,142 image blocks of each class from each printer (see Figure 2). We limited our data creation to 900 full-page color images both for budget constraints and environmental concerns, thus presenting synthetic printing and scanning simulation as an avenue of possible future work.
We post-processed each image with additive white Gaussian noise (AWGN), Gaussian blurring (GB), JPEG compression (JPEG), median filtering (MF) and re-sampling (RS) as well as retaining the pristine image (PR), yielding six classes, per the procedure described in .
Additive white Gaussian noise constructs a noise matrix of the same shape as the image according to a normal distribution with a given sigma value and adds this matrix to the original image. The result is then normalized to values between 0 and 255. Gaussian blurring blurs the image using a Gaussian filter by convolving the input image using a given kernal. JPEG compression is a lossy compression method which compresses the image through converting the color map, down-sampling and Discrete Cosine Transform (DCT). Median filtering replaces each pixel with the median value of the neighboring pixels using a given kernal area. Bilinear resampling works similarly, resizing the image using the distance-weighted average of the neighboring pixels to estimate the new pixel value. See Table2 for parameter details.
We trained each model (our proposed model, Bayar2016, and Xception) on a variety of training sets and evaluated each trained model on a variety of validation datasets (See Table 1).
We first evaluated the extent to which our selected models can correctly classify the validation images after printing and scanning. We trained each model on the original dataset (before printing and scanning) with all six classes: additive white gaussian noise (AWGN), gaussian blurring (GB), JPEG compression (JPEG), median filtering (MF), bilinear resampling (RS) and pristine or no manipulation (PR). For a more complete analysis, we removed the bilinear resampling (RS) and JPEG compression (JPEG) classes from the training and validations sets and retrained the models, as these two classes could intuitively be considered similar to changes introduced during the printing and scanning process (see Table 3).
Second, we explored countering this vulnerability by training on the printed and scanned image blocks . We trained each model on the printed and scanned image blocks from a single printer. The dataset (see Table 1, Xerox1) consists of 50 full images (1700 x 2200 pixels), which were then divided into 256x256 for the Bayar2016 model or 299x299 for the Xception models. This resulted in 2,142 image blocks for each dataset, which was divided into training and validation sets of size 1722 and 420 respectively, using only the central images to avoid including border artifacts from the scanning process.
Third, we created composite datasets, one consisting of all printed and scanned image blocks (from all three printers), and the other consisting of all printed and scanned image blocks as well as a number of image blocks from the original dataset (before printing and scanning), at a size equivalent to those from one of the three printers. The first composite dataset, which we refer to as Composite Printers, consists of 6,426 image blocks (printed and scanned only), while the second consists of 8,568 image blocks (75% printed and scanned, 25% original). The goal of this experiment was to evaluate if the poor accuracy fitting the printed and scanned data could be mitigated by dramatically increasing the size of the training set.
Finally, we evaluated the performance of each of the models on identifying the printer of printed and scanned images (see Table 1, Printer Identification).
For Bayar2016, we used a batch size of 64, an initial learning rate of 0.01, Stochastic Gradient Descent (SGD) with momentum 0.95, weight decay 0.0005, gamma 0.7, and step size 6.
We used similar hyperparameters for Xception and our proposed model. Specifically, for both models, we use the pre-trained weights from the network as trained on ImageNet. We again used SGD, and inferred the batch size and learning rate at training time based on the number of GPUs, using
for the batch size and 0.01 for the initial learning rate. We use momentum 0.9 and weight decay 0.0005. For learning rate decay, we use polynomial decay as described in . For each model, we trained until the validation accuracy plateaued or began to fall.
Following the original methodology for Bayar2016, we retain only the green color layer of each image and divide into 256x256 non-overlapping blocks, retaining nine central blocks. For our proposed model and for Xception, we retain all three color channels and split the images into 299x299 non-overlapping blocks, according to the input size of the original architecture.
6 Results and Discussion
6.1 Print-Scan Manipulation
To evaluate the general vulnerability of each of the models to printed and scanned images, we trained on the original IFSTC dataset (before printing and scanning) and evaluated each model on validation sets from each of the three printers. When we evaluated the models on the printed and scanned validation sets, we found that each model performed only slightly better than random.
We additionally removed the bilinear resampling and JPEG compression classes, and found that the resulting models are similarly unable to correctly classify the remaining four manipulations, still performing at or below random. We additionally note that the models perform worse on the printed and scanned validation images than on the validation images after JPEG compression, a known vulnerability of these types of models, indicating that printing and scanning may be more effective at masking the manipulations  (see Table 3).
, Xerox1). We trained each model on images from only the Xerox1 dataset, or images after being printed and scanned on the first Xerox printer. We find that while no model is able to perfectly fit the printed and scanned dataset, our proposed models significantly outperforms the current state-of-the-art models. We also note that transferability to other printers remains weak, indicating significant variance between the printers. Here 4c indicates that we used the restricted set of manipulations (AWGN, GB, MF, and PR) (See Section5).
6.2 Cross-Training on Printed and Scanned Examples
We additionally trained each model on printed and scanned images from an individual printer (Xerox1) (See Section 4.1).
We note that Bayar2016 and Xception achieve accuracies 66.6% and 70.4% respectively, while our proposed model is able to achieve an accuracy of 75.3%. It also appears that training on one printer does not lend itself to similar validation accuracy on examples from another printer, even of the same make. (See Table 4).
6.3 Composite Training
To compensate for the small size of the dataset for each printer alone, we created a composite dataset, consisting of all of the printed and scanned examples (total size 6,426 blocks), which we refer to as Composite Printers. However, we found that training on this composite dataset did not improve validation performance on any single printer compared with training on images from that printer alone. While this is possibly due to a still insufficiently large training dataset, it also likely provides further evidence that the difference between printers and scanners may be significant enough to preclude fitting a general printed and scanned dataset (see Table 5).
For completion, we additionally created another composite dataset, which we refer to as Composite Full, which consists of the same composition as Composite Printers plus an equivalent number of examples from the original dataset (total size 8,568), and found similar results. (See Table 6).
6.4 Printer Identification
For comparison with work on camera model identification, we additionally experimented with printer identification on each of the three printers using the discussed models, and found that the models could distinguish between images from the printers with up to 95% accuracy. This is particularly impressive considering the accuracies were achieved using a relatively small set of training data (2,410 image blocks) and without any additional metadata (see Table 7), indicating significant variance between the artifacts introduced by each printer .
We investigated the robustness of current state-of-the-art image manipulation detection models in the context of printing and scanning, and found that these state-of-the-art models perform poorly on printed and scanned image data. We proposed our own novel model architecture, which performs 5% better than the state-of-the-art models when trained and evaluated on images from a single printer. We constructed a dataset of over 6,000 printed and scanned image blocks which we plan to release to the community for further investigation.
That current state-of-the-art models are vulnerable to printing and scanning is an important finding given the availability and ease of printing and scanning images versus constructing complex adversarial examples.
Further analysis suggest that the variability between images produced by each printer is large, significant enough for the models to easily distinguish between printers and for models trained on a single printer to generalize poorly to images from another printer. This conclusion may create additional challenges in designing models robust to printing and scanning, and sets it apart from work on creating models robust to more uniform and predictable JPEG compression. Future work may include developing methods to simulate printing and scanning in order to create a larger datasets for training the models.
-  (2018-10) Cnn-based detection of generic contrast adjustment with jpeg post-processing. 2018 25th IEEE International Conference on Image Processing (ICIP). External Links: Cited by: §1, §1, §2.4.
-  (2016) A deep learning approach to universal image manipulation detection using a new convolutional layer. ACM Workshop on Information Hiding and Multimedia Security, pp. 5–10. Cited by: §1, §1, §2.2, §2.4, §3.1, §3.1, §3.2, §3, §4.2, Table 2, §4.
-  (2018) Deep learning for detecting processing history of images. In Media Watermarking, Security, and Forensics, Cited by: §2.2, §3.1.
-  (2010) Forensic detection of median filtering in digital images. In 2010 IEEE International Conference on Multimedia and Expo, Vol. , pp. 89–94. Cited by: §1.
-  (2010) Forensic detection of median filtering in digital images. In 2010 IEEE International Conference on Multimedia and Expo, pp. 89–94. Cited by: §1.
-  (2010) Edge-based blur metric for tamper detection. Journal of Information Hiding and Multimedia Signal Processing 1 (1), pp. 20–27. Cited by: §1.
Adversarial examples are not easily detected: bypassing ten detection methods.
Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 3–14. Cited by: §1.
-  (2015-11) Median filtering forensics based on convolutional neural networks. Signal Processing Letters, IEEE 22, pp. 1849–1853. External Links: Cited by: §2.4.
-  (2015-11) Median filtering forensics based on convolutional neural networks. Signal Processing Letters, IEEE 22, pp. 1849–1853. External Links: Cited by: §2.2.
-  (2019-03) A multi-purpose image forensic method using densely connected convolutional neural networks. Journal of Real-Time Image Processing 16, pp. . External Links: Cited by: §2.4.
-  (2017-07) Xception: deep learning with depthwise separable convolutions. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). External Links: Cited by: §3.1, §3.3, §3.
-  (2017) Recasting residual-based local descriptors as convolutional neural networks. Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security - IHMMSec ’17. External Links: Cited by: §2.2.
-  (2017) Study of the impact of standard image compression techniques on performance of image classification with a convolutional neural network. Cited by: §2, §6.1.
-  (2009) Exposing digital forgeries from jpeg ghosts. IEEE transactions on information forensics and security 4 (1), pp. 154–160. Cited by: §1, §1.
-  (2012-06) Rich models for steganalysis of digital images. IEEE Transactions on Information Forensics and Security 7, pp. 868–882. External Links: Cited by: §2.2.
-  (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §1, §2.3.
-  (2014) Explaining and harnessing adversarial examples. External Links: Cited by: §2.4, §5.
-  (2018-09) Analysis of adversarial attacks against cnn-based image forgery detectors. 2018 26th European Signal Processing Conference (EUSIPCO). External Links: Cited by: §1.
-  (2005) Detecting digital tampering by blur estimation. In First International Workshop on Systematic Approaches to Digital Forensic Engineering (SADFE’05), pp. 264–278. Cited by: §1.
-  (2007) Exposing digital forgeries in complex lighting environments. IEEE Transactions on Information Forensics and Security 2 (3), pp. 450–461. Cited by: §2.
-  (2018) Median filtered image restoration and anti-forensics using adversarial networks. IEEE Signal Processing Letters 25, pp. 278–282. Cited by: §2.4.
-  (2012) Efficient backprop. In Neural networks: Tricks of the trade, pp. 9–48. Cited by: §2.1.
-  (1989) Backpropagation applied to handwritten zip code recognition. Neural computation 1 (4), pp. 541–551. Cited by: §2.1.
-  (2017) Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083. Cited by: §2.3.
-  (2016) Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2574–2582. Cited by: §2.3.
-  (2016) The limitations of deep learning in adversarial settings. In 2016 IEEE European symposium on security and privacy (EuroSP), pp. 372–387. Cited by: §2.3.
-  (2016-03) The limitations of deep learning in adversarial settings. 2016 IEEE European Symposium on Security and Privacy (EuroSP). External Links: Cited by: §2.4.
-  (2005-02) Exposing digital forgeries by detecting traces of resampling. Trans. Sig. Proc. 53 (2), pp. 758–767. External Links: Cited by: §4.
-  (2005-02) Exposing digital forgeries by detecting traces of re-sampling. IEEE Transactions on Signal Processing 53, pp. 758–767. External Links: Cited by: §1.
-  (2016) A deep learning approach to detection of splicing and copy-move forgeries in images. In 2016 IEEE International Workshop on Information Forensics and Security (WIFS), Vol. , pp. 1–6. Cited by: §1.
-  (2019-01) Robust median filtering forensics using image deblocking and filtered residual fusion. IEEE Access PP, pp. 1–1. External Links: Cited by: §1, §2.4.
-  (2008) Blind forensics of contrast enhancement in digital images. In 2008 15th IEEE International Conference on Image Processing, pp. 3112–3115. Cited by: §1.
-  (2014) Going deeper with convolutions. External Links: Cited by: §3.3.
-  (2018) Median filtering detection of small-size image based on cnn. Journal of Visual Communication and Image Representation 51, pp. 162 – 168. External Links: Cited by: §2.4.
-  (2018-08) Pixel-domain adversarial examples against cnn-based manipulation detectors. Electronics Letters 54, pp. . External Links: Cited by: §2.4.
-  (2020) On the transferability of adversarial examples against cnn-based image forensics. External Links: Cited by: §2.4.
-  (2016) Camera model identification with the use of deep convolutional neural networks. In 2016 IEEE International workshop on information forensics and security (WIFS), pp. 1–6. Cited by: §1, §6.4.
-  (2018) Robust contrast enhancement forensics using convolutional neural networks. CoRR abs/1803.04749. External Links: Cited by: §1.
-  (2017-07) Pyramid scene parsing network. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). External Links: Cited by: §5.