awesome-color-constancy
Resources for Color Constancy and Illumination Estimation
view repo
There is active research targeting local image manipulations that can fool deep neural networks (DNNs) into producing incorrect results. This paper examines a type of global image manipulation that can produce similar adverse effects. Specifically, we explore how strong color casts caused by incorrectly applied computational color constancy - referred to as white balance (WB) in photography - negatively impact the performance of DNNs targeting image segmentation and classification. In addition, we discuss how existing image augmentation methods used to improve the robustness of DNNs are not well suited for modeling WB errors. To address this problem, a novel augmentation method is proposed that can emulate accurate color constancy degradation. We also explore pre-processing training and testing images with a recent WB correction algorithm to reduce the effects of incorrectly white-balanced images. We examine both augmentation and pre-processing strategies on different datasets and demonstrate notable improvements on the CIFAR-10, CIFAR-100, and ADE20K datasets.
READ FULL TEXT VIEW PDFResources for Color Constancy and Illumination Estimation
There is active interest in local image manipulations that can be used to fool deep neural networks (DNNs) into producing erroneous results. Such “adversarial attacks” often result in drastic misclassifications. We examine a less explored problem of global image manipulations that can result in similar adverse effects on DNNs’ performance. In particular, we are interested in the role of computational color constancy, which makes up the white-balance (WB) routine on digital cameras.
We focus on computational color constancy because it represents a common source of global image errors found in real images. When WB is applied incorrectly on a camera, it results in an undesirable color cast in the captured image. Images with such strong color casts are often discarded by users. As a result, online image databases and repositories are biased to contain mostly correctly white-balanced images. This is an implicit assumption that is not acknowledged for datasets composed of images crawled from the web and online. However, in real-world applications, it is unavoidable that images will, at some point, be captured with the incorrect WB applied. Images with incorrect WB can have unpredictable results on DNNs trained on white-balanced biased training images, as demonstrated in Fig. 1.
We examine how errors related to computational color constancy can adversely affect DNNs focused on image classification and semantic segmentation. In addition, we show that image augmentation strategies used to expand the variation of training images are not well suited to mimic the type of image degradation caused by color constancy errors. To address these problems, we introduce a novel color augmentation method that can accurately emulate realistic color constancy degradation. We also examine a newly proposed WB correction method [2] to pre-process testing and training images. Experiments on CIFAR-10, CIFAR-100, and the ADE20K datasets using the proposed augmentation and pre-processing correction demonstrate notable improvements to test image inputs with color constancy errors. Code for our proposed color augmenter is available at: https://github.com/mahmoudnafifi/WB_color_augmenter.
Cameras have onboard image signal processors (ISPs) that convert the raw-RGB sensor values to a standard RGB output image (denoted as an sRGB image) [47, 33]
. Computational color constancy, often referred to as WB in photography, is applied to mimic the human’s ability to perceive objects as the same color under any type of illumination. WB is used to identify the color temperature of the scene’s illumination either manually or automatically by estimating the scene’s illumination from an input image (e.g.,
[9, 25, 6, 17, 51, 7, 30, 1]). After WB is applied to the raw-RGB image, a number of additional nonlinear photo-finishing color manipulations are further applied by the ISP to render the final sRGB image [2]. These photo-finishing operations include, but are not limited to, hue/saturation manipulation, general color manipulation, and local/global tone mapping [47, 33, 27, 44, 8]. Cameras generally have multiple photo-finishing styles the user can select [34, 33, 2].When WB is applied incorrectly, it results in sRGB images with strong color casts. Because of the nonlinear photo-finishing operations applied by the ISP after WB, correcting mistakes in the sRGB image is non-trivial [45, 2]. Current solutions require meta-data, estimated from radiometric calibration or raw-image reconstruction methods (e.g., [34, 14, 45]
), that contains the necessary information to undo the particular nonlinear photo-finishing processes applied by the ISP. By converting back to a raw-RGB space, the correct WB can be applied using a diagonal correction matrix and then re-rendered by the ISP. Unfortunately, meta-data to inverse the camera pipeline and re-render the image is rarely available, especially for sRGB images gleaned from the web—as is the case with existing computer vision datasets. Recently, it was shown that white balancing sRGB images can be achieved by estimating a high-degree polynomial correction matrix
[2]. The work in [2], referred to WB for sRGB images (WB-sRGB), introduces a data-driven framework to estimate such polynomial matrix for a given testing image. We build on the WB-sRGB [2] by extending this framework to emulate WB errors on the final sRGB images, instead of correcting WB. We also used the WB-sRGB method [2] to examine applying a pre-process WB correction on training and testing images in order to improve the performance of DNN models against incorrectly white-balanced images.DNN models are susceptible to adversarial attacks in the form of local image manipulation (e.g., see [54, 26, 37, 18]). These images are created by adding a carefully crafted imperceptible perturbation layer to the original image [54, 26]. Such perturbation layers are usually represented by local non-random adversarial noise [54, 26, 41, 58, 3] or local spatial transformations [57]. Adversarial examples are able to misguide pre-trained models to predict either a certain wrong response (i.e., targeted attack) or any wrong response (i.e., untargeted attack) [40, 12, 3]. While incorrect color constancy is not an explicit attempt at an adversarial attack, the types of failures produced by this global modification act much like an untargeted attack and can adversely affect DNNs’ performance.
To overcome limited training data and to increase the visual variation, image augmentation techniques are applied to training images. Existing image augmentation techniques include: geometric transformations (e.g., rotation, translation, shearing) [28, 46, 28, 19], synthetic occlusions [60], pixel intensity processing (e.g., equalization, contrast adjustment, brightness, noise) [56, 19], and color processing (e.g., RGB color jittering and PCA-based shifting, HSV jittering, color channel dropping, color channel swapping) [19, 15, 49, 36, 48, 23, 42, 38, 32]. Traditional color augmentation techniques randomly change the original colors of training images aiming for better generalization and robustness of the trained model in the inference phase. However, existing color augmentation methods often generate unrealistic colors which rarely happen in reality (e.g., green skin or purple grass). More importantly, the visual appearance of existing color augmentation techniques does not well represent the color casts produced by incorrect WB applied onboard cameras, as shown in Fig. 2. As demonstrated in [4, 22, 13], image formation has an important effect on the accuracy of different computer vision tasks. Recently, a simplified version of the camera imaging pipeline was used for data augmentation [13]. This augmentation method in [13], however, explicitly did not consider the effects of incorrect WB due to the subsequent nonlinear operations applied after WB. To address this issue, we propose a camera-based augmentation technique that can synthetically generates images with realistic WB settings.
Normalization layers are commonly used to improve the efficiency of the training process. Such layers apply simple statistics-based shifting and scaling operations to the activations of network layers. The shift and scale factors can be computed either from the entire mini-batch (i.e., batch normalization
[31]) or from each training instance (i.e., instance normalization [55]). Recently, batch-instance normalization (BIN) [43] was introduced to ameliorate problems related to styles/textures in training images by balancing between batch and instance normalizations based on the current task. Though the BIN is designed to learn the trade-off between keeping or reducing original training style variations using simple statistics-based operations, the work in [43] does not provide any study regarding incorrect WB settings. The augmentation and pre-processing methods proposed in our work directly target training and testing images and do not require any change to a DNNs architecture or training regime.We begin by studying the effect of incorrectly white-balanced images on pre-trained DNN models for image classification and semantic segmentation. As a motivation, Fig. 3 shows two different WB settings applied to the same image. Fig. 3 shows that the DNN’s attention for the same scene is considerably altered by changing the WB setting.
For quantitative evaluations, we adopted several DNN models trained for the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012
[21] and the ADE20K Scene Parsing Challenge 2016 [61]. Generating an entirely new labeled testing set composed of images with incorrect WB is an enormous task—ImageNet classification includes 1,000 classes and pixel-accurate semantic annotation requires 60 minutes per image [50]. In lieu of a new testing set, we applied our method which emulates WB errors to the validation images of each dataset. Our method will be detailed shortly in Sec. 4.We apply our method to ImageNet’s validation set to generate images with five different color temperatures and two different photo-finishing styles for a total of ten WB variations for each validation image; 899 grayscale images were excluded from this process. In total, we generated 491,010 images. We examined the following six well-known DNN models, trained on the original ImageNet training images: AlexNet [36], VGG-16 & VGG-19 [52], GoogLeNet [53], and ResNet-50 & ResNet-101 [29]. Table 1 shows the accuracy drop for each model when tested on our generated validation set (i.e., with different WB and photo-finishing settings) compared to the original validation set. In most cases, there is a drop of 10% in accuracy. Fig. 4 shows an example of the impact of incorrect WB.
We used the ADE20K validation set for 2,000 images, and generated ten images with different WB/photo-finishing settings for each image. At the end, we generated a total of 20,000 new images. We tested the following two DNN models trained on the original ADE20K training set: DilatedNet [16, 59] and RefineNet [39]. Table 2 shows the effect of improperly white-balanced images on the intersection-over-union (IoU) and pixel-wise accuracy (pxl-acc) obtained by the same models on the original validation set. While DNNs for segmentation fare better than the results for classification, we still incur a drop of over 2% in performance.
Model | Effect on top-1 accuracy |
---|---|
AlexNet [36] | -0.112 |
VGG-16 [52] | -0.104 |
VGG-19 [52] | -0.102 |
GoogLeNet [53] | -0.107 |
ResNet-50 [29] | -0.111 |
ResNet-101 [29] | -0.109 |
Model | Effect on IoU | Effect on pxl-acc |
---|---|---|
DilatedNet [16, 59] | -0.023 | -0.024 |
RefineNet [39] | -0.031 | -0.026 |
Given an sRGB image, denoted as , that is assumed to be white-balanced with the correct color temperature, our goal is to modify ’s colors to mimic its appearance as if it were rendered by a camera with different (incorrect) color temperatures, , under different photo-finishing styles. Since we do not have access to ’s original raw-RGB image, we cannot re-render the image from raw-RGB to sRGB using a standard camera pipeline. Instead, we have adopted a data-driven method that mimics this manipulation directly in the sRGB color space. Our framework draws heavily from the WB-sRGB data-driven framework [2], which was proposed to correct improperly white-balanced sRGB images. Our framework, however, “emulates” WB errors on the rendered sRGB images. Fig. 5 provides an overview of our method.
Our method relies on a large dataset of sRGB images generated by [2]. This dataset contains images rendered with different WB settings and photo-finishing styles. There is a ground truth sRGB image (i.e., rendered with the “correct” color temperature) associated with each training image. The training sRGB images were rendered using five different color temperatures: 2850 Kelvin (K), 3800K, 5500K, 6500K, and 7500K. In addition, each image was rendered using different camera photo-finishing styles. In our WB emulation framework, we used 17,970 images from this dataset (1,797 correct sRGB images each with ten corresponding images rendered with five different color temperatures and two different photo-finishing styles, Camera Standard and Adobe Standard).
Next, we compute a mapping between the correct white-balanced sRGB image to each of its ten corresponding images. We follow the same procedure of the WB-sRGB method [2] and use a kernel function, , to project RGB colors into a high-dimensional space. Then, we perform polynomial data fitting on these projected values. Specifically, we used :, G, , G, B, RG, RB, GB, , , [24]. The data fitting can be represented by a color transformation matrix computed by the following minimization equation:
(1) |
where and are color matrices of the white-balanced image rendered with the correct color temperature and color values of the same image rendered with the target different color temperature , respectively, is the total number of pixels in each image, is the Frobenius norm, and is represented as a nonlinear full matrix.
We compute a color transformation matrix between each pair of correctly white-balanced image and its corresponding target image rendered with a specific color temperature and photo-finishing. In the end, we have ten matrices associated with each image in our training data.
As shown in Fig. 5, when augmenting an input sRGB image to have different WB settings, we search our dataset for similar sRGB images to the input image. This search is not based on scene content, but on the color distribution of the image. As a result, we represent each image in the training set with the RGB- projected color histogram feature used in [2]. Each histogram feature is represented as an tensor. To further reduce the size of the histogram feature, we apply principal component analysis (PCA) to the three-layer histogram feature. This transformation maps the zero-centered vectorized histogram to a new lower-dimensional space. Our implementation used a 55-dimensional PCA vector. Our final training data therefore consists of the compacted feature vector of each training white-balanced image, the associated color transformation matrices, and the PCA coefficient matrix and bias vector.
Given a new input image , we extract its compacted color feature , and then search for training examples with color distributions similar to the input image’s color distribution. The distance is adopted as a similarity metric between and the training compacted color features. Afterwards, we retrieve the color transformation matrices associated with the nearest training images. The retrieved set of matrices is represented by , where represents the color transformation matrix that maps the white-balanced training image colors to their corresponding image colors rendered with color temperature .
After computing the distance vector between and the nearest training features, we compute a weighting vector as follows [2]:
(2) |
where
is the radial basis function parameter. We used
in our experiments. We construct the final color transformation matrix as a linear weighted combination of the retrieved color transformation matrices . This process is performed as follows [2]:(3) |
Lastly, the “re-rendered” image with color temperature is computed as:
(4) |
Our goal is to improve the performance of DNN methods in the face of test images that may have strong global color casts due to computational color constancy errors. Based on the WB-sRGB framework [2] and the modified framework discussed in Sec. 4, we examine three strategies to improve the robustness of the DNN models.
(1) The first strategy is to apply a WB correction to each testing image in order to remove any unexpected color casts during the inference time. Note that this approach implicitly assumes that the training images are correctly WB. In our experiments, we used the WB-sRGB method [2] to correct the test images, because it currently achieves the state-of-the-art on white balancing sRGB rendered images. We examined adapting the simple diagonal-based correction – which is applied by traditional WB methods that are intended to be applied on raw-RGB images (e.g., gray-world [10]) – but found that they give inadequate results when applied on sRGB images, as also demonstrated in [2]. In fact, applying diagonal-based correction directly on the training image is similar to multiplicative color jittering. This is why we need to use a nonlinear color manipulation (e.g., polynomial correction estimated by [2]) for more accurate WB correction for sRGB images. An example of the difference is shown in Fig. 6.
It is worth mentioning that the training data used by the WB-sRGB method has five fixed color temperatures (2850K, 3800K, 5500K, 6500K, 7500K), all with color correction matrices mapping to their corresponding correct WB. In most cases, one of these five fixed color temperatures will be visually similar to the correct WB. Thus, if the WB-sRGB method is applied to an input image that is already correctly white-balanced, the computed transformation will act as an identity.
(2) The second strategy considers the case that some of the training images may include some incorrectly white-balanced images. We, therefore, also apply the WB correction step to all the training images as well as testing images. This again uses the WB-sRGB method [2] on both testing and training images.
(3) The final strategy is to augment the training dataset based on our method described in Sec. 4. Like other augmentation approaches, there is no pre-processing correction required. The assumption behind this augmentation process is that the robustness of DNN models can be improved by training on augmented images that serve as exemplars for color constancy errors.
Testing images are grouped into two categories. In Category 1 (Cat-1), we expand the original testing images in the CIFAR-10, CIFAR-100, and ADE20K datasets by applying our method to emulate camera WB errors (described in Sec. 4). Each test image now has ten (10) variations that share the same ground truth labels. We acknowledge this is less than optimal, given that the same method to modify the testing image is used to augment the training images. However, we are confident in the proposed method’s ability to emulate WB errors that we feel Cat-1 images represents real-world examples. With that said, we do not apply strategies 1 and 2 to Cat-1, as the WB-sRGB method is based on a similar framework used to generate the testing images. For the sake of completeness, we also include Category 2 (Cat-2), which consists of new datasets generated directly from raw-RGB images. Specifically, raw-RGB images are rendered using the full in-camera pipeline to sRGB images with in-camera color constancy errors. As a result, Cat-2’s testing images exhibit accurate color constancy errors but contain fewer testing images for which we have provided the ground truth labels.
We compare the three above strategies with two existing and widely adopted color augmentation processes: RGB color jittering and HSV jittering.
The nearest neighbor searching was applied using . The proposed WB augmentation model runs in 7.3 sec (CPU) and 1.0 sec (GPU) to generate ten 12-mega-pixel images. The reported runtime was computed using Intel Xeon E5-1607 @ 3.10 GHz CPU and NVIDIA™ Titan X GPU.
To the best of our knowledge, there is no standardized approach for existing color augmentation methods. Accordingly, we tested different settings and selected the settings that produce the best results.
For RGB color jittering, we generated ten images with new colors by applying a random shift to each color channel of the image. For HSV jittering, we generated ten images with new colors by applying a random shift to the hue channel and multiplying each of the saturation and value channels by a random scaling factor . We found that , , and give us the best compromise between having color diversity with low color artifacts during the augmentation process.
For image classification, training new models on ImageNet dataset requires unaffordable efforts—for instance, ILSVRC 2012 consists of 1 million images and would be 10 million images after applying any of the color augmentation techniques. For that reason, we perform experiments on CIFAR-10 and CIFAR-100 datasets [35] due to a more manageable number of images in each dataset.
We trained SmallNet [46] from scratch on CIFAR-10. We also fine-tuned AlexNet [36] to recognize the new classes in CIFAR-10 and CIFAR-100 datasets. For semantic segmentation, we fine-tuned SegNet [5] on the training set of the ADE20K dataset [61].
We train each model on: (i) the original training images, (ii) the WB-sRGB method [2] applied to the original training images, and (iii) original training images with the additional images produced by color augmentation methods. For color augmentation, we examined RGB color jittering, HSV jittering, and our WB augmentation. Thus, we trained five models for each CNN architecture, each of which was trained on one of the mentioned training settings.
For fair comparisons, we trained each model for the same number of iterations. Specifically, the training was for 29,000 and
550,000 iterations for image classification and semantic segmentation tasks, respectively. We adjusted the number of epochs to make sure that each model was trained on the same number of mini-batches for fair comparison between training on augmented and original sets. Note that by using a fixed number of iterations to train models with both original training data and augmented data, we did not fully exploit the full potential of the additional training images when we trained models using additional augmented data.
The training was performed using NVIDIA™ Titan X GPU. The details of training parameters are given in supplemental materials.
Cat-1 tests each model using test images that have been generated by our method described in Sec. 4.
We used the CIFAR-10 testing set (10,000 images) to test SmallNet and AlexNet models trained on the training set of the same dataset. We also used the CIFAR-100 testing set (10,000 images) to evaluate the AlexNet model trained on CIFAR-100. After applying our WB emulation to the testing sets, we have 100,000 images for each testing set of CIFAR-10 and CIFAR-100. The top-1 accuracies obtained by each trained model are shown in Table 3. The best results on our expanded testing images, which include strong color casts, were obtained using models trained on our proposed WB augmented data.
Interestingly, the experiments show that applying WB correction [2] on the training data, in most cases, improves the accuracy using both the original and expanded test sets. DNNs that were trained on WB augmented training images achieve the best improvement on the original testing images compared to using other color augmenters.
Cat-1 | SmallNet [46] on CIFAR-10 [35] | |
Training set | Original | Diff. WB |
Original training set | 0.799 | 0.655 |
“White-balanced” set | 0.801 (+0.002) | 0.683 (+0.028) |
HSV augmented set | 0.801 (+0.002) | 0.747 (+0.092) |
RGB augmented set | 0.780 (-0.019) | 0.765 (+0.11) |
WB augmented set (ours) | 0.809 (+0.010) | 0.786 (+0.131) |
Cat-1 | AlexNet [36] on CIFAR-10 [35] | |
Original training set | 0.933 | 0.797 |
“White-balanced” set | 0.932 (-0.001) | 0.811 (+0.014) |
HSV augmented set | 0.923 (-0.010) | 0.864 (+0.067) |
RGB augmented set | 0.922 (-0.011) | 0.872 (+0.075) |
WB augmented set (ours) | 0.926 (-0.007) | 0.889 (+0.092) |
Cat-1 | AlexNet [36] on CIFAR-100 [35] | |
Original training set | 0.768 | 0.526 |
“White-balanced” set | 0.757 (-0.011) | 0.543 (+0.017) |
HSV augmented set | 0.722 (-0.044) | 0.613 (+0.087) |
RGB augmented set | 0.723 (-0.045) | 0.645 (+0.119) |
WB augmented set (ours) | 0.735 (-0.033) | 0.670 (+0.144) |
We used the ADE20K validation set using the same setup explained in Sec. 3. Table 4 shows the obtained pxl-acc and IoU of the trained SegNet models. The best results were obtained with our WB augmentation; Fig. 7 shows qualitative examples. Additional examples are also given in supplemental materials.
IoU | ||
Cat-1 | Original | Diff. WB |
Original training set | 0.208 | 0.180 |
“White-balanced” set | 0.210 (+0.002) | 0.197 (+0.017) |
HSV augmented set | 0.192 (-0.016) | 0.185 (+0.005) |
RGB augmented set | 0.195 (-0.013) | 0.190 (+0.010) |
WB augmented set (ours) | 0.202 (-0.006) | 0.199 (+0.019) |
Cat-1 | pxl-acc | |
Original training set | 0.603 | 0.557 |
“White-balanced” set | 0.605 (+0.002) | 0.579 (+0.022) |
HSV augmented set | 0.583 (-0.020) | 0.536 (-0.021) |
RGB augmented set | 0.544 (-0.059) | 0.534 (-0.023) |
WB augmented set (ours) | 0.597 (-0.006) | 0.581 (+0.024) |
Cat-2 | SmallNet | ||
Training Set | In-cam AWB | In-cam Diff. WB | WB pre-processing |
Original training set | 0.467 | 0.404 | 0.461 |
“White-balanced” set | 0.496 (+0.029) | 0.471 (+0.067) | 0.492 (+0.031) |
HSV augmented set | 0.477 (+0.001) | 0.462 (+0.058) | 0.481 (+0.02) |
RGB augmented set | 0.474 (+0.007) | 0.475 (+0.071) | 0.470 (+0.009) |
WB augmented set (ours) | 0.494 (+0.027) | 0.496 (+0.092) | 0.484 (+0.023) |
Cat-2 | AlexNet | ||
Original training set | 0.792 | 0.734 | 0.772 |
“White-balanced” set | 0.784 (-0.008) | 0.757 (+0.023) | 0.784 (+0.012) |
HSV augmented set | 0.790 (+0.002) | 0.771 (+0.037) | 0.779 (+0.007) |
RGB augmented set | 0.791 (-0.001) | 0.779 (+0.045) | 0.783 (+0.011) |
WB augmented set (ours) | 0.799 (+0.007) | 0.788 (+0.054) | 0.787 (+0.015) |
Cat-2 data requires us to generate and label our own testing image dataset using raw-RGB images. To this end, we collected 518 raw-RGB images containing CIFAR-10 object classes from the following datasets: HDR+ Burst Photography dataset [27], MIT-Adobe FiveK dataset [11], and Raise dataset [20]. We rendered all raw-RGB images with different color temperatures and two photo-finishing styles using the Adobe Camera Raw module. Adobe Camera Raw accurately emulates the ISP onboard a camera and produces results virtually identical to what the in-camera processing would produce[2]. Images that contain multiple objects were manually cropped to include only the interesting objects—namely, the CIFAR-10 classes. At the end, we generated 15,098 rendered testing images that reflect real in-camera WB settings. We used the following testing sets in our experiments:
(i) In-camera auto WB contains images rendered with the auto WB (AWB) correction setting in Adobe Camera Raw, which mimics the camera’s AWB functionality. AWB does fail from time to time; we manually removed images that had a noticeable color cast. This set of images is intended to be equivalent to testing images on existing image classification datasets.
(ii) In-camera WB settings contains images rendered with the different color temperatures and photo-finishing styles. This set represents testing images that contain WB color cast errors.
(iii) WB pre-processing correction applied to set (ii) contains images of set (ii) after applying the WB-sRGB correction [2]. This set is used to study the potential improvement of applying a pre-processing WB correction in the inference phase.
Table 5 shows the top-1 accuracies obtained by SmallNet and AlexNet on the external testing sets. The experiments show the accuracy is reduced by 6% when the testing set is images that have been modified with incorrect WB settings compared with their original accuracies obtained with “properly” white-balanced images using the in-camera AWB. We also notice that the best accuracies are obtained by applying either a pre-processing WB on both training/testing images or our WB augmentation in an end-to-end manner. Examples of misclassified images are shown in Fig. 8. Additional examples are also given in supplemental materials.
(A) Correctly classified images rendered with in-camera auto WB. (B) Misclassified images rendered with
in-camera different WB. Note that all images in (B) are correctly classified by the same model (AlexNet [36]) trained on WB augmented data.This work has examined the impact on computational color constancy errors on DNNs for image classification and semantic segmentation. A new method to perform augmentation that accurately mimics WB errors was introduced. We show that both pre-processing WB correction and training DNNs with our augmented WB images improve the results for DNNs targeting CIFAR-10, CIFAR-100, and ADE20K datasets. We believe our WB augmentation method will be useful for other tasks targeted by DNN where image augmentation is sought.
This study was funded in part by the Canada First Research Excellence Fund for the Vision: Science to Applications (VISTA) programme and an NSERC Discovery Grant. Dr. Brown contributed to this article in his personal capacity as a professor at York University. The views expressed are his own and do not necessarily represent the views of Samsung Research.
Imagenet classification with deep convolutional neural networks.
In NIPS, 2012.