Test-Time Adaptation for Out-of-distributed Image Inpainting

02/02/2021 ∙ by Chajin Shin, et al. ∙ Yonsei University 17

Deep learning-based image inpainting algorithms have shown great performance via powerful learned prior from the numerous external natural images. However, they show unpleasant results on the test image whose distribution is far from the that of training images because their models are biased toward the training images. In this paper, we propose a simple image inpainting algorithm with test-time adaptation named AdaFill. Given a single out-of-distributed test image, our goal is to complete hole region more naturally than the pre-trained inpainting models. To achieve this goal, we treat remained valid regions of the test image as another training cues because natural images have strong internal similarities. From this test-time adaptation, our network can exploit externally learned image priors from the pre-trained features as well as the internal prior of the test image explicitly. Experimental results show that AdaFill outperforms other models on the various out-of-distribution test images. Furthermore, the model named ZeroFill, that are not pre-trained also sometimes outperforms the pre-trained models.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 6

page 7

page 8

page 9

page 10

page 11

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Image inpainting is a task that complete missing region of an image from the information of valid regions. We can remove unwanted objects, texts, or scratches via image inpainting techniques. Following massive development of deep learning-based imaging algorithms, most of image inpainting model are trained with extensive training images. Some methods have been proposed to handle arbitrary mask shape [21, 9]

or to capture semantic structure using novel architectures or loss functions 

[11, 20]. These models can learn good natural image priors from huge dataset and they are good at processing textures and structures if they are frequently appeared in the training dataset.

(a) Input
(b) GatedConv [21]
(c) EdgeConnect [11]
(d) AdaFill(Ours)
Figure 1: GatedConv [21] and EdgeConnect [11] show splashing or diffusing artifacts and cannot grasp internal similarity. In contrast, our method recover with less artifacts.
Figure 2: Left: Overall flow. In the training phase, we begin from the pre-trained inpainting network (or random initialization for ZeroFill). Next, for test-time adaptation, we degrade a test image with random child masks and put them into the network. Output of the network have to be same with the test image at valid regions. After test-time training, we pass the test image with its parent mask to get the final inpainted image. Right: Network structure of inpainting network .

However, these models have difficult in recovering images whose patterns are totally different from the that of training images. This domain gap causes severe color artifacts and these models cannot exploit internally similar patches that are appeared in the test images as depicted in Fig. 1. As shown in Fig. 1, color patterns are not consistent with the surrounding regions and artifacts occur.

To cope with this problem, inspired from the recent internal learning algorithms [19, 14], we propose a test-time adaptation algorithm for image inpainting named AdaFill. We first modify existing inpainting networks to fit a single test image and pre-train the network on the large-scale dataset to achieve external image priors. Next, we train on the test image only so that the network can focus on the internal pixel distribution via exploiting valid regions explicitly. With this simple scheme, our model can handle color artifacts caused by domain gap and exploit internal similarity of a test image. We also propose non-pre-trained version of AdaFill, called ZeroFill, that show comparable performance with the pre-trained models. To our best knowledge, it is the first work that tackles distributional shift problem in image inpainting. In image inpainting, compared to the other restoration tasks, restoration performance is more dependent on the training dataset. Therefore, generalization on the out-of-distributed images is a quite important issue for practical usage.

2 Related Works

Natural images have a high unique internal similarity, which means similar structures or textures across various scales are recurrently appeared within an image. Several studies have been verified that this internal similarity can be utilized for the single image super-resolution task 

[14, 24, 3]. They show that internal statistical prior from a single image is powerful and often better than the generalized statistics from the large-scale training.

Our work is closely related to the ZSSR [14] that perform image super-resolution from a single image via internal learning. They artificially generate training samples from the low-resolution test image using re-downsampling. Compare to the ZSSR that exploits whole degraded images, our method utilizes valid regions as strong training cues. Similarly, DIP [19] propose a method that implicitly learn the prior of a single image. They show that this internal prior can recover various types of degradation including image inpainting. However, this implicit internal prior has difficulty in recovering extreme degradation such as large hole or hole with extreme non-local patterns. In contrast, our method explicitly learns internal prior using artificial training samples as well as external prior using large-scale pre-training.

Model Gated Conv [21] Edge Connect [11] DIP [19] AdaFill(Ours)
Dataset PSNR SSIM LPIPS PSNR SSIM LPIPS PSNR SSIM LPIPS PSNR SSIM LPIPS
T91 [8] 27.15 0.889 0.0755 27.85 0.901 0.0692 25.46 0.851 0.1047 27.26 0.870 0.0843
Urban100 [4] 23.14 0.854 0.0722 24.06 0.866 0.0721 22.23 0.805 0.1082 24.52 0.845 0.0699
Google map [6] 24.65 0.846 0.0939 26.25 0.848 0.0888 24.47 0.836 0.1164 26.73 0.858 0.0910
Facade [17] 26.25 0.900 0.0570 26.02 0.886 0.0688 25.79 0.890 0.0691 28.55 0.921 0.0451
BCCD [2] 34.25 0.956 0.0595 34.43 0.954 0.0662 30.47 0.948 0.0890 34.26 0.962 0.0631
KLH [1] 33.25 0.823 0.1162 20.91 0.791 0.3419 30.33 0.751 0.1579 33.48 0.781 0.1919
Document [12] 19.67 0.910 0.0762 18.84 0.876 0.1316 17.49 0.865 0.1122 20.72 0.919 0.0585
Table 1: Quantitative comparison with GatedConv [21], EdgeConnect [11], and DIP [19]

3 Method

Our overall framework is describe in Fig. 2. In the training phase, we begin from the pre-trained inpainting network and we fine-tune on a single degraded image . For ZeroFill, we skip the pre-training process. We assume that the image is distorted by distortion function with a parent mask from the clean image like . The parent mask represents invalid pixel of with value 1 and valid pixel of with value 0. Therefore, we can represent distorted image like , where represents element-wise multiplication. To enable out network to learn internal similarities and exploit it for inpainting, we define similar distortion function with the child mask . As illustrated in Fig. 2, we degrade given image using child mask . Child masks are randomly generated during training. We denote this double distorted image as . To handle the situation that shape of parent mask is quite different from irregular mask or box mask, we use parent mask as a child mask with random rotation and scaling at a certain rate.

Our goal is to make the network learn mapping function from the double distorted image to the given single distorted image .

(1)

where is a preliminary prediction, is our inpainting network, and is concatenation operation. This prediction image have to be same with the given image for valid pixels. Since we do not know the ground truth of invalid pixels in , we degrade with the same distortion function . After degradation, we use following loss to train the inpainting network .

(2)

From this training step, the network can learn the restoration patterns in the degraded image using valid regions while ignoring parent distortions.

At the inference phase, we proceed one forward propagation with the test image that were used for training.

(3)

where is our final result image.

Structure of our inpainting network is described in the right side of Fig. 2. From the second stage of EdgeConnect [11], we modified some components. Detailed description is explained in Sec. 4.3.

Figure 3: Values above images represent internal similarity score, and values below images mean PSNR / SSIM / LPIPS [22]. Results show that the higher internal similarity score leads to the better results in our model.

4 Experiments

Settings.

We use PyTorch 

[13] framework for the experiments. We pre-train the network using Places365 [23]

dataset for 1 epochs with the settings in 

[11]. For the test-time adaptation, we use following hyper-parameters: batch size 8, learning rate 0.0001, Adam [7] optimizer with , and 1,000 training iterations. For ZeroFill, we use 5,000 iterations without pre-training. We evaluate our model with LPIPS [22], SSIM, and PSNR. We compare our model with the models that learn the explicit prior only (pre-trained models, GatedConv [21] and EdgeConnect [11]), and the internal prior only (DIP [19]).

Dataset. We use various dataset for evaluating our model: T91 [8], Urban100 [4], Google Map [6], Facade [17], BCCD [2], KLH [1], BSD200 [10], and Document [12]. These are out-of-distributed from the Places365 [23] dataset whose distribution is focused on various places images. In contrast, these images are small objects, natural scenes, artificial structures, medical images, satellite images, or text images. We subsample and pre-process each dataset and use two type of holes: box mask and irregular mask. For detailed description, please refer to the supplementary materials.

4.1 Experimental Results

Our quantitative results are described in Table 1. From these results, our model outperforms DIP in all datasets and metrics. In almost datasets and metrics, our model is superior to the pre-trained models even if our model is a one-stage network. It reveals that exploiting internal statistics of a test image is very critical for image inpainting. If the dataset has strong internal similarities, such as Urban100, Google Map, and Facade, our model consistently performs better. In addition, if the distribution of the dataset is far from the training dataset, such as KLH and Document, pre-trained models cannot recover well.

Our qualitative results are compared in Fig. 4. As mentioned above, similar results are observed. In the case of large internal similarity within an image, our model perfectly recover the hole regions, while other models show severe artifacts.

Google Map [6]
Facade [17]
T91 [8]
Urban100 [4]
Document [12]
BSD200 [10]
KLH [1]
Figure 4: Qualitative comparison results with pre-trained model of GatedConv [21], EdgeConnect [11] and DIP [19]. We can confirm that pre-trained models show color artifacts and lower ability in capturing internal similarity in a single image.

4.2 Internal Similarity

From the Fig. 3, result show that the higher internal similarity score, the better restoration performance is achieved from our method. Our method perfectly recover the hole region when the internal similarity is extremely large (the first column of Fig. 3.) To get the internal similarity, first, we use pre-trained VGG19 [15] and extract features from relu 5-1

layer. Next, we calculate pixel-wise similarity using cosine similarity then we get the similarity map whose size is

, where and are height and width of the feature map, respectively. Finally, we average the similarity map to get the final internal similarity score.

PT TTA One St. BN, NN PSNR SSIM LPIPS
EC 25.63 0.847 0.1283
EC-TTA 28.52 0.884 0.1095
EC-TTA 28.57 0.883 0.0882
AdaFill 28.57 0.882 0.0837
ZeroFill 27.47 0.878 0.1108
Table 2:

Ablation study. PT: pre-training, TTA: test-time adaptation, One St: one-stage network, BN: batch normalization, NN: nearest-neighbor upsampling with convolution.

4.3 Ablation Study

We conduct ablation studies to find the optimal structure for test-time adaption on a single image. These results are compared in Table 2. For ablation experiments, we use first 10 images from each dataset in Table 1. We modify two things from the EdgeConnect [11] baseline structure. The first one is using only the second stage of the EdgeConnect to reduce the number of parameters. The second one is replace instance normalization [18] transposed convolution with batch normalization [5] nearest-neighbor upsampling with convolution. These modifications increase the perceptual restoration quality, and reduce the color and annoying artifacts a lot. Results also show that our non-pre-trained model, ZeroFill even show slightly better performance than the pre-trained model.

5 Conclusion

We propose a simple test-time adaptation scheme called AdaFill for image inpainting and ZeroFill as an unsupervised version. Results show that the previous pre-trained models cannot generalize well on the out-of-distributed images. In contrast, our methods can overcome this domain gap and fully exploit the internal similarity of a test image. For future works, to reduce the test time, exploiting meta-learning [16] can be adapted for practical usage.

References

  • [1] (2004) Automatic particle selection: results of a comparative study. Vol. 145, pp. 3 – 14. Note: Automated Particle Selection for Cryo-Electron Microscopy External Links: ISSN 1047-8477 Cited by: Table 1, Figure 4, §4, Figure 16, §6.2.8.
  • [2] a. cosmicad Blood cell count and detection dataset. External Links: Link Cited by: Table 1, §4, Figure 15, §6.2.7.
  • [3] D. Glasner, S. Bagon, and M. Irani (2009) Super-resolution from a single image. In

    2009 IEEE 12th international conference on computer vision

    ,
    pp. 349–356. Cited by: §2.
  • [4] J. Huang, A. Singh, and N. Ahuja (2015) Single image super-resolution from transformed self-exemplars. In

    2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    ,
    Vol. , pp. 5197–5206. External Links: Document, Link Cited by: Table 1, Figure 4, §4, Figure 10, Figure 9, §6.2.4.
  • [5] S. Ioffe and C. Szegedy (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167. Cited by: §4.3.
  • [6] P. Isola, J. Zhu, T. Zhou, and A. A. Efros (2017) Image-to-image translation with conditional adversarial networks. CVPR. External Links: Link Cited by: Table 1, Figure 4, §4, Figure 11, Figure 12, §6.2.5.
  • [7] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.
  • [8] W. Lai, J. Huang, N. Ahuja, and M. Yang (2017) Deep laplacian pyramid networks for fast and accurate super-resolution. In IEEE Conference on Computer Vision and Pattern Recognition, Cited by: Table 1, Figure 4, §4, Figure 6, Figure 7, Figure 8, §6.2.2, §6.2.3.
  • [9] G. Liu, F. A. Reda, K. J. Shih, T. Wang, A. Tao, and B. Catanzaro (2018) Image inpainting for irregular holes using partial convolutions. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 85–100. Cited by: §1.
  • [10] D. Martin, C. Fowlkes, D. Tal, and J. Malik (2001-07) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proc. 8th Int’l Conf. Computer Vision, Vol. 2, pp. 416–423. External Links: Link Cited by: Figure 4, §4, Figure 5, §6.2.1.
  • [11] K. Nazeri, E. Ng, T. Joseph, F. Z. Qureshi, and M. Ebrahimi (2019) Edgeconnect: generative image inpainting with adversarial edge learning. arXiv preprint arXiv:1901.00212. Cited by: 0(c), Figure 1, §1, Table 1, §3, Figure 4, §4.3, §4, Figure 10, Figure 11, Figure 12, Figure 13, Figure 14, Figure 15, Figure 16, Figure 17, Figure 5, Figure 6, Figure 7, Figure 8, Figure 9.
  • [12] N. I. of Standards and Technology (2009) Nist special database 2. External Links: Link Cited by: Table 1, Figure 4, §4, Figure 17, §6.1, §6.2.9.
  • [13] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer (2017) Automatic differentiation in pytorch. In NIPS-W, Cited by: §4.
  • [14] A. Shocher, N. Cohen, and M. Irani (2018) “Zero-shot” super-resolution using deep internal learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3118–3126. Cited by: §1, §2, §2.
  • [15] K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §4.2.
  • [16] J. W. Soh, S. Cho, and N. I. Cho (2020)

    Meta-transfer learning for zero-shot super-resolution

    .
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3516–3525. Cited by: §5.
  • [17] R. Tyleček and R. Šára (2013) Spatial pattern templates for recognition of objects with regular structure. In Proc. GCPR, Saarbrucken, Germany. External Links: Link Cited by: Table 1, Figure 4, §4, Figure 13, Figure 14, §6.2.6.
  • [18] D. Ulyanov, A. Vedaldi, and V. Lempitsky (2016) Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022. Cited by: §4.3.
  • [19] D. Ulyanov, A. Vedaldi, and V. Lempitsky (2018) Deep image prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9446–9454. Cited by: §1, Table 1, §2, Figure 4, §4, Figure 10, Figure 11, Figure 12, Figure 13, Figure 14, Figure 15, Figure 16, Figure 17, Figure 5, Figure 6, Figure 7, Figure 8, Figure 9.
  • [20] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang (2018) Generative image inpainting with contextual attention. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5505–5514. Cited by: §1.
  • [21] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang (2019) Free-form image inpainting with gated convolution. In Proceedings of the IEEE International Conference on Computer Vision, pp. 4471–4480. Cited by: 0(b), Figure 1, §1, Table 1, Figure 4, §4, Figure 10, Figure 11, Figure 12, Figure 13, Figure 14, Figure 15, Figure 16, Figure 17, Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, §6.1.
  • [22] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang (2018)

    The unreasonable effectiveness of deep features as a perceptual metric

    .
    In CVPR, Cited by: Figure 3, §4.
  • [23] B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba (2017)

    Places: a 10 million image database for scene recognition

    .
    IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: §4, §4.
  • [24] M. Zontak and M. Irani (2011) Internal statistics of a single natural image. In CVPR 2011, pp. 977–984. Cited by: §2.

6 Supplementary Materials

6.1 Experimental Configuration

Dataset: For larger than images, we resize images to 256 pixels for shorter axis and random crop to make final resolution of . We only use first 100 images for dataset which is set of more than 100 images. In case of Document [12] dataset, we just downsample by a factor of 4 and random crop to get the resolution of .

Mask: We use irregular mask [21] with rate of 10 to 30%. In case of random box mask, we use 5 to 15% rate. We average results from irregular and random box mask to get final result value.

6.2 Results and Comparisons

6.2.1 Bsd200 [10] Dataset

GT                   Masked                   GC [21]                   EC [11]                   DIP [19]                   Ours
Figure 5: Qualitative comparison results with BSD200[10] dataset

6.2.2 General100 [8] Dataset

GT                 Masked                 GC [21]                 EC [11]                 DIP [19]                 Ours
Figure 6: Qualitative comparison results with General100 [8] dataset

6.2.3 T91 [8] Dataset

GT                 Masked                 GC [21]                 EC [11]                 DIP [19]                 Ours
Figure 7: Qualitative comparison results with T91 [8] dataset
GT                 Masked                 GC [21]                 EC [11]                 DIP [19]                 Ours
Figure 8: Qualitative comparison results with T91 [8] dataset

6.2.4 Urban100 [4] Dataset

GT                 Masked                 GC [21]                 EC [11]                 DIP [19]                 Ours
Figure 9: Qualitative comparison results with Urban100 [4] dataset
GT                 Masked                 GC [21]                 EC [11]                 DIP [19]                 Ours
Figure 10: Qualitative comparison results with Urban100 [4] dataset

6.2.5 Google Map [6] Dataset

GT                 Masked                 GC [21]                 EC [11]                 DIP [19]                 Ours
Figure 11: Qualitative comparison results with Google Map [6] dataset
GT                 Masked                 GC [21]                 EC [11]                 DIP [19]                 Ours
Figure 12: Qualitative comparison results with Google Map [6] dataset

6.2.6 Facade [17] Dataset

GT                 Masked                 GC [21]                 EC [11]                 DIP [19]                 Ours
Figure 13: Qualitative comparison results with Facade [17] dataset
GT                 Masked                 GC [21]                 EC [11]                 DIP [19]                 Ours
Figure 14: Qualitative comparison results with Facade [17] dataset

6.2.7 Bccd [2] Dataset

GT                 Masked                 GC [21]                 EC [11]                 DIP [19]                 Ours
Figure 15: Qualitative comparison results with BCCD [2] dataset

6.2.8 Klh [1] Dataset

GT                 Masked                 GC [21]                 EC [11]                 DIP [19]                 Ours
Figure 16: Qualitative comparison results with KLH [1] dataset

6.2.9 Document [12] Dataset

GT                 Masked                 GC [21]                 EC [11]                 DIP [19]                 Ours
Figure 17: Qualitative comparison results with Document [12] dataset

References

  • [1] (2004) Automatic particle selection: results of a comparative study. Vol. 145, pp. 3 – 14. Note: Automated Particle Selection for Cryo-Electron Microscopy External Links: ISSN 1047-8477 Cited by: Table 1, Figure 4, §4, Figure 16, §6.2.8.
  • [2] a. cosmicad Blood cell count and detection dataset. External Links: Link Cited by: Table 1, §4, Figure 15, §6.2.7.
  • [3] D. Glasner, S. Bagon, and M. Irani (2009) Super-resolution from a single image. In 2009 IEEE 12th international conference on computer vision, pp. 349–356. Cited by: §2.
  • [4] J. Huang, A. Singh, and N. Ahuja (2015) Single image super-resolution from transformed self-exemplars. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 5197–5206. External Links: Document, Link Cited by: Table 1, Figure 4, §4, Figure 10, Figure 9, §6.2.4.
  • [5] S. Ioffe and C. Szegedy (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167. Cited by: §4.3.
  • [6] P. Isola, J. Zhu, T. Zhou, and A. A. Efros (2017) Image-to-image translation with conditional adversarial networks. CVPR. External Links: Link Cited by: Table 1, Figure 4, §4, Figure 11, Figure 12, §6.2.5.
  • [7] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.
  • [8] W. Lai, J. Huang, N. Ahuja, and M. Yang (2017) Deep laplacian pyramid networks for fast and accurate super-resolution. In IEEE Conference on Computer Vision and Pattern Recognition, Cited by: Table 1, Figure 4, §4, Figure 6, Figure 7, Figure 8, §6.2.2, §6.2.3.
  • [9] G. Liu, F. A. Reda, K. J. Shih, T. Wang, A. Tao, and B. Catanzaro (2018) Image inpainting for irregular holes using partial convolutions. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 85–100. Cited by: §1.
  • [10] D. Martin, C. Fowlkes, D. Tal, and J. Malik (2001-07) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proc. 8th Int’l Conf. Computer Vision, Vol. 2, pp. 416–423. External Links: Link Cited by: Figure 4, §4, Figure 5, §6.2.1.
  • [11] K. Nazeri, E. Ng, T. Joseph, F. Z. Qureshi, and M. Ebrahimi (2019) Edgeconnect: generative image inpainting with adversarial edge learning. arXiv preprint arXiv:1901.00212. Cited by: 0(c), Figure 1, §1, Table 1, §3, Figure 4, §4.3, §4, Figure 10, Figure 11, Figure 12, Figure 13, Figure 14, Figure 15, Figure 16, Figure 17, Figure 5, Figure 6, Figure 7, Figure 8, Figure 9.
  • [12] N. I. of Standards and Technology (2009) Nist special database 2. External Links: Link Cited by: Table 1, Figure 4, §4, Figure 17, §6.1, §6.2.9.
  • [13] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer (2017) Automatic differentiation in pytorch. In NIPS-W, Cited by: §4.
  • [14] A. Shocher, N. Cohen, and M. Irani (2018) “Zero-shot” super-resolution using deep internal learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3118–3126. Cited by: §1, §2, §2.
  • [15] K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §4.2.
  • [16] J. W. Soh, S. Cho, and N. I. Cho (2020) Meta-transfer learning for zero-shot super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3516–3525. Cited by: §5.
  • [17] R. Tyleček and R. Šára (2013) Spatial pattern templates for recognition of objects with regular structure. In Proc. GCPR, Saarbrucken, Germany. External Links: Link Cited by: Table 1, Figure 4, §4, Figure 13, Figure 14, §6.2.6.
  • [18] D. Ulyanov, A. Vedaldi, and V. Lempitsky (2016) Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022. Cited by: §4.3.
  • [19] D. Ulyanov, A. Vedaldi, and V. Lempitsky (2018) Deep image prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9446–9454. Cited by: §1, Table 1, §2, Figure 4, §4, Figure 10, Figure 11, Figure 12, Figure 13, Figure 14, Figure 15, Figure 16, Figure 17, Figure 5, Figure 6, Figure 7, Figure 8, Figure 9.
  • [20] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang (2018) Generative image inpainting with contextual attention. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5505–5514. Cited by: §1.
  • [21] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang (2019) Free-form image inpainting with gated convolution. In Proceedings of the IEEE International Conference on Computer Vision, pp. 4471–4480. Cited by: 0(b), Figure 1, §1, Table 1, Figure 4, §4, Figure 10, Figure 11, Figure 12, Figure 13, Figure 14, Figure 15, Figure 16, Figure 17, Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, §6.1.
  • [22] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang (2018) The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, Cited by: Figure 3, §4.
  • [23] B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba (2017) Places: a 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: §4, §4.
  • [24] M. Zontak and M. Irani (2011) Internal statistics of a single natural image. In CVPR 2011, pp. 977–984. Cited by: §2.