Time-Travel Rephotography

12/22/2020 ∙ by Xuan Luo, et al. ∙ 3

Many historical people are captured only in old, faded, black and white photos, that have been distorted by the limitations of early cameras and the passage of time. This paper simulates traveling back in time with a modern camera to rephotograph famous subjects. Unlike conventional image restoration filters which apply independent operations like denoising, colorization, and superresolution, we leverage the StyleGAN2 framework to project old photos into the space of modern high-resolution photos, achieving all of these effects in a unified framework. A unique challenge with this approach is capturing the identity and pose of the photo's subject and not the many artifacts in low-quality antique photos. Our comparisons to current state-of-the-art restoration filters show significant improvements and compelling results for a variety of important historical people.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

page 4

page 5

page 7

page 8

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Abraham Lincoln’s face is iconic – we recognize him instantly. But what did he really look like? Our understanding of his appearance is based on grainy, black and white photos from well over a century ago. Antique photos provide a fascinating glimpse of the distant past. However, they also depict a faded, monochromatic world very different from what people at the time experienced. Old photos distort appearance in other less obvious ways. For example, the film of Lincoln’s era was sensitive only to blue and UV light, causing cheeks to appear dark, and overly emphasizing wrinkles by filtering out skin subsurface scatter which occurs mostly in the red channel. Hence, the deep lines and sharp creases that we associate with Lincoln’s face (Figure 1) are likely exaggerated by the photographic process of the time.

To see what Lincoln really looked like, one could travel back in time to take a photo of him with a modern camera, and share that photo with the (modern) world. Lacking a time machine, we instead seek to simulate the result, by projecting an old photo into the space of modern images, a process that we call time-travel rephotography (Figure 1).

Specifically, we start with an antique photo as reference, and wish to generate the high resolution, high quality image that a modern camera would have produced of the same subject. This problem is challenging, as antique photos have a wide range of defects due both to the aging process (fading and dust) and to limitations of early cameras, film, and development processes (low resolution, noise and grain, limited color sensitivity, development artifacts). One approach is to try to restore the image, by applying digital filters that attempt to undo these defects, e.g., noise removal, image deblurring, contrast adjustment, super-resolution, and colorization. A challenge with this approach is that the properties of old film and the aging process haven’t been fully characterized – hence, undoing them is an ill-posed problem. Instead, we propose to

project the antique photo on the the space of modern images, using generative tools like StyleGAN2 [33, 34]. We compare both approaches and show that the projection method generally yields superior results.

Our main contribution is a unified framework, achieving deblurring, superresolution, noise removal, contrast adjustment, and colorization all in one step. Our approach blends physics with statistics, employing physically-motivated models of image formation to capture known processes like camera response, complimented with generative, statistically based methods to complete details like skin pores, where they are missing in the input. Specifically, we model film sensitivity, addressing for the first time different antique photographic emulsions (blue-sensitive, orthochomatic and panchromatic). We also show that sharp photos of good exposure and contrast can be restored by simulating the image degradation process including defocus blur and camera response function (CRF) fitting. A key contribution is to show how the StyleGAN2 framework [33, 34] can be generalized to include these physically-inspired models of image formation, which are shown to significantly improve results. We demonstrate compelling time-travel portraits of many well-known historical figures from the last two centuries, including presidents (e.g., Abraham Lincoln), authors (e.g., Franz Kafka), artists (e.g., Frida Kahlo), and inventors (e.g., Thomas Edison).

2 Related Work

Image Restoration.

To restore degraded images, most prior work focused on a single type of degradation, i.e., denoising [9, 18, 14, 41, 66, 72, 73, 74, 77], deblurring [38, 49, 68, 67, 60], JPEG image deblocking [16, 22, 64], super-resolution [6, 17, 36, 40, 48, 62, 69], etc. Face-specific methods include deblurring [24, 51] and super-resolution  [10, 21, 48, 54, 56]. To address restoration in the presence of multiple

artifacts, researchers have proposed using reinforcement learning 

[71] or attention-based mechanisms [59] to select the best combination of restoration operations to apply to each image. The concurrent work by Wan et. al. [63] also restores portraits suffering from multiple artifacts. However, it does not do super resolution and its restoration quality degrades at high resolution of as evaluated in Sec. 5. None of the aforementioned techniques address colorization.

Colorization research can be categorized into scribble-based, exemplar-based and learning-based methods. Early work [28, 42, 46, 53, 61, 70] employed manual specification of target colors for parts of the image via sparse scribbles drawn on the image. To reduce user-intensive work, an alternative is to transfer color statistics from a (manually-specified) reference image [11, 13, 23, 26, 30, 45, 65]. Identifying a suitable reference, however, is a research topic in itself. Most related are fully automated

colorization methods that use machine learning on a large dataset 

[12, 15, 29, 31, 39, 75, 76, 78]. We compare with many of these methods in our results section.

Despite tremendous progress made in these individual steps, no prior work addresses restoration, colorization and super-resolution in one framework. We demonstrate that addressing all of these together produces better results compared to a sequence of individual best-of-breed restoration operators.

Face latent space embedding.

Embedding faces to the latent space of GANs has been an active field of research. In a sequence of impressive results, Karras et al.’s [33, 34] demonstrate the ability to synthesize high resolution human faces and perform a variety of image editing operations. Their StyleGAN2 approach is capable of reproducing real photos, but the reconstructions often fail to preserve identity, due to lack of flexibility of the underlying latent space . To address this identity-shift problem, several researchers [3, 4, 8] project images to an extended latent space with different style codes for each scale. While these methods better capture identity, they don’t address the restoration problem; given a monochrome, blurred, low-contrast target photo, they optimize for a monochrome, blurred, low-contrast result. Pixel2style2pixel [55] proposes a feed-forward network for projecting to . It, however, will still predict a monochrome output given the monochrome input and exhibit identity shift.

3 Problem Statement

Figure 2: Given an input antique image, we use a feed-forward encoder to compute an exemplar (sibling) in the latent space of StyleGAN2. We use the sibling as a starting point in the latent code optimization step, that seeks for an image that resembles the input. We guide the optimization using a degradation module that simulates the unique properties of antique images while maintaining color and skin texture details present in the sibling.

Our goal is to simulate traveling back in time and rephotographing historical figures with a modern camera. We call this time-travel rephotography, adapting the term rephotography which traditionally means “the act of repeat photography of the same site, with a time lag between the two images” [1, 7]. We focus on portraits recorded a century ago, shortly after cameras were invented in the late 1800s through the early 1900s. These are some of the most challenging photos to restore due both to loss of quality through the aging process and limitations of early film.

Photographic film has evolved significantly since its invention. The first glass-plate emulsions were sensitive only to blue and ultraviolet light [50]. Orthochromatic emulsions [50], introduced in 1873, provided sensitivity to both green and blue light. Photographic portraits of that era rendered skin poorly, artificially darkening lips and exaggerating wrinkles, creases, and freckles, due to the lack of red sensitivity. In particular, they underestimate the effect of subsurface scattering, which gives skin its characteristic smooth appearance, and is more prevalent in the longer wavelengths [32]. Panchromatic film, sensitive to red, green, and blue first appeared in 1907, yet orthochromatic films remained popular through the first half of the 20th century [2].

To simulate rephotographing historical people with a modern camera, we must account for these differences in color sensitivity of antique film, in addition to blur, fading (poor exposure, contrast), noises, low resolution, and other artifacts of antique photos.

4 Method

Input Ours w/o w/o w/o Sibling
Figure 3: Impact of each component in our color transfer loss. Skin and eye details are poorly reconstructed without . Low-frequency color artifacts appear without (e.g., eye and forehead regions). Despite having pleasing colors overall, both artifacts become more pronounced without . Subject: Georg Cantor, circa 1910.

We seek to synthesize a modern photo of a historical person, using an antique black-and-white photo as reference. Our approach is based on the idea of projecting the antique photo into the space of modern high-resolution color images represented by the StyleGAN2 generative model [35].

Our approach resembles previous techniques that optimize the latent representation of StyleGAN2 to synthesize an image [8]. However, in our case, we don’t want to exactly fit the antique image, as that would produce a black and white result with many of the same artifacts. Instead, we want to find a StyleGAN2 synthesized result depicting a subject, that if they were captured by an antique camera, would match the input.

A first step is to convert the StyleGAN2 output to grayscale before comparing with the antique input image. This naive approach is poorly constrained and leads to unrealistic colorized results. Instead, we employ an additional exemplar image as a reference, that has similar facial features as the input, yet contains high frequency details and natural color and lighting. Section 4.1 explains how we compute such an exemplar automatically. We call it the sibling image, as it resembles characteristics of the input while having a different identity, and Section 4.2 introduces losses to constrain the optimization and preserve the color and details present in the sibling image.

To further reduce the perceived identity gap between the input image and the modern portrait, we design reconstruction losses specifically suited for antique images (Section 4.3). A key contribution is a degradation module, that simulates the image formation model present in antique photos, and is applied to the StyleGAN2 result before comparing it with the input antique photo. The degradation module accounts for different types of film substrate, scanning processes and camera response curves, and in conjunction allows for improved rephotography results. We provide details on the latent code optimization in Section 4.4 and a system overview in Figure 2.

4.1 Sibling Encoder

Given a low-resolution grayscale reference image as input, denoted as , we seek to generate a high-resolution color sibling image, denoted as , with realistic color and similar facial features. To this end, we follow previous methods [8, 55, 79] and train a feed-forward encoder that takes as input an antique input image and outputs a -dimensional StyleGAN2 latent code . This code is then converted to the sibling image using the pre-trained StyleGAN2 synthesis network. The sibling encoder network is trained using random samples of StyleGAN2 latent codes and their corresponding images, downsampled to and converted to grayscale based on the emulsion type (see details in Sec. 4.3).

4.2 Sibling Color And Detail Transfer

To further constrain colors and skin details to match the sibling (Fig. 3), we introduce a color transfer loss that enforces the distribution of the StyleGAN2’s ToRGB layers outputs that compose the output to be similar to those for the sibling. We use a formulation inspired by style loss [19], and apply to ToRGB layers outputs. In our implementation, we use a robust Huber loss when comparing elements in the covariance matrices.

Although the color transfer loss encourages matching the style of the image, we found that details like skin texture were not transferred properly (Fig. 3). To further encourage detail synthesis, one could have a reconstruction loss between the sibling and generated image, but such a loss would be very sensitive to misalignments between the sibling and the StyleGAN2 result, and encourage identity shift. We thus introduce a contextual loss between the VGG features of and  [47]. The loss matches the input image features with features in the target and minimizes the best match distance.

4.3 Reconstruction Losses for Antique Images

Input Panchromatic Blue-sensitive
Input Panchromatic Orthochromatic
Figure 4: Top: Abraham Lincoln circa 1863, when the negatives were sensitive only to blue light. The panchromatic reconstruction has exaggerated wrinkles and unnatural colors. Bottom: Emmeline Pankhurst circa 1913, when both orthochromatic and panchromatic film existed. The orthochromatic reconstruction yields a more realistic image.
Input w/o CRF-F with CRF-F Sibling
Figure 5: Camera response function fitting (CRF-F) improves contrast and image exposure (e.g., forehead highlights). Subject: Louis Pasteur (1822 - 1895).
Input w/o blur simulation w/ blur simulation
Figure 6: Impact of including image blur in our reconstruction loss. Best viewed full screen. Top to bottom: Rosalind Franklin (1945) and Ruth Bader Ginsburg (1953).

Rather than fit the antique input image exactly, we seek a loss function that is robust to defects of antique photos. We approach this by introducing a

reconstruction loss that applies a series of modifications to the StyleGAN2 result before comparing it to the input antique image, with the purpose of providing a better guidance for the latent code optimization step. We define as a loss between the input image and a modified version of StyleGAN2 result , where is a degradation process, that attempts to simulate the appearance of the modern generated image as if were taken with an antique camera. In the following, we design to account for potential blur introduced in the scanning process or the negative’s limitation in resolving details, the spectral sensitivity of early negatives, different camera response functions, and low input resolution.

Blur

We apply a Gaussian blur with a user-provided standard deviation

to obtain the final degraded result . Values between 0 and 4 work well in our experiments. This blur approximates loss of detail due to aging and the scanning process, in addition to blur in the original exposure. Fig. 6 illustrates the benefit of simulating blur during optimization.

Antique Film Spectral Sensitivity We convert StyleGAN2’s output to grayscale, denoted as . The grayscale conversion must accommodate for the unique sensitivity of early film which was far more sensitive to blue light than red. In particular, we extract the blue channel for blue-sensitive photos, average the blue and green intensities () to approximate orthochromatic photos [20], and use standard grayscale conversion () for panchromatic photos. As shown in Fig. 4, choosing the right film model can make a significant difference. When the exact spectral sensitivity type of the negative is unknown (based on the photo’s description, capturing time, etc.), the user can simply choose which of the three models produces the best result.

Camera Response Function We model the unknown Camera Response Function (CRF) of the input old photo as where and are the bias, gain and gamma parameters to be optimized over. During optimization, we initialize to and . We observe in Fig. 5 that CRF fitting helps the output to avoid poor exposure and contrast of the input and resemble more that of the sibling.

Reconstruction Loss Using the degradation process outlined above, we define now our reconstruction losses. To maintain the overall face structure, we downsample both and from to and compute a perceptual loss between this result and the input image that uses a combination of VGG [57] and VGG-Face [52] features. To ensure that the eyes are properly reconstructed we add an additional loss where we downsample and to the original input resolution of , crop the eye regions and , and enforce VGG-based perceptual loss between these eye crops. The complete reconstruction loss is:

(1)

where is the downsampling operator to .

4.4 Latent Code Optimization

We recover our final modern portrait result by optimizing the latent code of StyleGAN2 to minimize all losses presented: . Similar to previous hybrid methods [8, 79], we initialize the latent code using the sibling code, and follow Abdal  [3] and optimize over an extended code, that contains 18 different copies of the 512-dimensional latent code used at different resolutions from to . As noted in StyleGAN2 [35], these codes correspond roughly to different perceptual aspects of the image. The coarser spatial codes determine the overall structure of the face (identity, pose, expression etc.), whereas the finer layers decide aspects like skin tone, skin texture, and lighting. We exploit this property and optimize the codes for layers up to in resolution, as they are sufficient to capture the identity and characteristic facial features. The finer spatial codes are copied from the sibling.

4.5 Implementation Details

Our method, including sibling computation and latent code optimization, takes about 10 minutes on 1 NVIDIA TITAN X GPU to compute a result.

Latent Code Optimization Rather than optimizing latent code layers (4-64) at once, we achieve better results by first optimizing the coarse codes (4-32) for 250 iterations and obtain an intermediate result . We then set as our new sibling to transfer the color and details from it using the color transfer and contextual losses, and optimize latent codes of resolution 4 to 64 together for 750 iterations, producing the final output . Note that the color transfer loss is only enforced on the corresponding ToRGB layers of the latent codes being optimized. To explore the latent space more comprehensively, we also add ramped-down noises to the latent code as StyleGAN2’s projection method [35]. We use RAdam optimizer [44] with default parameters and learning rate for the style codes and for the camera response function parameters. The weights of each loss are , , , . See supplementary for details on the specific layers used for all the losses.

Sibling Encoder For each film type described in Section 4.3, we train a sibling ResNet18 encoder [25] using StyleGAN2-generated samples that are converted to grayscale using the corresponding method. We use an L1 loss between the predicted and ground truth latent codes and use color jitter, contrast and exposure augmentations during training. Please see more details on the supplementary.

5 Evaluation

Method Zhang[76] InstC [58] DeOldify [5] Zhang (FFHQ) Ours
FID 162.5 158.7 160.2 160.1 154.3
Users
prefer ours
- 91.3% 75.9% 68.6% -
Table 1: Two quantitative assessments of our approach compared to state-of-the-art methods. Top: FID with respect to the FFHQ dataset for each method (lower is better). Bottom: human preferences from our user study.
Zhang (FFHQ) Ours Zoom-in
Figure 7: Visual comparison with the top-performing baseline, Zhang (FFHQ). The input and sibling images are shown as insets in the first and second columns, respectively. From top to bottom: Thomas Edison (1877), Frida Kahlo (1926), Franz Kafka (1923).
Input Ours DeOldify InstColorization Zhang Zhang (FFHQ)
Figure 8: Comparisons of our approach to a pipeline built from published techniques for restoring, colorizing, and enlarging antique images. We evaluated four prior colorization algorithms, detailed in the paper, all of which fail to achieve the same realistic skin appearance and overall image quality as our approach. Top to bottom: Dorothy Hodgkin (1947), Henry Ford (1928), and Niels Bohr (1922).

We collected 54 photos of 44 unique historical figures dating from the late 19th century to early 20th century, including people like Abraham Lincoln, Thomas Edison, Marie Curie, and Franz Kafka (e.g., Fig. 1). We cropped out the head region using the face alignment method of Karras et al. [33] and resized them to have a maximum resolution of (average , min ). For film model, we use the blue-sensitive model for photos before 1873, manually select between blue-sensitive and orthochromatic for images from 1873 to 1907 and among all models for photos taken afterwards. In our experiments we manually select the input blur kernel. Figures 13456, and 7 all show results of our approach. It’s important to zoom into these images and evaluate them at the target resolution of . Our supplemental data includes results for all 54 photos.

There exists no published baseline method that performs the full compliment of image restoration operators needed for time-travel photography, i.e., noise+blur removal, contrast adjustment, colorization, and super-resolution. We therefore compare our approach to sequentially applying state-of-the-art methods for each of these tasks separately. As a first step, we use Wan  [63] (research concurrent with our own), which was specifically designed for restoring antique portraits, to remove noise and artifacts in the input image at resolution . Note that we experimented with restoring directly at at this stage, but found that [63] produced slightly blurrier outputs compared to using a separate super-resolution technique. As a second step, we colorize the image. We evaluated several colorization techniques including DeOldify [5], InstColorization [58], Zhang  [76]. All of these methods are designed for generic scenes and can struggle with antique portraits. We therefore retrained Zhang’s colorization network using the FFHQ dataset of face images [33], denoted Zhang (FFHQ). We also augmented this training dataset by applying random Gaussian blur and noise to make their method more robust to antique imagery. As a final step, we use the SRFBN [43] (BI model) to super resolve () the colorized result to our target resolution of .

Figure 8 shows a comparison between our method and the pipeline described above, for each of the four colorization methods we evaluated. Zhang (FFHQ) outperforms the other colorization techniques. Figure 7 shows a higher-resolution comparison between our method and Zhang (FFHQ). None of these other methods are able to reproduce as realistic skin appearance and sharp coherent image details as ours.

Input

Result

(Facial) hairstyles Accessories&Clothing Bad quality
Figure 9: Our approach can struggle to render uncommon image features such as certain (facial) hairstyles, accessories, and clothing. Extremely poor image quality or severely compressed intensity gamuts may limit the quality of the result. From left to right: Alexander Grapham Bell (1904), Dowager Cixi (1903), Alexandre Dumas (1855) and Grace Hopper (1906 - 1992).
Figure 10: Wrong gender or ethnicity in sibling prediction can sometimes affect the result. Changing to a different sibling with the correct gender and ethnicity helps to alleviate this issue. Left: Martin Luther King Jr. (1964). Right: Edith Cowan (1900).

We also performed a quantitative assessment using the Fréchet Inception Distance (FID) [27], a measure of realism and image quality. Table 1 reports the FID computed over the set of 54 output images for each technique as compared to the FFHQ dataset [33]. Our method achieves a lower (better) FID score compared to these other methods.

Finally, we performed a user study involving 21 participants not directly involved in this project. Each participant was presented with a random set of 50 pairwise comparisons, and was asked to blindly choose between our result and one of the three baselines (Zhang (FFHQ), DeOldify and InstColorization), when asked Which of these two images is a higher-quality portrait?. We obtained answers for all 54 images in our dataset and for all three baseline methods. The results of this study are presented in Table 1. Our approach was consistently and significantly preferred.

6 Limitations and Future Work

As illustrated in Figure 9, our method does not work equally well across all inputs. Historical hairstyles, accessories, and clothing that are not in the StyleGAN2 training set pose a particular challenge. Its performance degrades for images that are severely distorted or have compressed intensity gamuts. It can also introduce slight perceptual changes in appearance, such as subtly altering the precise shape of a facial feature (e.g., Kafka’s left eye in Fig. 7) and changing the subject’s hairstyle (e.g., Ford’s hair in Fig. 8). In some cases the predicted sibling presents a different gender or ethnicity, which can influence the synthesized result (Fig. 10), and changing to a better sibling helps to alleviate this issue. Addressing these gender/ethnicity shift problems is an important topic of future work, to make sure these historical people are accurately represented. We believe these shortcomings can be addressed by further expanding the expressive power of image generative models like StyleGAN2, better sibling prediction, and generalizing our reconstruction loss (Sec. 4.3).

Sibling

Ours

w/o

Figure 11: Effect of the color transfer loss, . The top of each row visualizes the ToRGB layer outputs of different resolutions before adding the constant bias. A gray image implies zero values. We scale the values up in the bottom of each row for better visualization. With , the ToRGB outputs of the coarse layers have negligible values similar to the sibling. Without , the ToRGB outputs are too large causing low-frequency unnatural color variation in the output. The mouth and forehead region shows too much red, corresponding to the extreme values in layer 16 and 32. Erwin Schrödinger (1933).

7 Conclusion

We introduced time-travel rephotography, an image synthesis technique that simulates rephotographing famous subjects from the past using a modern high-resolution camera based on a black-and-white reference photo. Our basic approach is to project this reference image into the space of modern high-resolution images represented by the StyleGAN2 generative model [33]. This is accomplished through a constrained optimization over latent style codes that is guided by a novel reconstruction loss that simulates the unique properties of old film and cameras. We also introduce a sibling network that generates an image for recovering colors and local details in the result. Improving on applying a sequence of state-of-the-art techniques for image restoration, colorization, and super-resolution, our unified approach is able to render strikingly realistic and immediately recognizable images of historical figures.

Acknowledgements

We thank Roy Or-El, Aleksander Holynski and Keunhong Park for insightful advice. This work was supported by the UW Reality Lab, Amazon, Facebook, Futurewei, and Google.

Appendix A Effect of the Color Transfer Loss

StyleGAN2 [35] produces an image by summing up multiple ToRGB layer outputs. Fig. 11 visualizes the ToRGB outputs, , for each layer before adding its constant bias term. The coarse layer ’s of the in-domain sibling image have negligible values, and hence contributes little to the final color output. Without the color transfer loss (), goes out of typical range for an in-domain face, and thus produces color artifacts as shown in Fig. 11 (bottom) and Fig. 3. Adding makes their distribution similar to the sibling and back in range.

Appendix B Implementation Details

Losses We compute the contextual loss on the relu1_2,relu2_2 and relu3_4 of the pretrained VGG19 network [57]. For the reconstruction loss, we use layers conv1_1, conv2_1, conv3_1 and conv4_1 of the pretrained VGG-Face network [52], and use layers relu1_2, relu2_2, relu3_3, relu4_3 of the pretrained VGG16 network [57].

Sibling Encoder We train three sibling encoders for blue-sensitive, orthochromatic and panchromatic negative types. The exact grayscale conversion for blue-sensitive and orthochromatic negatives are unknown and can be different from one photo to another. Besides, we want our sibling prediction to be less sensitive to the exact brightness of the photo since blue and green channel are typically darker than red and the photo could have been manipulated differently during the original exposure and digitization process. So we augmented the data with color jittering of random brightness factor in , random contrast factor in and random hue factor in . We train the three encoders with Adam [37] optimizer with a learning rate of

and batch size of 4. We train the blue-sensitive and orthochromatic models for 100 epochs and the panchromatic model for 70 epochs.

References

  • [1] https://en.wikipedia.org/wiki/Rephotography.
  • [2] https://en.wikipedia.org/wiki/Photographic_film.
  • [3] Rameen Abdal, Yipeng Qin, and Peter Wonka. Image2stylegan: How to embed images into the stylegan latent space? In

    Proceedings of the IEEE International Conference on Computer Vision

    , pages 4432–4441, 2019.
  • [4] Rameen Abdal, Yipeng Qin, and Peter Wonka. Image2stylegan++: How to edit the embedded images? In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    , pages 8296–8305, 2020.
  • [5] Jason Antic.

    jantic/deoldify: A deep learning based project for colorizing and restoring old images (and video!), 2019.

  • [6] S Derin Babacan, Rafael Molina, and Aggelos K Katsaggelos. Total variation super resolution using a variational approach. In IEEE International Conference on Image Processing, pages 641–644. IEEE, 2008.
  • [7] Soonmin Bae, Aseem Agarwala, and Frédo Durand. Computational rephotography. ACM Trans. Graph., 29(3):24:1–24:15, 2010.
  • [8] Peter Baylies. Stylegan encoder - converts real images to latent space, 2019.
  • [9] Antoni Buades, Bartomeu Coll, and J-M Morel. A non-local algorithm for image denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, volume 2, pages 60–65. IEEE, 2005.
  • [10] Adrian Bulat and Georgios Tzimiropoulos. Super-fan: Integrated facial landmark localization and super-resolution of real-world low resolution faces in arbitrary poses with gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 109–117, 2018.
  • [11] Guillaume Charpiat, Matthias Hofmann, and Bernhard Schölkopf. Automatic image colorization via multimodal predictions. In Proceedings of the European Conference on Computer Vision, pages 126–139. Springer, 2008.
  • [12] Zezhou Cheng, Qingxiong Yang, and Bin Sheng. Deep colorization. In Proceedings of the IEEE International Conference on Computer Vision, pages 415–423, 2015.
  • [13] Alex Yong-Sang Chia, Shaojie Zhuo, Raj Kumar Gupta, Yu-Wing Tai, Siu-Yeung Cho, Ping Tan, and Stephen Lin. Semantic colorization with internet images. ACM Transactions on Graphics (TOG), 30(6):1–8, 2011.
  • [14] Kostadin Dabov, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE Transactions on image Processing, 16(8):2080–2095, 2007.
  • [15] Aditya Deshpande, Jason Rock, and David Forsyth. Learning large-scale automatic image colorization. In Proceedings of the IEEE International Conference on Computer Vision, pages 567–575, 2015.
  • [16] Chao Dong, Yubin Deng, Chen Change Loy, and Xiaoou Tang. Compression artifacts reduction by a deep convolutional network. In Proceedings of the IEEE International Conference on Computer Vision, pages 576–584, 2015.
  • [17] Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Image super-resolution using deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence, 38(2):295–307, 2015.
  • [18] Michael Elad and Michal Aharon. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 15(12):3736–3745, 2006.
  • [19] Leon A Gatys, Alexander S Ecker, and Matthias Bethge.

    Image style transfer using convolutional neural networks.

    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2414–2423, 2016.
  • [20] Joe Geigel and F Kenton Musgrave. A model for simulating the photographic development process on digital images. In Proceedings of the annual Conference on Computer graphics and interactive techniques, pages 135–142, 1997.
  • [21] Klemen Grm, Walter J Scheirer, and Vitomir Štruc. Face hallucination using cascaded super-resolution and identity priors. IEEE Transactions on Image Processing, 29(1):2150–2165, 2019.
  • [22] Jun Guo and Hongyang Chao. Building dual-domain representations for compression artifacts reduction. In Proceedings of the European Conference on Computer Vision, pages 628–644. Springer, 2016.
  • [23] Raj Kumar Gupta, Alex Yong-Sang Chia, Deepu Rajan, Ee Sin Ng, and Huang Zhiyong. Image colorization using similar images. In Proceedings of the ACM International Conference on Multimedia, pages 369–378, 2012.
  • [24] Yoav Hacohen, Eli Shechtman, and Dani Lischinski. Deblurring by example using dense correspondence. In Proceedings of the IEEE International Conference on Computer Vision, pages 2384–2391, 2013.
  • [25] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
  • [26] Mingming He, Dongdong Chen, Jing Liao, Pedro V Sander, and Lu Yuan. Deep exemplar-based colorization. ACM Transactions on Graphics (TOG), 37(4):1–16, 2018.
  • [27] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in neural information processing systems, pages 6626–6637, 2017.
  • [28] Yi-Chin Huang, Yi-Shin Tung, Jun-Cheng Chen, Sung-Wen Wang, and Ja-Ling Wu. An adaptive edge detection based colorization algorithm and its applications. In Proceedings of the annual ACM International Conference on Multimedia, pages 351–354, 2005.
  • [29] Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. Let there be color! joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Transactions on Graphics (ToG), 35(4):1–11, 2016.
  • [30] Revital Ironi, Daniel Cohen-Or, and Dani Lischinski. Colorization by example. In Rendering Techniques

    , pages 201–210. Citeseer, 2005.

  • [31] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1125–1134, 2017.
  • [32] Henrik Wann Jensen, Stephen R Marschner, Marc Levoy, and Pat Hanrahan. A practical model for subsurface light transport. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pages 511–518, 2001.
  • [33] Tero Karras, Samuli Laine, and Timo Aila.

    A style-based generator architecture for generative adversarial networks.

    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4401–4410, 2019.
  • [34] Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8110–8119, 2020.
  • [35] Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Analyzing and improving the image quality of StyleGAN. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020.
  • [36] Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1646–1654, 2016.
  • [37] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  • [38] Orest Kupyn, Volodymyr Budzan, Mykola Mykhailych, Dmytro Mishkin, and Jiří Matas. Deblurgan: Blind motion deblurring using conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8183–8192, 2018.
  • [39] Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. Learning representations for automatic colorization. In Proceedings of the European Conference on Computer Vision, pages 577–593. Springer, 2016.
  • [40] Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4681–4690, 2017.
  • [41] Stamatios Lefkimmiatis.

    Non-local color image denoising with convolutional neural networks.

    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3587–3596, 2017.
  • [42] Anat Levin, Dani Lischinski, and Yair Weiss. Colorization using optimization. In ACM SIGGRAPH 2004 Papers, pages 689–694. 2004.
  • [43] Zhen Li, Jinglei Yang, Zheng Liu, Xiaomin Yang, Gwanggil Jeon, and Wei Wu. Feedback network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3867–3876, 2019.
  • [44] Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han.

    On the variance of the adaptive learning rate and beyond.

    In Proceedings of the Eighth International Conference on Learning Representations (ICLR 2020), April 2020.
  • [45] Xiaopei Liu, Liang Wan, Yingge Qu, Tien-Tsin Wong, Stephen Lin, Chi-Sing Leung, and Pheng-Ann Heng. Intrinsic colorization. In ACM SIGGRAPH Asia 2008 papers, pages 1–9. 2008.
  • [46] Qing Luan, Fang Wen, Daniel Cohen-Or, Lin Liang, Ying-Qing Xu, and Heung-Yeung Shum. Natural image colorization. In Proceedings of the Eurographics Conference on Rendering Techniques, pages 309–320, 2007.
  • [47] Roey Mechrez, Itamar Talmi, and Lihi Zelnik-Manor. The contextual loss for image transformation with non-aligned data. In Proceedings of the European Conference on Computer Vision, pages 768–783, 2018.
  • [48] Sachit Menon, Alexandru Damian, Shijia Hu, Nikhil Ravi, and Cynthia Rudin. Pulse: Self-supervised photo upsampling via latent space exploration of generative models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2437–2445, 2020.
  • [49] Seungjun Nah, Tae Hyun Kim, and Kyoung Mu Lee. Deep multi-scale convolutional neural network for dynamic scene deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3883–3891, 2017.
  • [50] Beaumont Newhall. The History of Photography: From 1839 to the Present. The Museum of Modern Art, 5 edition, 1982.
  • [51] Jinshan Pan, Zhe Hu, Zhixun Su, and Ming-Hsuan Yang. Deblurring face images with exemplars. In Proceedings of the European Conference on Computer Vision, pages 47–62. Springer, 2014.
  • [52] Omkar M. Parkhi, Andrea Vedaldi, and Andrew Zisserman.

    Deep face recognition.

    In British Machine Vision Conference, 2015.
  • [53] Yingge Qu, Tien-Tsin Wong, and Pheng-Ann Heng. Manga colorization. ACM Transactions on Graphics (TOG), 25(3):1214–1220, 2006.
  • [54] Wenqi Ren, Jiaolong Yang, Senyou Deng, David Wipf, Xiaochun Cao, and Xin Tong. Face video deblurring using 3d facial priors. In Proceedings of the IEEE International Conference on Computer Vision, pages 9388–9397, 2019.
  • [55] Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. Encoding in style: a stylegan encoder for image-to-image translation. arXiv preprint arXiv:2008.00951, 2020.
  • [56] Ziyi Shen, Wei-Sheng Lai, Tingfa Xu, Jan Kautz, and Ming-Hsuan Yang. Deep semantic face deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8260–8269, 2018.
  • [57] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  • [58] Jheng-Wei Su, Hung-Kuo Chu, and Jia-Bin Huang. Instance-aware image colorization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020.
  • [59] Masanori Suganuma, Xing Liu, and Takayuki Okatani. Attention-based adaptive selection of operations for image restoration in the presence of unknown combined distortions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9039–9048, 2019.
  • [60] Jian Sun, Wenfei Cao, Zongben Xu, and Jean Ponce. Learning a convolutional neural network for non-uniform motion blur removal. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 769–777, 2015.
  • [61] Daniel Sỳkora, John Dingliana, and Steven Collins. Lazybrush: Flexible painting tool for hand-drawn cartoons. Computer Graphics Forum, 28(2):599–608, 2009.
  • [62] Ying Tai, Jian Yang, Xiaoming Liu, and Chunyan Xu. Memnet: A persistent memory network for image restoration. In Proceedings of the IEEE International Conference on Computer Vision, pages 4539–4547, 2017.
  • [63] Ziyu Wan, Bo Zhang, Dongdong Chen, Pan Zhang, Dong Chen, Jing Liao, and Fang Wen. Old photo restoration via deep latent space translation, 2020.
  • [64] Zhangyang Wang, Ding Liu, Shiyu Chang, Qing Ling, Yingzhen Yang, and Thomas S Huang. D3: Deep dual-domain based fast restoration of jpeg-compressed images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2764–2772, 2016.
  • [65] Tomihisa Welsh, Michael Ashikhmin, and Klaus Mueller. Transferring color to greyscale images. In Proceedings of the Conference on Computer graphics and interactive techniques, pages 277–280, 2002.
  • [66] Junyuan Xie, Linli Xu, and Enhong Chen. Image denoising and inpainting with deep neural networks. In Advances in neural information Processing systems, pages 341–349, 2012.
  • [67] Li Xu, Jimmy SJ Ren, Ce Liu, and Jiaya Jia. Deep convolutional neural network for image deconvolution. In Advances in Neural Information Processing Systems, pages 1790–1798, 2014.
  • [68] Li Xu, Xin Tao, and Jiaya Jia. Inverse kernels for fast spatial deconvolution. In Proceedings of the European Conference on Computer Vision, pages 33–48. Springer, 2014.
  • [69] Jianchao Yang, John Wright, Thomas S Huang, and Yi Ma. Image super-resolution via sparse representation. IEEE transactions on image Processing, 19(11):2861–2873, 2010.
  • [70] Liron Yatziv and Guillermo Sapiro. Fast image and video colorization using chrominance blending. IEEE transactions on Image Processing, 15(5):1120–1129, 2006.
  • [71] Ke Yu, Chao Dong, Liang Lin, and Chen Change Loy. Crafting a toolchain for image restoration by deep reinforcement learning. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pages 2443–2452, 2018.
  • [72] Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing, 26(7):3142–3155, 2017.
  • [73] Kai Zhang, Wangmeng Zuo, Shuhang Gu, and Lei Zhang. Learning deep cnn denoiser prior for image restoration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3929–3938, 2017.
  • [74] Kai Zhang, Wangmeng Zuo, and Lei Zhang. Ffdnet: Toward a fast and flexible solution for cnn-based image denoising. IEEE Transactions on Image Processing, 27(9):4608–4622, 2018.
  • [75] Richard Zhang, Phillip Isola, and Alexei A Efros. Colorful image colorization. In Proceedings of the European Conference on Computer Vision, pages 649–666. Springer, 2016.
  • [76] Richard Zhang, Jun-Yan Zhu, Phillip Isola, Xinyang Geng, Angela S Lin, Tianhe Yu, and Alexei A Efros. Real-time user-guided image colorization with learned deep priors. ACM Transactions on Graphics (TOG), 36(4):1–11, 2017.
  • [77] Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, and Yun Fu. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2472–2481, 2018.
  • [78] Jiaojiao Zhao, Li Liu, Cees GM Snoek, Jungong Han, and Ling Shao. Pixel-level semantics guided image colorization. arXiv preprint arXiv:1808.01597, 2018.
  • [79] Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, and Alexei A Efros. Generative visual manipulation on the natural image manifold. In European conference on computer vision, pages 597–613. Springer, 2016.