Self-Adaptively Learning to Demoire from Focused and Defocused Image Pairs

11/03/2020 ∙ by Lin Liu, et al. ∙ HUAWEI Technologies Co., Ltd. 2

Moire artifacts are common in digital photography, resulting from the interference between high-frequency scene content and the color filter array of the camera. Existing deep learning-based demoireing methods trained on large scale datasets are limited in handling various complex moire patterns, and mainly focus on demoireing of photos taken of digital displays. Moreover, obtaining moire-free ground-truth in natural scenes is difficult but needed for training. In this paper, we propose a self-adaptive learning method for demoireing a high-frequency image, with the help of an additional defocused moire-free blur image. Given an image degraded with moire artifacts and a moire-free blur image, our network predicts a moire-free clean image and a blur kernel with a self-adaptive strategy that does not require an explicit training stage, instead performing test-time adaptation. Our model has two sub-networks and works iteratively. During each iteration, one sub-network takes the moire image as input, removing moire patterns and restoring image details, and the other sub-network estimates the blur kernel from the blur image. The two sub-networks are jointly optimized. Extensive experiments demonstrate that our method outperforms state-of-the-art methods and can produce high-quality demoired results. It can generalize well to the task of removing moire artifacts caused by display screens. In addition, we build a new moire dataset, including images with screen and texture moire artifacts. As far as we know, this is the first dataset with real texture moire patterns.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

page 8

page 13

page 14

page 15

page 16

page 17

page 18

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Image demoiréing is the task of removing moiré patterns from images, taken by digital cameras from screens or from natural images with high-frequency patterns. Moiré artifacts are caused by the interference between the color filter array (CFA) of a camera and high-frequency repetitive signals, which can result from an LCD screen’s subpixel layout or a natural scene’s high-frequency repetitive patterns (e.g., textures on clothes). Image demoiréing is challenging as the moiré patterns vary in shape, color, and frequency. Existing deep learning based demoiréing models [39, 20, 12, 50, 24] rely heavily on training with large amounts of annotated clean and moiré image pairs in order to obtain good performance. However, the models are still limited in handling various complex moiré patterns. Moreover, these models are restricted to perform demoiréing of images captured from screens, having difficulty in removing moiré artifacts from natural images.

Lack of high-quality training data also limits the performance of supervised methods. There are two public datasets for screen image demoiréing, which are TIP2018 dataset [39] and LCDMoire dataset [46, 47]. TIP2018 is a real dataset with slight misalignment between each image pair and LCDMoire is a synthetic dataset. Because both datasets were developed for screen image demoiréing, they are unsuitable to train a model to remove moiré patterns from images of high-frequency textures.

To reduce moiré artifacts, some digital-camera manufacturers design special hardware, including special CFAs (Fuji’s X-Trans, Sigma SD Quattro) and variable low-pass filters (Sony RX1RM2). Special CFAs and variable low-pass filters require special hardware design and thus cannot be widely used on smartphones.

In this paper, we propose a self-adaptive learning method for image demoiréing. Our method removes moiré patterns from a moiré image with the help of a moiré-free blur image. We design a defocusing method to model the low-pass filter to obtain the blur image without moiré patterns. We use it as an additional input (defocused moiré-free image) to help remove moiré patterns from the moiré image (focused), and treat it as a joint filtering problem. Our method can be easily applied to any digital camera. During the focusing process, a defocused blur image can be stored and combined with the focused image to perform image demoiréing. Deep image prior [17] shows that the structure of a generator network can capture a great deal of low-level image statistics without any training. In our model, we use a 3-layer fully connected sub-network to generate a blur kernel, and adopt a U-Net-like encoder/decoder architecture to perform image demoiréing. Neither network is learned in an explicit training stage; but rather they are learned at test-time through an iterative, self-adaptive optimization.

In summary, our main contributions are:

  1. We propose a self-adaptive learning method for image demoiréing, which uses an additional input (defocused moiré-free image) to help remove moiré patterns from the focused moiré image.

  2. We create a new dataset with pairs of focused moiré and defocused moiré-free images, containing both screen moiré images and high-frequency texture moiré images111In this paper, the terms ‘moiré image’ and ‘moiré-free image’ denote an image with and without moiré patterns. In addition, ‘screen moiré image’ means an image whose moiré patterns is caused by digital screen, and ‘texture moiré image’ denotes an image with moiré patterns caused by high-frequency textures..

  3. Quantitative and qualitative experimental results on both public and our datasets show that our model outperforms state-of-the-art methods.

  4. Our method, without a training stage, can be easily applied to any digital camera or smartphone.

2 Related Work

In this section, we review the most relevant work, including image demoiréing, joint filtering, self-adaptive learning, and blind deblurring.

Image Demoiréing. There are two common scenarios: screen image demoiréing and texture image demoiréing. Screen image demoiréing focuses on removing moiré patterns from photos taken from screens, where moiré patterns are mainly caused by the interference between the screen’s subpixel layout and the camera’s color filter array. Texture image demoiréing deals with moiré patterns that are produced by photographing high-frequency scene content (e.g., fabric and long-distance buildings), which interferes with the CFA. Early work [37, 33, 36] on screen image demoiréing focus on certain specific moiré patterns (striped, dotted or monotonous moiré patterns). Recently, some deep learning models [39, 20, 12, 24, 50] cast screen demoiréing as an image restoration problem and can handle more types of moiré patterns. Liu et al. [20]

built a coarse-to-fine convolutional neural network to remove moiré patterns from photos taken from screens. Sun

et al. [39]

proposed a multi-resolution convolutional neural network for demoiréing and released an associated dataset. He

et al. [12] labeled the data in [39] with three attribute labels of moiré patterns, which is beneficial to learn diverse patterns. These methods are all supervised and need training with a large-scale dataset. Moreover, after training, they cannot generalize well to texture image demoiréing.

Unlike screen image demoiréing, removing moiré patterns in texture images is more challenging as moiré patterns appear only at the high-frequency areas and are always mixed with the underlying textures. Recently, some researchers [44, 21, 45] have attempted to handle texture image demoiréing. Yang et al. [44] and Liu et al. [21] tried to remove moiré artifacts using low-rank and sparse matrix decomposition. Moiré patterns are also common artifacts from the image signal processing pipeline in a camera, especially from image demosaicing [7, 23]. Gharbi et al. [7] proposed to alleviate the moiré artifacts by fine-tuning their demosaicing model on a moiré-prone dataset, which was collected by measuring the frequency change from the ground-truth image and the demosaiced image. Our network uses an additional input (defocused moiré-free image) to help remove moiré patterns from the moiré image and does not need training.

Joint Filtering. Joint filtering has been applied to many low-level vision tasks [31, 8], with the aim of leveraging the guidance image as a prior and transferring the structural details to the target image. It has good ability in handling images from different domains. Many applications have been tried including depth/RGB image restoration [8], flash/no-flash image denoising [31], texture removal [10, 49], etc. Local joint filtering methods [40, 4, 13, 49] make use of a locally linear model to explore the relationship among neighboring pixels. Representative methods include bilateral filtering [40, 4] and guided filtering [13]. But these methods usually introduce erroneous structures into the target image because they only explore the local structures of the guidance image. Global joint filtering methods [9, 5, 42] optimize a global objective function. Different hand-crafted priors were proposed to enforce the target image and guidance image to have similar structures. But the hand-crafted priors may not reflect inherent structural details in the target image. Recently, some deep learning based joint filtering algorithms [19, 41, 28] have been proposed and shown better results. Pan et al. [28] presented spatially variant linear representation coefficients, which are determined by both the guidance image and the input image, to decide whether the structural details should be transferred to the output image.

Our method can also be viewed as a joint filtering method, taking two inputs (a focused moiré image and a defocused blur image). The defocused blur image provides important structural information to guide the demoiréing network. The moiré patterns, especially low-frequency patterns, can be treated as a new structure overlaid on the structures of the moiré-free images. With the defocused image as a guide, the demoiréing network can enhance the original structural information and suppress the moiré structures. Unlike these deep learning based joint filtering methods, ours does not need training and produces a moiré-free image only from a defocused and focused image pair.

Self-Adaptive Learning.

Self-adaptive learning has been used in some specific low-level vision tasks such as super-resolution

[35, 14, 3], deblurring [25, 1, 32], inpainting [48] and dehazing [2]. They exploit the internal recurrence of information in an image without training. Recently, some researchers proposed some frameworks which can deal with multiple low-level vision tasks. Lempitsky et al. [17] showed that the structure of the deep image prior (DIP) neural network is very good to capture the low-level statistics of a single natural image. Gandelsman et al. [6]

proposed a double-DIP framework for decomposing a single image into two layers. However, these two networks are unable to generate good results in the demoiréing problem, partly because moiré patterns are widely distributed in the spatial and frequency domains.

Blind Deblurring. Blind image deblurring is a very challenging problem because it needs to estimate both the blur kernel and the clean image from a blur image. Blind image deblurring methods can be divided into optimization-based and deep-learning based. Optimization-based methods use different priors for modeling clean images, such as gradient-based prior [30, 51], patch-based prior [25, 38] and dark channel prior [43, 29]. For modeling accurate blur kernels, gradient sparsity prior [18, 29] and spectral prior [22] are usually adopted. In most cases, it is a blind deblurring problem to deblur a defocused image from an uncalibrated camera. Unlike these methods, we can obtain a clear image with moiré patterns through focusing. This image contains rich details and benefits restoring a clean moiré-free image. We also design a generative network to estimate the blur kernel.

3 The Proposed Method

We first introduce the formulation of the problem, then design the network structure and finally present our algorithm.

3.1 Problem Formulation

When the image is defocused, image blur is spatially invariant in the same depth and the blur image can be formulated as,

(1)

where is the underlying clean image, is the blur kernel, is the additive Gaussian noise with noise level , and denotes 2D convolution. In most cases where the camera is not calibrated, we need to estimate both and from a blur image , which is an ill-posed problem.

When the image is focused, moiré artifacts may appear in high-frequency areas. To remove the moiré patterns, we suppose an image demoiréing network can obtain the clean image such that

(2)

where is a focused image contaminated with moiré patterns. Putting Eqns. 1 and 2 together, we have

(3)

Thus, the blur image is obtained by first removing the moiré patterns from and then convolving with the blur kernel . In this combined formulation, Eqn 3, the demoir’eing network can be learned without using the underlying clean image as the ground truth.

Inspired by the DIP framework [17], we propose to use a generative network to capture the blur kernel as a prior. Finally the demoiréing problem is formulated as

(4)

where

is a fixed vector and is sampled from the uniform distribution [0,1].

is the estimated blur kernel using the generator . However, only optimizing Eqn. 4 cannot guarantee that will give good demoiréing results. Following [18, 29], we add the following constraints to the blur kernel,

(5)
(6)

where denotes the -th element in the blur kernel. Note that in Section 4 we will describe how to obtain .

Figure 1: Illustration of our model. The generative network is utilized to obtain a prior of the blur kernel. The network can output a moiré-free image after multiple iterations. The whole network is self-learned using only the focused image with moiré patterns and the defocused image without moiré patterns.

3.2 Network Structure

Fig. 1 shows the structure of our model. The input of is a focused image with moiré patterns.

is a U-Net-like network where its first 5 layers of the encoder are connected via skip-connections to the 5 layers of the decoder. A convolutional output layer with the sigmoid function is used to generate the moiré-free image

. U-Net-like structures have been shown to work well in many low-level computer vision tasks

[15, 39, 17]. For the network , a blur kernel usually contains much less information than an image, and can be well estimated by a simpler generative network. Thus, we adopt a 3-layer fully-connected network (FCN) to serve as . It takes a 200-dimentional vector (noise) as the input. The hidden layer and the output layer have 1,000 nodes and nodes, respectively, and the blur kernel size is

. A softmax layer is applied to the output layer of

to ensure the constraints in Eqns. 5 and 6.

3.3 Optimization Algorithm

The optimization process of Eqn. 4 can be viewed as a self-adaptive learning method. With only the defocused moiré-free image and the focused moiré image , the networks and iteratively find better weights, making produce clear images with fewer and fewer moiré patterns. The parameters of and are simultaneously updated by back-propagation. Based on extensive experiments, we find that the joint optimization of and is better than the alternating optimization222The detail processes of the joint optimization and the alternating optimization and their difference are explained in the supplementary material. of them (see the ablation study in Section 5.1).

4 A New Dataset

We create a new dataset to evaluate our method quantitatively and qualitatively, as there is no public dataset available for the specific setting in this paper.

Synthetic Data: For the screen moiré patterns, the data are sampled from TIP2018 dataset [39], from which we randomly choose 130 image pairs (moiré images and moiré-free images). The moiré images serve as the focused moiré images and the moiré-free images serve as the ground truth. To synthesize a defocused moiré-free image, we apply a Gaussian smoothing kernel (with from 0.8 to 1.6) and an additive Gaussian noise (with noise level from 0 to 0.2) to the moiré-free image using Eqn. 1. We assume a simplified case where the whole image has the same blurriness. We call this subset SynScreenMoire.

For texture moiré patterns, we collect 30 high-quality images (with dense and regular textures) from the Internet and treat them as ground truth. The method in [44] is adopted to synthesize the corresponding moiré images. Finally, we use the same method for the screen moiré images to synthesize the defocused images from the ground truth. This subset is called SynTextureMoire.

Figure 2: Illustration of image acquisition.

Real Data: We build a real dataset with 100 pairs, each with a focused moiré image and a defocused moiré-free image. It includes 50 pairs where the moiré patterns are caused by the interference between the camera CFAs and the screen pixel layouts (this subset is called RealScreenMoire), and another 50 pairs where the moiré patterns are caused by the interference between the camera CFAs and the high-frequency textures of the images (this subset is called RealTextureMoire). As shown in Fig. 2, to capture a pair of images, we design an image acquisition pipeline, which mainly consists of two steps: image capture and image alignment.

(1) Image Capture. Each image is displayed at the centre of a computer screen (Fig. 2(a) and (b)) and the background color of the screen is set to white for better alignment. To produce a wide variety of moiré patterns, we use three types of smartphones (OPPO R9, HONOR 9 and HUAWEI P30 PRO). For different image pairs, we randomly change the distance and angle between the camera and the computer screen. The cameras are placed on a tripod. It is worth noting that when capturing texture moiré images, the camera is farther away from the screen than when acquiring screen moiré images, to avoid screen moiré patterns. By adjusting the distance between the camera and the screen, when the displayed high-frequency image textures (perceived by the camera) have frequencies similar to the camera’s CFA, texture moiré patterns appear. Since the frequencies of the image textures are much lower than the frequencies of the computer screen’s subpixel layout, screen moiré patterns are minimized. In order to avoid camera shaking when changing the focus and defocus settings, we use a laptop to remotely control the zooming and shooting of the mobile phones. The capture process is shown in Fig. 2(f). We use a screen with 4k resolution to display the images when we make the RealTextureMoire subset. Using different phone or camera models as our capture devices ensures that the moiré patterns are across different optical sensors, while the diversity of display screens for RealScreenMoire exhibits the difference in screen resolution333The detailed information about the camera models and the screens is described in the supplementary material.. (2) Image Alignment.

As the focal length increases, the objects in the image will appear larger. We need to register each defocused and focused image pair. With the help of a white background, we first binarize the captured image to find its area and then crop it. We then align an image pair using the homography

[11].

5 Experiments

In this section, we show an ablation study and comparisons with state-of-the-art methods. Our algorithm is implemented in Pytorch. The experiments are conducted on a NVIDIA RTX 2080Ti GPU. In the kernel generation network,

is sampled from the uniform distribution in [0,1] with a fixed random seed 0. The initial learning rate is set to 0.01 and reduced by a half for every 500 iterations. The algorithm runs for 3000 iterations for each image pair.

5.1 Ablation Study

Figure 3: One example of the ablation study.
Network M+B Z B Alter K M
PSNR/SSIM 27.43/0.852 28.42/0.869 27.17/0.842 29.57/0.846 27.31/0.855 26.10/0.734
Table 1: Ablation study on SynScreenMoire.

The SynScreenMoire subset is used to verify that the network can extract sufficient information from the focused image. The results are shown in Table 1, where ‘M+B’ means that the input to the network is the concatenation of the focused and the defocused images; ‘B’ denotes the input is only the defocused image; ‘Z’ means the input is a 2D noise image (sampled from the uniform distribution in [0,1]) of the same size as the focused image; ‘M’ stands for our original input (focused moiré image). Comparing ‘M’ with ‘M+B’ shows that adding the defocused moiré-free image as an additional input decreases the performance. We speculate that the whole network is confused by this blur image in the input. This also verifies the necessity of learning a blur kernel. Comparing ‘M’ with ‘Z’ and ‘B’ shows that the network does extract useful information from the focused moiré image. Adding the moire image provides a good local minimum for the network to converge to. In fact, although the moiré image is corrupted by moire patterns, it still retains many high-frequency details. The U-NET can retain these details in the iteration for the reconstruction of a clean image from the blur image that lacks high-frequency details. As shown in Fig. 3, the result from ‘M’ is better than those from ‘Z’ and ‘B’. Another baseline is by treating the final result as a spatially variant linear representation [28] of the defocused moiré-free image (). The result shows that our end-to-end training (‘M’) is more powerful than predicting the parameters ( and ) of the image. We also compare our joint optimization method with the alternating optimization method (‘Alter’ in Table 1), which optimizes and alternately (see the supplementary material for more details). Moreover, we test the necessity of using a network (FCN) to generate the blur kernel by replacing the FCN with a learnable kernel, noted as ‘K’ in Table 1. The result shows that the PSNR/SSIM of ‘K’ is smaller than that of ‘M’. Adding the FCN to learn the blur kernel can make the image satisfy the total variation prior and smooth the noise. In addition, we also tried the other architecture of , the encoder-decoder and ResNet. Their PSNRs are 0.11dB and 1.05dB, respectively lower than U-NET. Many other image restoration methods have shown that U-NET has advantages over the two structures.

Method DMCNN [39] CFNet [20] MopNet [12] DIP [17] GF [13] DJF [19] MSJF [34] FDNet
S or U? S S S U U S U U
SynScreenMoire PSNR/SSIM 26.15/0.869 25.62/0.820 26.45/0.856 22.57/0.757 27.23/0.808 31.06/0.898 22.82/0.785
SynTextureMoire PSNR/SSIM 22.79/0.714 22.08/0.702 23.44/0.789 22.51/0.720 21.99/0.525 22.40/0.752 24.70/0.687
Table 2: Quantitative comparison on SynScreenMoire and SynTextureMoire. S and U in the second row refer to Supervised and Unsupervised, respectively. The best results are highlighted in bold.
Method MopNet [12] DIP [17] DoubleDIP [6] GF [13] DJF [19] MSJF [34] SVLRM [28] FDNet
S or U? S U U U S U S U
RealScreenMoire NIQE/BRISQUE 5.57/30.53 5.35/33.89 5.69/45.92 6.42/45.91 5.75/30.96 5.34/29.34 9.42/31.04 /29.87
RealTextureMoire NIQE/BRISQUE 17.85/42.81 18.73/42.53 13.64/48.44 11.99/51.84 21.65/42.13 11.67/45.21 12.63/42.74
Table 3: Quantitative comparison of image demoiréing on RealScreenMoire and RealTextureMoire. The best results are highlighted in bold and the second best are underlined.

5.2 Comparison with State-of-the-Art

We compare our method with state-of-the-art demoiréing methods (DMCNN [39], CFNet [20] and MopNet [12]), joint filtering methods (GF [13], MSJF [34], DJF [19] and SVLRM [28]) and unsupervised image restoration methods (DIP [17] and DoubleDIP [6]). We also compare with some state-of-the-art blind deblurring methods to show that our model obtains useful information from the focused moiré images. For all blind deblurring methods and our method, the blur kernel and noise level are unknown. In what follows, we term our model FDNet since it uses both the Focused and Defocused image pair for demoiréing.

Evaluation on SynScreenMoire and SynTextureMoire.

On the synthetic data with ground truth, we can use the Peak Signal-to-Noise Ratio (PSNR) and the Structural Similarity Index Measure (SSIM) to compare the restored images. The supervised methods, DMCNN, CFNet, MopNet and DJF are trained on TIP2018 for testing on

SynScreenMoire, and trained on MITMoire for testing on SynTextureMoire. As shown in Table 2, FDNet outperforms all the state-of-the-art methods. We also perform experiments on SynTextureMoire to compare with the state-of-the-art blind deblurring methods to show FDNet extracts useful information from the focused moiré images. For DCP [29] and DeblurGANv2 [16], the PSNRs/SSIMs are 28.53/0.875 and 28.58/0.864, respectively, which are smaller than FDNet’s 31.77/0.926.

Evaluation on RealScreenMoire and RealTextureMoire. On the real data without ground truth, we evaluate all generated images using no-reference quality metrics, which estimate absolute image quality scores. For objective quality measurement, we use the Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) [26] and Naturalness Image Quality Evaluator (NIQE) [27]. BRISQUE extracts the point-wise statistics of local normalized luminance signals and measures image naturalness. NIQE is based on the construction of a quality-aware collection of statistical features based on a simple space domain natural scene statistic model. Note that a smaller NIQE or BRISQUE score means an image with better quality (lower is better). An off-the-shelf trained algorithm (MATLAB2018b) is used to obtain the NIQE and BRISQUE scores. To compare with the existing methods, we randomly choose 25 images and 38 images from our real dataset RealScreenMoire and RealTextureMoire, respectively. The supervised methods MopNet, DJF and SVLRM are trained on TIP2018 for testing on RealScreenMoire, and trained on MITMoire for testing on RealTextureMoire. As shown in Table 3, our method overall outperforms both the supervised and unsupervised methods.

Figure 4: Visual comparison among our FDNet and other models, evaluated on images from SynScreenMoire and SynTextureMoire.

Qualitative Results. As shown in Fig. 4, the result of DIP has obvious moiré artifacts left, and DoubleDIP has a global color shift from the original input and over-smoothed details. These two deep image prior methods cannot effectively remove moiré patterns, perhaps because they struggle to learn the low-frequency characteristics and the color diversity of moiré patterns. Moire patterns and noise are different; the former are prevalent more in low and mid-frequencies. DIP relies on the spectral bias of the CNN to learn lower frequencies first. So before DIP learns the high-frequency details of the image, moire patterns have appeared in the results of DIP. The demoiréing only methods (DMCNN, CFNet and MopNet) cannot effectively remove the moiré patterns. In addition, the joint filtering methods (GF and MSJF) tend to blur the high-frequency regions and cannot remove the moiré patterns well with the guidance of the blur image. In contrast, our method FDNet eliminates the moiré patterns more effectively, benefiting from the accurate prediction of the blur kernel. In addition, FDNet retains the original textures in the images with moiré patterns removed instead of over-smoothing the high-frequency regions. More results are provided in the supplementary material.

5.3 Practical Applications of our Method

Our method has the potential of being applied to smartphones without modifying the hardware. In the typical capture mode, the camera usually is embedded with an auto-focus algorithm, which can be modified to save an additional defocused image. Unlike variable hardware low-pass filters in some DSLR cameras that require user control, our method is invisible to the user. The defocused image and the focused image can be used to perform image demoiréing.

6 Conclusion

We have proposed a self-adaptive learning method for moiré pattern removal. Our network predicts a moiré-free clear image from a focused image with moiré patterns, with the help of a corresponding defocused moiré-free blur image, It substantially outperforms state-of-the-art demoiréing methods and joint filtering methods. The moiré-free blur image is easy to obtain through software or hardware. In addition, we have built the first dataset with pairs of focused moiré images and defocused moiré-free images. The future work includes finding more accurate blur kernel estimation and more efficient restoration.

Broader Impact

Our method improves the quality of photographs taken from a digital camera, by removing moiré patterns to restore an underlying clean, moiré-free image. By design, the algorithm produces restored images that are more faithful to the true scene. This makes the photograph’s information more apparent and representative. It is envisioned better quality images will have a positive societal benefit, making visually recorded information more detailed, informative, and useful.

With any approach that improves image quality also comes the risk of negative uses, such as privacy issues. For images of natural scenes, further downstream applications such as surveillance and tracking may become more effective particularly in high-frequency regions of an image where moiré patterns are more likely.

Another consideration is that by removing moiré patterns from pictures taken of digital displays, it may become more difficult to determine, from the image alone, if it is taken of natural scene or of a display such as a computer screen. Potentially this could make it easier for one to take a photograph of a digital screen and claim that the photo is an authentic capture of real scene. However, there may be other indicators if the photo is taken of the screen, particularly if the layout of the LCD elements are visible. Potential future research could explore the difficulty of classifying screen and natural images, with and without demoiréd results produced by the proposed method.

We note, although our method substantially improves the state-of-the-art, it is not perfect and its failures may result in moiré patterns to remain in an image, or be replaced with blurry outputs. As the method is self-adaptive, learning at test-time from two images, we believe the implications of learning from biased data to be minimal.

Funding Disclosure

The work was supported by the Computer Vision Research Project of Huawei Noah’s Ark Lab.

References

  • [1] Yuval Bahat, Netalee Efrat, and Michal Irani. Non-uniform blind deblurring by reblurring. In ICCV, 2017.
  • [2] Yuval Bahat and Michal Irani. Blind dehazing using internal patch recurrence. In ICCP, 2016.
  • [3] Xi Cheng, Zhenyong Fu, and Jian Yang. Zero-shot image super-resolution with depth guided internal degradation learning.
  • [4] Fredo Durand and Julie Dorsey. Fast bilateral filtering for the display of high-dynamic-range images. SIGGRAPH, 2002.
  • [5] David Ferstl, Christian Reinbacher, Rene Ranftl, Matthias Ruther, and Horst Bischof. Image guided depth upsampling using anisotropic total generalized variation. In ICCV, 2013.
  • [6] Yossi Gandelsman, Assaf Shocher, and Michal Irani. “double-dip”: Unsupervised image decomposition via coupled deep-image-priors. CVPR, 2019.
  • [7] Michaël Gharbi, Gaurav Chaurasia, Sylvain Paris, and Frédo Durand. Deep joint demosaicking and denoising. TOG, 2016.
  • [8] Shuhang Gu, Wangmeng Zuo, Shi Guo, Yunjin Chen, Chongyu Chen, and Lei Zhang. Learning dynamic guidance for depth image enhancement. In CVPR, 2017.
  • [9] Xiaojie Guo, Yu Li, Jiayi Ma, and Haibin Ling. Mutually guided image filtering. TPAMI, 2018.
  • [10] Bumsub Ham, Minsu Cho, and Jean Ponce. Robust image filtering using joint static and dynamic guidance. In CVPR, 2015.
  • [11] Richard Hartley and Andrew Zisserman. Multiple view geometry in computer vision. Cambridge University Press, 2003.
  • [12] Bin He, Ce Wang, Boxin Shi, and Ling-Yu Duan. Mop moire patterns using mopnet. In ICCV, 2019.
  • [13] Kaiming He, Jian Sun, and Xiaoou Tang. Guided image filtering. TPAMI, 2013.
  • [14] Jia-Bin Huang, Abhishek Singh, and Narendra Ahuja. Single image super-resolution from transformed self-exemplars. In CVPR, 2015.
  • [15] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. In CVPR, 2017.
  • [16] Orest Kupyn, Tetiana Martyniuk, Junru Wu, and Zhangyang Wang. Deblurgan-v2: Deblurring (orders-of-magnitude) faster and better. ICCV, 2019.
  • [17] Victor Lempitsky, Andrea Vedaldi, and Dmitry Ulyanov. Deep image prior. CVPR, 2018.
  • [18] Anat Levin, Yair Weiss, Fredo Durand, and William T Freeman. Understanding and evaluating blind deconvolution algorithms. In CVPR, 2009.
  • [19] Yijun Li, Jiabin Huang, Narendra Ahuja, and Minghsuan Yang. Deep joint image filtering. ECCV, 2016.
  • [20] Bolin Liu, Xiao Shu, and Xiaolin Wu. Demoiréing of camera-captured screen images using deep convolutional neural network. arXiv preprint arXiv:1804.03809, 2018.
  • [21] Fanglei Liu, Jingyu Yang, and Huanjing Yue. Moiré pattern removal from texture images via low-rank and sparse matrix decomposition. In VCIP, 2015.
  • [22] Guangcan Liu, Shiyu Chang, and Yi Ma. Blind image deblurring using spectral properties of convolution operators. TIP, 2014.
  • [23] Lin Liu, Xu Jia, Jianzhuang Liu, and Qi Tian. Joint demosaicing and denoising with self guidance. In CVPR, 2020.
  • [24] Lin Liu, Jianzhuang Liu, Shanxin Yuan, Gregory Slabaugh, Ales Leonardis, Wengang Zhou, and Qi Tian. Wavelet-based dual-branch network for image demoiréing. In ECCV, 2020.
  • [25] Tomer Michaeli and Michal Irani. Blind deblurring using internal patch recurrence. In ECCV, 2014.
  • [26] Anish Mittal, Anush K Moorthy, and Alan C Bovik. Blind/referenceless image spatial quality evaluator. ASILOMAR, 2011.
  • [27] Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. Making a “completely blind” image quality analyzer. IEEE Signal Processing Letters, 2012.
  • [28] Jinshan Pan, Jiangxin Dong, Jimmy Ren, Liang Lin, Jinhui Tang, and Minghsuan Yang. Spatially variant linear representation models for joint filtering. CVPR, 2019.
  • [29] Jinshan Pan, Deqing Sun, Hanspeter Pfister, and Minghsuan Yang. Deblurring images via dark channel prior. TPAMI, 2018.
  • [30] Jinshan Pan, Hu Zhe, Zhixun Su, and Ming Hsuan Yang. L0 -regularized intensity and gradient prior for deblurring text images and beyond. TPAMI, 2017.
  • [31] Georg Petschnigg, Richard Szeliski, Maneesh Agrawala, Michael Cohen, Hugues Hoppe, and Kentaro Toyama. Digital photography with flash and no-flash image pairs. TOG, 2004.
  • [32] Dongwei Ren, Kai Zhang, Qilong Wang, Qinghua Hu, and Wangmeng Zuo. Neural blind deconvolution using deep priors. In CVPR, 2020.
  • [33] Ryoji Sasada, Masahiko Yamada, Shoji Hara, Hideya Takeo, and Kazuo Shimura. Stationary grid pattern removal using 2d technique for moiré-free radiographic image display. In Medical Imaging, 2003.
  • [34] Xiaoyong Shen, Chao Zhou, Li Xu, and Jiaya Jia. Mutual-structure for joint filtering. ICCV, 2015.
  • [35] Assaf Shocher, Nadav Cohen, and Michal Irani. “zero-shot” super-resolution using deep internal learning. In CVPR, 2018.
  • [36] Hasib Siddiqui, Mireille Boutin, and Charles A Bouman. Hardware-friendly descreening. TIP, 2009.
  • [37] Denis N Sidorov and Anil Christopher Kokaram. Suppression of moire patterns via spectral analysis. In VCIP, 2002.
  • [38] Libin Sun, Sunghyun Cho, Jue Wang, and James Hays. Edge-based blur kernel estimation using patch priors. ICCP, 2013.
  • [39] Yujing Sun, Yizhou Yu, and Wenping Wang. Moiré photo restoration using multiresolution convolutional neural networks. TIP, 2018.
  • [40] Carlo Tomasi and Roberto Manduchi. Bilateral filtering for gray and color images. ICCV, 1998.
  • [41] Huikai Wu, Shuai Zheng, Junge Zhang, and Kaiqi Huang. Fast end-to-end trainable guided filter. CVPR, 2018.
  • [42] Qiong Yan, Xiaoyong Shen, Li Xu, Shaojie Zhuo, Xiaopeng Zhang, Liang Shen, and Jiaya Jia. Cross-field joint image restoration via scale map. In ICCV, 2013.
  • [43] Yanyang Yan, Wenqi Ren, Yuanfang Guo, Rui Wang, and Xiaochun Cao. Image deblurring via extreme channels prior. CVPR, 2017.
  • [44] Jingyu Yang, Fanglei Liu, Huanjing Yue, Xiaomei Fu, Chunping Hou, and Feng Wu. Textured image demoiréing via signal decomposition and guided filtering. TIP, 2017.
  • [45] Shanxin Yuan, Radu Timofte, Ales Leonardis, Gregory Slabaugh, et al. Ntire 2020 challenge on image demoireing: Methods and results. In CVPRW, 2020.
  • [46] Shanxin Yuan, Radu Timofte, Gregory Slabaugh, and Ales Leonardis. Aim 2019 challenge on image demoireing: Dataset and study. ICCVW, 2019.
  • [47] Shanxin Yuan, Radu Timofte, Gregory Slabaugh, Ales Leonardis, Bolun Zheng, Xin Ye, Xiang Tian, Yaowu Chen, Xi Cheng, Zhenyong Fu, et al. Aim 2019 challenge on image demoireing: Methods and results. In ICCVW, 2019.
  • [48] Haotian Zhang, Long Mai, Ning Xu, Zhaowen Wang, John Collomosse, and Hailin Jin. An internal learning approach to video inpainting. In ICCV, 2019.
  • [49] Qi Zhang, Xiaoyong Shen, Li Xu, and Jiaya Jia. Rolling guidance filter. In ECCV, 2014.
  • [50] Bolun Zheng, Shanxin Yuan, Gregory Slabaugh, and Ales Leonardis. Image demoireing with learnable bandpass filters. In CVPR, 2020.
  • [51] Wangmeng Zuo, Dongwei Ren, David Zhang, Shuhang Gu, and Lei Zhang. Learning iteration-wise generalized shrinkage–thresholding operators for blind deconvolution. TIP, 2016.

Appendix A Cameras and Screens

We use three cameras and three screens to capture our dataset; please see Table 4 for the specifications. Note that the HONOR Intelligence Screen is a screen with 4K resolution and is used to display images for the RealTextureMoire subset. A high-resolution screen helps avoid screen moiré patterns when acquiring texture moiré images.

Capture device Display device
Manufacturer Model Image Resolution Manufacturer Model Resolution
OPPO R9 SAMSUNG S22F350H
HONOR 9 HONOR Intelligence Screen 4K
HUAWEI P30 PRO HP E243
Table 4: Camera specifications and screen specifications.

Appendix B The Alternating Optimization Method

The following algorithms (Procedure 1 and Procedure 2) show the joint optimization method and the baseline alternating optimization method compared in the ablation study in Section 5.1 of the main paper. The difference between the joint optimization used for our FDNet and the alternating optimization is shown on the lines 6–10 of Procedure 2, where and are updated in an alternating fashion.

0:  Focused image with moiré patterns and defocused blur image without moiré patterns;
0:  Estimated moiré-free image ;
1:  Initialize and with Gaussian random weights;
2:  Sample from the uniform distribution [0,1];
3:  for  = 1 to  do
4:     ; ; ;
5:     ;
6:     Update and simultaneously using the ADAM algorithm;
7:  end for
8:  return  .
Procedure 1 The joint optimization algorithm.
0:  Focused image with moiré patterns and defocused blur image without moiré patterns;
0:  Estimated moiré-free image ;
1:  Initialize and with Gaussian random weights;
2:  Sample from the uniform distribution [0,1];
3:  for  = 1 to  do
4:     ; ; ;
5:     ;
6:     if  is even then
7:        Update using the ADAM algorithm; ;
8:     else
9:        Update using the ADAM algorithm; ;
10:     end if
11:  end for
12:  return  .
Procedure 2 The alternating optimization algorithm.

Appendix C Results from Real Natural Scenes

We also test our model on a smartphone HUAWEI P30 PRO. We collect some focused and defocused image pairs from natural scenes, where the focused images have texture moiré patterns, as shown in Figure 5. To test on the real world examples, we do some preprocessing, e.g., alignment. We keep the areas where the moire is produced at the same depth. FDNet generalizes well to images taken from natural scenes (not from screens), as the results are moiré-free and the details are retained from the focused moiré image.

Figure 5: Examples captured from natural scenes.

Appendix D Efficiency Comparison among our Model, DIP and Double-DIP

We evaluate the efficiencies of the FDNet and the deep-image-prior methods (DIP and DoubleDIP) on an NVIDIA RTX 2080Ti GPU. The number of iterations for DIP to find an optimal result varies from image to image, and needs to be manually adjusted. In the demoiréing task, DIP takes 1000 iterations. Double-DIP also requires 1000 iterations to converge. Our model does not have the problem of getting worse results when iterations are over some threshold, due to the constraint by the blur image. FDNet converges in about 500 iterations and then its PSNR slightly increases with the iteration number increasing (see Figure 6 for one example). As shown in Table 5, our FDNet has a faster runtime.

Algorithm DoubleDIP DIP FDNet
Time (s) 280 43 30
Parameters (MB) 3.08 2.22 2.64
Table 5: Efficiency comparison.

Appendix E Visualization of Intermediate Results and Blur Kernels

We visualize some intermediate results (see Figure 6), which show that as the number of iterations increases, the moiré patterns gradually disappear. Figure 7 shows the the learnt blur kernels for different image pairs. Note that the learnt blur kernels are learned from scratch and adaptive to each image pair.

Figure 6: Intermediate results of one example in SynScreenMoire. The numbers to the left of the PSNR are the numbers of iterations.
Figure 7: Visualization of estimated blur kernels.

Appendix F Examples of our New Dataset

Figures 8 and 9 show some examples of the RealScreenMoire subset and the RealTextureMoire subset, respectively. Note that the focused images have more details overlaid with moiré patterns, while the corresponding defocused images have no moiré patterns but appear blurry.

Figure 8: Examples of RealScreenMoire.
Figure 9: Examples of RealTextureMoire.

Appendix G Additional Visual Comparisons on SynScreenMoire, SynTextureMoire, RealScreenMoire and RealTextureMoire

Figure 10, Figure 11, Figure 12 and Figure 13 show more results on SynScreenMoire, SynTextureMoire, RealScreenMoire, and RealScreenMoire, respectively. The main paper presents an analysis of the results for Figures 10 and 11 on SynScreenMoire and SynTextureMoire.

As shown in Figures 12 and 13, the deep-learning based methods (SVLRM, MopNet and DJF) produce some artifacts near the edges. DJF also tends to over-sharpen the images and exhibits ringing artifacts. The results of DIP have obvious moiré artifacts left, and DoubleDIP has a global color shift from the original input. In addition, the joint filtering methods (GF and MSJF) tend to smooth the high-frequency regions. Our FDNet outperforms all of them.

Figure 10: Visual comparisons on SynScreenMoire.
Figure 11: Visual comparisons on SynTextureMoire.
Figure 12: Visual comparisons on RealScreenMoire (without ground truth).
Figure 13: Visual comparisons on RealTextureMoire (without ground truth).