DeepAI
Log In Sign Up

MIPI 2022 Challenge on RGBW Sensor Re-mosaic: Dataset and Report

Developing and integrating advanced image sensors with novel algorithms in camera systems are prevalent with the increasing demand for computational photography and imaging on mobile platforms. However, the lack of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). To bridge the gap, we introduce the first MIPI challenge including five tracks focusing on novel image sensors and imaging algorithms. In this paper, RGBW Joint Remosaic and Denoise, one of the five tracks, working on the interpolation of RGBW CFA to Bayer at full resolution, is introduced. The participants were provided with a new dataset including 70 (training) and 15 (validation) scenes of high-quality RGBW and Bayer pairs. In addition, for each scene, RGBW of different noise levels was provided at 0dB, 24dB, and 42dB. All the data were captured using an RGBW sensor in both outdoor and indoor conditions. The final results are evaluated using objective metrics including PSNR, SSIM, LPIPS, and KLD. A detailed description of all models developed in this challenge is provided in this paper. More details of this challenge and the link to the dataset can be found at https://github.com/mipi-challenge/MIPI2022.

READ FULL TEXT VIEW PDF
09/15/2022

MIPI 2022 Challenge on Quad-Bayer Re-mosaic: Dataset and Report

Developing and integrating advanced image sensors with novel algorithms ...
09/15/2022

MIPI 2022 Challenge on RGBW Sensor Fusion: Dataset and Report

Developing and integrating advanced image sensors with novel algorithms ...
09/15/2022

MIPI 2022 Challenge on RGB+ToF Depth Completion: Dataset and Report

Developing and integrating advanced image sensors with novel algorithms ...
09/15/2022

MIPI 2022 Challenge on Under-Display Camera Image Restoration: Methods and Results

Developing and integrating advanced image sensors with novel algorithms ...
06/02/2021

NTIRE 2021 Challenge on High Dynamic Range Imaging: Dataset, Methods and Results

This paper reviews the first challenge on high-dynamic range (HDR) imagi...
05/17/2021

Fast and Accurate Quantized Camera Scene Detection on Smartphones, Mobile AI 2021 Challenge: Report

Camera scene detection is among the most popular computer vision problem...
05/21/2021

Helsinki Deblur Challenge 2021: description of photographic data

The photographic dataset collected for the Helsinki Deblur Challenge 202...

1 Introduction

RGBW is a new type of CFA pattern (Fig. 1

) designed for image quality enhancement under low light conditions. Thanks to the higher optical transmittance of white pixels over conventional red, green, and blue pixels, the signal-to-noise ratio (SNR) of the sensor output becomes significantly improved, thus boosting the image quality especially under low light conditions. Recently several phone OEMs, including Transsion, Vivo, and Oppo have adopted RGBW sensors in their flagship smartphones to improve the camera image quality 

[4, 5, 1].

On the other hand, conventional camera ISPs can only work with Bayer patterns, thereby requiring an interpolation procedure to convert RGBW to a Bayer pattern. The interpolation process is usually referred to as remosaic, and a good remosaic algorithm should be able (1) to get a Bayer output from RGBW with least artifacts, and (2) to fully take advantage of the SNR and resolution benefit of white pixels.

The remosaic problem becomes more challenging when the input RGBW becomes noisy, especially under low light conditions. A joint remosaic and denoise task is thus in demand for real world applications.

Figure 1: The RGBW remosaic task.

In this challenge, we intend to remosaic the RGBW input to obtain a Bayer at the same spatial resolution. The solution is not necessarily deep-learning. However, to facilitate the deep learning training, we provide a dataset of high-quality RGBW and Bayer pairs, including 100 scenes (70 scenes for training, 15 for validation, and 15 for testing). We provide a Data Loader to read these files and show a simple ISP in Fig. 

2

to visualize the RGB output from the Bayer and to calculate loss functions. The participants are also allowed to use other public-domain datasets. The algorithm performance is evaluated and ranked using objective metrics: Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM) 

[6], Learned Perceptual Image Patch Similarity (LPIPS) [8], and KL-divergence (KLD). The objective metrics of a baseline method are available as well to provide a benchmark.

Figure 2: An ISP to visuailize the output Bayer and to calculate the loss function.

This challenge is a part of the Mobile Intelligent Photography and Imaging (MIPI) 2022 workshop and challenges emphasizing the integration of novel image sensors and imaging algorithms, which is held in conjunction with ECCV 2022. It consists of five competition tracks:

  1. RGB+ToF Depth Completion uses sparse and noisy ToF depth measurements with RGB images to obtain a complete depth map.

  2. Quad-Bayer Re-mosaic converts Quad-Bayer RAW data into Bayer format so that it can be processed by standard ISPs.

  3. RGBW Sensor Re-mosaic converts RGBW RAW data into Bayer format so that it can be processed by standard ISPs.

  4. RGBW Sensor Fusion fuses Bayer data and a monochrome channel data into Bayer format to increase SNR and spatial resolution.

  5. Under-display Camera Image Restoration improves the visual quality of the image captured by a new imaging system equipped with an under-display camera.

2 Challenge

To develop high-quality RGBW Remosaic solution, we provide the following resources for participants:

  • A high-quality RGBW and Bayer dataset: As far as we know, this is the first and only dataset consisting of aligned RGBW and Bayer pairs, relieving the pain of data collection to develop learning-based remosaic algorithms;

  • A data processing code with Data Loader to help participants get familiar with the provided dataset;

  • A simple ISP including basic ISP blocks to visualize the algorithm output and to calculate the loss function on RGB results;

  • A set of objective image quality metrics to measure the performance of a developed solution.

2.1 Problem Definition

RGBW remosaic aims to interpolate the input RGBW CFA pattern to obtain a Bayer of the same resolution. The remosaic task is needed mainly because current camera ISPs usually cannot process CFAs other than the Bayer pattern. In addition, the remosaic task becomes more challenging when the noise level gets higher, thus requiring more advanced algorithms to avoid image quality artifacts. In addition to the image quality requirement, RGBW sensors are widely used in smartphones with limited computational budget and battery life, thus requiring the remosaic algorithm to be lightweight at the same time. While we do not rank solutions based on running time or memory footprint, computational cost is one of the most important criteria in real applications.

2.2 Dataset: Tetras-RGBW-RMSC

The training data contains 70 scenes of aligned RGBW (input) and Bayer (ground truth) pairs. For each scene, noise is sythesized on the 0dB RGBW input to provide the noisy RGBW input at 24dB and 42dB respectively. The synthesized noise consists of read noise and shot noise, and the noise models are measured on an RGBW sensor. The data generation steps are shown in Fig. 3. The testing data contains RGBW input of 15 scenes at 0dB, 24dB, and 42dB, but the ground truth Bayer results are not available to participants.

Figure 3: Data generation of the RGBW remosaic task. The RGBW raw data is captured using a RGBW sensor and cropped to be a size of . A Bayer (DBinB) and white (DBinC) image are obtained by averaging the same color in the diagonal direction within a block. We demosaic the Bayer (DBinB) to get an RGB using the DemosaicNet [3]. The white (DBinC) is concatenated to the RGB image to have RGBW for each pixel, which is in turn mosaiced to get the input RGBW and aligned ground truth Bayer.

2.3 Challenge Phases

The challenge consisted of the following phases:

  1. Development: The registered participants get access to the data and baseline code, and are able to train the models and evaluate their run-time locally.

  2. Validation: The participants can upload their models to the remote server to check the fidelity scores on the validation dataset, and to compare their results on the validation leaderboard.

  3. Testing: The participants submit their final results, codes, models, and factsheets.

2.4 Scoring System

2.4.1 Objective Evaluation

The evaluation consists of (1) the comparison of the remosaic output (Bayer) with the reference ground truth Bayer, and (2) the comparison of RGB from the predicted and ground truth Bayer using a simple ISP (the code of the simple ISP is provided). We use

  1. Peak Signal-to-Noise Ratio (PSNR)

  2. Structural Similarity Index Measure (SSIM) [6]

  3. Learned Perceptual Image Patch Similarity (LPIPS) [8]

to evaluate the remosiac performance. The PSNR, SSIM, and LPIPS will be applied to the RGB from the Bayer using the provided simple ISP code, while KLD is evaluated on the predicted Bayer directly.

A metric weighting PSNR, SSIM, KLD, and LPIPS is used to give the final ranking of each method, and we will report each metric separately as well. The code to calculate the metrics is provided. The weighted metric is shown below. The M4 score is between 0 and 100, and the higher score indicates the better overall image quality.

(1)

For each dataset we report the average results over all the processed images belonging to it.

3 Challenge Results

The results of the top three teams are shown in Table. 1. In the final test phase, we verified their submission using their code. op-summer-po, HIT-IIL, and Eating, Drinking, and Playing are the top three teams ranked by M4 as presented in Eq. (1), and op-summer-po shows the best overall performance. The proposed methods are described in Section 4 and the team members and affiliations are listed in Appendix 0.A.

Team name PSNR SSIM LPIPS KLD M4
op-summer-po 36.83 0.957 0.115 0.018 64.89
HIT-IIL 36.34 0.95 0.129 0.02 63.12
Eating, Drinking, and Playing 36.77 0.957 0.132 0.019 63.98
Table 1: MIPI 2022 Joint RGBW Remosaic and Denoise challenge results and final rankings. PSNR, SSIM, LPIPS, and KLD are calculated between the submitted results from each team and the ground truth data. A weighted metric, M4, by Eq. (1) is used to rank the algorithm performance, and the top three teams with the highest M4 are included in the table.

To learn more about the algorithm performance, we evaluated the qualitative image quality in addition to the objective IQ metrics in Fig. 4 and Fig. 5 respectively. While all teams in Table. 1 have achieved high PSNR and SSIM, the detail and texture loss can be found on the book cover in Fig. 4 and on the mesh in Fig. 5. When the input has a large amount of noise, oversmoothing tends to yield higher PSNR at the cost of detail loss perceptually.

Figure 4: Qualitative image quality (IQ) comparison. The results of one of the test scenes (42dB) are shown. While the top three remosaic methods achieve high objective IQ metrics in Table. 1, details and texture loss are noticeable on the book cover. The texts on the book are barely interpretable in (b), (c) and (d). The RGB images are obtained by using the ISP in Fig. 2, and its code is provided to participants.
Figure 5: Qualitative image quality (IQ) comparison. The results of one of the test scenes (42dB) are shown. Oversmoothing in the top three methods in Table. 1 can be found when compared with the ground truth. The text under the mesh can barely be recognized, and the mesh texture become distorted in (b),(c) and (d). The RGB images are obtained by using the ISP in Fig. 2, and its code is provided to participants.

In addition to benchmarking the image quality of remosaic algorithms, computational efficiency is evaluated because of wide adoptions of RGBW sensors on smartphones. We measured the runnnig time of the remosaic solutions of the top three teams (based on M4 by Eq. (1)) in Table. 2. While running time is not employed in the challenge to rank remosaic algorithms, the computational cost is of critical importance when developing algorithms for smartphones. HIT-IIL achieved the shortest running time among the top three solutions on a workstation GPU (NVIDIA Tesla V100-SXM2-32GB). With sensor resolution of mainstream smartphones reaching 64M or even higher, power-efficient remosaic algorithms are highly desirable.

Team name 12001800 (measured) 64M

(estimated)

op-summer-po 6.2s 184s
HIT-IIL 4.1s 121.5s
Eating, Drinking and Playing 10.4s 308s
Table 2: Running time of the top three solutions ranked by Eq. (1) in the 2022 Joint RGBW Remosaic and Denoise challenge. The running time of input of was measured, while the running time of a 64M input RGBW was based on the estimation. The measurement was taken on an NVIDIA Tesla V100-SXM2-32GB GPU.

4 Challenge Methods

In this section, we describe the solutions submitted by all teams participating in the final stage of MIPI 2022 Joint RGBW Remosaic and Denoise Challenge.

4.1 op-summer-po

Figure 6: The model architecture of op-summer-po.

op-summer-po proposed a framework based on DRUNet [7] as shown in Fig. 6

, which has four scales with 64, 128, 256, and 512 channels respectively. Each scale has an identity skip connection between the stride convolution (stride = 2) and the transpose convolution. This connection concatenates encoder and decoder features. Each encoder or decoder includes four residual blocks.

The input of the framework is the raw RGBW image, and the output is the estimated raw Bayer image. Then the estimated raw Bayer image is sent to DemosaicNet [3] and Gamma Transform to get the full-resolution RGB image. Moreover, two LPIPS [8] functions in both raw and RGB domains are used to get a better perception quality.

4.2 Hit-Iil

Figure 7: The model architecture of HIT-IIL.

HIT-IIL proposed an end-to-end method to jointly learn RGBW re-mosaicing and denoising. They split RGBW images into 4-channel while maintaining image size, as shown in Fig. 7. Specifically, they repeated the RGBW image as 4 channels, then made each channel represent a color type (i.e., one of white, green, blue, and red). When the pixel color type is different from the color represented by the channel, its value is set to 0. Such input mode provides the positional information of the color for the network and significantly improves performance in experiments.

As for the end-to-end network, they adopt NAFNet [2] to re-mosaic and denoise RGBW images. NAFNet contains the 4-level encoder-decoder and bottleneck. For the encoder, the numbers of NAFNet’s blocks for each level are 2, 4, 8, and 24. For the decoder, the numbers of NAFNet’s blocks for the 4 levels are all 2. In addition, the number of NAFNet’s blocks for the bottleneck is 12.

4.3 Eating, Drinking and Playing

Figure 8: The model architecture of Eating, Drinking and Playing.

Eating, Drinking and Playing proposed a UNet-Transformer for RGBW joint remosaic and denoise, and its overall architecture is in Fig. 8. To be more specific, they aimed to estimate a raw Bayer image from the corresponding raw RGBW image . To achieve this, they decreased the distance between the output and the ground truth raw Bayer image by minimizing loss.

Similar to DRUNet [7], the UNet-Transformer adopts an encoder-decoder structure and has four levels with 64, 128, 256, and 512 channels respectively. The main difference between UNet-Transformer and DRUNet is that UNet-Transformer adopts Multi-ResTransformer (MResT) blocks rather than residual convolution blocks in each level of the encoder and decoder. As the main component of UNet-Transformer, the MResT cascades multiple Res-Transformer (ResT) blocks, which can study local and global information simultaneously. The structure of ResT is illustrated in Fig. 9. In the ResT, the first two convolution layers were adopted to study the local information, and then they employed a self-attention to learn the global information.

Figure 9: The structure of the Res-Transformer (ResT) block. R is the reshape and the blue block is the convolution layer.

5 Conclusions

In this paper, we summarized the Joint RGBW Remosaic and Denoise challenge in the first Mobile Intelligent Photography and Imaging workshop (MIPI 2022) held in conjunction with ECCV 2022. The participants were provided with a high-quality training/testing dataset for RGBW remosaic and denoise, which is now available for researchers to download for future research. We are excited to see so many submissions within such a short period, and we look forward for more research in this area.

6 Acknowledgements

We thank Shanghai Artificial Intelligence Laboratory, Sony, and Nanyang Technological University to sponsor this MIPI 2022 challenge. We thank all the organizers and participants for their great work.

Appendix 0.A Teams and Affiliations

op-summer-po
Title: Two LPIPS Functions in Raw and RGB domains for RGBW Joint Remosaic and Denoise
Members: Lingchen Sun, (slcbbd111@sina.com), Rongyuan Wu, ,Qiaosi Yi
Affiliations: OPPO Research Institute, East China Normal University


HIT-IIL
Title: Data Input and Augmentation Strategies for RGBW Image Re-Mosaicing
Members: Rongjian Xu (ronjon.xu@gmail.com), Xiaohui Liu, Zhilu Zhang, Xiaohe Wu, Ruohao Wang, Junyi Li, Wangmeng Zuo
Affiliations: Harbin Institute of Technology


Eating, Drinking, and Playing
Title: UNet-Transformer for RGBW Joint Remosaic and Denoise
Members: ,Qiaosi Yi (51205901027@stu.ecnu.edu.cn), Rongyuan Wu, Lingchen Sun, Faming Fang
Affiliations: OPPO Research Institute, East China Normal University


References

  • [1] (Website) External Links: Link Cited by: §1.
  • [2] L. Chen, X. Chu, X. Zhang, and J. Sun (2022) Simple baselines for image restoration. arXiv preprint arXiv:2204.04676. Cited by: §4.2.
  • [3] M. Gharbi, G. Chaurasia, S. Paris, and F. Durand (2016) Deep joint demosaicking and denoising. ACM Transactions on Graphics (ToG) 35 (6), pp. 1–12. Cited by: Figure 3, §4.1.
  • [4] (Website) External Links: Link Cited by: §1.
  • [5] (Website) External Links: Link Cited by: §1.
  • [6] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli (2004) Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13 (4), pp. 600–612. Cited by: MIPI 2022 Challenge on RGBW Sensor Re-mosaic: Dataset and Report, §1, item 2.
  • [7] K. Zhang, Y. Li, W. Zuo, L. Zhang, L. Van Gool, and R. Timofte (2021) Plug-and-play image restoration with deep denoiser prior. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: §4.1, §4.3.
  • [8] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang (2018)

    The unreasonable effectiveness of deep features as a perceptual metric

    .

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    .
    Cited by: MIPI 2022 Challenge on RGBW Sensor Re-mosaic: Dataset and Report, §1, item 4, §4.1.