Log In Sign Up

MIPI 2022 Challenge on RGBW Sensor Fusion: Dataset and Report

Developing and integrating advanced image sensors with novel algorithms in camera systems are prevalent with the increasing demand for computational photography and imaging on mobile platforms. However, the lack of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). To bridge the gap, we introduce the first MIPI challenge, including five tracks focusing on novel image sensors and imaging algorithms. In this paper, RGBW Joint Fusion and Denoise, one of the five tracks, working on the fusion of binning-mode RGBW to Bayer, is introduced. The participants were provided with a new dataset including 70 (training) and 15 (validation) scenes of high-quality RGBW and Bayer pairs. In addition, for each scene, RGBW of different noise levels was provided at 24dB and 42dB. All the data were captured using an RGBW sensor in both outdoor and indoor conditions. The final results are evaluated using objective metrics, including PSNR, SSIM, LPIPS, and KLD. A detailed description of all models developed in this challenge is provided in this paper. More details of this challenge and the link to the dataset can be found at


page 6

page 11


MIPI 2022 Challenge on RGBW Sensor Re-mosaic: Dataset and Report

Developing and integrating advanced image sensors with novel algorithms ...

MIPI 2022 Challenge on RGB+ToF Depth Completion: Dataset and Report

Developing and integrating advanced image sensors with novel algorithms ...

MIPI 2022 Challenge on Quad-Bayer Re-mosaic: Dataset and Report

Developing and integrating advanced image sensors with novel algorithms ...

MIPI 2022 Challenge on Under-Display Camera Image Restoration: Methods and Results

Developing and integrating advanced image sensors with novel algorithms ...

NTIRE 2021 Depth Guided Image Relighting Challenge

Image relighting is attracting increasing interest due to its various ap...

NTIRE 2022 Challenge on High Dynamic Range Imaging: Methods and Results

This paper reviews the challenge on constrained high dynamic range (HDR)...

1 Introduction

RGBW is a new type of CFA pattern (Fig. 1

(a)) designed for image quality enhancement under low light conditions. Thanks to the higher optical transmittance of white pixels over conventional red, green, and blue pixels, the signal-to-noise ratio (SNR) of the sensor output becomes significantly improved, thus boosting the image quality, especially under low light conditions. Recently several phone OEMs, including Transsion, Vivo, and Oppo have adopted RGBW sensors in their flagship smartphones to improve the camera image quality 

[2, 3, 1].

The binning mode of RGBW is mainly used in the camera preview mode and video mode, in which the pixels of the same color are averaged in the diagonal direction within a window in RGBW to further improve the image quality and to reduce the noise. A fusion algorithm is thereby needed to take the input of a diagonal-binning-bayer (DBinB) and a diagonal-binning-white (DBinC) to obtain a Bayer of better signal-to-noise ratio (SNR) in Fig. 1 (b). A good fusion algorithm should be able (1) to get a Bayer output from RGBW with least artifacts, and (2) to fully take advantage of the SNR and resolution benefit of white pixels.

The RGBW fusion problem becomes more challenging when the input DBinB and DBinC become noisy especially under low light conditions. A joint fusion and denoise task is thus in demand for real-world applications.

Figure 1: The RGBW Fusion task: (a) the RGBW CFA. (b) In the binning mode, DBinB and DBinC are obtained by diagonal averaging of pixels of the same color within a 22 window. The joint fusion and denoise algorithm takes DBinB and DBinC as input to get a high-quality Bayer.

In this challenge, we intend to fuse the RGBW inputs (DBinB and DBinC in Fig. 1

(b)) to denoise and improve the Bayer. The solution is not necessarily deep-learning. However, to facilitate the deep learning training, we provide a dataset of high-quality binning-mode RGBW (DBinB and DBinC) and the output Bayer pairs, including 100 scenes (70 scenes for training, 15 for validation, and 15 for testing). We provide a Data Loader to read these files and show a simple ISP in Fig. 


to visualize the RGB output from the Bayer and calculate loss functions. The participants are also allowed to use other public-domain datasets. The algorithm performance is evaluated and ranked using objective metrics: Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM) 

[11], Learned Perceptual Image Patch Similarity (LPIPS) [15], and KL-divergence (KLD). The objective metrics of a baseline method are available as well to provide a benchmark.

Figure 2: An ISP to visuailize the output Bayer and to calculate the loss function.

This challenge is a part of the Mobile Intelligent Photography and Imaging (MIPI) 2022 workshop and challenges emphasizing the integration of novel image sensors and imaging algorithms, which is held in conjunction with ECCV 2022. It consists of five competition tracks:

  1. RGB+ToF Depth Completion uses sparse and noisy ToF depth measurements with RGB images to obtain a complete depth map.

  2. Quad-Bayer Re-mosaic converts Quad-Bayer RAW data into Bayer format so that it can be processed by standard ISPs.

  3. RGBW Sensor Re-mosaic converts RGBW RAW data into Bayer format so that it can be processed by standard ISPs.

  4. RGBW Sensor Fusion fuses Bayer data and a monochrome channel data into Bayer format to increase SNR and spatial resolution.

  5. Under-display Camera Image Restoration improves the visual quality of the image captured by a new imaging system equipped with an under-display camera.

2 Challenge

To develop high-quality RGBW fusion solution, we provide the following resources for participants:

  • A high-quality RGBW (DBinB and DBinC in Fig. 1b) and Bayer dataset; As far as we know, this is the first and only dataset consisting of aligned RGBW and Bayer pairs, relieving the pain of data collection to develop learning-based fusion algorithms;

  • A data processing code with Data Loader to help participants get familiar with the provided dataset;

  • A simple ISP including basic ISP blocks to visualize the algorithm output and to calculate the loss function on RGB results;

  • A set of objective image quality metrics to measure the performance of a developed solution.

2.1 Problem Definition

The RGBW fusion task aims to fuse the DBinB and DBinC of RGBW (Fig. 1 (b)) to improve the image quality of the Bayer output. By incorporating the white pixels (DBinC) of higher spatial resolution and higher SNR, the output Bayer potentially would have better image quality. In addition, the binning mode of RGBW is mainly used for the preview and video modes in smartphones, thus requiring the fusion algorithms to be lightweight and power-efficient. While we do not rank solutions based on the running time or memory footprint, the computational cost is one of the most important criteria in real applications.

2.2 Dataset: Tetras-RGBW-Fusion

The training data contains 70 scenes of aligned RGBW (DBinB and DBinC input) and Bayer (ground-truth) pairs. For each scene, DBinB at 0dB is used as the ground truth. Noise is synthesized on the 0dB DBinB and DBinC data to provide the noisy input at 24dB and 42dB respectively. The synthesized noise consists of read noise and shot noise, and the noise models are measured on an RGBW sensor. The data generation steps are shown in Fig. 3. The testing data contains DBinB and DBinC inputs of 15 scenes at 24dB and 42dB, but the ground truth Bayer results are not available to participants.

Figure 3: Data generation of the RGBW fusion task. The RGBW raw data is captured using an RGBW sensor and cropped to be a size of . A Bayer (DBinB) and white (DBinC) image are obtained by averaging the same color in the diagonal direction within a block.

2.3 Challenge Phases

The challenge consisted of the following phases:

  1. Development: The registered participants get access to the data and baseline code, and are able to train the models and evaluate their running time locally.

  2. Validation: The participants can upload their models to the remote server to check the fidelity scores on the validation dataset, and to compare their results on the validation leaderboard.

  3. Testing: The participants submit their final results, code, models, and factsheets.

2.4 Scoring System

2.4.1 Objective Evaluation

The evaluation consists of (1) the comparison of the fused output (Bayer) with the reference ground truth Bayer, and (2) the comparison of RGB from the predicted and ground truth Bayer using a simple ISP (the code of the simple ISP is provided). We use

  1. Peak Signal-to-Noise Ratio (PSNR)

  2. Structural Similarity Index Measure (SSIM) [11]

  3. Learned Perceptual Image Patch Similarity (LPIPS) [15]

to evaluate the fusion performance. The PSNR, SSIM, and LPIPS will be applied to the RGB from the Bayer using the provided simple ISP code, while KLD is evaluated on the predicted Bayer directly.

A metric weighting PSNR, SSIM, KLD, and LPIPS is used to give the final ranking of each method, and we will report each metric separately as well. The code to calculate the metrics is provided. The weighted metric is shown below. The M4 score is between 0 and 100, and a higher score indicates a better overall image quality.


For each dataset we report the average results over all the processed images belonging to it.

3 Challenge Results

Six teams submitted their results in the final phase, and their results have been verified using their submitted code as well. Table. 1 summarizes the results in the final test phase. LLCKP, MegNR, and jzsherlock are the top three teams ranked by M4 are presented in Eq. (1), and LLCKP shows the best overall performance. The proposed methods are described in Section 4, and the team members and affiliations are listed in Appendix 0.A.

BITSpectral 36.53 0.958 0.126 0.027 63.27
BIVLab 35.09 0.94 0.174 0.0255 57.98
HIT-IIL 36.66 0.958 0.128 0.02196 63.62
jzsherlock 37.05 0.958 0.132 0.29 63.84
LLCKP 36.89 0.952 0.054 0.017 67.07
MegNR 36.98 0.96 0.098 0.0156 65.55
Table 1: MIPI 2022 Joint RGBW Fusion and Denoise challenge results and final rankings. PSNR, SSIM, LPIPS, and KLD are calculated between the submitted results from each team and the ground truth data. A weighted metric, M4, by Eq. (1) is used to rank the algorithm performance, and the top three teams with the highest M4 are highlighted.

To learn more about the algorithm performance, we evaluated the qualitative image quality in addition to the objective IQ metrics in Fig. 4 and Fig. 5 respectively. While all teams in Table. 1 have achieved high PSNR and SSIM, the detail and texture loss can be found on the yellow box in Fig. 4 and on the test chart in Fig. 5. When the input has a large amount of noise and the scene is under low light conditions, oversmoothing tends to yield higher PSNR at the cost of detail loss perceptually.

Figure 4: Qualitative image quality (IQ) comparison. The results of one of the test scenes (42dB) are shown. While the top three fusion methods achieve high objective IQ metrics in Table. 1, details and texture loss are noticeable on the yellow box. The texts on the box are barely interpretable in (b), (c), and (d). The RGB images are obtained by using the ISP in Fig. 2, and its code is provided to participants.
Figure 5: Qualitative image quality (IQ) comparison. The results of one of the test scenes (42dB) are shown. Oversmoothing in the top three methods in Table. 1 can be found when compared with the ground truth. The test chart becomes distorted in (b), (c), and (d). The RGB images are obtained by using the ISP in Fig. 2, and its code is provided to participants.

In addition to benchmarking the image quality of fusion algorithms, computational efficiency is evaluated because of the wide adoption of RGBW sensors in smartphones. We measured the running time of the RGBW fusion solutions of the top three teams in Table. 2. While running time is not employed in the challenge to rank fusion algorithms, computational cost is critical when developing algorithms for smartphones. jzsherlock achieved the shortest running time among the top three solutions on a workstation GPU (NVIDIA Tesla V100-SXM2-32GB). With sensor resolution of mainstream smartphones reaching 64M or even higher, power-efficient fusion algorithms are highly desirable.

Team name 12001800 (measured) 16M


jzsherlock 3.7s 27.4s
LLCKP 7.1s 52.6s
MegNR 12.4s 91.9s
Table 2: Running time of the top three solutions ranked by Eq. (1) in the 2022 Joint RGBW Fusion and Denoise challenge. The running time of input of was measured, while the running time of a 64M RGBW sensor was based on estimation (the binning-mode resolution of a 64M RGBW sensor is 16M). The measurement was taken on an NVIDIA Tesla V100-SXM2-32GB GPU.

4 Challenge Methods

In this section, we describe the solutions submitted by all teams paticipanting in the final stage of MIPI 2022 RGBW Joint Fusion and Denoise Challenge.

4.1 BITSpectral

Figure 6: The model architecture of BITSpectral.

BITSpectral developed a transformer-based network, Fusion Cross-Patch Attention Network (FCPAN), for this joint fusing and denoising task. The FCPAN is presented in Fig. 6

(a) consisting of a Deep Feature Fusion Module (DFFM) and several Cross-Patch Attention Modules (CPAM). The input of DFFM contains an RGGB Bayer pattern and a W channel. The output of DFFM is the fused features of RGBW, which is fed to CPAM for depth feature extraction. CPAM is a U-shape network with spatial downsampling to reduce computational complexity. They proposed to use 4 CPAMs in the network.

Fig. 6 also includes the details of Swin Transformer Layer [6] (STL), the Cross-Patch Attention Block (CPAB), and Cross-Patch Attention Multi-Head Self-Attention (CPA-MSA). They used STL to extract the attention within feature patches in each stage and CPAB to directly obtain the global attention among patches for the innermost stage. Compared with STL, CPAB has an extended range of perception due to the cross-patch attention.

4.2 BIVLab

Figure 7: The model architecture of BIVLab.

BIVLab proposed a Self-Guided Spatial-Frequency Complement Network(SG-SFCN) for the RGBW joint fusion and denoise task. As shown in Fig. 7, the swin transformer layer (STL) [6] is adopted to extract rich features from DBinB and DBinC separately. SpaFre blocks (SFB) [12]

then fuses the DBinB and DBinC in complementary spatial and frequency domains. In order to handle the different noise levels, the features extracted by the STL, which contain the noise-level information, are applied to each SFB as a guidance. Finally, the denoised Bayer is obtained by adding the predicted Bayer residual to the original DBinB Bayer. During the training, all the images are cropped to patches of size

in order to guarantee essential global information.

4.3 Hit-Iil

Figure 8: The model architecture of HIT-IIL.

HIT-IIL proposed a NAFNet [4] based model for the RGBW Joint Fusion and Denoise task. As shown in Fig. 8, the framework consists of a 4-level encoder-decoder and bottleneck module. For the encoder, the numbers of NAFNet’s blocks for each level are 2, 2, 4, and 8. For the decoder, the numbers of NAFNet’s blocks are set to 2 for all of the 4 levels. In addition, the bottleneck module contains 24 NAFNet’s blocks. Unlike the original NAFNet design, the skip connection between the input and the output is removed in their method.

During the training, they also used two data augmentation strategies. The first one is mixup, which generates the synthesized images as:


Here, the and

denote the images of the same scene with noise levels of 24dB and 42dB. A random variable

is selected between 0 and 1 to generate the synthesised image . Their second augmentation strategy is the image flip proposed in [5].

4.4 jzsherlock

Figure 9: The model architecture of jzsherlock.

Jzsherlock proposed a dual-branch network for the RGBW joint fusion and denoise task. The entire architecture, consisting of a Bayer branch and a white branch, is shown in Fig. 9. The Bayer branch’s input is a normalized noisy Bayer image and output the denoised result. After pixel unshuffle operation with scale=2, the Bayer image is converted to GBRG channels. They use stacked ResBlocks without BatchNorm (BN) layers to extract the feature maps of noisy Bayer image. On the other hand, the white branch extracts the features from the corresponding white image using stacked ResBlocks as well. An average pooling layer rescales the white image features to the same size as Bayer branch for feature fusion. Several Residual-in-Residual Dense Blocks (RRDB) [9] are applied to the fused feature maps for restoration. After the RRDB blocks, a Conv+LeakyReLU+Conv structure is applied to enlarge the feature map channels by a scale of 4. Then pixel shuffle with scale=2 is applied to upscale the feature maps to the input size. A Conv layer is used to convert the output to the GBRG 4 channels. Finally, a skip connection is applied to add the input Bayer to form the final denoised result.

The network is trained by L1 loss in the normalized domain. The final normalization with min=64 and max=1023, with values out of the range clipped.

4.5 Llckp

Figure 10: The model architecture of LLCKP.

LLCKP proposed a denoising method based on existing image restoration model[14], [4]. As shown in Fig. 10, they synthesized RGBW images from GT GBRG images with additional synthetic noise and real-noise pair (noisy images provided by challenge). They also used 20,000 pairs of RAW image from SIDD with normal exposure and synthesized RGBW images as extra data. During the training, the Restormer model’s [14] weights are pre-trained on SIDD RGB images. Data augmentation [5] and cutmix [13] are applied during the training phase.

4.6 MegNR

Figure 11: The model architecture of MegNR.

MegNR proposed a pipeline for the RGBW Joint Fusion and Denoise task. The overall diagram is shown in Fig. 11. The pixel-unshuffle(PU) [8] is firstly applied to RGBW images to split them into independent channels. Inspired by Uformer [10], they developed their RGBW fusion and reconstruction network, HAUformer. They replaced the LeWin Blocks [10] in Uformer’s original design and included two modules Hybrid Attention Local-Enhanced Block(HALEB) and Overlapping Cross-Attention Block(OCAB) to capture more long-range dependencies information and useful local context. Finally, the pixel-shuffle(PS) [7] module restored the output to the standard Bayer format.

5 Conclusions

In this paper, we summarized the Joint RGBW Fusion and Denoise challenge in the first Mobile Intelligent Photography and Imaging workshop (MIPI 2022) held in conjunction with ECCV 2022. The participants were provided with a high-quality training/testing dataset for RGBW fusion and denoise, which is now available for researchers to download for future research. We are excited to see so many submissions within such a short period, and we look forward for more research in this area.

6 Acknowledgements

We thank Shanghai Artificial Intelligence Laboratory, Sony, and Nanyang Technological University to sponsor this MIPI 2022 challenge. We thank all the organizers and participants for their great work.

Appendix 0.A Teams and Affiliations

Title: Fusion Cross-Patch Attention Network for RGBW Joint Fusion and Denoise
Members: Zhen Wang (, Daoyu Li, Yuzhe Zhang, Lintao Peng, Xuyang Chang, Yinuo Zhang, Liheng Bian
Affiliations: Beijing Institute of Technology

Title: Self-Guided Spatial-Frequency Complement Network for RGBW Joint Fusion and Denoise
Members: Bing Li (, Jie Huang, Mingde Yao, Ruikang Xu, Feng Zhao
Affiliations: University of Science and Technology of China

Title: NAFNet for RGBW Image Fusion
Members: Xiaohui Liu (, Xiaohui Liu, Rongjian Xu, Zhilu Zhang, Xiaohe Wu, Ruohao Wang, Junyi Li, Wangmeng Zuo
Affiliations: Harbin Institute of Technology

Title: Dual Branch Network for Bayer Image Denoising Using White Pixel Guidance
Members: Zhuang Jia (
Affiliations: Xiaomi

Title: Synthetic RGBW image and noise
Members: DongJae Lee (
Affiliations: KAIST

Title: HAUformer: Hybrid Attention-guided U-shaped Transformer for RGBW Fusion Image Restoration
Members: Ting Jiang (, Qi Wu, Chengzhi Jiang, Mingyan Han, Xinpeng Li, Wenjie Lin, Youwei Li, Haoqiang Fan, Shuaicheng Liu
Affiliations: Megvii Technology