Log In Sign Up

Two-Step Color-Polarization Demosaicking Network

by   Vy Nguyen, et al.

Polarization information of light in a scene is valuable for various image processing and computer vision tasks. A division-of-focal-plane polarimeter is a promising approach to capture the polarization images of different orientations in one shot, while it requires color-polarization demosaicking. In this paper, we propose a two-step color-polarization demosaicking network (TCPDNet), which consists of two sub-tasks of color demosaicking and polarization demosaicking. We also introduce a reconstruction loss in the YCbCr color space to improve the performance of TCPDNet. Experimental comparisons demonstrate that TCPDNet outperforms existing methods in terms of the image quality of polarization images and the accuracy of Stokes parameters.


Monochrome and Color Polarization Demosaicking Using Edge-Aware Residual Interpolation

A division-of-focal-plane or microgrid image polarimeter enables us to a...

Structure-preserving color transformations using Laplacian commutativity

Mappings between color spaces are ubiquitous in image processing problem...

ABANICCO: A New Color Space for Multi-Label Pixel Classification and Color Segmentation

In any computer vision task involving color images, a necessary step is ...

Learning Sensor Multiplexing Design through Back-propagation

Recent progress on many imaging and vision tasks has been driven by the ...

HoloHDR: Multi-color Holograms Improve Dynamic Range

Holographic displays generate Three-Dimensional (3D) images by displayin...

Modelling the Scene Dependent Imaging in Cameras with a Deep Neural Network

We present a novel deep learning framework that models the scene depende...

Evaluating Single Image Dehazing Methods Under Realistic Sunlight Haze

Haze can degrade the visibility and the image quality drastically, thus ...

1 Introduction

Polarization information of light in a scene can be helpful in various image processing and computer vision tasks such as transparent object segmentation [5], 3D reconstruction [19], and specular removal [4]. A polarization image can be obtained by placing a polarizer in front of the camera lens, where the images of different polarization orientations of the same scene can be obtained by rotating the polarizer. Four polarization images of , , , and are typically captured to robustly infer necessary polarization information, e.g., Stokes parameters, the angle of polarization (AoP), and the degree of polarization (DoP) [2].

There are two popular polarization image acquisition approaches [16]

: a division-of-time polarimeter and a division-of-focal-plane polarimeter. In the division-of-time polarimeter, a linear polarizer in front of the camera lens is sequentially rotated in time to obtain different polarization states of a pixel. However, the division-of-time polarimeter is only applicable to static scenes where no camera and object movement exists. In the division-of-focal-plane polarimeter, a micro-polarizer array is used to capture different polarization orientations as a mosaic pattern at once. Each pixel of the captured mosaic pattern only contains the information of one polarization orientation. The division-of-focal-plane polarimeter can be applied to dynamic scenes but it requires a demosaicking process, which is an interpolation process of missing pixel values. The demosaicking process strongly affects the image quality.

(a) Existing single-step approach
(b) Our two-step approach
Figure 1: Two network-based approaches for color-polarization demosaicking: (a) Existing single-step approach and (b) our two-step approach, which consists of color demosaicking and polarization demosaicking.

Figure 2: The overall pipeline of our Two-step Color-Polarization Demosaicking Network (TCPDNet).

In this paper, we focus on color-polarization filter array (CPFA) demosaicking for a color division-of-focal-plane polarimeter, such as a Sony sensor [8]. Typically, the input of the CPFA demosaicking is 1-chanel CPFA raw data and the output is 12-channel full-color-polarization images, as shown in Fig. 1. Each pixel of the input 1-channel CPFA raw data contains the intensity of one polarization orientation and one color channel. The output 12-channel full-color-polarization images are a set of RGB and four polarization images with the angles of , , , and . The CPFA demosaicking is more challenging than color filter array (CFA) demosaicking and monochrome polarization filter array (MPFA) demosaicking, because the sampling density of each color-polarization component in the CPFA raw data is very low.

There are two typical approaches of the CPFA demosaicking problem: An interpolation-based approach [11, 12] and a learning-based approach [18, 17, 15]

. The interpolation-based approach is simple, but the performance is usually limited. Because the learning-based approach with a deep neural network has great potential to obtain high performance, we adopt a network-based approach in this paper.

In the literature, there are two representative learning-based CPFA demosaiking methods: CPDNet [17] and CPDCNN [15]. CPDNet is an end-to-end network which directly processes the CPFA raw data [17]. CPDCNN is a two-branch network which includes the sub-networks to exploit inter-channel correlations and high-frequency information [15]. Both of them process the color and the polarization information in an intertwined manner.

In this work, we propose a two-step color-polarization demosaicking network (TCPDNet). The network architecture of TCPDNet is inspired by Morimatsu’s interpolation-based CPFA demosaicking method [11, 12]. Our method consists of two well-studied sub-tasks: color demosaicking [6, 10] and polarization demosaicking [9, 1, 3], as shown in Fig. 1(b). In each sub-task, demosaicking network parameters are shared across different kinds of input mosaicked data. We consider that even if the input mosaicked data differ in polarization orientations (for the color demosaicking task) or color channels (for the polarization demosaicking task), their inter-channel correlations should be the same. For example, the color demosaicking networks for of Bayer CFA mosaicked data and

of Bayer CFA mosaicked data share the same parameters because their RGB correlations should be the same regardless of the polarization orientations. We further improve the performance of TCPDNet by introducing a reconstruction loss in the YCbCr color space. Using the loss function in the YCbCr color space, we expect the demosaicking networks to learn the inter-channel correlations effectively. Experimental results show that our TCPDNet outperforms existing methods by a large margin both quantitatively and qualitatively on Tokyo Tech dataset 

[11, 12] and CPDNet dataset [17].

2 Proposed method

2.1 Network architecture

Proposed TCPDNet consists of two sub-networks: color demosaicking network and polarization demosaicking network. This is inspired by the work of Morimatsu et al. [11, 12], which proposes a two-step interpolation-based color-polarization demosaicking method. Figure 2 shows the overall pipeline of our proposed method. The input CPFA raw data are firstly re-arranged into four sub-sampled Bayer mosaiced data. The sub-sampled Bayer mosaicked data has half size of input raw data in height and in width. The color demosaicking networks then demosaick these four sub-sampled Bayer mosaiced data to generate four sub-sampled RGB images, where the weights of the color demosaicking networks are shared. Then, three mosaicked polarization data are generated by the same manner as the pixel shuffle operation [14]. Then, polarization demosaicking networks demosaick those three mosaicked polarization data to generate full-RGB-polarization images which form the output 12-channel data, where the weights of the polarization demosaicking networks are also shared.

Color demosaicking network and polarization demosaicking network have the same demosaicking strategy. Given the input 1-channel mosaicked data, we first interpolate the mosaicked data by bilinear interpolation and then refine the interpolated image by CNN, for which we adopt a high-performance network of U-net [13] with the skip connection, though any kind of CNN architectures can be applied.

2.2 Loss function

We evaluate reconstruction losses with two kinds of images: sub-sampled RGB images and full-color-polarization images, in comparison with the ground-truth data. We refer to the reconstruction losses of the sub-sampled RGB images and the full-color-polarization images as and , respectively, as described in Fig. 2.

In this paper, we use the L1-norm for the reconstruction losses. We also introduce , which is the reconstruction loss of the full-color-polarization images in the YCbCr color space, as shown in Fig. 2

. The backpropagation training with the

loss in the RGB color space updates the weights of the polarization demosaicking network in a channel-by-channel manner, while the training with the loss in the YCbCr color space is expected to take account of inter-channel correlations.

Let X be the input CPFA raw data, and Z be the ground-truth 12-channel full-color-polarization data. Let be the input height and be the input width. Let be the color demosaicking network with the parameter and be the polarization demosaicking network with the parameter . Let be which is a set of polarization orientations.

For the color demosaicking network, let () be the process which sub-samples pixels of polarization orientation from X to form 1-channel sub-sampled Bayer CFA mosaicked data, let be the process which extracts sub-sampled RGB image for polarization angle from the full-color-polarization images Z. The reconstruction loss of the sub-sampled RGB images can be expressed as


For the polarization demosaicking network, let be a set of the outputs of the color demosaicking network. Let () be the pixel shuffle operation for -channel of . Let be the process to extract -channel from the full-color-polarization images Z. The reconstruction loss of the full-color-polarization images can be expressed as


The reconstruction loss in Eq. 2 is evaluated in the RGB color space. We then introduce the reconstruction loss in the YCbCr color splace. Let be the function which converts 12-channel full-color-polarization images from the RGB color space to the YCbCr color space and let be the operation that concatenates each-channel result to form full-color-polarization images. The reconstruction loss in the YCbCr color space can be expressed as


Then, our proposed loss function can be expressed by combining two reconstruction losses as



is a hyperparameter.

In this paper, we experientially set 4 for . The experimental results later show that the loss combination significantly outperforms the loss combination , which proves the effectiveness of over .

3 Experimental Results

Method Loss CPSNR Angle error
Single-step 44.72 37.03 14.30
44.58 36.71 14.63
TCPDNet 44.59 38.74 12.70
44.91 38.74 12.65
Table 1: Ablation study on different loss combinations of TCPDNet and the single-step network on Tokyo Tech dataset.
Method CPSNR Angle error
Bilinear interpolation 34.64 34.27 35.19 34.46 36.01 42.05 39.93 30.33 23.70
EARI [11] 38.33 37.58 39.00 37.77 39.81 45.47 42.82 32.95 20.54
IGRI2 [12] 38.40 37.59 39.07 37.78 39.60 46.38 43.05 33.17 20.05
CPDNet (original) [17] 23.02 24.26 24.33 24.43 24.64 32.35 38.96 24.85 50.42
CPDNet (re-trained) [17] 28.01 27.81 28.10 27.81 28.23 45.23 41.84 31.24 32.32
TCPDNet (Ours) 43.73 43.16 44.46 43.31 44.91 52.82 48.86 38.74 12.65
Table 2: Performance comparison on Tokyo Tech dataset.

Figure 3: Qualitative comparison between our proposed TCPDNet and existing methods. The scene is from Tokyo Tech dataset.

3.1 Datasets and metrics

We evaluated the proposed network with two publicly available datasets: Tokyo Tech dataset [11] and CPDNet dataset [17]. Due to the page limitation, we only show the results on Tokyo Tech dataset in this paper. Additional results on Tokyo Tech and CPDNet datasets can be seen as a supplemental material in our project page111

. In those two datasets, the ground-truth 12-channel full-color-polarization images were taken by the division-of-time polarimeter approach. Then, the CPFA raw data were synthesized with the corresponding CPFA pattern. We quantitatively evaluated with Color Peak Signal-to-Noise Ratio (CPSNR) and the angle error of AoP following related works of 

[11, 12].

Tokyo Tech dataset includes 40 scenes of 1024 768 resolution. The evaluations in this work were conducted on our splits: 30 scenes for the training set, two scenes for the validation set, and eight scenes for the testing set.

3.2 Training details

We trained the networks with sized patches cropped from the full size of the training image, while we tested with the full size of the test image. For the training batch generation, first, we randomly sampled six different images. Then, four image patches were cropped from each sampled image. Therefore, one training batch for updating the network weight consists of 24 image patches. We used Adam optimizer [7] with a fixed learning rate of through the training. Each model was trained with 200,000 iterations. Our code is available at our project page (see fotnote 1).

For data augmentation, we applied the random rotation of , , and on the 12-channel ground-truth full-color-polarization images before CPFA raw data synthesis. Applying clockwise rotation on the polarization image is equivalent to rotating the camera by clockwise around its Z-axis, which results in the change of AoP. For this case, every pixel in the AoP image should be subtracted by  [5]. We took this AoP change into account by re-arranging the order of polarization channels of the 12-channel ground-truth full-color-polarization images from to . Similar polarization channel re-arrangements were also applied for clockwise rotation and clockwise rotation.

3.3 Ablation study

We evaluated two network architectures and two losses in different color spaces. We compared the proposed TCPDNet with the single-step color-polarization network. We also compared the loss in the RGB color space and the loss in the YCbCr color space. Our ablation study was conducted with Tokyo Tech dataset.

From Table 1, we find that the loss does not contribute to the improvement of the single-step network, but it greatly improves the performance of TCPDNet. This is expected because the single-step network jointly generates the full-color-polarization images originally, while the proposed TCPDNet generates those per monochrome. Thus, the loss plays an important role in exploiting the inter-channel correlations in our TCPDNet to boost the performance.

3.4 Comparison with existing methods

We compared our proposed TCPDNet trained with the loss combination against other existing methods on Tokyo Tech dataset. We compared with five algorithms; bilinear interpolation, EARI [11], IGRI2 [12], CPDNet [17] (original), and CPDNet [17] (re-trained). The weight of CPDNet (original) is provided by the authors of [17], while we re-trained CPDNet (re-trained) with Tokyo Tech dataset. For the learning-based methods, each model was trained five times and averaged metric values were evaluated.

Table 2 shows the quantitative comparisons with Tokyo Tech dataset, where a higher CPSNR is better and a lower angle error is better. We evaluated four color-polarization images (, , , and ), three Stokes parameters (, , and ), DoP, and AoP. From Table 2, we can find that the proposed TCPDNet clearly outperforms the other existing methods by a large margin.

We conducted the qualitative comparison on and the visualizations of DoP and AoP. Figure 3 visualizes the results of different methods for a scene from Tokyo Tech dataset, where we visualize AoP-DoP with the same manner as [11, 12]. Our proposed TCPDNet can produce better results with clearer edges and fewer artifacts. On the other hand, we can observe obvious color artifacts in estimated by existing methods, especially on the character ”S”. Regarding the results of the AoP-DoP visualization, EARI and IGRI2 hardly preserve the edge information. The re-trained CPDNet generally provides better results but not as close to the ground truth as TCPDNet.

4 Conclusion

In this work, we have proposed a two-step color-polarization demosaicking network, referred as TCPDNet. This network comprises of two sub-networks: color demosaicking network and polarization demosaicking network. We have also introduced the reconstruction loss in the YCbCr color space for the improvement of TCPDNet. Our proposed TCPDNet quantitatively and qualitatively outperforms existing methods by a significant margin. Comparing to existing methods, our TCPDNet is the best in preserving the edge information with the least color artifacts.


  • [1] A. Ahmed, X. Zhao, J. Chang, H. Ma, V. Gruev, and A. Bermak (2018) Four-directional adaptive residual interpolation technique for DoFP polarimeters with different micro-polarizer patterns. IEEE Sensors Journal 18 (19), pp. 7990–7997. Cited by: §1.
  • [2] C. P. Huynh, A. Robles-Kelly, and E. R. Hancock (2013) Shape and refractive index from single-view spectro-polarimetric images. Int. Journal of Computer Vision 101 (1), pp. 64–94. Cited by: §1.
  • [3] T. Jiang, D. Wen, Z. Song, W. Zhang, Z. Li, X. Wei, and G. Liu (2019) Minimized laplacian residual interpolation for DoFP polarization image demosaicking. Applied Optics 58 (27), pp. 7367–7374. Cited by: §1.
  • [4] L. V. Jospin, G. Baechler, and A. Scholefield (2018) Embedded polarizing filters to separate diffuse and specular reflection. Proc. of Asian Conf. on Computer Vision (ACCV), pp. 3–18. Cited by: §1.
  • [5] A. Kalra, V. Taamazyan, S. K. Rao, K. Venkataraman, R. Raskar, and A. Kadambi (2020) Deep polarization cues for transparent object segmentation.

    Proc. of IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR)

    , pp. 8602–8611.
    Cited by: §1, §3.2.
  • [6] D. Kiku, Y. Monno, M. Tanaka, and M. Okutomi (2016) Beyond color difference: residual interpolation for color image demosaicking. IEEE Trans. on Image Processing 25 (3), pp. 1288–1300. Cited by: §1.
  • [7] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §3.2.
  • [8] Y. Maruyama, T. Terada, T. Yamazaki, Y. Uesaka, M. Nakamura, Y. Matoba, K. Komori, Y. Ohba, S. Arakawa, Y. Hirasawa, Y. Kondo, J. Murayama, K. Akiyama, Y. Oike, S. Sato, and T. Ezaki (2018) 3.2-MP back-illuminated polarization image sensor with four-directional air-gap wire grid and 2.5-m pixels. IEEE Trans. on Electron Devices 65 (6), pp. 2544–2551. Cited by: §1.
  • [9] S. Mihoubi, P. Lapray, and L. Bigué (2018) Survey of demosaicking methods for polarization filter array images. Sensors 18 (11), pp. 3688. Cited by: §1.
  • [10] Y. Monno, D. Kiku, M. Tanaka, and M. Okutomi (2017) Adaptive residual interpolation for color and multispectral image demosaicking. Sensors 17 (12), pp. 2787. Cited by: §1.
  • [11] M. Morimatsu, Y. Monno, M. Tanaka, and M. Okutomi (2020) Monochrome and color polarization demosaicking using edge-aware residual interpolation. Proc. of IEEE Int. Conf. on Image Processing (ICIP), pp. 2571–2575. Cited by: §1, §1, §2.1, §3.1, §3.4, §3.4, Table 2.
  • [12] M. Morimatsu, Y. Monno, M. Tanaka, and M. Okutomi (2021) Monochrome and color polarization demosaicking based on intensity-guided residual interpolation. IEEE Sensors Journal 21 (23), pp. 26985–26996. External Links: Document Cited by: §1, §1, §2.1, §3.1, §3.4, §3.4, Table 2.
  • [13] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. Proc. of Int. Conf. on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 234–241. Cited by: §2.1.
  • [14] W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang (2016)

    Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network

    Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 1874–1883. Cited by: §2.1.
  • [15] Y. Sun, J. Zhang, and R. Liang (2021) Color polarization demosaicking by a convolutional neural network. Optics Letters 46 (17), pp. 4338–4341. Cited by: §1, §1.
  • [16] J. S. Tyo, D. L. Goldstein, D. B. Chenault, and J. A. Shaw (2006) Review of passive imaging polarimetry for remote sensing applications. Applied Optics 45 (22), pp. 5453–5469. Cited by: §1.
  • [17] S. Wen, Y. Zheng, F. Lu, and Q. Zhao (2019) Convolutional demosaicing network for joint chromatic and polarimetric imagery. Optics Letters 44 (22), pp. 5646–5649. Cited by: §1, §1, §1, §3.1, §3.4, Table 2.
  • [18] S. Wen, Y. Zheng, and F. Lu (2021) A sparse representation based joint demosaicing method for single-chip polarized color sensor. IEEE Trans. on Image Processing 30, pp. 4171–4182. Cited by: §1.
  • [19] J. Zhao, Y. Monno, and M. Okutomi (2020) Polarimetric multi-view inverse rendering. Proc. of European Conf. on Computer Vision (ECCV), pp. 85–102. Cited by: §1.