Multispectral Compressive Imaging Strategies using Fabry-Pérot Filtered Sensors

02/06/2018 ∙ by Kevin Degraux, et al. ∙ Université Libre de Bruxelles imec Université catholique de Louvain 0

This paper introduces two acquisition device architectures for multispectral compressive imaging. Unlike most existing methods, the proposed computational imaging techniques do not include any dispersive element, as they use a dedicated sensor which integrates narrowband Fabry-Pérot spectral filters at the pixel level. The first scheme leverages joint inpainting and super-resolution to fill in those voxels that are missing due to the device's limited pixel count. The second scheme, in link with compressed sensing, introduces spatial random convolutions, but is more complex and may be affected by diffraction. In both cases we solve the associated inverse problems by using the same signal prior. Specifically, we propose a redundant analysis signal prior in a convex formulation. Through numerical simulations, we explore different realistic setups. Our objective is also to highlight some practical guidelines and discuss their complexity trade-offs to integrate these schemes into actual computational imaging systems. Our conclusion is that the second technique performs best at high compression levels, in a properly sized and calibrated setup. Otherwise, the first, simpler technique should be favored.



There are no comments yet.


page 13

page 15

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Multispectral (MS) imaging consists in capturing the light intensity, , of an object or scene as it varies along its 2-D spatial coordinates and over different wavelengths , i.e., the light spectrum as measured into a few intervals or bands. This information is sampled in a 3-D data volume, which allows for accurate classification or segmentation of constituents in an object or scene from their spectral profile. Hence, multispectral (MS) imaging finds diverse applications in remote sensing [1], optical sorting [2], astronomy [3], food science [4], medical imaging [5] and precision agriculture [6].

A classic approach is to spatially or spectrally multiplex the MS cube over a 2-D Focal Plane Array (FPA). This is done by scanning the cube, so that specific slices are sequentially acquired by the sensor in several snapshots (for a review see, e.g., [7]). Such systems require either tunable spectral filters or dispersive elements with mechanical parts to scan the object or scene. These approaches entail trade-offs between complexity and cost, spectral and spatial resolution, and acquisition time.

Recently, single-snapshot MS imagers were developed to rapidly acquire a MS cube, thus avoiding motion artifacts and enabling video acquisition rate [8]. Among such imagers, we focus on those using Fabry-Pérot (FP) filtered sensors [9, 10], i.e., standard CMOS imaging sensors on top of which an array of spectral filters is deposited. This technique generalizes RGB filter arrays [11] to filter banks using an arbitrary number of narrowband profiles [9], e.g., a few tens. Thus the array imposes a reduction in spatial resolution as the sensor’s pixels are partitioned between bands.

This paper investigates MS imaging strategies based on Compressed Sensing (CS) (see, e.g., [12, 13]), an established signal processing paradigm that has inspired several computational imaging frameworks [14, 15, 16, 17]. After acquisition by a compressive device, the measurements are fed into a recovery algorithm along with the sensing operator and signal prior. Under broad theoretical conditions [18, 19], this method recovers a high-resolution approximation of the target scene, even if the sensing was performed below the scene’s Nyquist rate. The complexity of the sensing operation (e.g., resolution, time) is therefore balanced to the complexity of the signal with respect to a given prior.

1.1 Main Contributions

Our work contributes to advancing the field of MS compressive imaging in the following senses:

  1. [label=()]

  2. We propose two MS snapshot imaging strategies: Multispectral Volume Inpainting (MSVI) and Multispectral Random Convolution (MSRC). Both maintain a relatively low system-level complexity without any dispersive element. Using CS principles, they are designed with a low-pixel-count FP sensor.

  3. MSVI leverages a generalized inpainting procedure, as discussed in [20], to provide a simple integration of the FP sensor in a computational imaging scheme. This architecture performs a spatio-spectral subsampling of the MS cube and relies on their redundancy to obtain a high-resolution recovery. It is fairly simple and works best at lower compression levels.

  4. MSRC leverages random convolution, as discussed in [21], to provide spatial-domain CS by means of an out-of-focus random Coded Aperture (CA), i.e., an array of square apertures randomly placed on an opaque screen. It preserves the spectral resolution, fixed by the low number of narrowband FP filters (e.g., ) on the FPA. In an ideally sized, low-noise setup, this more complex architecture clearly improves the recovered quality, especially at higher compression levels. However, it entails some optical design challenges, as discussed in Section 4.2.

  5. Our analysis is paired with a discussion on the analysis-sparse signal prior, the associated convex optimization formulation and fine-tuned ADMM algorithm [22, 23] for the large-scale recovery of MS cubes.

  6. Both architectures are numerically compared in terms of achievable recovery performances. We also discuss their complexity trade-offs and design guidelines, by identifying unavoidable adverse optical effects, to integrate these schemes into realistic imaging systems.

Our findings and numerical results corroborate how a conspicuous reduction in the number of measurements w.r.t. the Nyquist-rate representation of is made possible by both architectures while preserving high Peak Signal-to-Noise Ratio (PSNR). Table 1 summarizes some pros and cons of each strategy, which are detailed and clarified throughout the paper.

The paper is organized as follows. The rest of Section 1 places our contribution in perspective with respect to related work in the literature and introduces useful notations. Section 2 presents the FP filters technology, the proposed MS analysis sparsity prior and the inverse problem formulation, common to both strategies, as well as the associated reconstruction algorithm. Section 3 details the specifics of the MSVI strategy, i.e., its image formation model and sensing matrix. Section 4 similarly provides the details of the MSRC strategy and discusses some associated non-idealities and practical considerations. Section 5 presents numerical reconstruction results. We demonstrate the MSVI performances with experimental data and compare MSVI and MSRC, using simulated acquisition. The final section gives a brief conclusion.

Optical system complexity Very simple Simple (simpler than [16])
Optical calibration Simple Complex
Robustness to noise Good Good
Robustness to miscalibration Acceptable Low (see, e.g.,  [44, 45])
PSNR at 1:16 compression 33dB 37dB (with or without PSF)
PSNR at 1:2 compression 48dB 50dB (41dB with PSF)
Acquisition speed Fast Fast
Initialization quality (7) Acceptable Low
Table 1: Comparison of the proposed imaging strategies. Peak Signal-to-Noise Ratio (PSNR) based on Fig. 6. See MSVI initialization on Fig. 7.

1.2 Related Work

1.2.1 Compressive Spectral Imaging

The use of CS for MS imaging schemes dates back to [16, 24]. The most popular application of CS to spectral imaging is the Coded Aperture Snapshot Spectral Imaging (CASSI) framework [16, 25, 26, 27, 15], with its many variants summarized hereafter. Single-disperser CASSI uses a random CA to partially occlude the focused scene. A refractive prism or grating then shears the spatio-spectral information, and the processed light is recorded by a standard imaging sensor. The system introduced by [26] shows high spectral accuracy after image recovery, at the expense of lower spatial accuracy. Double-disperser CASSI [16] achieves opposite performances in terms of spatial versus spectral accuracy, but requires non-trivial calibration and geometric alignment of its optical components. A close line of work in [28, 29] proposes a snapshot spectral imaging architecture. It is based on CASSI but features wide-band spectral filters, which provide random spatio-spectral subsampling after the shearing element. Non-snapshot spectral imaging architectures based on CS were also recently proposed [24, 30, 31].

CASSI and its variants target a relatively large number of bands and are intrinsically capable of achieving high spectral resolution thanks to dispersive optics. However, when the spectrum is well represented by fewer bands, spectral mixing is less effective, for CS purposes, than spatial mixing, especially for FP-filtered sensors with only a few tens of narrowband filters (e.g., [32]) whose high selectivity excludes spectral super-resolution. In this work, we focus on those FP filtered schemes that target a few bands without using any dispersive element.

1.2.2 Compressive Imaging by Random Convolution

Since its introduction, CS has been envisioned to provide image acquisition at reduced sensor resolution [33] or shorter acquisition times [34] (see also the tutorial in [35] and references therein). In particular, the second strategy proposed in this paper is related to CS by random convolution: The sensing operation acts as a spatial-domain convolution with a random filter, e.g., a random CA as in CA imaging [36, 37]. More recently, the subsampled random convolution operation was shown, in [38, 19, 39], to comply with theoretical results of CS. This operation was also featured in recent imaging architectures [40, 41, 42]. Convolution-based schemes are appealing because they allow for a fast sensing operation. Indeed, the compressive measurements can be formed in one frame of a full imaging sensor, as opposed to single-pixel camera designs [33, 24, 43], where the compressive measurements are multiplexed in time. Moreover,

Fast Fourier Transform

(FFT) implementations of the convolutions drastically reduce the computational cost of reconstruction compared to unstructured sensing operations. However, the snapshot capability and numerical efficiency of random convolution architectures are paid by a higher correlation between adjacent compressive measurements, because of their spatial adjacency and considering the optical-level non-idealities such as the Point Spread Function (PSF) of the optical elements. In this work we propose a MS extension of the low-complexity snapshot imaging architecture introduced by Björklund and Magli [42]. In particular, their architecture uses a CA placed out-of-focus to provide random convolution.

1.3 Notations

Vectors are noted (bold), matrices, (upper-case bold), 3-D arrays, (caligraphic bold), is the vectorization of in row-major ordering,

is the identity matrix and

is the adjoint of . We note , is the -norm of and is the operator -norm. The proximal operator associated to is defined by . The full (2-D discrete) convolution between and is and its valid part (Matlab and NumPy terminology, i.e.

, fully overlapping inputs without zero-padded edges) is

. The indicator if is inside the set and otherwise. We note , the diagonal matrix with diagonal ; , the block-diagonal matrix with blocks arranged without overlap and , repeating times .

2 Preliminaries

In this section, we introduce the FP filtered sensors used in both strategies. We then propose an MS image analysis sparsity prior. This prior is used to regularize an inverse problem through a convex optimization program, again common to both strategies. This section then describes the convex formulation and the associated recovery algorithm used in Section 5.

2.1 Fabry-Pérot Filtered Sensors

(a) (b) (c) (d)
Figure 1: FP filtered sensors: (a) transmittance profiles (arbitrary units) of a -band FP filter bank in the VIS range; depiction of (b) mosaic, (c) random, and (d) tiled filters as deposited on a CMOS sensor array (gray).

The class of imaging sensors at the core of this work are comprised of a standard CMOS sensor designed to operate in the visible light (VIS) range of wavelengths (i.e., 400–700 nm), on which a layer of Fabry-Pérot interferometers [46] is deposited. The latter, whose physics is well described in [47], act on the spectrum of incoming light as band-pass filters whose center wavelength and width are designed to yield narrowband profiles (about ).

Once the filter profiles are designed, the FP filters can be manufactured to cover the area of a single pixel, either in a mosaic layout [10], where a group of different filters is repeated in a or mosaic pattern (Fig. 0(b)), or by partitioning the sensor in a tiled layout, where the sensor is partitioned in large areas with the spectral filter for a specific wavelength deposited on top of them [48] (Fig. 0(d)). While it is possible to envision architectures that use tiled layouts [21, 10] we here focus on mosaic designs, as they will allow a reduction of the correlation between measurements taken on adjacent sensor pixels. Such a sensor, [10], was designed and prototyped at imec and will be referred to as imec’s sensor in the following. The use of an external spectral cutoff filter, removing anything outside the VIS range, allows one to obtain a filter bank such as the one depicted in Fig. 0(a). These profiles were generated for illustration purposes based on calibration measurements taken at imec. The raw data was post-processed to only keep the main lobe of each filter response. In particular, smaller secondary modes, which can appear at harmonic wavelengths (see [47] and [10] for another example), were removed for clarity. In this work, we consider an idealized situation, ignoring secondary modes. Furthermore, in a real situation, we must compensate for the attenuation coefficients, either before, during or after reconstruction.

Hereafter, for the sake of simplicity, we approximate the spectral responses by a Dirac delta, , at each filter’s centre wavelength , with equal gain. We consider a sensor featuring a -band filter bank with uniformly spaced center wavelengths between , either placed in a mosaic pattern (Fig. 0(b)), or with a randomly-assigned arrangement called random pattern (Fig. 0(c)). The latter has not been manufactured in practice but should not pose any major difficulty compared to the mosaic pattern. In simulations (Section 5.2.1), the random pattern is generated by permuting the assigned locations of the filters over the entire FPA.

2.2 Forward model and analysis sparsity prior

Let represent a discretized MS cube in its 2-D spatial and 1-D spectral domains, equivalently represented by its vectorization . Both studied architectures entail a noisy linear acquisition process, summarized by the following generic forward model,


In this model, the linear sensing operator is represented in matrix form by where . It yields a set of compressive measurements that are captured by the sensor array, or in vectorized form . The noise vector is bounded in -norm by .

As any computational imaging system based on regularized inverse problems, our schemes must leverage a prior model for the signal being acquired. We here choose to use an analysis sparsity prior (see, e.g., [49, 50, 51]

). Specifically, we separately apply linear transforms to the spatial and spectral domains, denoted by

and . This amounts to constructing a separable transform by the Kronecker product . For the spatial domain transform, , we chose a 2-D Daubechies-4 Undecimated Discrete Wavelet Transform (UDWT) which forms a shift-invariant, separable, and overcomplete wavelet transform [52, 53]. The approximation level (scaling coefficients) is inherently not sparse as it contains the low-pass approximation of the image. We found, however, that the slowly varying spatial information helps in leveraging the redundancy between bands. We thus use a 2-D Discrete Cosine Transform (DCT), which concentrates the low-pass information in a few coefficients, making it consistent with our sparsity prior. The wavelet filters are chosen with length 8 and in 3 levels, resulting in an analysis transform (3 levels 3 directions 1 approximation level). The DCT is chosen for the 1-D spectral domain transform, , given that we focus on MS cubes from natural scenes with smooth spectral profiles.

2.3 Recovery Method

The recovery method consists in inverting (1

) to find an estimate

of the MS cube, using the analysis-sparsity prior. We use the -analysis formulation from [50], with an additional range constraint , which reads


A good noise estimate can be used for setting the parameter . We solve the non-smooth convex optimization program (2) using the Alternating Directions Method of Multipliers (ADMM) introduced in [54, 55]. Specifically, we use the version from [23, Algorithm 2], recasted to solve problems of the form


where are convex lower semicontinuous functions and are linear operators such that has full column rank. A practical implementation requires efficient computation of the proximal operators [56, 57] associated to the functions , as well as the matrix-vector products and for arbitrary and . A crucial step of the algorithm is the matrix inversion, , for some . Any property of the matrices, , that can simplify that step should be exploited. In particular, the tight frame property, i.e., ; Fourier diagonalization, i.e., , where is the Discrete Fourier Transform (DFT) and is diagonal; or the sparsity and separability of , are used in the following.

For their blind deconvolution problem, [23] proposes to handle the boundary conditions by adding and then masking the missing rows of the block-circulant sensing operator. This stabilizes the estimation, while recovering the block-circulant structure of the convolution operator, allowing Fourier diagonalization. Building over these ideas, as detailed for both architectures in Sections 3.2 and 4.3, we can arbitrarily add rows and columns to in order to exploit one of the properties cited above. We define an extended sensing matrix (with and ) and restriction operators and , i.e., that restrict input vectors (of length and ) to some arbitrarily chosen index sets (of length and ), such that


Note that the adjoint, , of the restriction, , is the corresponding zero-padding operator. In addition to that factorization of , it happens that the analysis transform introduced above is actually a scaled tight frame, i.e., there exists a diagonal weighting matrix such that and . In order to make use of the tight frame property of and the factorization (4) of , we define and where is the complementary restriction of , such that , and

is the zero matrix. The tight frame property,

, is thus preserved. Let , i.e., a zero-padded version of , and let and . Note that and . By imposing , we get the equivalent problem to (2),


Let , and be the indicators of the sets , and . In order to match the required form, (3), the problem (5) is then split in functions,


The corresponding proximal operators all admit a very simple closed form expression that can be efficiently evaluated (see, e.g., [56, 57] and references therein). Moreover, we have


which, as we will see, is easily invertible for both architectures.

For initializing the algorithm, we use a 3-D linear interpolation of

in the 3-D space to get an estimate . Let so that (but ). We then use Tikhonov regularization,


that we practically solve using the conjugate gradients algorithm, i.e., without matrix inversion.

3 Multispectral Compressive Imaging by Generalized Inpainting

The first architecture, coined Multispectral Volume Inpainting (MSVI), is presented in this section. We describe the formation and recording of measurements on the snapshot FP sensor and the corresponding sensing matrix implementation. The description below is aligned with Fig. 2.

3.1 Image Formation Model

Figure 2: MSVI forward model.

Our scheme allows us to choose, as a free parameter, the target spatial resolution, , of the target MS volume, (see Fig. 2). We choose to target a smaller resolution than the sensor resolution, , i.e., and , even though . We assume that a spatial low-pass filter at appropriate frequency has been placed before the device so that the chosen resolution, , achieves the Nyquist rate of the resulting low-pass scene, . This practice is common for stabilizing demosaicking [58], e.g., using birefringent filters [59] or by slightly defocusing the objective lens.

Let be an upscaled version of the scene , i.e., matching the FPA pixel count . We can obtain this by using a smooth and separable interpolation function, e.g., Lanczos, represented here by the linear operator, , applied to each band. Since achieves the Nyquist rate, both and are lossless representations of . This de-couples the number of FPA pixels, , from the target scene resolution, ; we may choose the subsampling rate by changing one or the other. In order to model the sensing operation and its relation with the upsampling , we introduce the diagonal mask operator, , masking all FPA pixels but the ones corresponding to the FP filters of index . Since every FPA pixel is sampling exactly one band, we have and for so that the concatenation, is a restriction operator in such that . The sensing matrix in (1) finally reads with . This forward model is schematized on Fig. 2.

Set aside the clear affiliation with the inpainting problem in computer vision 

[60, 61, 62], we can make links with the random basis ensembles (see, e.g., [63, Chapter 12] and references therein) in CS. In the spectral direction, the sparsity basis is the DCT, which is maximally incoherent with the canonical basis and therefore an optimal choice. On the other hand, in the spatial dimension, loosely speaking and ignoring upsampling, the sparsity “basis” is a wavelet transform which is not maximally incoherent with the canonical basis. This intuitively justifies the study of the second method in Section 4. Rigorously extending the analogy to redundant wavelet analysis with the upsampling would require further work.

3.2 Sensing matrix implementation

As explained in Section 2.3, we can ease the computations by adding rows and columns to (see (4)). One natural choice is and , so that Therefore, is a separable sparse matrix which is easily pre-computed and fast to invert, for example with the conjugate gradients algorithm. Even though the gain is less obvious than in the MSRC case discussed in Section 4.3, we found, empirically, that using this trick speeds up ADMM’s convergence compared to the direct use of . Let it be noted that in the case and , i.e., a subsampling rate of , we have , which makes the inversion step as trivial as a scalar multiplication by .

4 Multispectral Compressive Imaging by Out-of-Focus Random Convolution

This section describes the Multispectral Random Convolution (MSRC) device. First, we discuss the image formation model, then some implementation aspects linked to important non-idealities, such as attenuation and diffraction. Finally, we discuss the numerical implementation of the sensing matrix, to be used in the recovery method of Section 2.3.

4.1 Image Formation Model

Figure 3: Geometry of the MSRC optical path.

We give here a description of the MSRC device, based on geometrical optics, depicted on Fig. 3. This follows the ideas originally introduced by [42]. The difference, here, is that we use the FP filtered sensor instead of a panchromatic sensor.

4.1.1 Continuous model

For a precise description, it is easier to use the continuous domain representation, , of the object of interest. In order to lighten the notations, we will consider only one spatial dimension and use a simplified instead of . Since everything is separable, the two dimensional extension is straightforward. from which we consider that a virtual flipped source radiates in all directions allowed by the aperture of the objective lens. It illuminates a random CA with elements, at a distance along the optical axis. The CA, with aperture pitch , is modeled by its transmittance function111In order to lighten the notations, we introduce the two following sampling functions, for any grid length, , and sampling rate (or pitch), ,

where is Dirac’s delta function and is the boxcar function, i.e., if and elsewhere. They are defined so that for sampling indices , the sampling grid is always centered around .,


Its known symbols are

with equal probability, modeling either transparent (

) or opaque () pixels. We assume that the CA has negligible effect in the spectral domain. The CA is illuminated by replicas of the source, shifted by as bundles of parallel rays that propagate in the same directions, defined by the angle w.r.t. the optical axis. An ideal thin lens with focal length , placed in front of the CA, then focuses the modulated light on the sensor. All the rays with direction converge on the focal plane at ,


with denoting, here, a continuous convolution. This defines the relationship with between , , and the pixel pitch of the imaging sensor, . For , we choose to model the sampling function corresponding to the detector by . In this notation, we highlight the fact that the sensor is spectrally-filtered, i.e., we assign a different wavelength depending on the pixel index (see Section 2.1). The measurement, is obtained as


forming the discrete measurements vector, . Note that this spectral filtering step is the continuous equivalent of the one described in Section 3 and represented on the right part of Fig. 2, but where replaces . There are sensor pixels, i.e., shifts of the target on the CA, which covers CA elements. The CA must therefore have elements to cover all recorded angles.

As explained in [42], we can alter the CA pattern and measurements vector so that the symbols of effectively become instead of . We propose to either use two complementary patterns and , where transparent pixels () become opaque () and vice versa, and subtract the corresponding measurements vectors, , or to subtract measurement made with a fully transparent aperture, (i.e., ), from , i.e., . This implies the use of a programmable CA or a fixed mask that can easily be removed for a full, non-coded acquisition (see Section 4.2). In the rest, we consider that the equivalent pattern is used.

4.1.2 Discrete model

The discrete linear forward model of the optical processing chain stems from a particular discretization of the target volume. The most natural choice, in this instance, is to replace , in (10), by its Dirac-sampled version,


where is a sample of the discrete target volume. Note that the sampling functions of and are nicely aligned with each other, so that (10) directly translates to the discrete model. Coming back to two discrete spatial dimensions, the discrete forward model thus reads


where is the array of recorded measurements; are the mask linear operators modeling the effect of the FP filters (they correspond to the matrices introduced in Section 3); the filter represents the discrete, 2-D version of ; and is the band of index of the full cube, . The size, , of the CA is chosen so that the valid convolution (noted , see Section 1.3) matches the size of the sensor, i.e., .

4.1.3 Multi-snapshot mode

Since a total of measurements is recorded by the sensor, the latter produces measurements per band. We consider the possibility of partitioning the acquisition of by taking multiple snapshots, , with different aperture patterns, i.e., . Therefore, the total number of measurements becomes . Taking multiple snapshots with different aperture patterns is expected to reduce the correlation between measurements. As the size of the FPA decreases, while keeping constant, the multi-snapshot device resembles more and more the single-pixel camera [33], equivalent to setting .

4.2 Non-idealities and practical considerations

The parallel compressive MSRC scheme entails some additional concerns for its actual implementation. Hereafter, we explain the effect of diffraction and a few other non-idealities.

4.2.1 Diffraction and Point Spread Function

As anticipated by [42], the main optical-level limitation of this scheme is the impact of diffraction that occurs at the CA. A single small square aperture, followed by a lens and illuminated by a plane wave, forms a diffraction pattern at the focal plane [64, Chapter 4]. The effect of diffraction at the CA is modeled as an optical filter whose Point Spread Function (PSF) is that pattern. The 2-D, wavelength-dependent, diffraction kernel has the following expression at the focal plane,

where is an unimportant energy conservation constant (normalized afterwards). This PSF has a low-pass effect that limits the spatial bandwidth of the system, causing more correlation between measurements and a decrease of performance. We again simplify the discussion to one spatial dimension.

Right before being sampled by the sensor, as in (10), the ideal function is spatially convolved with as


with , the kernel as equivalently viewed at the CA scale. Note that this rescaled does not physically appear at the CA (since the diffraction pattern is only observed in the focal plane) but is only a notational trick allowing mathematical simplification. The expression for the discrete measurements (10) becomes


In order to define the discrete sensing model, we inject (11) in (14) by replacing by . After expanding the expression of (the details are omitted for brevity), one can verify that the result is completely equivalent to replacing the continuous kernel in (14) by a discretized version defined by


where the PSF samples are given by


The number, , of kernel samples is determined by the size of the window in which they are significantly bigger than zero, and is a normalization factor such that . Note that sampling with steps is equivalent to sampling with steps . Also note that the sampling function, , in (16) comes from the expression of (see (8)) but also depends on the chosen discretization and sampling functions , modeling the sensor pixels.

Let be the 2-D discrete PSF, by evaluating the 2-D generalization of (16) at wavelengths . Using (15), we can now adapt the 2-D multi-snapshot discrete model as,


where we can pre-compute the diffracted, wavelength dependent aperture patterns, . Since the size, , of the focal plane matches the valid convolution with a CA of size , we can safely truncate the diffracted aperture pattern, , to an effective size of .

Keep in mind that the modeled diffraction kernel is an approximation based on assumptions such as the use of a perfect thin lens, the fact that the object is an incoherent plane wave, etc. The actual PSF of the system could be measured, for instance, using a pre-defined CA along with a point-like target light source to estimate the PSF with a regularized inverse problem (see, e.g., [65, 66]). Spatially-dependent PSFs could also be estimated with similar techniques. We leave this subject open to future investigation.

4.2.2 Sizing example

We can compute the size of the diffraction kernel as a function of the pixel pitches and and the focal length, , which is constrained by the size of the lens and thus the size of the CA. Specifically, the diameter of the focusing lens must be bigger than the CA, i.e., . Moreover, practical lenses should have a sufficiently high F-number to avoid aberrations, i.e., . We can characterize the width of the diffraction kernel on the sensor by the location of its first zeros, where the argument of the is , i.e., in pixels,

so that . We thus apply this to the simulations parameters of Section 5.2.2, i.e., , , so that . Notice that the largest PSF width corresponds to the longest wavelength, 620nm. Let 80m so that the CA is about 41mm wide and we must choose a lens of at least 58mm in diameter, with focal length 29mm, e.g., we can arbitrarily choose 40mm. All these parameters being fixed, the PSF width on the focal plane (at 620nm) is 620m and the number of pixels is determined by the pixel pitch of the sensor, which also determines the distance .

Figure 4: 11 pixels wide diffraction PSF ( 55).

These parameters are incompatible with standard CMOS technology of 5.5m used in [10]. In that case, we get an impractical width of 112 pixels. The workaround, proposed in [42], of binning pixels together in macro-pixels is wasteful and defeats the purpose of the compressive architecture. Another way of modifying the equivalent , requiring further investigation, is to magnify the sensor as viewed from the focusing lens. For the simulations, intended as a proof of concept, we assume an effective magnification of 10 or 20 times the 5.5m CMOS sensor. This leads to a width of respectively 11 and 5 pixels ( 55 and 110). This PSF is illustrated on Fig. 4.

4.2.3 Other practical considerations

We mention here a few other important challenges of the MSRC design. Beside the spectral differences mentioned in Section 2.1, manufacturing variability may introduce unknown gains in the sensor. Because each measurement provides information about the entire scene, this can highly limit the quality of the MSRC reconstruction. In comparison, the effect of a corrupted measurement in the MSVI design would be localized. Blind calibration techniques [44, 45] may help when direct calibration is not possible.

Optical alignment is another important issue. For instance, the distances and and the roll angle between the sensor and the CA must be precisely set. The choice of the lenses must also minimize chromatic and spherical aberrations.

Narrowband filtering considerably decreases the system’s light throughput. Therefore, the implementation must limit further light attenuation. For instance, several possible choices exist for the CA. A manufactured mask with physical holes provides the best light throughput but is not programmable. Pixels of a semi-transparent LCD Spatial Light Modulator (SLM) have imperfect opacity or transparency. The same goes with reflective Liquid Crystal on Silicon (LCoS) devices [67], paired with a polarizing beam splitter [68] that further dims the light. Despite their excellent light transmittance, Digital Micro-mirror Devices (DMD), such as the one used in the single-pixel camera [33], are not suitable for being used out of focus, as uncertainty in the deflection angles (see, e.g., [69]) would translate into systematic error..

4.3 Sensing matrix implementation

Based on the discrete model (17), we can write the sensing matrix corresponding to the vectorized forward model (1). Let be the partial block circulant matrix which defines the valid convolution operator with , and let be the matrix equivalent to (i.e., the same as in Section 3.1). First, notice that every band and every snapshot can be processed separately by the submatrices such that the vectorized form of (17) is . The sensing matrix, is thus the block matrix, . This follows from the natural order in which the and elements are stacked in the vectorized and . Note that each is a masked (some rows are zeroed by ) random convolution which enjoys good CS properties as explained in Section 1.2. Let be the restriction operator that selects the valid part, of size , of a circular convolution of size . Similarly, let be the zero-padding operator (adjoint of the restriction) whose output matches the size of the circular convolution. Let be the 2-D DFT and let , i.e., the diagonal matrix of the DFT of . With all these ingredients, we can factorize,

The factors composing the full matrix thus read , , with

and, denoting , and the diagonal matrices ,

With this factorization, is easily invertible. Indeed, since and are unitary and is diagonal, noting , we have . Therefore, inverting is just equivalent to inverting the diagonal matrix, . Note that computing for some input vector requires computing DFTs and inverse DFTs of size . Similarly, computing for some input vector requires DFTs and inverse DFTs. Comparatively, computing the inverse of is cheaper since it only requires DFTs and inverse DFTs.

5 Numerical Experiments

In this section, we present numerical results with experimental data for the MSVI and with simulated acquisition to compare both MSVI and MSRC strategies. Experiments were performed in Matlab with the code provided in supplementary material 222This paper has supplementary downloadable material available at .. In all the recoveries, we used , , with . The dynamic range is normalized to and . The tolerance of for the relative -distance between two iterations was always reached in less than iterations. The parameters were manually tuned in order to reach a reasonably fast convergence, but they do not critically affect the recovery quality. The factor in

is a heuristic normalization of

compared to in (6). For the initialization, we use a tolerance of and a maximum of ten iterations.

5.1 Msvi Experiment

Figure 5: Result of an experiments on real data acquired with imec’s sensor. Qualitative comparison between naive demosaicking (nearest neighbor), the initialization (7), and the proposed method. The top image is a false color nearest neighbor preview. The points (a), (b) and (c) are pixels whose spectra are represented in the column.

On Fig. 5, we present the result based on experimental measurements of a test scene, observed with imec’s mosaic sensor. This imager has a resolution of pixels organized in a mosaic of identical macro-pixels; each with different FP filters at wavelengths of visible light (as in Fig. 0(b)). For this experiment we restricted the measurements to a region, depicted by the bigger white square on the false color image. The subsampling rate , here is , i.e., we recover a volume with voxels. For setting , we target a minimum measurements to residual ratio of dB. The naive, super-pixel based, demosaicking method (top row), used in [10], is clearly the worst. The middle row shows the result of the linear interpolation and Tikhonov regularization initialization method (7). Though we observe a clear improvement, a grid artifact, already observed in [20], appears and is particularly visible on the 551 nm band. This artifact is removed with the proposed method (bottom row). Without the exact filter calibration profiles and the ground truth spectra, we cannot, here, evaluate the spectral accuracy.

5.2 Comparison of Msvi and Msrc on Synthetic Simulations

Figure 6: Results of the synthetic experiment. (Left) Ground truth in false color of the CAVE [70] sample, chart and stuffed toy. The bigger white square region indicates the part of the cube (ROI) that was used in the experiment. The smaller white squares labelled (a), (b) and (c) correspond to zooms on features depicted on Fig. 7. (Middle and right) Average PSNR over the whole dataset for two levels of input noise: 40dB (middle) and 20dB (right). The curves correspond to the MSVI system (dotted lines) with mosaic (square purple) and random (diamond orange) layouts, and to the MSRC (plain lines) when diffraction is not modeled (red dots) and when is 5 (black triangles) or 11 pixels (green circles).

In the following, we use a MS dataset to compare both strategies, quantitatively and qualitatively, on a series of controlled, synthetic simulations. It comprises eight ROI selected from the 32 multispectral volumes of the CAVE dataset [70]. The spectral ROI is 470 nm through 620 nm, matching imec’s sensor. The spatial ROI was manually chosen in each image to capture the most interesting features. The chart and stuffed toys sample (ROI centered at ), used for qualitative comparisons, is shown on Fig. 6 (left). The other samples, chosen to produce average PSNR curves, were balloons , feathers , jelly beans , glass tiles , stuffed toys , superballs , and beads . The middle and right plots on Fig. 6 show the average (over the eight dataset samples) reconstruction Peak Signal-to-Noise Ratio (, where MSE stands for Mean Squared Error) in function of the subsampling rate for five different sensor configurations; two MSVI and three MSRC setups, each with two levels, 40 and 20dB ( Signal-to-Noise Ratio (SNR)), of additive white gaussian noise on the measurements. The value of is determined by an oracle. Each simulation uses the corresponding sensing operator, , for both generating the measurements and for reconstruction, i.e., there is no model mismatch. Fig. 7 shows the qualitative result of the chosen sample at , 40dB SNR. We chose this extreme low sampling rate under low input noise to expose the most obvious differences between sensing strategies and configurations.

5.2.1 Msvi

Since is fixed by the dataset, we explore four MSVI sensor sizes: . We test two different FP filters configurations: the mosaic pattern (Fig. 0(b)) and the random pattern (Fig. 0(c)). At lower subsampling ratios, the random arrangement outperforms the mosaic sensor, particularly at high input SNR. This indicates that randomness mitigates aliasing caused by extreme subsampling. At Nyquist rate, mosaic beats random sampling, but both results are above 50dB and look visually perfect (not shown).

Figure 7: Qualitative results for the five sensor setups at , 40dB input SNR. The patches correspond to zooms on regions of the cube (Fig. 6). The points (a), (b) and (c) are pixels whose spectra are represented in the column. The light gray lines remind the ground truth and the red areas represent the error. The and rows show the results of linear interpolation of the mosaic and random MSVI measurements. The last five rows are the recovery results corresponding to each tested setup.

On Fig. 7, we first show the initialization point as defined by (7), i.e., since , . Despite being much faster than our iterative method, it gives visually scrambled results, particularly bad on the outer 470 nm and 620 nm bands where less data-points are available. The spectral error is particularly large for pixel (A) where the highly textured region destabilizes linear interpolation. The results obtained by the proposed method, denoted , are more accurate: edges and textures, e.g., the horizontal bars of the chart and the stripes on the toy’s sleeve are well resolved. The mosaic arrangement leads to a grid artifact as observed in Section 5.1, whereas the random arrangement leads to seemingly unstructured artifacts and smaller spectral error areas, which concurs with the PSNR curves. In practice, a higher sampling rate, e.g., , is preferable to mitigate those effects.

5.2.2 Msrc

For testing the MSRC strategy, the size of the FPA is fixed to , so that . To vary the sampling rate , the number of snapshots is increased as . Simulations with a unique snapshot and increasing FPA size, omitted for the sake of conciseness, gave very close but slightly inferior results. We compare the performances of a diffraction-free case with two cases where the diffraction kernel was respectively 5 and 11 pixels wide. As expected, the global trend indicates that increasing the size of the PSF decreases quality. Interestingly, at , the reconstruction PSNR of the diffraction-free case is on par with the 5 pixels case.

Fig. 7 suggests that the MSRC method is suitable for extreme subsampling (compression). For example, the digits on the chart (first column) of the diffraction-free reconstruction are legible and artifacts are barely noticeable. The spectral error is also impressively small. The performances rapidly decrease with diffraction and its spatial low-pass effect. As expected, the redundant wavelet prior is not able to recover the lost high-pass information. However, where the 11 pixels case gives pretty bad spectral accuracy, especially near edges, the 5 pixels case remains reasonably good at spectral reconstruction.

5.2.3 Comparison

In the ideal diffraction-free case under low noise, the MSRC device provides a performance improvement of up to 4dB (for the subsampling rate) over the MSVI. This justifies the present study on the feasibility of the MSRC design. However, at higher sampling rates, diffraction decreases quality, even with a 5 pixels PSF. Note that the gap between MSVI and the ideal MSRC falls to zero at 20dB of input SNR. For the qualitative comparison, we focus on the case, where MSRC outperforms MSVI on average, even with diffraction. MSRC gives better spectral accuracy than MSVI on the selected pixels, in particular pixels (a) and (b). Regarding spatial accuracy, noisy patterns appear between the stripes on the toy’s sleeve with MSVI reconstruction. However, the spatial high-frequency content, particularly visible on the chart patterns and digits, is affected by the diffraction kernel.

6 Conclusion

Both strategies proposed in this paper use a MS sensor with integrated FP filters. Despite using the principles of CS, they do not involve dispersive elements. Along with the conceptual optical design, for each device, we proposed an accurate forward model and a unified reconstruction procedure, formulated as a regularized inverse problem with an original spatio-spectral prior. The particularity of MSRC, compared to MSVI, lies in the spatial mixing provided by an out-of-focus CA, which allows higher compression ratios but, if not properly sized, entails adverse effects such as diffraction.

Through extensive numerical simulations, we explored different setups. We devised practical guidelines and highlighted limitations for both methods allowing to proceed towards an informed implementation. In an ideally sized, low-noise, calibrated setup, MSRC gives better performances with high compression. In other situations, factoring the cost of implementation and calibration, MSVI should be preferred.


The authors thank the Integrated Imagers group of imec (Leuven, Belgium) for supporting part of this design exploration during the second author’s visit between February and August 2014.


  • [1] P. Shippert, “Why Use Hyperspectral Imagery?” Photogrammetric engineering and remote sensing, vol. 70, pp. 377–379, 2004.
  • [2] P. Tatzer, T. Panner, M. Wolf, and G. Traxler, “Inline sorting with hyperspectral imaging in an industrial environment,” N. Kehtarnavaz and P. A. Laplante, Eds., vol. 5671.   International Society for Optics and Photonics, feb 2005, p. 162.
  • [3] E. K. Hege, D. O’Connell, W. Johnson, S. Basty, and E. L. Dereniak, “Hyperspectral imaging for astronomy and space surviellance,” in Optical Science and Technology, SPIE’s 48th Annual Meeting, S. S. Shen and P. E. Lewis, Eds., vol. 5159.   International Society for Optics and Photonics, jan 2004, p. 380.
  • [4] A. A. Gowen, C. P. O’Donnell, P. J. Cullen, G. Downey, and J. M. Frias, “Hyperspectral imaging - an emerging process analytical tool for food quality and safety control,” Trends in Food Science & Technology, vol. 18, pp. 590–598, 2007.
  • [5] G. Lu and B. Fei, “Medical hyperspectral imaging: a review,” Journal of Biomedical Optics, vol. 19, p. 010901, 2014.
  • [6] M. L. Whiting, S. L. Ustin, P. Zarco-Tejada, A. Palacios-Orueta, and V. C. Vanderbilt, “Hyperspectral mapping of crop and soils for precision agriculture,” W. Gao and S. L. Ustin, Eds., vol. 6298.   International Society for Optics and Photonics, aug 2006, p. 62980B.
  • [7] R. G. Sellar and G. D. Boreman, “Comparison of relative signal-to-noise ratios of different classes of imaging spectrometer,” Applied optics, vol. 44, pp. 1614–1624, 2005.
  • [8] N. Hagen and M. W. Kudenov, “Review of snapshot spectral imaging technologies,” Optical Engineering, vol. 52, pp. 090 901–090 901, 2013.
  • [9] A. Lambrechts, P. Gonzalez, B. Geelen, P. Soussan, K. Tack, and M. Jayapala, “A CMOS-compatible, integrated approach to hyper- and multispectral imaging,” 2014 IEEE International Electron Devices Meeting, pp. 10.5.1–10.5.4, 2014.
  • [10] B. Geelen, N. Tack, and A. Lambrechts, “A compact snapshot multispectral imager with a monolithically integrated per-pixel filter mosaic,” in SPIE MOEMS-MEMS.   International Society for Optics and Photonics, 2014.
  • [11] B. E. Bayer, “Color Imaging Array. U. S. patent 3971 065,” p. 10, 1976.
  • [12] D. L. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, pp. 1289–1306, 2006.
  • [13] E. J. Candès and M. B. Wakin, “An introduction to compressive sampling,” Signal Processing Magazine, IEEE, vol. 25, pp. 21–30, 2008.
  • [14] R. M. Willett, R. F. Marcia, and J. M. Nichols, “Compressed sensing for practical optical imaging systems: a tutorial,” Optical Engineering, vol. 50, 2011.
  • [15] G. Arce, D. Brady, L. Carin, H. Arguello, and D. Kittle, “Compressive coded aperture spectral imaging: An introduction,” IEEE Signal Processing Magazine, vol. 31, pp. 105–115, 2014.
  • <