1. Introduction
Hyperspectral images (HSIs) capture light intensity of a scene as a function of space and wavelength and have been used in numerous vision [Pan et al., 2003; Tarabalka et al., 2010; Kim et al., 2012], geoscience and remote sensing applications [Cloutis, 1996; Harsanyi and Chang, 1994]. Traditional approaches for hyperspectral imaging, including tunable spectral filters and pushbroom cameras, rely on sampling the HSI, i.e., measuring the amount of light in each spatiospectral voxel. When imaging at highspatial and spectral resolutions, the amount of light in a voxel can be quite small, requiring long exposures to mitigate the effect of noise.
HSIs are often endowed with rich structures that can be used to alleviate the challenges faced by traditional imagers. For example, natural scenes are often comprised of a few materials of distinct spectra and further, illumination of limited spectral complexity [Chakrabarti and Zickler, 2011; Finlayson et al., 1994]. This implies that collection of spectral signatures observed at various locations in a scene lies close to a lowdimensional subspace. Instead of sampling the HSI of the scene, one spatiospectral voxel at a time, we can dramatically speedup acquisition and light throughput by measuring only projections on this lowdimensional subspace. However, such a measurement scheme requires a priori knowledge of the scene since this subspace is entirely scene dependent. This paper introduces an optical computing technique that identifies this subspace from an iterative and adaptive sensing strategy and constructs a lowrank approximation to the scene’s HSI.
We propose the design of an imager that estimates a lowrank approximation of a HSI by optically implementing the so called Krylov subspace method. At its core, the proposed imager optically implements two operators: a spatiallycoded spectrometer and a spectrallycoded spatial imager; when we interpret the HSI as a 2D matrix, these two operators correspond to left and right multiplication of the matrix with a vector. The two operators are subsequently used in an iterative and adaptive imaging procedure whose eventual output is a lowrank approximation to the HSI. The proposed imager is adaptive, i.e., the measurement operator used to probe the scene’s HSI at a given iteration depends on previously made measurements. This is a marked departure from current hyperspectral imaging strategies where the signal model is merely used as a prior for recovery from nonadaptive measurements.
Contributions.
We propose an optical architecture that we refer to as KRylov subspacebased Imaging and SpectroMetry (KRISM) and make the following three contributions:

[leftmargin=*]

Optical computation of HSIs. We show that optical computing of HSIs to estimate its dominant singular vectors provides significant advantages in terms of increased light throughput and reducing measurement time.

Coded apertures for resolving space and spectrum. While highresolution imaging and spectrometry have been studied extensively before, architectures suitable for one are often undesirable for others. In particular, we show that the use of slits in spectrometry and large open apertures in conventional imaging are illsuited for the alternate task. To mitigate this, we study the effect of aperture plane coding on HSI and propose to use a coded aperture design that is simultaneously capable of high spatial and spectral resolutions.

Optical setup. We design and validate a novel and versatile optical implementation for KRISM that uses a single camera and a single spatial light modulator to efficiently implement spatiallycoded spectral and spectrallycoded spatial measurements.
The contributions above are supported via an extensive set of simulations as well as real experiments performed using a lab prototype.
Limitations.
The benefits and contributions described above come with a key limitation:

[leftmargin=*]

Limited effectiveness when sensing few spectral bands. Our method is only advantageous if there are sufficient number of spectral bands and the hyperspectral image is sufficiently low rank. If we only seek to image with very few spectral bands or if the scene is not well approximated by a lowrank model, then the proposed method performs poorly against traditional sensing methods.
2. Prior work
Nyquist sampling of HSIs.
Classical designs for hyperspectral imaging based on Nyquist sampling include the tunable filter — which scans the scene, one narrow spectral band at a time, measuring the image associated with spectral bands at each instant — or using a pushbroom camera — which scans the scene one row at a time, measuring the entire spectrum associated with pixels on the row at each instant. Both approaches are timeconsuming as well as light inefficient since each captured image wastes a large percentage of light incident on the camera.
Multiplexed sensing.
The problem of reduced light throughput can be mitigated by the use of multiplexing. One of the seminal results in computational imaging is that the use of multiplexing codes including the Hadamard transform can often lead to significant efficiencies either in terms of increased SNR or faster acquisition [Harwit and Sloane, 1979]. This can either be spectral multiplexing [Mohan et al., 2008] or spatial multiplexing [Sun and Kelly, 2009]. While multiplexing mitigates light throughput issues, it does not reduce the number of measurements required. Sensing at high spatial and/or spectral resolution still requires long acquisition times to maintain a high SNR. Fortunately, HSIs have concise signal models that can be exploited to reduce the number of measurements.
Lowrank models for HSIs.
There are many approaches to approximate HSIs using lowdimensional models; this includes group sparsity in transform domain[Rasti et al., 2013], low rank model [Li et al., 2012; Golbabaee and Vandergheynst, 2012], as well as lowrank and sparse model [Waters et al., 2011; Saragadam et al., 2017]. Of particular interest to this paper is the lowrank modeling of HSIs when they are represented as a 2D matrix (See Figure 2). Lowrank models of HSIs and their variants have found numerous uses in vision and graphics including color constancy [Finlayson et al., 1994], color displays [Kauvar et al., 2015], endmember detection [Winter, 1999], source separation [Hui et al., 2018]
[Saragadam et al., 2017], compressive imaging [Golbabaee and Vandergheynst, 2012] and denoising [Zhao and Yang, 2015]. Chakrabarti and Zickler [2011] also provide empirical justification that HSIs of natural scenes are well represented by low rank models.Compressive hyperspectral imaging.
The lowrank model has also been used for compressive sensing (CS) of HSIs. CS aims to recover a signal from a set of linear measurements that are fewer than its dimensionality [Baraniuk, 2007]. CS achieves this by modeling the sensed signal using lower dimensional representations — lowrank matrices being one such example. The technique that is most relevant to this paper is that of row/column projection [Fazel et al., 2008]. Here, the measurement model is restricted to obtaining row and column projections of a matrix. Given a matrix , and measurement operators , the measurements acquired are of the following form,
When the matrix has a rank , it can be shown that it is sufficient to acquire images and spectral profiles with . In contrast, the method proposed in this paper requires only a number of measurements proportional to the rank of the matrix; however, these measurements are adaptive to the scene. At an increased cost of optical complexity, adaptive sensing promises accurate results with far fewer measurements than CS.
Hyperspectral imaging architectures.
Several architectures have been proposed for CS acquisition of HSIs. The DualDisperser Coded Aperture Snapshot Spectral Imager (DDCASSI) [Gehm et al., 2007] obtains a single image, multiplexed in both spatial and spectral domains by dispersing the image using a prism, passing it through a coded aperture and then recombining using a second prism. In contrast, the Single Disperser CASSI (SDCASSI) [Wagadarikar et al., 2008] which relies on a single prism that does a spatial coding using a binary mask followed by a spectral dispersion with a prism. Baek et al. [2017] disperse the image by placing a prism right before an SLR camera. The HSI is then reconstructed by studying the dispersion of color at the edges in the obtained RGB image. Takatani et al. [2017] instead propose a snapshot imager that relies on a combination faced of reflectors with with filters. Various other snapshot techniques have been proposed which rely on the general idea of spacespectrum multiplexing [Cao et al., 2016; Lin et al., 2014a; Jeon et al., 2016]. While snapshot imagers require only a single image, they often produce HSIs with reduced spatial or spectral resolutions.
Significant improvements can be obtained by acquiring multiple measurements instead of a single snapshot image. Kittle et al. [2010] obtaining multiple SDCASSI like measurements by moving the coded aperture. Li et al. [2012] relied on spatiallymultiplexed spectral measurements of the scene to reconstruct the HSI. Lin et al. [2014b] improved upon spatiallymultiplexed CS by separately coding spatial and spectral domains. To this end, all existing methods tradeoff spatial resolution, or spectral resolution, or take long durations to capture. A key insight is that most of these methods are nonadaptive — a sharp contrast to the proposed approach. Table 1 provides an overview of various HS imaging strategies and their relative merits. We next discuss the concept of Krylov subspaces for lowrank approximation of matrices, which motivates iterative and adaptive techniques and paves way to the proposed method.
Krylov subspaces.
Central to the proposed method is a class of techniques, collectively referred to as Krylov subpaces, for estimating singular vectors of matrices. Recall that the Singular Value Decomposition (SVD) of a matrix
is given as , where and are orthonormal matrices, referred to as the singular vectors, and is a diagonal matrix of singular values. Krylov subspace methods allow for efficient estimation of the singular values and vectors of a matrix and enjoy two key properties. First, the singular values and vectors are computed using operators that probe the matrix via left and right multiplications with vectors, i.e., we do not need direct access to the elements of the matrix . Second, the top singular values and vectors of a lowrank matrix can be estimated using a small set of matrixvector multiplications. These two properties are invaluable when the matrix is very large or when it is implicitly represented using operators or, as is the case in this paper, the matrix is the scene’s HSI and we only have access to optical implementations of the underlying matrixvector multiplications.There are many variants of Krylov subspace techniques which differ mainly on their robustness to noise and model mismatches. The techniques in this paper are based on an implementation called the Lanczos bidiagonalization with full orthogonalization [Golub and Kahan, 1965; Hernandez et al., 2007]. Algorithm 1 summarizes this technique.
Optical computing of lowrank signals.
Matrixvector and matrixmatrix multiplications can often be implemented as optical systems. Such systems have been used for matrixmatrix multiplication [Athale and Collins, 1982], matrix inversion [Rajbenbach et al., 1987]
, as well as computing eigenvectors
[Kumar and Casasent, 1981]. Of particular interest to our paper is the optical computing of the light transport operator using Krylov subspace methods [O’Toole and Kutulakos, 2010]. The light transport matrix represents the linear mapping between scene illumination and a camera observing the scene. Each column of the matrix is the image of the scene when only a single illuminant is turned on. Hence, given a vector that encodes the scene illumination, the image captured by the camera is given as . By Helmholtz reciprocity, if we replaced every pixel of the camera by a light source and every illuminant with a camera pixel, then the light transport associated with the reversed illumination/sensing setup is given as . Hence, by colocating a projector with the camera and a camera with the scene’s illuminants, we have access to both left and rightmultiplication of the light transport operator with vectors; we can now apply Krylov subspace techniques for optically estimating a lowrank approximation to the light transport matrix. This delightful insight is one of the key results in [O’Toole and Kutulakos, 2010].This paper proposes a translation of the ideas in [O’Toole and Kutulakos, 2010] to hyperspectral imaging. However, as we will see next, this translation is not straightforward and requires novel imaging architectures.
3. Optical Krylov Subspaces for Hyperspectral Imaging
In this section, we provide a highlevel description of optical computing of HSIs using Krylov subspace methods.
Notation.
We represent HSIs in three different ways:

[leftmargin=*]

— a realvalued function over 2D space and 1D spectrum ,

— a threedimensional tensor with
spatial “pixels” and spectral bins; this is a sampling of the threedimensional function, and 
— a matrix with rows and columns, such that each column corresponds to the vectorized image at a specific spectrum.
The goal is to optically build the following two operators:

[leftmargin=*]

Spectrallycoded imager — Given a spectral code , we seek to measure the image given as
(1) The image corresponds to a grayscale image of the scene with a camera whose spectral response is .

Spatiallycoded spectrometer — Given a spatial code , we seek to measure a spectral measurement given as
(2) The measurement corresponds to the spectral measurement of the scene, wherein the spectral profile of each pixel is weighted by the corresponding entry in the spatial code .
Note that these two operators correspond to left and right multiplications in Algorithm 1; hence, we can optically implement Algorithm 1 to obtain top singular vectors and values of the HSI matrix .
Number of measurements required.
To obtain a rank approximation of the matrix , we would require at least spatiallycoded spectral measurements — each of dimensionality , and spectrallycoded images — each of dimensionality . Hence, the number of measurements required by the approach is proportional to and, over traditional Nyquist sampling, it represents a reduction in measurements by a factor of
(3) 
Given that the complexity of the scene is encoded in its lowrank, we can envision dramatic reductions in measurements required over Nyquistsampling techniques especially when sensing at high spatial and spectral resolutions (see Table 1).
Challenges in implementing operators and .
Spatiallycoded spectral measurements have been implemented in the context of compressive hyperspectral imaging [Sun and Kelly, 2009]. Here, light from a scene is first focused onto a spatial light modulator, that performs spatial coding, and then directed into a spectrometer. For spectral coding at a highresolution, we could replace the sensor in a spectrometer with a spatial light modulator; subsequently, we can form and measure an image of the coded light using a lens. However, highresolution spectrometers invariably use a slit aperture that produces a large onedimensional blur in the spatial image due to diffraction. We show in Section 4 that simultaneous spatiospectral localization is not possible with either a slit or an open aperture. This leads our discussion to the design of optimal binary coded apertures which enable high spectral and spatial resolutions. Subsequently, in Section 6, we present the design of KRISM — a novel and versatile imaging system, and validate its performance in Section 7.
4. Coded apertures for simultaneous sensing of space and spectrum
In this section, we introduce an optical system capable of simultaneously resolving space and spectrum at high resolutions. Along the way, we explain why traditional systems for measuring images and spectrum are mutually incompatible.
4.1. Optical setup
The ideas proposed in this paper rely on the optical setup shown in Figure 15 which is a slight modification of a traditional spectrometer. An objective lens focuses a scene onto the its image plane, that we denote as P1. This is followed by two 4 relays with a coded aperture placed on the first pupil plane, P2, and a diffraction grating placed at the plane marked as P3. We are interested in the intensity images formed at the planes marked at the “rainbow plane” P4 and the “spatial plane” P5, and their relationship to the image formed on P1, the coded aperture, and the grating parameters.
We assume that the field formed on the plane P1 is incoherent and, hence, we only need to consider its intensity and how it propagates, and largely ignore its phase. Let be the intensity of the field as a function of spatial coordinates and wavelength . Let be the aperture code placed at the plane P2, be the groove density (measured in grooves per unit length) of the diffraction grating in P3, and be the focal length of the lenses that form the 4 relays. The hyperspectral field intensity at the plane P4 is given as
(4) 
where is the scene’s overall spectral content defined as
The intensity field at the spatial plane P5 is given as
(5) 
where
is the 2D spatial Fourier transform of the aperture code
, and denotes twodimensional spatial convolution along and axes. These expressions arise from Fourier optics [Goodman, 2005] and their derivations are provided in the supplemental material.Image formed at the rainbow plane P4.
A camera with spectral response placed at the rainbow plane would measure
(6) 
where . Here, the dimensionless term provides a scaling of the spectrum of the scene and indicates the resolving power of the diffraction grating. For example, we used a focal length and a grating with groove density grooves/mm for the prototype discussed in Section 6; here, . This implies that the spectrum is stretched by a factor of . Therefore, a 1nm of the spectrum maps to 30 m, which is about 67 pixelwidths on the cameras that we used. The key insight this expression provides is that the image is the convolution of the scene’s spectrum — denoted as a 1D image — with the aperture code . This implies that we can measure the spectrum of the scene, albeit convolved with the aperture code on this plane; this motivates our naming of this plane as the rainbow plane.
Image at the spatial plane P5.
A camera with the spectral response placed at the spatial plane P5 would measure
(7) 
is a “spatial image” in that spectral components of the HSI have been integrated out. Hence, we refer to P5 as the spatial plane.
Implementing KRISM operations.
The derivation above suggests that we get a spatial image of the scene formed at the spatial plane P5 and a spectral profile at the rainbow plane P4. We can therefore build the two operators central to KRISM by coding the light at one of the planes while measuring it at the other. For the spectrallycoded imager , we will place an SLM at the rainbow plane P4 while measuring the image, with a camera, at P5. For the spatiallycoded spectrometer , we will place an SLM at P3 — which optically identical to P5 — while measuring the image form at P4.
Effect of the aperture code on the scene’s HSI.
Introducing an aperture code on the plane P2 can be interpreted as distorting the scene’s HSI in two distinct way. First, a spectral blur is introduced whose point spread function (PSF) is a scaled copy of the aperture code . Second, a spatial blur is introduced for each spectral band whose PSF is the power spectral density (PSD) of the aperture code, suitably scaled. With this interpretation, the images formed on planes P4 and P5 are a spectral and spatial projection, respectively, of this new blurred HSI. Our proposed technique measures a lowrank approximation to this blurred HSI and we can, in principle, deblur it to obtain the true HSI of the scene. However, the spatial and spectral blur kernels may not always be invertible. As we show next, the choice of the aperture is critical and that traditional apertures such as a slit in spectrometry and an open aperture in imaging will not lead to invertible blur kernels.
4.2. Failure of slits and open apertures
We now consider the effect of the traditional apertures used in imaging and spectrometry — namely, an open aperture and a slit, respectively — on the images formed at the rainbow and the spatial planes. Suppose that the aperture code is a box function of width mm and height mm, i.e.,
Its Fourier transform is the product of two sincs
The spatial image is convolved with the PSD scaled by and so, the blur observed on it has a spatial extent of units. Suppose that and m, the observed blur is . The rainbow plane image , on the other hand, simply observes a box blur whose spatial extent is . Armed with these expressions, we can study the effect of an open and a slit apertures on the spatial and rainbow images.



Scenario #1 — An open aperture.
Suppose that mm, then we can calculate the spatial blur to be m in both height and width, and hence, we can expect a very sharp spatial image of the scene. The blur on the rainbow image has a spread of mm; for relay lenses with focal length mm and grating with groove density grooves/mm, this would be equivalent of a spectral blur of nm. Hence, we cannot hope to achieve high spectral resolution with an open aperture.
Scenario #2 — A slit.
A slit is commonly used in spectrometers; suppose that we use a slit of width m and height mm. Then, we expect to see a spectral blur of nm. The spatial image is blurred along the yaxis by a m blur and along the xaxis by a m blur; effectively, with a m pixel pitch, this would correspond to a 1D blur of 100 pixels. In essence, the use of a slit leads to severe loss in spatial resolution.
Figure 4 shows images formed at the rainbow and spatial planes for various aperture codes. This validates our claim that conventional imagers are unable to simultaneously achieve high spatial and spectral resolutions due to the nature of the apertures used. We next design apertures with carefully engineered spectral and spatial blurs, which can be deblurred in postprocessing.
4.3. Design of aperture codes
We now design an aperture code that is capable of resolving both space and spectrum at highresolutions. Our use of coded apertures is inspired by seminal works in coded photography for motion and defocus deblurring [Raskar et al., 2006; Veeraraghavan et al., 2007; Levin et al., 2007].
Observation.
Recall that the rainbow plane image is a convolution between a 1D spectral profile and a 2D aperture code . This convolution is one dimensional, i.e., along the axis; hence, we can significantly simplify the code design problem by choosing an aperture of the form
with being as large as possible. The choice of the rect function along the axis leads to a high light throughput. In addition to this, from the separability of Fourier transformations as well as convolutions, we can show that the resulting spatial blur along direction is compact.
For ease of fabrication, we further restrict the aperture code to be binary and of the form
(8) 
where when and zero otherwise. Hence, the mask design reduces to finding an bit codeword . The term , with units in lengths, specifies the physical dimension of each bit in the code. We fix its value based on the desired spectral resolution. For example, for mm and grooves/mm, a desired spectral resolution of nm would require m.
Our goal is to design masks that enable the following:

[leftmargin=*]

High light throughput. For a given code length , we seek codes with large light throughput which is equal to the number of ones in the code word

Invertibility of the spatial and spectral blur. The code is designed such that the resulting spatial and spectral blur are both invertible.
An invertible blur can be achieved by engineering its PSD to be flat. Given that the spectrum is linearly convolved with , a point DFT of the code word captures all the relevant components of the PSD of . Denoting this point DFT of as , we aim to maximize its minimum value in magnitude. Recall from (7) that the spatial PSF is the power spectral density (PSD) of , with suitable scaling. Specifically, the Fourier transform of spatial blur is given by , where is the linear autocorrelation of and represents spatial frequencies. From (8), we get,
(9) 
where is the discrete linear autocorrelation of . Thus, it is sufficient to maximize to obtain an invertible spatial blur.
We select an aperture code that leads to invertible blurs for both space and spectrum by solving the following optimization problem:
(10) 
under the constraint that the elements of are binaryvalued, and is a constant. For code length sufficiently small, we can simply solve for the optimal code via exhaustive search of all code words. For our optical implementation, we used and an exhaustive search for the optimal code took over a day. The resulting code and its performance in delivering high spatial and spectral resolutions is shown in Figure 5 and 6; we used m and mm for this result. However, such a brute force optimization is not scalable for larger codes. Instead of searching for optimal codes, we can search for approximately optimal codes by iterating over a few candidate solutions. This strategy has previously been explored in [Raskar et al., 2006], where 6 million candidate solutions are searched for a 52dimensional code. Another alternative would be to start with optimal codes for other applications, such as Msequences and MURA codes, and perturb them by flipping bits and searching for the best solution.
Figure 6 shows the frequency response of both spectral and spatial blurs for the 32dimensional optimized code. The advantages of optimized codes are immediately evident — an open aperture has several nulls in spectral domain, while a slit attenuates all high spatial frequencies. The optimized code retains all frequencies in both domains, while increasing light throughput throughput.
5. Synthetic experiments
We tested KRISM via simulations on four different datasets and compared it against alternate approaches for hyperspectral imaging.
Datasets.
We used the hyperspectral data set in [Arad and BenShahar, 2016], which consists of several high spatial and spectral resolution hyperspectral images covering 519 bands in visible and near IR wavelengths. We downsampled the HSI to to keep computation with CASSItype simulations tractable. We also used datasets from [Choi et al., 2017] with 31 spectral bands to compare with learningbased techniques. Finally, we present one example from [SpecTIR, [n. d.]] to compare KRISM against Row/Column CS proposed in [Fazel et al., 2008]. Several more simulation results have been shown in supplementary material.
Competing methods.
We compared KRISM against four competing CS hyperspectral imaging techniques. All methods were simulated with 60dB readout and photon noise and 12bit quantization. Specifics of each simulation model are given below:

[leftmargin=*]

KRISM: We performed a rank4 approximation of the HSI with 6 spatial and 6 spectral measurements. Diffraction blur due to coded aperture was introduced both in spectral and spatial profiles. Deconvolution was then done using Wiener deconvolution in both spectral and spatial domains.

Row/Col CS [Fazel et al., 2008]: As with KRISM, we performed a rank4 approximation of the HSI by computing random Gaussian projections with 6 spatial and 6 spectral measurements. Diffraction blur due to coded aperture was introduced as well.

CASSI [Choi et al., 2017]: We used the Single Disperser CASSI (SDCASSI) architecture from [Wagadarikar et al., 2008] for obtaining a single coded image and recovered the HSI using spectral prior from [Choi et al., 2017]. For data from [Arad and BenShahar, 2016], we the number of spectral bands to 31, as recovery with more than 31 bands lead to highly inaccurate results.

CASSI++ [Kittle et al., 2010]: We used the multiframe CASSI architecture for obtaining coded images, and recovered the HSI with sparsity prior in wavelet domain. As with CASSI, we reduced the number of spectral bands to 31.

SPC++ [Sun and Kelly, 2009]: We obtained spatiallymultiplexed spectral measurements with random permuted Hadamard matrix and recovered the HSI with sparsity in wavelet domain.
Simulation results.
Figure 7 shows simulation results with various CS techniques for two datasets. We define reconstruction SNR as
where is the Frobenius norm and is the recovered version of . While SDCASSI with spectral prior shows promising results with small number of spectral bands, performance quickly degrades once the bands are increased. CASSI++ and SPC++ perform better under varying conditions, but the gains are only visible under low compression ratios (). For large number of spectral bands, only KRISM and Row/Col CS have good RSNR scores. We hence compared the two methods on the SpecTIR dataset [SpecTIR, [n. d.]] that has 178 spectral bands. As seen in Figure 8, KRISM outperforms Row/Col CS by about 8dB, thus establishing the superiority of the proposed adaptive imaging technique. Based on these simulations, we can conclude that KRISM is indeed a compelling methodology when the spatial or spectral resolution is very high — a key requirement for several hyperspectral image processing algorithms. When the number of spectral bands are smaller, the gains are modest but, nonetheless, present. In the next section, we provide an optical schematic for implementing KRISM.

6. The KRISM Optical setup
We now present an optical design for implementing the two operators presented in Section 3 and analyzed in Section 4. For efficiencies in implementation, we propose a novel design that combines both operators into one compact setup.
Figure 9 shows a schematic that uses polarization to achieve both operators with a single SLM and a single camera. First, in Figure 9(a), an SLM is placed 2 away from the grating, and an image sensor 2 away from the SLM, implementing spectrally coded spatial measurement operator . In Figure 9(b), light follows an alternate path where in the SLM is 4 away from the grating; the camera is still 2 away from the SLM. This light path allows us to achieve the spatialcoded spectral measurement operator . The two light pathways are combined using a combination of polarizing beam splitters (PBS) and liquid crystal rotators (LC). The input light is prepolarized to be either Spolarized or Ppolarized. When the light is Ppolarized, the SLM is effectively 2 units away from the grating leading to implementation of , the spectrallycoded imager. When the light is Spolarized, the SLM is 4 units away, provided the polarizing beamsplitter, PBS 3 was absent. To counter this, an LC rotator is placed before PBS 3 that rotates Spolarization to Ppolarization when switched on. Hence, when Slight is input, in conjugation with the rotator being switched on, we achieve the operator , spatiallycoded spectrometer. By simultaneously controlling the polarization of input light and the LC rotator, we can implement both and operators with a single camera and SLM.
Figure 10 shows our lab prototype with the entire light pathway including the coded aperture placed in the relay system between the objective lens and diffraction grating. The input polarization is controlled by using a second LC rotator with a polarizer, placed before the diffraction grating. Finally, an auxiliary camera is used to image the pattern displayed on the SLM. This camera is used purely for alignment of the pattern displayed on the SLM. A detailed list of components can be found in the supplemental material.
Calibration.
Our optical setup requires three broad calibration processes. The first one is camera to SLM calibration. We used an auxiliary camera (Component 12 in Figure 10) that is directly focused on the SLM for this purpose. The second one is calibration of wavelengths. We used several narrowband filters to figure out the location of wavelengths. Finally, third one is radiometric calibration. We used a calibrated TungstenHalogen light source to estimate the spectral response of the setup. A detailed description of the calibration procedure can be found in the supplementary material.
System characterization.
Spectral resolution (FWHM) of the setup was computed using several 1nm narrowband filters across visible wavelengths. Our optical setup provided an FWHM of 2.9nm. Spatial resolution was computed by capturing photo of a Siemen star, and then deconvolving with a pointspread function, obtained by capturing image of a pinhole. The frequency at 30% of Modulation Transfer Function, MTF30 was found to be an average of 0.4 line pairs/pixel. All computation details, as well as relevant figures, can be found in the supplementary material.
Diffraction due to LCoS pattern.
Since the SLM is placed at the conjugate plane of either spectral or spatial measurements, the displayed pattern introduces diffraction blur, which potentially makes the measurement model nonlinear. To counter this, we add a constant offset to both positive and negative patterns, which makes the diffraction blur compact enough that the nonlinearities can be neglected. Since the model is linear, the added offset does not change the final answer.
Spectral deconvolution.
Measurements by our optical system return spectra at each point, convolved by the aperture code. To get the true spectrum, we deconvolved the measured singular vector using a smoothness prior. The specific objective function we used:
(11) 
where is the true spectrum, is the measured spectrum, is the aperture code, is the first order difference of , and is weight of penalty term. Solution to 11 was computing using conjugate gradient descent. Higher favors smoother spectra, and hence is preferred for illuminants with smooth spectra, such as tungstenhalogen bulb or white LED. For peaky spectra such as Compact Fluorescent Lamp (CFL), a lower value of is preferred. In our experiments, we found to be appropriate for peaky spectra, whereas, was appropriate for experiments with tungstenhalogen illumination. We provided performance of deconvolution with some other algorithms in the supplementary section.
Spatial deconvolution.
Equation (7) suggests that the spatial blur kernel varies across different spectral bands. More specifically, the blur kernels at two different spectra are scaled versions of each other. However, we observed that the variations in blur kernels were not significant when we image over a small waveband — for example, the visible waveband of nm. Given this, we approximate the spatial blur as being spectrally independent, which leads to the following expression:
where is the spatial blur. We estimated the spatial blur kernel by imaging a pinhole and subsequently deconvolved the spatial singular vectors. We used a TV prior based deconvolution using the technique in [BioucasDias and Figueiredo, 2007] using the image of a pinhole as the PSF. Details of the deconvolution procedure are in the supplementary section.
7. Real Experiments
We present several results from real experiments which show the effectiveness of our method. Specifically, we evaluate the ability to measure singular vectors with high accuracy, ability to get high spatial resolution and high spectral resolution. Unless specified, experiments involved a capture of a rank4 approximation of the HSI, with 6 spectral and 6 spatial measurements. Apart from reconstruction SNR, we present Spectral Angular Mapper (SAM) [Yuhas et al., 1992] similarity between spectra measured by our optical setup and that measured with a spectrometer. SAM between two vectors and is defined as . Since this can be treated as highdimensional approximation of angle between two vectors, we use SAM to evaluate closeness of spatial and singular vectors as well. We computed the singular vectors with our own implementation of Algorithm 1 in Matlab, with and implemented as function handles. The routine was initialized with allones spatial image to speed up convergence. The spatial resolution was pixels and the spectral resolution was bands between 400nm to 700nm, with 3 nm FWHM. The PSF for image blur was estimated by placing a pinhole in front of the camera. Deconvolution was then done using TVpenalty on spatial singular vectors. For verifying spectroradiometric capabilities, we obtained spectral measurements at a small set of spatial points using an Ocean Optics FLAME spectrometer.
Visualization of Lanczos iterations
Figure 11 shows iterations for the “Color checker” scene in Figure 13. The algorithm starts with capture of the brightest parts of the image, corresponding to the spectralon, and the white and yellow patches. Consequently, by iteration 5, the blue and red parts of the image are isolated. The iterations are representative of the signal energy in various wavelengths. Maximum energy is concentrated in yellow wavelengths, due to tungstenhalogen illuminant and spectral response of the camera. This is then followed by the red wavelengths, and finally the blue wavelengths.



Comparison of measured singular vectors.
We obtain the complete hyperspectral image through a permuted Hadamard multiplexed sampling in the spectral domain for comparison with groundtruth singular vectors. We chose a scene with four colored dice for this purpose, shown in Figure 12 (a). We then computed 4 singular vectors of spectrally Hadamardmultiplexed data. Figure 12 shows a comparison of the spatial and spectral singular vectors. The singular vectors obtained via Krylov subspace technique are close to the ones obtained through Nyquist sampling. On an average, the reconstruction accuracy between KRISM and Hadamard multiplexing was found to be greater than 30dB, while the angle between the singular vectors was no worse than , with the top three singular vector having an error of or small. While Hadamard sampling method took 49 minutes for 256 measurements, KRISM took under 2 minutes for 6 spatial and 6 spectral measurements, thus offering a speedup of . Notice that Hadamard multiplexing gives a increase in speed up, and hence, KRISM offers a speed up of over Nyquist methods.
Color checker.
Since our setup is optimized for viewing in 400nm700nm, we evaluated our system for color reconstruction of the 24color Macbeth color chart. The Macbeth color chart consists of a wide gamut of colors in visible spectrum that are spectrally well separated, and forms a good test bench for visible spectrometry We placed the “Color passport” and spectralon plug in front of our camera and illuminated it with a tungstenhalogen DC light source. The spectralon has a spectrally flat response, and hence helps estimate the spectral response of the illuminant+spectrometer system. This enables measurement of true radiance of the color swatches. Since the spectra is smooth, we used least squares recovery of the spectrum, with penalty on the first difference of spectral singular vectors. The captured data was then normalized by dividing spectrum of all points with the spectrum of the spectralon. Figure 13 shows the captured image against reference color chart. Also shown are spectra at select locations plotted along with ground truth spectra. On an average, the RSNR between spectra measured by KRISM and that measured by spectrometer is greater than 19dB, while the SAM is less than .
Peaky spectrum illumination
We imaged a small toy figurine of “Chopper”, placed under compact fluorescent lamp illumination (CFL), which has a very peaky illumination, to test high spatiospectral resolving capability. Figure 1 shows the rendered RGB image and spectra at a representative location. Spectra at a selected spatial point, as measured by KRISM, and Flame spectrometer have been shown as well. The SAM between spectrum measured by KRISM and that measured by spectrometer was found to be . Notice that the location of the peaks, as well as the heights match very well. Indeed, the chopper example establishes the high spatiospectral resolution capabilities of KRISM.
Other real experiments.
Figure 14 shows several real world examples captured with our optical setup, with a diverse set of objects. For verification with groudtruth, we captured spectral data at select spatial locations. The “Dice” and “Objects” scene captures several more colorful objects with high texture. The zoomedin pictures show the spatial resolution, while the comparison of spectra highlights the fidelity of our system as a spectral measurement tool. “Ace” scene was captured by placing the toy figurine under CFL illuminant, which is peaky. We could not obtain groundtruth with a spectrometer, as the toy was too small to reliably probe with a spectrometer. The peaks are located within 2nm of groundtruth peaks, and the relative heights of the peaks match the underlying color. “Crayons” scene consists of numerous colorful wax crayons illuminated with a tungstenhalogen lamp. The closeness of spectra w.r.t spectrometer readings shows the spectral performance of our setup. Finally, “Feathers” consists of several colorful feathers illuminated by tungstenhalogen lamp. The fine structure of feathers is well captured by our setup. Across the board, our setup promises high spatial as well as spectral resolution. Most importantly, all this data was obtained with only 6 spatial and 6 spectral measurements, a compression over Nyquist method.





8. Discussion and conclusion
We presented a novel hyperspectral imaging methodology called KRSIM, and provided an associated novel optical system for enabling optical computation of hyperspectral scenes to acquire the top few singular vectors in a fast and efficient manner. Through several real experiments, we establish the strength of KRISM in three important aspects: 1) the ability to capture singular vectors of the hyperspectral image with high fidelity, 2) the ability to capture an approximation of the hyperspectral image with or fast acquisition rate compared to Nyquist sampling, and 3) the ability to measure simultaneously at high spatial and spectral resolution. We believe that our setup will trigger several new experiments in adaptive imaging for fast and high resolution hyperspectral imaging.
Added advantages.
There are two more advantages to KRISM. One, since we capture the top few singular vectors directly, there is a data compression from the acquisition itself. Two, the only recovery time involves deconvolution of the spectra, which is far less than the time required for recovery of hyperspectral images from CS measurements.
Effect of photon noise
Although Krylov subspace based methods are very robust to noise [Simoncini and Szyld, 2003], the quality of the singular vectors degrades as the rank of acquisition is increased. This is primarily due to photon noise, as we progressively block most of the energy contained in initial singular vectors. This can be mitigated by increasing the exposure time of measurements for higher singular vectors. All said, the problem of noisy higher singular vectors exists with any kind of sampling scheme and hence needs separate attention via a good noise model.
9. Acknowledgement
The authors would like to acknowledge Prof. Ioannis Gkioulekas (Robotics Institute, Carnegie Mellon University) for the valuable feedback. The authors acknowledge support via the NSF CAREER grant CCF1652569, the NGIA grant HM04761712000, and the Intel ISRA on compressive sensing.
References
 [1]

Arad and
BenShahar [2016]
Boaz Arad and Ohad
BenShahar. 2016.
Sparse Recovery of Hyperspectral Signal from
Natural RGB Images. In
European Conf. Computer Vision
.  Athale and Collins [1982] Ravindra A Athale and William C Collins. 1982. Optical matrix–matrix multiplier based on outer product decomposition. Appl. optics 21, 12 (1982), 2089–2090.
 Baek et al. [2017] SeungHwan Baek, Incheol Kim, Diego Gutierrez, and Min H Kim. 2017. Compact singleshot hyperspectral imaging using a prism. ACM Trans. Graphics 36, 6 (2017), 217.
 Baraniuk [2007] Richard G Baraniuk. 2007. Compressive sensing. IEEE Signal Processing Magazine 24, 4 (2007), 118–121.
 BioucasDias and Figueiredo [2007] José M BioucasDias and Mário AT Figueiredo. 2007. A new TwIST: Twostep iterative shrinkage/thresholding algorithms for image restoration. IEEE Trans. Image processing 16, 12 (2007), 2992–3004.
 Cao et al. [2016] Xun Cao, Tao Yue, Xing Lin, Stephen Lin, Xin Yuan, Qionghai Dai, Lawrence Carin, and David J Brady. 2016. Computational snapshot multispectral cameras: toward dynamic capture of the spectral world. IEEE Signal Processing Magazine 33, 5 (2016), 95–108.

Chakrabarti and
Zickler [2011]
A. Chakrabarti and T.
Zickler. 2011.
Statistics of RealWorld Hyperspectral Images.
In
IEEE Conf. Computer Vision and Pattern Recognition
.  Choi et al. [2017] Inchang Choi, Daniel S. Jeon, Giljoo Nam, Diego Gutierrez, and Min H. Kim. 2017. HighQuality Hyperspectral Reconstruction Using a Spectral Prior. ACM Trans. Graphics 36, 6 (2017), 218:1–13.
 Cloutis [1996] EA Cloutis. 1996. Review Article Hyperspectral geological remote sensing: evaluation of analytical techniques. International J. Remote Sensing 17, 12 (1996), 2215–2242.
 Fazel et al. [2008] M Fazel, E Candes, B Recht, and P Parrilo. 2008. Compressed sensing and robust recovery of low rank matrices. In Asilomar Conf. Signals, Systems and Computers.
 Finlayson et al. [1994] Graham D Finlayson, Mark S Drew, and Brian V Funt. 1994. Color constancy: generalized diagonal transforms suffice. JOSA A 11, 11 (1994), 3011–3019.
 Gehm et al. [2007] ME Gehm, R John, DJ Brady, RM Willett, and TJ Schulz. 2007. Singleshot compressive spectral imaging with a dualdisperser architecture. Optics Express 15, 21 (2007), 14013–14027.
 Golbabaee and Vandergheynst [2012] Mohammad Golbabaee and Pierre Vandergheynst. 2012. Hyperspectral image compressed sensing via lowrank and jointsparse matrix recovery. In ICASSP.
 Golub and Kahan [1965] Gene Golub and William Kahan. 1965. Calculating the singular values and pseudoinverse of a matrix. J. of the Society for Industrial and Appl. Mathematics, Series B: Numerical Analysis 2, 2 (1965), 205–224.
 Goodman [2005] Joseph W Goodman. 2005. Introduction to Fourier optics. Roberts and Company Publishers.
 Harsanyi and Chang [1994] Joseph C Harsanyi and CI Chang. 1994. Hyperspectral image classification and dimensionality reduction: An orthogonal subspace projection approach. IEEE Trans. Geoscience and Remote Sensing 32, 4 (1994), 779–785.
 Harwit and Sloane [1979] Martin Harwit and Neil J Sloane. 1979. Hadamard transform optics.
 Hernandez et al. [2007] V Hernandez, JE Roman, A Tomas, and V Vidal. 2007. Restarted Lanczos bidiagonalization for the SVD in SLEPc. STR8, Tech. Rep. (2007).
 Hui et al. [2018] Zhuo Hui, Kalyan Sunkavalli, Sunil Hadap, and Aswin C. Sankaranarayanan. 2018. Illuminant Spectrabased Source Separation using Flash Photography. In IEEE Intl. Conf. Computer Vision and Pattern Recognition (CVPR).
 Jeon et al. [2016] Daniel S Jeon, Inchang Choi, and Min H Kim. 2016. Multisampling compressive video spectroscopy. In Computer Graphics Forum.
 Kauvar et al. [2015] Isaac Kauvar, Samuel J Yang, Liang Shi, Ian McDowall, and Gordon Wetzstein. 2015. Adaptive color display via perceptuallydriven factored spectral projection. ACM Trans. Graphics (TOG) 34, 6 (2015), 165–1.
 Kim et al. [2012] Min H Kim, Todd Alan Harvey, David S Kittle, Holly Rushmeier, Julie Dorsey, Richard O Prum, and David J Brady. 2012. 3D imaging spectroscopy for measuring hyperspectral patterns on solid objects. ACM Trans. Graphics (TOG) 31, 4 (2012), 38.
 Kittle et al. [2010] David Kittle, Kerkil Choi, Ashwin Wagadarikar, and David J Brady. 2010. Multiframe image estimation for coded aperture snapshot spectral imagers. Appl. optics 49, 36 (2010), 6824–6833.
 Kumar and Casasent [1981] BVK Vijaya Kumar and David Casasent. 1981. Eigenvector determination by iterative optical methods. Applied optics 20, 21 (1981), 3707–3710.
 Levin et al. [2007] Anat Levin, Rob Fergus, Frédo Durand, and William T Freeman. 2007. Image and depth from a conventional camera with a coded aperture. ACM transactions on graphics (TOG) 26, 3 (2007), 70.
 Li et al. [2012] Chengbo Li, Ting Sun, Kevin F Kelly, and Yin Zhang. 2012. A compressive sensing and unmixing scheme for hyperspectral data processing. IEEE Trans. Image Processing 21, 3 (2012), 1200–1210.
 Lin et al. [2014a] Xing Lin, Yebin Liu, Jiamin Wu, and Qionghai Dai. 2014a. Spatialspectral encoded compressive hyperspectral imaging. ACM Trans. Graphics 33, 6 (2014), 233.
 Lin et al. [2014b] Xing Lin, Gordon Wetzstein, Yebin Liu, and Qionghai Dai. 2014b. Dualcoded compressive hyperspectral imaging. Optics letters 39, 7 (2014), 2044–2047.
 Mohan et al. [2008] Ankit Mohan, Ramesh Raskar, and Jack Tumblin. 2008. Agile spectrum imaging: Programmable wavelength modulation for cameras and projectors. In Computer Graphics Forum.
 O’Toole and Kutulakos [2010] Matthew O’Toole and Kiriakos N Kutulakos. 2010. Optical computing for fast light transport analysis. ACM Trans. Graph. 29, 6 (2010), 164.
 Pan et al. [2003] Zhihong Pan, Glenn Healey, Manish Prasad, and Bruce Tromberg. 2003. Face recognition in hyperspectral images. IEEE Trans. Pattern Analysis and Machine Intelligence 25, 12 (2003), 1552–1560.
 Rajbenbach et al. [1987] Henri Rajbenbach, Yeshayahu Fainman, and Sing H Lee. 1987. Optical implementation of an iterative algorithm for matrix inversion. Appl. optics 26, 6 (1987), 1024–1031.
 Raskar et al. [2006] Ramesh Raskar, Amit Agrawal, and Jack Tumblin. 2006. Coded exposure photography: motion deblurring using fluttered shutter. In ACM Transactions on Graphics (TOG), Vol. 25. 795–804.
 Rasti et al. [2013] Behnood Rasti, Johannes R Sveinsson, Magnus O Ulfarsson, and Jon Atli Benediktsson. 2013. Hyperspectral image denoising using a new linear model and sparse regularization. In Geoscience and Remote Sensing Symposium, IEEE Intl.
 Saragadam et al. [2017] Vishwanath Saragadam, Jian Wang, Xin Li, and Aswin Sankaranarayanan. 2017. Compressive spectral anomaly detection. In Intl. Conf. Comp. Photography.
 Simoncini and Szyld [2003] Valeria Simoncini and Daniel B Szyld. 2003. Theory of inexact Krylov subspace methods and applications to scientific computing. SIAM J. Scientific Computing 25, 2 (2003), 454–477.
 SpecTIR [[n. d.]] SpecTIR. [n. d.]. SpecTIR, Advanced Hyperspectral and Geospatial Solutions. http://www.spectir.com/freedatasamples/. ([n. d.]).
 Sun and Kelly [2009] Ting Sun and Kevin Kelly. 2009. Compressive sensing hyperspectral imager. In Computational Optical Sensing and Imaging.
 Takatani et al. [2017] Tsuyoshi Takatani, Takahito Aoto, and Yasuhiro Mukaigawa. 2017. Oneshot Hyperspectral Imaging using Faced Reflectors. In CVPR.
 Tarabalka et al. [2010] Yuliya Tarabalka, Jocelyn Chanussot, and Jon Atli Benediktsson. 2010. Segmentation and classification of hyperspectral images using watershed transformation. Pattern Recognition 43, 7 (2010), 2367–2379.
 Veeraraghavan et al. [2007] Ashok Veeraraghavan, Ramesh Raskar, Amit Agrawal, Ankit Mohan, and Jack Tumblin. 2007. Dappled photography: Mask enhanced cameras for heterodyned light fields and coded aperture refocusing. ACM Trans. Graph. 26, 3 (2007), 69.
 Wagadarikar et al. [2008] Ashwin Wagadarikar, Renu John, Rebecca Willett, and David Brady. 2008. Single disperser design for coded aperture snapshot spectral imaging. Appl. Optics 47, 10 (2008), B44–B51.
 Waters et al. [2011] Andrew E Waters, Aswin C Sankaranarayanan, and Richard Baraniuk. 2011. SpaRCS: Recovering lowrank and sparse matrices from compressive measurements. In Adv. Neural Info. Processing Systems.
 Winter [1999] Michael E. Winter. 1999. NFINDR: An algorithm for fast autonomous spectral endmember determination in hyperspectral data. (1999).
 Yasuma et al. [2010] Fumihito Yasuma, Tomoo Mitsunaga, Daisuke Iso, and Shree K Nayar. 2010. Generalized assorted pixel camera: postcapture control of resolution, dynamic range, and spectrum. IEEE trans. on image processing 19, 9 (2010), 2241–2253.
 Yuhas et al. [1992] Roberta H Yuhas, Alexander FH Goetz, and Joe W Boardman. 1992. Discrimination among semiarid landscape endmembers using the spectral angle mapper (SAM) algorithm. (1992).
 Zhao and Yang [2015] YongQiang Zhao and Jingxiang Yang. 2015. Hyperspectral image denoising via sparse representation and lowrank constraint. IEEE Trans. Geoscience and Remote Sensing 53, 1 (2015), 296–308.
Appendix A Appendix
The appendix is organized as follows.

[leftmargin=*]

Sections C — Code design. We provide details on the design of coded aperture, comparisons with alternate codes such as Msequences, and specifications of the deconvolution technique used for both space and spectrum.

Section D — Details of optical implementation. We provide a comprehensive list of components used for building the setup.

Section E —Explanation of design choices. We discuss some of the design considerations for implementing our optical setup.

Section F — Calibration. This section serves as a guide to calibrate the proposed optical setup.

Section H — Additional simulation results. We present some more simulations with emphasis on performance across a diverse set of datasets.
Appendix B Coded apertures for simultaneous sensing of space and spectrum
Section 4 provided a brief derivation of the effect of coded aperture on spatial and spectral blur. We do a more rigorous proof here. Figure 15 explains the optical setup that we will consider for all our derivations. In particular, we place a coded aperture at plane P2, which introduces diffraction blur in both spatial and spectral measurements. We obtain spectral measurements on P4 and spatial measurements on P5. The goal of this section is to derive the relationship between measurements on P4 and P5 to the coded aperture and the scene’s HSI.
b.1. Assumptions
The first assumption is that the image is spatially incoherent, an assumption that is realistic for most real world settings. This implies that the spatial frequencies add up in intensities and not amplitudes. Next, the diffraction grating is assumed to disperse light along xaxis, which means that there is no dispersion along the yaxis. Finally, we assume an ideal thin lens model for all the relay lenses . This implies that the Fourier transform property of ideal thin lenses holds for all computations.
b.2. Basics
The derivation in the sequel relies on the so called Fourier transform property of lenses [Goodman, 2005]. Suppose that the complexvalued phase field at the plane is given as , where denote spatial coordinates on the plane and denotes the wavelength of light. Lets place an ideal thin lens with focal length at and whose optical axis is aligned to the axis. The Fourier transforming property says that the complex phasor field that is formed at the plane is given as
where is the 2D Fourier transform of along the first two dimensions.
We can compute the 2D Fourier transform of using the scaling property.
The negative signs in the argument of comes from the Fourier transform being the Hermitian of the inverse Fourier transform. If we now placed a second ideal thin lens of focal length at , then the field at can be computed as
The assembly above, with two lenses at and is referred to as a system. As we see above, the system replicates the field at at , barring a flip of the coordinate axis; this property is useful for the following discussion.
b.3. Propagation of signal
We will use Figure 15 as a guide for the derivation. An objective lens focuses a scene onto its image plane, denoted as P1. Assuming that all light is incoherent, let the complex phasor at P1 be denoted as ; note that intensity of this complex field is the hyperspectral image that we seek to measure, i.e.,
Since we assume an incoherent model, we analyze the system for a point light source and then extend it to a generic image by adding up only intensities.
Field at Plane P1.
Consider a point light source at with complex amplitude . The overall phasor field at P1 is given as
Field at Plane P2.
Using Fourier transform property of lens, we get the field on plane P2 to be,
(12) 
where is the continuous 2D Fourier transform of along the first two dimensions. The field just after the coded aperture is given by,
(13) 
Field at Plane P3.
Using Fourier transform property of lens a second time, the field just before the diffraction grating is
(14) 
where is the continuous 2D FT of . Since the diffraction grating is assumed to disperse along xaxis, we model it as a series of infinite slits, given by,
(15) 
where is the grove density of the diffraction grating, measured in grooves per unit length. The factor ensures that light does not get amplified as it propagates through the setup. The field just after the diffraction grating is hence given by,
(16) 
Field at the Rainbow Plane P4.
To calculate the field at P4, we first need an expression for , the 2D Fourier transform of .
(17) 
The field on plane P4 is given as:
(18) 
where ”” represents continuous 2D convolution. Since we are only interested in the first order of diffraction, we set k = 1, giving us,
(19) 
Field at the Spatial Plane P5.
Finally, the field on plane P5 is given by,
(20) 
b.4. Measurement by camera
Note that a camera can only measure intensity of the field. Assuming a camera with spectral response , the measurement on plane P4 is
(21) 
where is a 2D convolution and
Extending to a generic incoherent image case, we get the following expression,
(22) 
where
and
We can observe from (22) that the image formed at the rainbow plane P4 is a convolution of the scene’s spectrum modified with the camera response as well as with the square of the aperture code.
Similarly, the intensity on plane P5 is given by,
(23) 
where can be seen as the spatial PSF. Extending to a generic incoherent image case, we get the following expression,
(24) 
The expression above suggests that the image associated with each spectral channel is convolved with a different blur kernel; further, the blur kernel for different wavelengths are simple scaled versions of each other. This implies that we need to design codes and deconvolve them for each channel separately. However, this can be avoided for the following two reasons. First, since the kernels are scaled versions of each other, if one of them is invertible then so are the rest. Second, since our optical setup imaged within a narrow spectral band of nm, the variance in spatial blur is not significant. Since the blur of coded aperture is compact, the pixellation makes the differences in blur sizes insignificant. Figure 16 shows a comparison of spatial blur of optimized code at two representative wavelengths of nm and nm as seen by a camera with pixel width. As is evident, the blur size is largely invariant to wavelength, and hence we assumed that the spatial blur is spectrally invariant, and hence, the image formed at plane P5 is given as
For optimizing the spatial blur kernel, we chose a design wavelength of nm. However, if we were to image over a larger span of wavelengths, such as nm, spectral bands have to be deconvolved individually.
Appendix C Code selection


(Spectrum SNR: 21.5dB; Spatial RSNR: 25.3) 


(Spectrum SNR: 15.3dB; Spatial RSNR: 25.4) 


(Spectrum SNR: 20.1dB; Spatial RSNR: 25.3) 
We characterize performance of some coded aperture for spatiospectral resolution, and then explain deconvolution of spectrum and space for the real optical setup.
c.1. Choice of codes
We briefed a way of obtaining doubly invertible codes for high spatial and spectral resolution in Section 4.3. In this section, we show an alternate design, and evaluate their performance for spatial deconvolution and spectral deconvolution.
c.1.1. Spatially compact codes
Instead of pursuing spatially invertible codes, we can optimize for codes which introduce compact spatial blur. In such a case, the goal would be to suppress side lobes of the PSF of spatial PSF. Let be the spatial PSF created by . If be the first and second maximum peak heights of , then maximizing the ratio leads to spatially imperceptible blur. Combined with an invertible spectral blur, we formulate the overall objective function as:
(25) 
where is a constant. As with doubly invertible codes, we brute forced the optimal solution.
c.1.2. Msequences
Maximal length sequences, or Msequences for short, are optimal codes when using circular convolution. Their PSD is flat and hence is desirable as blur functions. However, since our convolution is linear, Msequences are not necessarily the optimal choice.
c.2. Performance comparison
We compare doubly invertible codes, spatially compact codes and Msequences for their performance in spatial deconvolution and spectral deconvolution. To test spectral deconvolution, we created a spectrum with two closely spaced narrowband peaks and a broadband peak, and blurred them with various codes. Readout noise and shot noise were added to adhere to real world measurements. Finally, deconvolution was done with wiener filter.
To test spatial deconvolution, we used Airforce target and blurred with the scaled PSD of the pupil codes, and added noise. Deconvolution was done with a TV prior in all cases.
Figure 17 shows a comparison of performance for spectral and spatial deconvolution. As expected, doubly invertible codes perform the best for spectral deconvolution, while spatially compact codes perform worse. Spatially compact codes perform the best in this case, while double invertible codes come close. Since hyperspectral imaging requires good spatial as well as spectral resolution, we chose doubly invertible codes.
c.3. Spectral deconvolution
Recall that our optical setup measures a blurred version of the true spectrum. Specifically, if the aperture code is and the spectrum to be measured is , our optical setup measures , where is additive white gaussian noise. The addition of noise prevents us from simply dividing in Fourier domain. Fortunately, since the aperture code was designed to be invertible, it is fairly robust to noise. A naive solution, such as Wiener deconvolution, hence, works very well. If the noise is too high, or the spectra is known to be smooth, we can impose an penalty on the difference and solve the following optimization problem:
(26) 
where is the measured spectrum, is the aperture code, is the spectrum to be recovered, and is the first difference of . Further priors, such as positivity constrains give better results as well. Figure 19 shows a comparison of spectra of various commonly available light sources, as well as a comparison with spectrometric measurements. We showed results for three forms of deconvolution, namely, Wiener deconvolution, regularized deconvolution, and positivity constrained deconvolution. Figure 18 shows results for some narrowband filters. We computed the central wavelength and Full Width Half Max (FWHM) for each filter and compared it against the numbers provided by the company. As expected, the FWHM of 1nm filters is between 2nm and 3nm, as the FWHM of our optical setup is 3nm. FWHM for 10nm filters and 40nm filters is close to the ground truth values.
c.4. Spatial deconvolution
The presence of a coded aperture introduces a blur in spatial domain, which is equal to the scaled power spectral density of the coded aperture. Our optimization procedure accounts for invertible spectral blur as well as invertible spatial blur. Hence, deconvolution is stable even in the presence of noise. While naive deconvolution procedures such as Wiener deconvolution or RichardsonLucy work moderately well, we imposed total variance (TV) penalty on the edges to get more accurate results. Figure 20 shows blurred image of Siemen star, and deconvolution with TVprior, Wiener and RichardsonLucy algorithms. As with spectral deconvolution, spatial deconvolution is fairly robust to choice of algorithm. We chose TVprior, as it returned the sharpest results. Figure 21 shows MTF before and after deconvolution, with TVprior. Deconvolution signficantly improves the MTF30 value, which jumps from 0.15 line pairs/pixel to 0.5 line pairs/pixel.
Appendix D List of components
Figure 22 shows an annotated image of the optical setup we built along with a list of components along with their company and item number. The system was optimized for a central wavelength of 580nm and hence the relay arm till the diffraction grating has been tilted at with respect to the diffraction grating to correct for schiempflug. Lenses in the relay arm are tilted by with respect to the diffraction grating so that the objective can be aligned with the relay arm without any further tilt. The first beamsplitter (component 8) and the second turning mirror (component 10) have been placed on a kinematic platform to correct for misalignments in the cage system. It is of importance that we chose an LCoS instead of a DMD for spatial light modulation. The reasons:

[leftmargin=*]

Since the output after modulation by DMD is not rectilinear to the DMD plane, it introduces further scheimpflug, which is hard to correct.

DMD acts as a diffraction grating with Littrow configuration, as it is formed of extremely small mirror facets. This will introduce artifacts in measurements which are nonlinear.
Some more design considerations are enumerated below:

[leftmargin=*]

Lenses. We used 100mm achromats for all lenses except the last lens before cameras. Achromats were the most compact and economical choice for our optical setup, while offering low spatial and spectral distortion.

Polarizing beam splitters. We used wire grid polarizing beamsplitters everywhere to ensure low dependence of spectral distortion on angle of incidence, and increase the contrast ratio.

Using an objective lens for measurement camera. Note that a lens is placed between the LCoS and measurement sensor which converts spatiallycoded image to spectrum and coded spectrum to spatial image. Instead of using another achromat, we used an objective lens set to focus at infinity. Since objective lenses are free of any distortions, and are optimized to focus at infinity, this significantly improves resolution of measurements.

Diffraction grating. We used an offtheshelf transmissive diffraction grating with 300 groves/mm, which offered most compact spectral dispersion without any overlap with higher orders. This ensured that there would be no spectral vignetting at any point in the setup. Further discussion about the choice of grove density is provided in E

Polarization rotators. We bought offtheshelf Liquid Crystal (Lc) shutters and peeled off the polarizers on either sides to construct polarization shutters. This is the most economic option, while offering contrast ratios as high as 400:1. However, the drawback is that the settling time is 330ms, which prevents their usage at very high rate. A natural workaround is to incorporate binary Ferroelectric shutters which have a low latency rate of 1ms. However, since ours was only a lab prototype, we decided to go with the cheaper option.
Appendix E Design considerations
We outline some design choices we made and the rationale behind them in this section.
e.1. Choice of code size
The pupil code has two free parameters, the length of the code and the pitch size, . The two parameters control the invertibility of spectrum and imperceptibility of spatial images. To understand our design choices, we present constrains and physical dimensions of various measurements. Let each lens in the optical setup have a focal length and aperture diameter . Let pixel pitch of measurement camera be . This implies that the camera can capture all spatial frequencies up to . Let the size of grating be in each dimension and its grove density be .
We capture wavelengths from to . The grating equation is given by, , where is the groves spacing and is the order of diffraction, 1 in our case. Solving for angular spread of spectrum, we get, The size of spectrum then is . The minimum resolvable wavelength is To avoid vignetting in a system, we require that the pupil plane be no larger than , giving us .