Log In Sign Up

Image reconstruction from dense binary pixels

Recently, the dense binary pixel Gigavision camera had been introduced, emulating a digital version of the photographic film. While seems to be a promising solution for HDR imaging, its output is not directly usable and requires an image reconstruction process. In this work, we formulate this problem as the minimization of a convex objective combining a maximum-likelihood term with a sparse synthesis prior. We present MLNet - a novel feed-forward neural network, producing acceptable output quality at a fixed complexity and is two orders of magnitude faster than iterative algorithms. We present state of the art results in the abstract.


A Picture is Worth a Billion Bits: Real-Time Image Reconstruction from Dense Binary Pixels

The pursuit of smaller pixel sizes at ever increasing resolution in digi...

Astronomical image reconstruction with convolutional neural networks

State of the art methods in astronomical image reconstruction rely on th...

Iterative PET Image Reconstruction Using Convolutional Neural Network Representation

PET image reconstruction is challenging due to the ill-poseness of the i...

Fast Proximal Gradient Methods for Nonsmooth Convex Optimization for Tomographic Image Reconstruction

The Fast Proximal Gradient Method (FPGM) and the Monotone FPGM (MFPGM) f...

Spatially-Adaptive Reconstruction in Computed Tomography using Neural Networks

We propose a supervised machine learning approach for boosting existing ...

Deep Kernel Representation for Image Reconstruction in PET

Image reconstruction for positron emission tomography (PET) is challengi...

Towards Low-Photon Nanoscale Imaging: Holographic Phase Retrieval via Maximum Likelihood Optimization

A new algorithmic framework is presented for holographic phase retrieval...

I Introduction

The pursuit of smaller pixel sizes at ever increasing resolution in digital image sensors is mainly driven by the stringent price and form-factor requirements of sensors and optics in the cellular phone market. Recently, Eric Fossum proposed a novel concept of an image sensor with dense sub-diffraction limit one-bit pixels (jots) [1], which can be considered a digital emulation of silver halide photographic film. This idea has been recently embodied as the EPFL Gigavision camera.

We denote by the radiant exposure at the camera aperture measured over a given time interval. This exposure is subsequently degraded by the optical point spread function denoted by the operator , producing the exposure of the sensor . The number of photoelectrons generated at pixel in time frame

follows the Poisson distribution with the rate

. A binary pixel compares the accumulated charge against a pre-determined threshold , outputting a one-bit measurement

. Thus the probability of a single binary pixel

to assume an ”on” value in frame is

. Our goal is to estimate an intensity field vector

best predicting given the measurement matrix .

In [2], a maximum likelihood (ML) approach was proposed. Assuming independent measurements, the negative likelihood function can be expressed as


In  [2] this objective is minimized w.r.t via standard iterative optimization techniques.

Ii Maximum Likelihood with Sparse Prior

Since the ML approach assumes no prior, it needs a large amount of binary measurements in order to achieve good reconstruction. Sparsity priors had been shown to give state of the art results in denoising tasks in general, and particularly in low light Poisson noise [3, 4]. In this work, we show that by introducing a similar sparsity spatial prior the number of measurements can be decreased significantly. Assuming the light intensity admits a non-linear sparse synthesis model , with the dictionary

and an element-wise non-linear transformation

such as the non-negativity enforcing function from [4], we may construct the estimator as , where


should be selected to best represent the tradeoff between the negative log-likelihood and the sparsity prior, in all experiments we selected empirically. The likelihood data fitting term is convex with a Lipschitz-continuous gradient (details are omitted due to lack of space), thus problem (2) can be solved using proximal algorithms such as FISTA [5]. Figures 2 and 3 show the significant improvement in image quality when using the sparse prior.

Iii Fast Approximation

Iterative solutions of (2) typically require hundreds of iterations to converge. This results in prohibitive complexity and unpredictable input-dependent latency unacceptable in real-time applications. To overcome this limitation, we follow the approach advocated by [6] and [7], in which a small number of ISTA iterations are unrolled into a feed-forward neural network, that subsequently undergoes supervised training on typical inputs. In our case, a single ISTA iteration can be written in the form


where , , ( is the step size used by ISTA) and is the two-sided shrinkage function. Each such operation may be viewed as a single layer of the network parametrized by , receiving as the input and producing as the output. Figure 1 depicts the network architecture, henceforth referred to as MLNet.

Fig. 1: MLNet architecture. A small number of ISTA iterations is unrolled into a feed-forward network. Each layer applies a non-linear transformation to the current iterate , parametrized by and

. Training these parameters using standard backpropagation on a set of representative inputs allows the network to approximate the output of the underlying iterative algorithm with much lower complexity.

When initializing the parameters as prescribed by the ISTA iteration and then adapting them by training that minimizes the reconstruction error of the entire network, the number of layers required to achieve comparable output quality on typical inputs is smaller by about two orders of magnitude than the number of corresponding ISTA iterations (see Figure 4). To the best of our knowledge, this is the first time a similar strategy is applied to reconstruction problems with a non-Euclidean data fitting term.

Iv Results

Figure 2 shows reconstruction results of an HDR image using ML with and without the sparse prior. FISTA was used to reconstruct overlapping patches that were subsequently averaged. An overcomplete dictionary was trained using -SVD[8]. Figure 3 shows reconstruction results of an emulated low-light image. Figure 4 demonstrates the superiority of MLNet over iterative ML reconstruction on the same image. In all experiments was initialized to the maximum dynamic range value.

(a) Binary image (b) Ground Truth
(c) ML (PSNR=) (d) FISTA (PSNR=)
(e) ML zoom in (f) FISTA zoom in

Fig. 2: High dynamic range image reconstruction. A five orders of magnitude HDR photoelectrons count image was assembled from multiple defocused raw images at different exposure times taken by a DSLR camera. (a) A single input binary image emulated by thresholding the photoelectrons count image using a predetermined threshold pattern, (b) Ground truth image produced by averaging and downsampling photoelectrons count images, (c) ML reconstruction without sparse prior (PSNR=). (d) ML reconstruction with a sparse prior (PSNR=). (e) and (f) are zoom in versions of images (c) and (d) respectively. Images (b)-(f) are shown on a logarithmic scale.
(a) Binary image (b) Ground Truth
(c) ML (PSNR=) (d) MLNet (PSNR=)

Fig. 3: Low light reconstruction. Lena’s image was normalized to the range of from which four input binary images were simulated using a uniform threshold pattern with values . Depicted is a zoomed in fragment of the image: (a) input binary image, (b) low-resolution ground truth, (c) ML reconstruction (PSNR=) and (d) MLNet reconstruction (PSNR=). MLNet was trained on a disjoint set of patches from generic images.
Fig. 4: Bounded reconstruction latency comparison. The plot shows the reconstruction quality for iterative algorithms (ISTA and FISTA) stopped after a given number of iterations, and the proposed MLNet with equivalent number of layers. As a reference, the performance of ML without the sparse prior is shown. Iteration represents the initialization for all algorithms. MLNet produces acceptable output quality and is about two orders of magnitude faster than ISTA and FISTA. The use of sparse prior has a clear advantage over pure ML.