1 Introduction
The proliferation of digital cameras has created an explosion of photographs being shared online. Most of these photographs exist in narrowgamut, lowdynamic range formats—typically those defined in the sRGB or Adobe RGB standards—because they are intended primarily for display through devices with limited gamut and dynamic range. While this workflow is efficient for storage, transmission, and displayprocessing, it is unfortunate for computer vision systems that seek to exploit online photo collections to learn object appearance models for recognition; reconstruct threedimensional (3D) scene models for virtual tourism; enhance images through processes like denoising and deblurring; and so on. Indeed, many of the computer vision algorithms required for these tasks use radiometric reasoning and therefore assume that image color values are directly
proportional to spectral scene radiance (called RAW color hereafter). But when a consumer camera renders—or globally tonemaps—its digital linear color measurements to an outputreferred, narrowgamut color encoding (called JPEG color hereafter), this proportionality is almost always destroyed.^{1}^{1}1Some comments on terminology. We use colloquial phrases RAW color and JPEG color respectively for linear, scenereferred color and nonlinear, outputreferred color. The latter does not include lossy compression, and should not be confused with JPEG compression. Also, we use (global) tonemap for any spatiallyuniform, nonlinear map of each pixel’s color, independent of the values of its surrounding pixels. It is nearly synonymous with the common phrase “radiometric response function” [1], but generalized to include crosschannel maps.
In computer vision, we try to undo the nonlinear effects of tonemapping so that radiometric reasoning about consumer photographs can be more effective. To this end, there are many methods for fitting parametric forms to the global tonemapping operators applied by color cameras—socalled “radiometric calibration" methods [2, 3, 1, 4, 5, 6, 7]—and it is now possible to fit many global tonemapping operators with high precision and accuracy [6]. However, once these maps are estimated, standard practice for undoing color distortion in observed nonlinear JPEG colors is to apply a simple inverse mapping in a onetoone manner [2, 3, 1, 4, 5, 6, 7]. This ignores the critical fact that forward tonemapping processes lead to loss of information that is highly structured.
Tonemapping is effective when it leads to narrowgamut images that are nonetheless visuallypleasing, and this necessarily involves nonlinear compression. Once the compressed colors are quantized, the reverse mapping becomes onetomany as shown in Fig. 1, with each nonlinear JPEG color being associated with a distribution of linear RAW colors that can induce it. The amount of color compression in the forward tonemap, as well as the (hue/lightness) directions in which it occurs, change considerably across color space. As a result, the variances of reversemapped RAW color distributions unavoidably span a substantial range, with some predicted linear RAW colors being much more reliable than others.
How can we know which predicted RAW colors are unreliable? Intuitively, the forward compression (and thus the reverse uncertainty) should be greatest near the boundary of the output gamut, and practitioners often leverage this intuition by heuristically ignoring all JPEG pixels that have values above or below certain thresholds in one or more of their channels. However, as shown in Fig.
2, the variances of inverse RAW distributions tend to change continuously across color space, and this makes the choice of such thresholds arbitrary. Moreover, this heuristic approach relies on discarding information that would otherwise be useful, because even in highvariance regions, the RAW distributions tell us something about the true scene color. This is especially true where the RAW distributions are strongly oriented (Fig. 1 and bottomleft of Fig. 2): even though they have high total variance, most of their uncertainty is contained in one or two directions within RAW color space.In this paper, we argue that vision systems can benefit substantially by incorporating a model of radiometric uncertainty when analyzing tonemapped, JPEGcolor images. We introduce a probabilistic approach for visual inference, where (a) the calibrated estimate of a camera’s forward tonemap is used to derive a probability distribution, for each tonemapped JPEG color, over the RAW linear scene colors that could have induced it; and (b) the uncertainty embedded in these distributions is propagated to subsequent visual analyses. Using a variety of cameras and new formulations of a representative set of classic inference problems (multiimage fusion, photometric stereo, and deblurring), we demonstrate that modeling radiometric uncertainty is important for achieving optimal performance in computer vision.
The paper is organized as follows. After related work in Sec. 2, Sec. 3 reviews parametric forms for modeling the global tonemaps of consumer digital cameras and describes an algorithm for fitting model parameters to offline training data. In Sec. 4, we demonstrate how any forward tonemap model can be used to derive perpixel inverse color distributions, that is, distributions for linear RAW colors conditioned on the JPEG color reported at each pixel. Section 5 shows how the uncertainty in these inverse distributions can be propagated to subsequent visual processes, by introducing new formulations of a representative set of classical inference tasks: image fusion (eg., [3]); threedimensional shape via Lambertian photometric stereo (eg., [8]); and removing camera shake via image deblurring (eg., [9]).
2 Related Work
The problem of radiometric calibration, where the goal is inverting nonlinear distortions of scene radiance that occur during image capture and rendering, has received considerable attention in computer vision. Until recently, this calibration has been formulated only for grayscale images, or for color images on a perchannelbasis by assuming that the “radiometric response function” in each channel acts independently [3, 1, 2, 4]. While early variants of this approach parametrized these response functions simply as an exponentiation (or “gamma correction”) with the exponent as a single model parameter, later work sought to improve modeling accuracy by considering more general polynomial forms [4]. Since these models have a relatively small number of parameters, they have featured in several algorithms for “selfcalibration”—parameter estimation from images captured in the wild, without calibration targets—through analysis of edge profiles [10, 11], image statistics [12, 13], or exposure stacks of images [3, 1, 2, 14, 15, 16].
However, perchannel models cannot accurately model the color processing pipelines of most consumer cameras, where the linear sensor measurements span a much wider gamut than the target output format. To be able to generate images that “look good” on limitedgamut displays, these cameras compress outofgamut and highluminance colors in ways that are as pleasing as possible, for example by preserving hue. This means that two scene colors with the same raw sensor value in their red channels can have very different red values in their mapped JPEG output if one RAW color is significantly more saturated than the other.
Chakrabarti et al. [5]
investigated the accuracy of more general, crosschannel parametric forms for global tonemapping in a number of consumer cameras, including multivariate polynomials and combinations of crosschannel linear transforms with perchannel polynomials. While they found reasonable fits for most cameras, the residual errors remained relatively high even though the calibration and evaluation were both limited to images of a single relatively narrowgamut chart. Kim et al.
[6]improved on this by explicitly reasoning about the mapping of outofgamut colors. Their model consists of a cascade of: a linear transform, a perchannel polynomial, and a crosschannel correction for outofgamut colors using radial basis functions. The forward tonemap model we use in this paper (Sec.
3) is strongly motivated by this work, although we find a need to augment the calibration training data so that it better covers the full space of measurable RAW values.All of these approaches are focussed on modeling the distortion introduced by global tonemapping. They do not, however, consider the associated loss of information, nor the structured uncertainty that exists when the distortion is undone as a preprocess for radiometric reasoning by vision systems. Indeed, while the benefit of undoing radiometric distortion has been discussed in the context of various vision applications (eg., deblurring [17, 11], highdynamic range imaging [18], video segmentation [19]), previous methods have relied exclusively on deterministic inverse tonemaps that ignore the structured uncertainty evident in Figures 1 and 2. The main goal of this of this paper is to demonstrate that the benefits of undoing radiometric distortion can be made significantly greater by explicitly modeling the uncertainty inherent to inverse tonemapping, and by propagating this uncertainty to subsequent visual inference algorithms.
A earlier version of this work [20] presented a direct method to estimate inverse RAW distributions from calibration data. In contrast, we introduce a twostep approach, where (a) calibrations images are used to fit the forward deterministic tonemap for a given camera, and (b) the model is inverted probabilistically. We find that this leads to better calibration and better inverse distributions with less calibration data.
Finally, we note that our proposed framework applies to stationary, global tonemapping processes, meaning those that operate on each pixel independent of its neighboring pixels, and are unchanging from scene to scene. This is applicable to many existing consumer cameras locked into fixed imaging modes (“portrait”, “landscape” etc. ), but not to local tonemapping operators that are commonly used for HDR tonemapping.
3 Camera Rendering Model
Before introducing our radiometric uncertainty model in Sec. 45, we review and refine here a model for the forward tonemaps of consumer cameras, along with offline calibration procedures. We use a similar approach to Kim et al. [6], and employ a twostep model to account for a camera’s processing pipeline—a linear transform and perchannel polynomial, followed by a corrective mapping step for outofgamut and saturated colors. The end result is a deterministic forward map from RAW tricolor sensor measurements at a pixel to corresponding rendered JPEG color values . Readers familiar with [6] may prefer to skip directly to Sec. 4, where we present how to invert the modeled tonemaps probabilistically.
3.1 Model
As shown in Fig. 3, we model the mapping as:
(1)  
(2) 
where define a linear color space transform, bounds its argument to the range , and quantizes its arguments to 8bit integers.
Equation (1) above corresponds to the commonly used perchannel polynomial model (eg., [4, 5]). Specifically, is assumed to be a polynomial of degree :
(3) 
where are model parameters. We use seventhorder polynomials (i.e., ) in our implementation.
Motivated by the observations in [6], this polynomial model is augmented with an additive correction function in (2
) to account for deviations that result from camera processing to improve the visual appearance of rendered colors. We use supportvector regression (SVR) with a Gaussian radial basis function (RBF) kernel to model these deviations,
i.e., each is of the form:(4) 
where and are also model parameters.
3.2 Parameter Estimation
Next, we describe an algorithm to estimate the various parameters of this model from a set of calibration images. Using pairs of corresponding RAWJPEG pixel values from the calibration set, we begin by estimating the parameters of the standard map in (1) as:
(5) 
where are scalar weights. Like [5], we also restrict such that is monotonically increasing.
The weights are chosen with two objectives: (a) to promote a better fit for nonsaturated colors, since we expect the corrective step in (2) to rectify rendering errors for saturated colors, and (b) to compensate for nonuniformity in the training set, i.e., more training samples in some regions over others. Accordingly, we set these weights as:
(6) 
where is a scalar function that varies from to with increasing saturation in , and the second term effectively resamples the training set uniformly over the RAW space. We set to correspond to the expected separation between uniformly sampled points in the cube.
Camera Name  Uniform  10 Exp.  10 Exp.  4 Exp.  8 Exp.  4 Exp.  8 Exp. 

8k Samples  1 Illum.  2 Illum.  4 Illum.  4 Illum.  6 Illum.  6 Illum.  
Panasonic DMC LX3  
Canon EOS 40D  
Canon PowerShot G9  
Canon PowerShot S90  
Nikon D7000 
Once we have set the weights, we use an approach similar to the one in [5] to minimize the cost in (5). We alternately optimize over only the linear or polynomial parameters, and respectively, while keeping the other constant. For fixed , the optimal can be found by using a standard quadratic program solver, since the cost in (5) is quadratic in , and the monotonicity restriction translates to linear inequality constraints. For fixed , we use gradient descent to find the optimal linear parameters .
We begin the above alternating iterations by assuming and setting directly using leastsquares on training samples for which is small— this is based on the assumption that is nearly linear for small values of . We then run the iterations till convergence, but since the cost in (5) is not convex, there is no guarantee that the iterations above will yield the global minimum. Therefore, we restart the optimization multiple times with estimates of corresponding to random deviations around the current optimum.
3.3 Datasets
Our database consists of images captured using a number of popular consumer cameras (see Tables I and II), using an XRite 140patch color checker chart as the calibration target as in [5] and [6]. However, although the chart contains a reasonably wide gamut of colors, these colors only span a part of the space of possible RAW values that can be measured by a camera sensor.
To be able to reliably fit the behavior of each camera’s tonemapping function in the full space of measurable scene colors, and to accurately evaluate the quality of these fits, we captured images of the chart under sixteen different illuminants (we used a standard Tungsten bulb paired with different commercially available gelbased color filters) to obtain a significantly wider gamut of colors. Moreover, for each illuminant, we captured images with different exposure values that range from one where almost all patches are underexposed to one where all are overexposed. We expect this collection of images to represent an exhaustive set that includes the full gamut of irradiances likely to be present in a scene.
Most of the cameras in our dataset allow access to the RAW sensor measurements, and therefore directly give us a set of RAWJPEG pairs for training and evaluation. For JPEGonly cameras, we captured a corresponding set of images using a RAWcapable camera. To use the RAW values from the second camera as a valid proxy, we had to account for the fact that the exposure steps in the two cameras were differently scaled (but available from the image metadata), and for the possibility that the RAW proxy values in some cases may be clipped while those recorded by the JPEG camera’s sensors were not. Therefore, the exposure stack for each patch under each illuminant from the RAW camera was used to estimate the underlying scene color at a canonical exposure value, and these were then mapped to the exposure values from the JPEG camera without clipping.
For a variety of reasons, we expect the quality of fit to be substantially lower when using a RAW proxy. Our model does not account for the fact that there may be different degrees of vignetting in the two cameras, and it implicitly assumes that spectral sensitivity functions in the RAW and JPEG cameras are linearly related (i.e., that there is a bijective mapping between linear color sensor measurements in one camera and those in the other), which may not be these case [22, 23]. Moreover, despite the fact that the white balance setting in each camera is kept constant—we usually use “daylight” or “tungsten”—we observe that some cameras exhibit variation in the white balance multipliers they apply for different scenes (different illuminants and exposures). For RAWcapable cameras, these multipliers are in the metadata and can be accounted for when constructing the calibration set. However, these values are not usually available for JPEGonly cameras, and thus introduce more noise in the calibration set.
3.4 Evaluation
For each camera, we estimated the parameters of our rendering model using different subsets of the collected RAWJPEG pairs, and measured the quality of this calibration in terms of root meansquared error (RMSE) values (between the predicted and true JPEG values, in terms of gray levels for an 8bit image) on the entire dataset. These RMSE values for the RAWcapable camera are reported in Table. I.
The first of these subsets is simply constructed with 8000 random RAWJPEG pairs sampled uniformly across all pairs, and as expected, this yields the best results. Since capturing such a large dataset to calibrate any given camera may be practically burdensome, we also consider subsets derived from a limited number of illuminants, and with a limited number of exposures perilluminant. The exposures are equally spaced from the lowest to the highest, and the subset of illuminants are chosen so as to maximize the diversity of included chromaticities— specifically, we order the illuminants such that for each , the convex hull of the RAW RG chromaticities of patches from the first illuminants has the largest possible area. This order is determined using one of the cameras (the Panasonic DMC LX3), and used to construct subsets for all cameras.
We find that different cameras show different degrees of sensitivity to diversity in exposures and illuminants, but using four illuminants with eight exposures represents a reasonable acquisition burden while also providing enough diversity for reliable calibration in all cameras. On the other hand, images of the chart under only a single illuminant, even with a large number of exposures, do not provide a diverse enough sampling of the RAW sensor space to yield good estimates of the rendering function across the entire dataset.
Table II shows RMSE values obtained from calibrating JPEGonly cameras, and as expected, these values are substantially (approx. 3 to 4 times) higher than those for RAWcapable cameras. Note that for this case, we only show results for the uniformly sampled training set, since we find parameter estimation to be unstable when using more limited subsets. This implies that calibrating JPEGonly cameras with a RAW proxy is likely to require the acquisition of larger sets of images, and perhaps more sophisticated fitting algorithms that explicitly infer and account for vignetting effects, scenedependent variations in white balance, etc. .
Fig. 4 illustrates the deviations due to the gamut correction step in our model, using the estimated rendering function for one of the calibrated cameras. We see that while this function is relatively smooth, its variations clearly can not be decomposed as perchannel functions. This confirms the observations in [6] on the necessity of including a crosschannel correction function.
Camera Name  Raw Proxy  RMSE 

(w/ 8k Samples)  
Fujifilm J10  Panasonic DMC LX3  10.24 
Panasonic DMC LZ8  Canon PowerShot G9  9.80 
Samsung Galaxy S3  Nikon D7000  11.47 
4 Probabilistic Inverse
Camera Name 
Deterministic Inverse  Prob. Inverse  Prob. Inverse  Prob. Inverse  Prob. Inverse 

Uniform 8k samples  Uniform 8k samples  10 Exp., 2 Illum.  4 Exp., 4 Illum.  8 Exp., 4 Illum.  
Panasonic DMC LX3  3.50  12.44  6.19  11.87  12.17 
Canon EOS 40D  3.45  13.06  0.18  11.87  12.22 
Canon PowerShot G9  2.01  8.33  7.12  7.80  8.16 
Canon PowerShot S90  3.83  11.34  10.47  10.96  10.91 
Nikon D7000  1.59  8.52  6.20  3.45  8.28 
The previous section dealt with computing an accurate estimate of tonemapping function applied by a camera. However, the main motivation for calibrating a camera is to be able to invert this tonemap and use available JPEG values back to derive radiometrically meaningful RAW measurements that are useful for computer vision applications. But it is easy to see that this inverse is not uniquely defined, since multiple sensor measurements can be mapped to the same JPEG output as a result of the quantization that follows the compressive map in (2), with higher intensities and saturated colors experiencing greater compression, and therefore more uncertainty in their recovery from reported JPEG values.
Therefore, instead of using a deterministic inverse function, we define the inverse probabilistically as a distribution of possible RAW measurements that could have been tonemapped to a given JPEG output . While formulating this distribution, we also account for errors in the estimate of the rendering function, treating them as Gaussian noise with variance , where is set to twice the intraining RMSE. Specifically, we define as:
(7) 
where is the normalization factor
(8) 
and is a prior on sensormeasurements. This prior can range from perpixel distributions that assert, for example, that broadband reflectances are more likely than saturated colors; to higherorder scenelevel models that reason about the number of distinct chromaticities and materials in a scene— we expect that the choice of will be different for different applications and environments. In this paper, we simply choose a uniform prior over all possible sensor measurements whose chromaticities lie in the convex hull of the training data.
Note that these distributions are computed assuming that the white balance multipliers are known (and incorporated in ). For some cameras, even with a fixed whitebalance setting, the actual whitebalance multipliers might vary from scene to scene. In these cases, the variable in the distribution above will be a linear transform^{2}^{2}2Note that whitebalance correction is typically a linear diagonal transform in the camera’s sensor space. For cameras that are not RAWcapable and have been calibrated with a RAW proxy, this will be a general linear transform in the proxy’s sensor space.— which is fixed for all pixels in a given image— away from a sceneindependent RAW measurement. This may be sufficient for applications that only reason about colors in a single image, or in multiple images of the same scene where the white balance multipliers can reasonably be expected to remain fixed, but other applications will need to address this ambiguity when using these inverse distributions.
While the expression in (7
) is the exact form of the inverse distribution—corresponding to a uniform distribution over all RAW values
predicted by the camera model to map to a given JPEG value , with added slack for calibration error—it has no convenient closed form. Practitioners will therefore need to compute them explicitly over a grid of possible values of for each JPEG value, or approximate them with a convenient parametric form for use in vision applications. We employ multivariate Gaussian distributions to approximate the exact form in (
7), as an example to demonstrate the benefits of using a probabilistic inverse in the remainder of this paper, but this is only one possible choice and the optimal representation for these distributions will likely depend on the target application and platform.Formally, we approximate as
(9) 
Note that here , in addition to being the mean of the approximate Gaussian distribution, is also the single best estimate of given (in the minimum leastsquares error sense) from the exact distribution in (7). And since (7) is derived using a camera model similar to that of [6], can be interpreted as the deterministic RAW estimate that would be yielded by the algorithm in [6].
The integrations in (4) are performed numerically, and by storing precomputed values of on a denselysampled grid to speed up distance computations. A MATLAB implementation is available on our project page [24], which takes roughly to compute the mean and covariance above for a single JPEG observation on a modern machine.
Tables III and IV report the mean empirical loglikelihoods, i.e., the mean value of across all RAWJPEG pairs in the validation set, for our set of calibrated cameras. For the RAWcapable cameras, we report these numbers for inverse distributions computed using estimates of the rendering function from different calibration sets as in Table I. As expected, better estimates of usually lead to better estimates of with higher loglikelihoods, and we find that our choice of calibrating using 8 exposures and 4 illuminants for RAW cameras yields scores that are close to those achieved by random samples across the entire validation set.
Moreover, to demonstrate the benefits of using a probabilistic inverse, we also report loglikelihood scores from a deterministic inverse that outputs single prediction ( from (4)) for the RAW value for a given JPEG. Note that strictly speaking, the loglikelihood in this case would be unless is exactly equal to . The scores reported in Tables III and IV are therefore computed by using a Gaussian distribution with variance equal to the mean prediction error (which is the choice that yields the maximum mean loglikelihood). We find that these scores are much lower than those from the full model, demonstrating the benefits of a probabilistic approach.
Finally, we show visualizations of the inverse distributions for four of the remaining RAWcapable cameras in our database. These plots represent the distributions using ellipsoids to represent mean and covariance, and can be interpreted as RAW values that are likely to be mapped to the same JPEG color by the camera. We see that these distributions are qualitatively different for different cameras, since different manufacturers typically employ their own strategies for compressing wide gamut sensor measurements to narrow gamut images that are visually pleasing. Moreover, the sizes and orientations of the covariance matrices can also vary significantly for different JPEG values obtained from the same camera.
Camera Name  Deterministic Inverse  Prob. Inverse 

Fujifilm J10  1.97  8.69 
Panasonic DMC LZ8  1.60  11.83 
Samsung Galaxy S3  2.23  7.51 
5 Visual Inference with Uncertainty
The probabilistic derendering model (4) provides an opportunity for vision systems to exploit the structured uncertainty that is unavoidable when inverting global tonemapping processes. To demonstrate how vision systems can benefit from modeling this uncertainty, we introduce inference algorithms that incorporate it for a broad, representative set of visual tasks: image fusion, photometric stereo, and deblurring.
5.1 Image Fusion
We begin with the task of combining multiple color observations of the same scene to infer accurate estimates of scene color. This task is essential to high dynamicrange (HDR) imaging from exposurestacks of JPEG images in the spirit of Debevec and Malik [2]; and variations of it appear when stitching images together for harmonized, wideview panoramas or other composites, and when inferring object color (intrinsic images and color constancy) or surface BRDFs from Internet images.
Formally, we consider the problem of estimating the linear color of a scene point from multiple JPEG observations captured at known exposures . Each observation is assumed to be the rendered version of sensor value , and we assume the camera has been precalibrated as described previously. The naive extension of RAW HDR reconstruction is to use a deterministic approach to derender each JPEG value , and then compute scene color using leastsquares. This strategy considers every derendered JPEG value to be equally reliable and is implicit, for example, in traditional HDR algorithms based on selfcalibration from nonlinear images [3, 1, 2, 14, 15, 16]. When the imaging system is precalibrated, the deterministic approach corresponds to ignoring variance information and computing a simple, exposurecorrected linear combination of the derendered means:
(10) 
where .
In contrast to the deterministic approach, we propose using the probabilistic inverse from Sec. 4 to weigh the contribution of each JPEG observation based on its reliability, thereby improving estimation. Estimation is also improved by the fact that inverse distributions from different exposures of the same scene color often carry complementary information, in the form of differentlyoriented covariance matrices. Specifically, each observation provides us with a Gaussian distribution ,
(11) 
where corresponds to the expected variance of photosensor noise, which is assumed to be constant and small relative to most . The mostlikely estimate of from all observations is then given by
(12) 
An important effect that we need to account for in this probabilistic approach is clipping in the photosensor. To handle this, we insert a check on the derendered distributions , and when the estimated mean in any channel is close to , we update the corresponding elements of to reflect a very high variance for that channel. The same strategy is also adopted for the baseline deterministic approach (10).
To experimentally compare reconstruction quality of the deterministic and probabilistic approaches, we use all RAWJPEG colorpairs from the database of colors captured with the Panasonic DMC LX3, corresponding to all colorpairs except those from the four training illuminants. We consider the color checker under a particular illuminant to be the target HDR scene, and we consider the differentlyexposed JPEG images under that illuminant to be the input images of this scene. The task is to estimate for each target scene (each illuminant) the true linear patch color from only two differentlyexposed JPEG images. The true linear patch color for each illuminant is computed using RAW data from all exposures, and performance is measured using relative RMSE:
(13) 
Figure 6 shows a histogram of the reduction in RMSE values when using the probabilistic approach. This is the histogram of differences between evaluating (13) with probabilistic and deterministic estimates across
distinct linear scene colors in the dataset and all possible unordered pairs of
exposures^{3}^{3}3These correspond to the different exposure time stops available on the camera: in relative time units. as input, excluding the trivial pairs for which (a total of test cases). In a vast majority of cases, incorporating derendering uncertainty leads to better performance.We also show in the right of the figure, for both the deterministic and probabilistic approaches, twodimensional visualizations of the error for each exposurepair. Each point in these visualizations corresponds to a pair of input exposure values , and the pseudocolor depicts the mean RMSE across all linear scene colors in the test dataset. (Diagonal entries correspond to estimates from a single exposure, and are thus identical for the probabilistic and deterministic approaches). We see that the probabilistic approach yields acceptable estimates with low errors for a larger set of exposurepairs. Moreover, in many cases it leads to lower error than those from either exposure taken individually, demonstrating that the probabilistic modeling is not simply selecting the better exposure, but in fact combining complementary information from both observations.
5.2 Photometric Stereo
Another important class of vision algorithms include those that deal with recovering scene depth and geometry. These algorithms are especially dependent on having access to radiometrically accurate information and have therefore been applied traditionally to RAW data, but the ability to reliably use tonemapped JPEG images, say from the Internet, is useful for applications like weather recovery [25], geometric camera calibration [26], and 3D reconstruction via photometric stereo [27]. As an example, we consider the problem of recovering shape using photometric stereo from JPEG images.
Photometric stereo is a technique for estimating the surface normals of a Lambertian object by observing that object under different lighting conditions and a fixed viewpoint [8]. Formally, given images under different directional lighting conditions, with being the direction and strength of the source, let denote the linear intensity recorded in a single channel at a particular pixel under the light direction. If and are the normal direction and albedo of the surface patch at the backprojection of this pixel, then the Lambertian reflectance model provides the relation . The goal of photometric stereo is to infer the material and shape given the set .
Defining a pseudonormal , the relation between the observed intensity and the scene parameters becomes
(14) 
Given three or more pairs, the pseudonormal is estimated simply using leastsquares as:
(15) 
where and are formed by stacking the light directions and measurements respectively. The normal can then simply be recovered as .
When dealing with a linear color image, Barsky and Petrou [28] suggest constructing the observations as a linear combination of the different channels of the color vectors . The coefficients are chosen to maximize the magnitude of the intensity vector , and therefore the stability of the final normal estimate , as
(16) 
The optimal
is then simply the eigenvector associated with the largest eigenvalue of the matrix
. Intuitively, this corresponds to the normalized color of the material at that pixel location.In order to use photometric stereo to recover scene depth from JPEG images, we need to first obtain estimates of the linear scene color measurements from the available JPEG values . Rather than apply the above algorithm asis to deterministic inversemapped estimates of , we propose a new algorithm that uses the distributions derived in Sec. 4.
First, we modify the approach in [28] to estimate the coefficient vector by maximizing the signaltonoise ratio (SNR), rather than simply the magnitude, of :
(17) 
It is easy to show that the optimal value of for this case is given by the eigenvector associated with the largest eigenvalue of the matrix . This choice of essentially minimizes the relative uncertainty in the set of observations , which are now described by univariate Gaussian distributions:
(18) 
From this it follows (eg., [29]) that the maximum likelihood estimate of the pseudonormal is obtained through weighted leastsquares, with weights given by the reciprocal of the variance, i.e.,
(19) 
where is constructed by stacking the means , and .
We evaluate our algorithm on JPEG images of a figurine captured using the Canon EOS 40D from a fixed viewpoint under directional lighting from ten different known directions. At each pixel, we discard the brightest and darkest measurements to avoid possible specular highlights and shadows, and use the rest to estimate the surface normal. The camera takes RAW images simultaneously, which are used to recover surface normals that we treat as ground truth.
Figure 7 shows the angular error map for normal estimates using the proposed method, as well as the deterministic baseline. We also show the corresponding depth maps obtained from the normal estimates using [30]. The proposed probabilistic approach produces smaller normal estimate errors and fewer reconstruction artifacts than the deterministic algorithm—quantitatively, the mean angular error is for the probabilistic approach, and for the deterministic baseline. We also ran the reconstruction algorithm on inverse estimates computed by simple gammacorrection on the JPEG values (a gamma parameter of 2.2 is assumed). These estimates had a much higher mean error .
5.3 Deconvolution
Deblurring is a common image restoration application and has been an active area of research in computer vision [31, 32, 33, 34, 35]. Traditional deblurring algorithms are designed to work on linear RAW images as input, but in most practical settings, only camera rendered JPEG images are available. The standard practice in such cases has been to apply an inverse tonemap assuming a simple gamma correction of , but as has been recently demonstrated [36], this approach is inadequate and will often yield poor quality images with visible artifacts due to the fact that deblurring algorithms rely heavily on linearity of the input image values.
While Kim et al. [36] show that more accurate inverse maps can improve deblurring performance, their maps are still deterministic. In this section, we explore the benefits of using a probabilistic inverse, and introduce a modified deblurring algorithm that accounts for varying degrees of uncertainty in estimates of RAW values from pixel to pixel.
Formally, given a blurry JPEG image , we assume that the corresponding blurry RAW image is related to a latent sharp RAW image of the scene as
(20) 
where is the blur kernel and is additive white Gaussian noise. The operator denotes convolution of the 3channel image with a singlechannel kernel , implemented as the convolution of the kernel with each image channel separately. Although (20) assumes convolution with a spatiallyuniform blur kernel, the approach in this section can be easily generalized to account for nonuniform blur (eg., as in [35]).
Deblurring an image involves estimating the blur kernel acting on the image, and then inverting this blur to recover the sharp image . In this section, we will concentrate on this second step, i.e., deconvolution, assuming that the kernel has already been estimated— say by applying the deterministic inverse and using a standard kernel estimation algorithm such as [31]^{4}^{4}4Empirically, we find that using a deterministic inverse suffices for the kernel estimation step, as it involves pooling information from the entire image to estimate a relatively small number of parameters..
We begin with a modern linearimage deconvolution algorithm [9] and adapt it to exploit the inverse probability distributions from Sec. 4. Given an observed linear blurred image and known kernel , Krishnan and Fergus [9] provide a fast algorithm to estimate the latent sharp image by minimizing the cost function
(21) 
where are gradient filters (horizontal and vertical finite difference filters in both [9] and our implementation), and the exponent is . The first term measures the agreement of with the linear observation while the second term imposes a sparse prior on gradients in a sharp image. The scalar weight controls the relative contribution of the two.
Given the tonemapped version of the blurry linear image , the deterministic approach would be to simply replace with its expected value in the cost function above. However, to account for the structured uncertainty in our estimate of and the fact that some values of are more reliable than others, we modify the cost function to incorporate both the derendered means and covariances :
(22) 
where , , and is the expected variance of photosensor noise.
The algorithm in [9] minimizes the original cost function (21) using an optimization strategy known as halfquadratic splitting. It introduces a new set of auxiliary variables corresponding to the gradients of the latent image , and carries out the following minimizations successively:
(23)  
(24) 
where is a scalar parameter that is increased across iterations. To minimize our modified cost function in (5.3), we need only change (5.3) to
(25) 
A complication arises from this change: while the original expression (5.3) can be computed in closedform in the Fourier domain, the same is not true for the modified version (5.3) because the first term has spatiallyvarying weights. Thus, to compute (5.3) efficiently, we use a second level of iterations based on variablesplitting to compute (5.3) in every outer iteration of the original algorithm.
Specifically, we introduce a new costfunction with new auxiliary variables :
(26) 
where is a scalar variable whose value is increased across iterations. Note that for , the expressions in (5.3) and (5.3) are equivalent. The minimization algorithm proceeds as follows: we begin by setting and then consider increasing values of equally spaced in the domain (in our implementation, we consider eight values that go from times the minimum to the maximum of all diagonal entries of all ). For each value of , we perform the following updates to and in sequence:
(27)  
(28) 
Note that (5.3) has the same form as the original (5.3) and can be computed in closedform in the Fourier domain. The updates to in (5.3) can also be computed in closedform independently for each pixel location .
We evaluate the proposed algorithm using three RAW images from a public database [37, 38] and eight (spatiallyuniform) camerashake blur kernels from the database in [34]. We generate a set of twentyfour blurred JPEG observations by convolving each RAW image with each blur kernel, adding Gaussian noise, and then applying the estimated forward map of the Canon EOS40D camera.^{5}^{5}5Note that for generating the synthetically blurred and groundtruth sharp images, we use the forward map as estimated using the uniformly sampled 8k training set, which as seen in Table I nearly perfectly predicts the camera map. During deconvolution, we use inverse and forward maps estimated using only the smaller “8 exposures, 4 illuminants” set. We compare deconvolution performance of the proposed approach to a deterministic baseline consisting of the algorithm in [9] applied to the derendered means . The error is measured in terms of PSNR values between the true and estimated JPEG versions of the latent sharp image. Since these errors depend on the choice of regularization parameter (which in practice is often set by hand), we perform a grid search to choose separately for each of the deterministic and probabilistic approaches and for every observed image, selecting the value each time that gives the lowest RMSE using the known groundtruth sharp image. This measures the best performance possible with each approach. We set the exponent value to as suggested in [9].
Figure 8 shows a histogram of the improvement in PSNR across the different images when using the probabilistic approach over the deterministic one. The probabilistic approach leads to higher PSNR values for all images, with a median improvement of 1.24 dB. Figure 8 also includes an example of deconvolution results from the two approaches, and we see that in comparison to the probabilistic approach, the deterministic algorithm yields oversmoothed results in some regions while producing ringing artifacts in others. This is because the single scalar weight is unable to adapt to the varying levels of “noise” or radiometric uncertainty in the derendered estimates. The ringing artifacts in the deterministic algorithm output correspond to regions of high uncertainty, where the probabilistic approach correctly employs a lower weight (i.e., ) for the first term of (5.3) and smooths out the artifacts by relying more on the prior (i.e., the second term). At the same time, it yields sharper estimates in regions with more reliable observations by using a higher weight for the fidelity term, thus reducing the effect of the smoothness prior.
6 Conclusion
Traditionally, computer vision algorithms that require accurate linear measurements of spectral radiance have been limited to RAW input, and therefore to training and testing on small, specialized datasets. In this work, we present a framework that enables these methods to be extended to operate, as effectively as possible, on tonemapped input instead. Our framework is based on incorporating a precise model of the uncertainty associated with global tonemapping processes, and it makes it possible for computer vision systems to take better advantage of the vast number of tonemapped images produced by consumer cameras and shared online.
To a vision engineer, our model of tonemapping uncertainty is simply a form of signaldependent Gaussian noise, and this makes it conceptually appealing for inclusion in subsequent visual processing. To prove this point, we introduced new, probabilistic adaptations of three classical inference tasks: image fusion, photometric stereo and deblurring. In all of these cases, an explicit characterization of the ambiguity due to tonemapping allows the computer vision algorithm to surpass the performance possible with a purely deterministic approach.
In future work, the use of inverse RAW distributions should be incorporated in other vision algorithms, such as depth from stereo, structure from motion, and object recognition. This may require exploring approximations for the inverse distribution different from the Gaussian approximation in (4). While some applications might require more complex parametric forms, others may benefit from simpler weighting schemes that are derived based on the analysis in Sec. 4.
Also, it will be beneficial to find ways of improving calibration for JPEGonly cameras. Our hope is that eventually our framework will enable a common repository of tonemap models (or probabilistic inverse models) for each imaging mode of each camera, making the totality of Internet photos more usable by modern computer vision algorithms.
Acknowledgments
The authors would like to thank the associate editor and reviewers for their comments. This material is based on work supported by the National Science Foundation under Grants no. IIS0905243, IIS0905647, IIS1134072, IIS1212798, IIS1212928, IIS0413169, and IIS1320715; by DARPA under the Mind’s Eye and MSEE programs; and by Toyota.
References
 [1] T. Mitsunaga and S. Nayar, “Radiometric self calibration,” in Proc. CVPR, 1999.
 [2] P. Debevec and J. Malik, “Recovering high dynamic range radiance maps from photographs,” in SIGGRAPH, 1997.
 [3] S. Mann and R. Picard, “Being ‘undigital’ with digital cameras: Extending dynamic range by combining differently exposed pictures,” in Proc. IS&T Annual Conf., 1995.
 [4] M. D. Grossberg and S. K. Nayar, “Modeling the space of camera response functions,” PAMI, 2004.
 [5] A. Chakrabarti, D. Scharstein, and T. Zickler, “An empirical camera model for internet color vision,” in Proc. BMVC, 2009.
 [6] S. Kim, H. Lin, Z. Lu, S. Süsstrunk, S. Lin, and M. Brown, “A new incamera imaging model for color computer vision and its application,” PAMI, 2012.
 [7] J.Y. Lee, Y. Matsushita, B. Shi, I. S. Kweon, and K. Ikeuchi, “Radiometric calibration by rank minimization,” PAMI, 2013.
 [8] R. Woodham, “Photometric method for determining surface orientation from multiple images,” Optical engineering, 1980.
 [9] D. Krishnan and R. Fergus, “Fast image deconvolution using HyperLaplacian priors,” in NIPS, 2009.
 [10] S. Lin, J. Gu, S. Yamazaki, and H.Y. Shum, “Radiometric calibration from a single image,” in Proc. CVPR, 2004.
 [11] Y. Tai, X. Chen, S. Kim, F. Li, J. Yang, J. Yu, Y. Matsushita, and M. Brown, “Nonlinear camera response functions and image deblurring: Theoretical analysis and practice,” PAMI, 2013.
 [12] H. Farid, “Blind inverse gamma correction,” IEEE Trans. Imag. Proc., 2002.
 [13] S. Kuthirummal, A. Agarwala, D. Goldman, and S. Nayar, “Priors for large photo collections and what they reveal about cameras,” in Proc. ECCV, 2008.
 [14] M. Grossberg and S. Nayar, “Determining the camera response from images: what is knowable?” PAMI, 2003.
 [15] E. Reinhard, G. Ward, S. Pattanaik, and P. Debevec, High dynamic range imaging. Elsevier, 2006.
 [16] B. Shi, Y. Matsushita, Y. Wei, C. Xu, and P. Tan, “Selfcalibrating photometric stereo,” in Proc. CVPR, 2010.
 [17] X. Chen, F. Li, J. Yang, and J. Yu, “A theoretical analysis of camera response functions in image deblurring,” Proc. ECCV, 2012.
 [18] C. Pal, R. Szeliski, M. Uyttendaele, and N. Jojic, “Probability models for high dynamic range imaging,” in Proc. CVPR, 2004.
 [19] M. Grundmann, C. McClanahan, S. B. Kang, and I. Essa, “Postprocessing approach for radiometric selfcalibration of video,” in Proc. ICCP, 2013.
 [20] Y. Xiong, K. Saenko, T. Darrell, and T. Zickler, “From pixels to physics: Probabilistic color derendering,” in Proc. CVPR, 2012.

[21]
C.C. Chang and C.J. Lin, “LIBSVM: A library for support vector machines,”
ACM Trans. Intelligent Systems and Technology, 2011, (software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm).  [22] J. Holm, “Capture color analysis gamuts,” in IS&T/SID Color and Imaging Conference, 2006.
 [23] J. Jiang, D. Liu, J. Gu, and S. Süsstrunk, “What is the space of spectral sensitivity functions for digital color cameras?” in IEEE Workshop on the Applications of Computer Vision (WACV), 2013.
 [24] [Online]. Available: http://vision.seas.harvard.edu/derender/
 [25] L. Shen and P. Tan, “Photometric stereo and weather estimation using internet images,” in Proc. CVPR, 2009.
 [26] J. Lalonde, S. Narasimhan, and A. Efros, “What do the sun and the sky tell us about the camera?” IJCV, 2010.
 [27] J. Ackermann, F. Langguth, S. Fuhrmann, and M. Goesele, “Photometric stereo for outdoor webcams,” in Proc. CVPR, 2012.
 [28] S. Barsky and M. Petrou, “Colour photometric stereo: Simultaneous reconstruction of local gradient and colour of rough textured surfaces,” in Proc. ICCV, 2001.
 [29] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2009.
 [30] R. Frankot and R. Chellappa, “A method for enforcing integrability in shape from shading algorithms,” PAMI, 1988.
 [31] R. Fergus, B. Singh, A. Hertzmann, S. Roweis, and W. Freeman, “Removing camera shake from a single photograph,” in SIGGRAPH, 2006.
 [32] Q. Shan, J. Jia, and A. Agarwala, “Highquality motion deblurring from a single image,” in SIGGRAPH, 2008.
 [33] J. Cai, H. Ji, C. Liu, and Z. Shen, “Blind motion deblurring from a single image using sparse approximation,” in Proc. CVPR, 2009.
 [34] A. Levin, Y. Weiss, F. Durand, and W. Freeman, “Understanding and evaluating blind deconvolution algorithms,” in Proc. CVPR, 2009.
 [35] O. Whyte, J. Sivic, A. Zisserman, and J. Ponce, “Nonuniform deblurring for shaken images,” IJCV, 2012.
 [36] S. Kim, Y.W. Tai, S. J. Kim, M. S. Brown, and Y. Matsushita, “Nonlinear camera response functions and image deblurring,” in Proc. CVPR. IEEE, 2012.
 [37] P. V. Gehler, C. Rother, A. Blake, T. Minka, and T. Sharp, “Bayesian color constancy revisited,” in Proc. CVPR, 2008.
 [38] L. Shi and B. Funt, “Reprocessed version of the Gehler color constancy dataset of 568 images,” 2010, accessed from http://www.cs.sfu.ca/~colour/data/.