1 Introduction
Impulse noise in a grayscale image is characterized by the appearance of random single pixel errors on the image. This type of error occurs when there are malfunctioning pixel elements in the camera sensors, faulty memory locations, or timing errors in analogtodigital conversion [7, 13].
We shall model impulse noise in the following way. Let and be the pixel values at position in the original and noisy images, respectively. Let be the dynamic range of the image, i.e., the set of possible pixel values. Let be a number, called the noise ratio, such that , then
(1) 
where and . If , we say that the image has saltandpepper noise [2]; and if is a (feasible) value chosen at random from , we say that the image has randomvalued impulse noise [5].
In color images, impulse noise is modeled by considering its presence as defined above in any of its color components taken individually as grayscale images.
There are results to detect and correct impulse noise in images, see for example the various techniques described in [5, 13]. Current approaches divide the task in two steps, firstly by identifying the likely pixels that have been corrupted by impulse noise, and secondly by applying a local filter to restore their values, see for example [4, 6]. Here we will concern ourselves with the restoration of color images for which corrupted pixels have already been “detected” by an impulse noise detection algorithm, say like ROLD [5], a detection statistic for randomvalued impulse noise. In particular, for our study, we will use the TID2008 image database [14], which provides reference images and corresponding randomvalued impulse noise counterpart examples. Unlike in the works referenced above, our goal will be to restore the noisy images with a recommender system which will substitute the pixels labeled as noise with a suggested value for each of them, as opposed to applying a local filter in each case.
This work is divided in the following way. In section 2 we introduce recommender systems in general and the collaborative filtering system that we propose in particular. We then lay out the experimental methodology that we will use to test our collaborative filtering system in section 3 and report on the results obtained in section 4. We follow with a discussion in section 5, where we talk about the bias vs variance tradeoff, see section 5.1; how to choose the two main parameters of our recommender system, the number of features and the regularization factor that control that tradeoff, see section 5.2; and a brief note on the new family of image decompositions that our recommender system is based upon and the implications for the system performance, see section 5.3. Finally, we present our conclusions and future work directions in section 6.
2 A collaborative filtering recommender system
Recommender systems are algorithms and techniques that provide suggestions for their users on a given topic or class of objects. For example, a recommender system could suggest what items to buy, what music to hear, what movie to watch, or what news to read [15]. In our case, we are going to implement a recommender system that will suggest what value to give to a missing or corrupted pixel in an image, a novel approach.
The particular flavor of recommender system that we are going to use is a collaborative filtering algorithm [15]. In this modality, “systems require recommendation seekers to express preferences by rating a dozen or two items, thus merging the roles of recommendation seeker and preference provider. These systems focus on algorithms for matching people based on their preferences and weighting the interests of people with similar tastes to produce a recommendation for the information seeker” [3]. For example, consider a recommender system where the items are movies and the users rank them in a scale from 1 to 5. The recommender system would then compute a ranking for a movie that you have not yet ranked by taking into consideration the rankings given to that movie by other users and how similar your tastes are to the tastes of those users. Then, if the ranking is above certain threshold, the system would recommend that movie to you [12].
In our case, we are going to think of the rows in an image as the movies in the example above, and its columns as the users. Then our recommender collaborative filtering algorithm will provide a ranking for a missing pixel, which in this case will translate into its suggested value.
But why should a system like this work? Imagine that two people with very similar tastes in wine decide to rank 50 wines on a predetermined scale. Now suppose that the first person has scored all 50 wines and that the second one has done so for the first 49 wines in the list. If their rankings are very similar for the first 49 wines, we would expect that the second person would give a very similar ranking to the ranking that the first person gave to the 50th bottle of wine. That is intuitively why we expect the collaborative filtering algorithm to work.
2.1 Collaborative filtering cost function
Consider a grayscale image of size with integer entries , where is the image’s bit depth. Typically . In this case is what we called the dynamic range of the image in section 1. Consider a matrix also of size but with entries , where if we know the pixel value at position to be correct and 0 otherwise, i.e., the pixel value is missing or has been labeled as noise by our impulse noise detector. We will call such matrix a mask for .
Now imagine that for each row of
we associate a vector
, where is a fixed number of features that we are going to associate with that row. Similarly we associate a vector to column of . Hence we will have vectors and vectors associated with our image . The collaborative filtering algorithm then considers these two sets of vectors and will model the prediction of the pixel value at position by computing , i.e., the inner product of and . In other words, if we form the matrices(2) 
then our prediction (recommendation) matrix is
(3) 
Define the collaborative filtering cost function for as
(4) 
where implicitly we have provided a mask for as well.
To link matrices and meaningfully to , we require that they solve the minimization problem
(5) 
To see why this requirement is meaningful, let be a mask for whose entries are all ones and assume that , then it is easy to see that for all . That is, and offer a perfect reconstruction (factorization) of image . In other words .
2.2 Regularization and normalization
In section 2.1 we defined the collaborative cost function for an image that measures how good two matrices and approximate the known good (non noisy) values of the pixels of when the approximation is given by .
Two remarks are in order. Assume that exactly approximate the pixel value at position in image . In that case, . Also assume that and write the inner product of these two vectors as , where we focus our attention on the product of their last corresponding th vector components. Then,
(6) 
Without loss of generality, assume that . Then
(7) 
Observe that the real function of real variable has the following properties:

and .

has two minima. One at and another one at .
The first remark then is that without any further restrictions, we can find an infinite number of vector pairs for which their combined norm is arbitrarily large and for which ; and that there are at most two vector pairs with minimal combined norm satisfying . This prompts us to regularize the cost function defined in equation (4) and propose instead the regularized collaborative filtering cost function for as
(8) 
where is the regularization factor. Consequently, the minimization problem (5) is equivalently transformed into the regularized minimization problem for
(9) 
The introduction of regularization reduces the search space for a solution of problem (9), and also addresses the issue of model overfitting, of which we will talk in section 5.
The second remark comes from the following observation. Assume that image is not trivially zero and that a whole column of pixels is tagged as noise. If we set , a solution to minimization problem (9) will necessarily return a matrix for which , the zero vector. Then for any row , the recommender system will return a value of as the suggested value for the pixel at position , given that in this case. This would be a very poor recommendation since at least there is a row for which the average value of the noise free pixels in that row is not zero, and assigning that average to the missing pixel would be a more meaningful suggestion than zero. To have an intuition as to why that is the case, think back to the scenario where users rank movies. For a new user that hasn’t ranked any movies yet and wants to know what other users think of a particular movie, the recommender system would do a better job if it offered the average recommendation that other users have given to that movie than offering a ranking value of zero.
One way to address this problem would be to normalize the original image by subtracting from each row the average value of the noisefree pixel values in that row, and then solve problem (9) for this new matrix. For a noisy pixel at position we would then recommend the value obtained by computing the corresponding inner product and adding to it the average value aforementioned. We describe this procedure in detail in the following section.
3 Experimental setup
As mentioned in section 1, we use the photographs in the TID2008 image database [14] to conduct our experiments. This database consists of 25 reference images, 17 types of distortions for each reference image, 4 different levels of each type of distortion. From these, we work with the images affected with randomvalued impulse noise. The generative model for this distortion is described by equation (1). The 4 levels of distortion correspond to a noise ratio of about and for levels 1 through 4, respectively, with .
Each color image in our database is a 24bit depth color image of size pixels which we manipulate into a single matrix of size where each of the 8bit red, green, and blue color channels are concatenated column wise, see figure 1 for examples. From this point onward, when we mention a color image we will think of it this way, unless we say otherwise, i.e., as a single matrix where all three color channels have been combined column wise.
With this treatment for color images in place, given a noisy image in our dataset, like the one depicted in figure 1(b), the experiment consists in following the steps described in the last paragraph of section 2. In particular, we have to first obtain a mask for that identifies the pixels that have been corrupted with randomvalued impulse noise. Here is where we can use, for example, the ROLD statistic [5] to mark pixels that are to be considered impulse noise.
The mask , with and in this case, is a matrix such that if the pixel at position is considered noisefree, and zero otherwise, see figure 2.
In our case we count with the noisefree reference image , see figure 1(a), and this means that we can obtain a perfect mask for in the sense that it will identify correctly all pixels that are noise, and only those. This is not necessarily the case when we don’t have a reference image that represents the grown truth. The relationships between the noisefree image , its related noisy image , the mask for (with respect to ), and the noise matrix is given by the following equation
(10) 
where is the Hadamard product of matrices (entrywise multiplication), and is the negation operator that changes a 1 for a 0 and vice versa. Note that the noise matrix is assumed to satisfy the conditions of equation (1), and it can be made unique if we impose the extra condition that it have the least number of nonzero entries.
Next we compute , the normalized version of the noisy image , as described in the last paragraph of section 2, namely, let be the vector with entries , then define to be the matrix with entries if , and 0 otherwise. We now proceed to obtain a solution to problem (9) for the normalized color image . For this, we have to select a regularization factor as well. We chose and to compare nonregularized and regularized solutions.
Since this is a minimization problem, a method such as gradient descent or any flavor of conjugate gradient can be used. Either of these two approaches requires choosing an initial value for matrices and , see equation (2), from which the minimization algorithm then iterates to pursue a minima of the regularized collaborative filtering cost function , see equation (2.2
). We chose both matrices to have entries taken from the normal distribution with zero mean and variance one,
, as initial values. The choice of the number of features will be discussed in section 5. For our experiments we selected .With this experimental setup, let be a solution pair for the regularized minimization problem for . We then propose image
(11) 
as the reconstruction from the noisy image , from mask for , and regularization factor , where is the vector with all ones, and in this case. Recall that and hence . Basically, if is a perfect mask, as is the case in our experiments, we have that , and comparing equations (10) and (11) gives that we are substituting the noise matrix with the sum of the prediction matrix , see equation (3), with the normalization term , to restore image from its noisy version . Observe that, by design, is small, which justifies why we would make that substitution of . We can therefore call the matrix the denoising matrix.
4 Results
In this section we present the results that we obtain when we follow the experimental protocol, described in section 3, for all of the reference photographs in the TID2008 image database [14] that are affected with each of the four impulse noise levels available therein. The noise levels are controlled by the noise ratio parameter , see equation (1), which has in this case, on average, the values of 0.0085, 0.0169, 0.0339, and 0.0678.
To measure the quality of the reconstructions we obtain, we utilize the peak signaltonoise ratio (PSNR) [2, 17], measured in decibels (dB), and the mean structural similarity index (MSSIM), a number from 0 to 1, where 1 corresponds to two identical images. This image quality statistic is designed to correlate with the way humans perceive image quality [18].
The first set of results we present can be seen in figure 3. For each image in the TID2008 database affected with impulse noise we plot its PSNR and MSSIM image quality metrics. Then we process each image applying to it the experimental protocol described in section 3, and plot on the same figure both image quality metrics for the reconstruction. We can observe that for all images and all four levels of noise ratio the improvements in quality are very satisfactory, especially as measured by MSSIM. The choice of the number of features and regularization factor in our experiments is justified in section 5.
The next set of results that we present pertain to regularization, see sections 2.2 and 5 for definitions and justification of its use. Figures 4 and 5 show the effects of regularization on PSNR and MSSIM for number of features and a regularization factor of . The choices of these two numbers is discussed in section 5.
The results confirm that regularizing the solutions is a good idea that yields better reconstruction results, if marginally in some instances, but dramatic in others. The figures show the results for all 25 images contained in the TID2008 database.
Finally, we show in figure 6 a selection of images affected by randomvalued impulse noise for a noise ratio of , their reconstruction with our methodology, and the originals from which the noisy versions were generated for comparison purposes.
5 Discussion
5.1 Bias vs variance tradeoff
When designing or choosing a model to represent a data set, we are interested in quantifying to what degree the model appropriately captures its essence. To help us quantify the goodness of a model we will show how the concepts of bias and variance play a role.
Suppose that we conduct the following experiment. Given a set we associate to each a sample value , forming a set of observations , which we call the training set. Assume that , where
is a random variable with zero mean and variance
, which we call noise, and is an unknown deterministic function which we wish to determine. We typically try to do this by training a function over the training set that minimizes the mean square error of the approximation by to the sample values in it. Ideally, we would like to come up with a function that would also minimize the mean square error for sample values of other possible points for which we haven’t gotten any data yet, i.e., , so that we could make good predictions with our model . Essentially, we want to minimize the mean square error over , which we decompose as follows(12) 
where . Now suppose, as mentioned above, that we want to know what the expected test mean square error will be for a point . This expectation is the average of the square error for all possible functions derived from all possible training sets when we perform the experiment described above repeatedly. This identifies to a random variable.
In the finite case, let be the th training set that is obtained after performing the experiment times. Assume that we have training sets, then
(13) 
since because is deterministic, i.e., a regular function. From this,
(14) 
Similarly, we can prove that , and together with equations (13) and (14), equation (5.1) becomes
(15) 
with
(16) 
and
(17) 
The overall expected test mean square error can be computed by averaging over all possible values of in the test set [10].
Equation (15) says that in order to minimize the expected test mean squared error, we must find a function that simultaneously yields low variance and low bias. Also note that since and , this error will always be at least as large as the variance of the noise component in our model.
The variance speaks to how much would change if we changed our training data set, and bias refers to the error introduced by approximating a reallife problem by a much simpler model. Often these two quantities are mutually exclusive and therefore we speak of the biasvariance tradeoff [10].
In our recommender system model, the biasvariance tradeoff is controlled by the number of features , and the regularization factor . In figure 7 we show the error when reconstructing image I23 from noisy image I23_06_4 for values of equal to 1, 8, and 352 using a variety of values of between 0 and 80. How to determine a good value of ? And how about ?
5.2 How to choose and without a ground truth reference
For an grayscale image we need values to describe each pixel when no compression is used. For our recommender system model, we need to find two matrices and to propose a recommendation. If and , see equation (2), then we need numbers to represent those matrices. It is therefore reasonable to assume that we would need a value of such that is at least as big as to represent the image with fidelity. We consequently try to find the smallest such that . Let and , then
hence if is the smallest natural number such that , we would have at least as many numbers necessary to define our recommender system as there are numbers describing the image that we are trying to reconstruct. In our case, since all the images in our database are color images, we have that and , see section 3. This gives a minimum for a system that, with the criterion outlined above, will have enough variance to capture any of the pictures in our dataset.
On the other extreme, since when , if we set we should also have enough features in our recommender system to capture the images in the database with fidelity. In our case . Finally, through exploration, see figure 8, we find that is the number of features that maximizes the PSNR in our experiments with image I23 and noisy image I23_06_4 for a regularization factor of , and maximizes MSSIM under the same circumstances.
To explore the impact of the regularization factor we performed a numerical exploration for a range of values of between 0 and 80. When reconstructing image I23 from image I23_06_4, we found that, on average, for with we obtain the largest PSNR, and for with we get the smallest error as measured by . For both image quality measures are not as good for their respective optimal values of . The experiments are summarized in figure 9. They reflect the tradeoff between bias and variance that we talked about since the parameter controls, to a degree, for the variance in the model. For the experiments with all other images in our database we chose a regularization factor and we set for the number of features. We want to emphasize that we do not expect theses choices to produce optimal results for all images and all noise ratios , but our experiments show that they produced very satisfactory results, as discussed in section 4.
It is important to notice that for a given number of features we were able to find by exploration the optimal value for the regularization factor because we counted with both the original and distorted versions of the particular image we chose to fine tune these parameters. But what if we didn’t have that luxury, i.e., what if we only counted with the distorted version of the image to figure out a reasonable value for , as is the real world scenario? We propose in that case to use the Lcurve method [8, 9] to select the value of . In figure 10(a) we plot the norm of the solution as given by,
(18) 
versus the norm of the residue computed as,
(19) 
both averaged from ten different random seed reconstructions of image I23 from I23_06_4.
We observe that the value of the regularization factor at the “elbow” in the Lshaped curve is very close to the value of for which we experimentally obtain the best reconstruction results, see table 1. That is, in the case of number of features, the Lshaped curve has an elbow at ; whereas for , the elbow appears at . These values of are best seen when we plot the sum of the average residue and the average solution norms, and they correspond to the minima of what we call the curves in figure 10(b). What is remarkable of this approach is that we can find a very good approximation to the empirically obtained optimal value of for a given via the L or curves without knowledge of the ground truth.
Method  PSNR (dB)  

Empirical  352  11  45.69  
380  13  0.001928  
Lcurve  352  9  45.67  
380  10  0.002028  
curve  352  9  45.67  
380  10  0.002028 
5.3 Image decomposition and error: many choices
Another topic to address is the image decomposition that we have proposed. In particular, the additive term on the right hand side in the representation
(20) 
In our case we set , with and , where is the vector with entries , and is the vector with all ones, see equation (11). Matrix is the term that provides normalization in our recommender system, see section 2.2. This term averages the available pixel values in each row and proposes that average for any missing pixel in that row as a starting point, which is then further enhanced by the contribution coming from for that particular pixel. This approach has the caveat that, in images, the value of a pixel may be more likely to be correlated with the values of nearby pixels, than with those that are farther away. This can be corrected by proposing different ways to build the normalizing matrix .
One approach could be to assign a weight to the value of a contributing pixel that takes into account its distance to the reference pixel, for example; or to just average the pixel values of the pixels in the same class that it belongs to, which could be defined in a variety of ways, such as belonging in the neighborhood of a means partition of the image, as another example, see figure 11.
Finally, we want to mention how to address the slight asymmetry on how the rows and columns of the image are treated. We can transpose the image, process it in the same way the original image was processed, revert to the original orientation, and finally add and average both representations. We do this for our test image I23 and its noisy representation I23_06_4 and obtain that, over five different random initializations, we obtain an average improvement of dB in PSNR, and units in MSSIM.
6 Conclusions and future work
We have proposed a collaborative filtering recommender system to restore images with impulse noise. This system assumes that we know the location of the pixels that have been corrupted by impulse noise and it uses this knowledge to build a recommendation for the values that those pixels should have. In the process we have proposed an image decomposition that consists in the product of two matrices plus the sum of a third one.
To test and calibrate this collaborative recommender system, we have used a well known image database. There are two parameters—, the number of features; and , the regularization factor—which we have explored and optimized for our particular image data set to obtain the best image quality. We have measured the performance of our algorithm using two well known image quality metrics, PSNR and MSSIM. Our experiments show that this system performs very well at reconstructing the original images from their noisy counterparts provided we give a complete list of the distorted pixels in each of the originals.
We have provided a rationale on how to choose the parameters aforementioned, but this is an area of further exploration. Also, we have pointed out that the image decomposition proposed is not unique and we have hinted at other possible decompositions that fit our recommender system. Exploring these could lead to new insights, improvements, and applications of our algorithm.
As described above, the bulk of our work focused on reconstructing images with isolated pixels deemed to be noisy, or missing. What happens when we consider missing segments of an image? Figures 12(a), 12(b), and 12(c) show the performance of our system when presented with an image that has been excised of a square portion of its pixels. The reconstruction obtained compares remarkably well with the original and goes from a 29.21 dB mutilated image to a reconstruction at 41.81 dB. Figures 12(d) and 12(e) showcase an image whose original is not available that has been defaced with a grid of dots and for which our recommender system has provided a reconstruction that could be the original undisturbed image by visual inspection. These examples fall under the purview of the general techniques known as inpainting [1, 16], and although the reconstructions shown are far from the current stateoftheart in inpainting, the fact that our algorithm recovers the geometry and texture simultaneously without having been designed implicitly to solve this holy grail of inpainting makes exploring this application of our algorithm worthy of pursuit. One can envision as well applications of our work to superresolution.
I would like to thank the support of the leadership at the Institute for Physical Science and Technology, without which this work would not have been possible. I also want to personally thank John J. Benedetto, Director of the Norbert Wiener Center in the Mathematics Department at the University of Maryland, College Park; and Mike Dellomo, Associate Director and Advisor of the Masters in Telecommunications Program in the Department of Electrical and Computer Engineering also at the University of Maryland, College Park; for their feedback and valuable input in the writing of this manuscript. Finally, I want to acknowledge and thank the support of ARO Grants W911NF1510112 and W911NF1710014.
References
 [1] Marcelo Bertalmío, Vicent Caselles, Simon Masnou, and Guillermo Sapiro. Inpainting. In Computer Vision, A Reference Guide, pages 401–416. Springer, 2014.
 [2] Al Bovik, editor. Handbook of Image and Video Processing. Elsevier Academic Press, 2nd edition, 2005.
 [3] John M. Carroll. Humancomputer interaction in the new millennium. ACM Press, New York, N.Y., 2002.
 [4] R. H. Chan, C.W. Ho, and M. Nikolova. Saltandpepper noise removal by mediantype noise detectors and detailpreserving regularization. IEEE Transactions on Image Processing, 14(10):1479–1485, October 2005.
 [5] Yiqiu Dong, Raymond H. Chan, and Shufang Xu. A detection statistic for randomvalued impulse noise. IEEE Transactions on Image Processing, 16(4):1112–1120, April 2007.
 [6] R. Garnett, T. Huegerich, C. Chui, and W.J. He. A universal noise removal algorithm with an impulse detector. IEEE Transactions on Image Processing, 14(11):1747–1754, November 2005.
 [7] Rafael C. Gonzalez and Richard E. Woods. Digital image processing. Prentice Hall, Upper Saddle River, N.J., 2nd edition, 2002.
 [8] Per Christian Hansen. Analysis of discrete illposed problems by means of the Lcurve. SIAM Review, 34(4):561–580, December 1992.
 [9] Per Christian Hansen and Dianne Prost O’Leary. The use of the Lcurve in the regularization of discrete illposed problems. SIAM Journal on Scientific Computing, 14(6):1487–1503, November 1993.
 [10] Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An Introduction to Statistical Learning with Applications in R. Springer, 6th (2015) edition, 2013. ISBN 9781461471370.
 [11] J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pages 281–297, Berkeley, California, 1967. University of California Press.
 [12] Andrew Ng. Lecture 98  Collaborative Filtering. https://www.coursera.org/learn/machinelearning/lecture/2WoBV/collaborativefiltering, November 2016. Coursera.
 [13] Konstantinos N. Plataniotis and Anastasios N. Venetsanopoulos. Color Image Processing and Applications. Springer Verlag, 2000. ISBN 3540669531.
 [14] N. Ponomarenko, V. Lukin, A. Zelensky, K. Egiazarian, M. Carli, and F. Battisti. TID2008  A database for evaluation of fullreference visual quality assessment metrics. Advances of Modern Radioelectronics, 10:30–45, 2009.
 [15] Francesco Ricci, Lior Rokach, and Bracha Shapira. Introduction to Recommender Systems Handbook, pages 1–35. Springer US, Boston, MA, 2011.
 [16] Jianhong (Jackie) Shen. Inpainting and the fundamental problem of image processing. SIAM News, 36(5), June 2003.
 [17] D. S. Taubman and Michael W. Marcellin. JPEG 2000: Image Compression Fundamentals, Standards and Practice. Kluwer Academic Publishers, Norwell, MA, second edition, 2002.
 [18] Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. Image quality assessment: From error measurement to structural similarity. IEEE Transactions on Image Processing, 13(1):1–14, 2004.
 [19] Chad Wys. The New York Times Magazine, October 2016. Source image from the Getty’s Open Content Program. “Portrait of a Woman,” by Jacob Adriaensz Baker.