I Introduction
Scanning tunneling microscopy (STM) has been considered the most important technique to image, control and monitorize molecular systems with atomic resolution under a wide range of experimental conditions that enable molecule visualization. Typically, the STM image is composed of a bidimensional array of current values obtained when a metallic tip is very close to a conductive substrate creating a tunnel effect by applying a voltage between both conductive media or electrodes. The image resolution depends on several factors, such as the nature of electrodes, conductivity of the molecular system, packing order of molecules, and temperature. The quality of STM images is, however, limited by various degradation mechanisms including noise (thermal and Poissonian due to the electronic counting process), drifts, and dropouts. These degradation mechanisms reduce considerably the interpretability of STM images, calling for image denoising and restoration techniques to mitigate the effect of those degradations mechanisms.
There are several software suites that perform image denoise of STM images [19, 11]. Most techniques implement denoising in the twodimensional frequency domain by exploiting the fact that the STM images are quasiperiodicity and thus have quasidiscrete and quasiperiodic patterns in the frequency domain. However, the degradation mechanisms mentioned above defocus the frequency patterns, compromising the success of the frequency based denoising techniques.
Image denoising is one of the oldest image restoration problems, which continues to be object of intense research. As in many signal and image areas, the success of image denoising depends crucially on the existence of a good model to represent the original image. In this paper, we attack STM denoising by reformulating the true estimation problem as a sparse regression, often termed sparse coding. Following the standard procedure in patchbased image restoration, the image is partitioned into small overlapping square patches and the vector corresponding to each patch is modeled as a sparse linear combination of vectors, termed atoms, taken from a set called dictionary. Aiming at optimal sparse representations, and thus at optimal noise removing capabilities, the dictionary is learned from the data it represents via matrix factorization with sparsity constraints on the code (i.e., the regression coefficients) enforced by the
norm [6, 16].The sparsity of a representation on a dictionary is strongly linked with the level of selfsimilarity of the images under study. It happens that STM images have a high level of selfsimilarity and, therefore, they are natural candidates to be denoised via sparse regression on learned dictionaries. Compared with the standard approaches, STM images pose an additional challenge: they have plenty of artifacts, mainly dropouts, which appear in a structured way as consecutive line segments on the scanning direction. To cope we these artifacts, we modify the algorithm introduced in [6, 16]; the modifications consists in treating the artifacts as missing data in both the dictionary learning and denoising (i.e., coding) phases.
To assess the quality of the proposed approach, we applied the algorithm to STM images of porphyrins and bipyridines adsorbed on highly oriented pirolitic graphite. The Matlab code of the algorithm and a Gwyddion module can be obtained from http://www.lx.it.pt/jpaos/stm/.
Ii Sparse Image representation
Signals and images can be represented in different ways. Several methods include Fourier Transform methods
[24], wavelets methods [17], curvelets methods [23], just to mention a few. Despite the underlying differences, each representation has its own advantages when used in some of the classical, and still challenging, image restoration problems [12, 3].Recently, sparse linear decomposition of signals in learned dictionaries led to stateoftheart methods. The idea is to represent a signal, or fragments of the signal denoted patches, as a linear combination of vectors, termed atoms, taken from a set called dictionary. The dictionary can be fixed, for example based on wavelets [17], or can be learned [1].
The topic of sparse and redundant representations of real world images is currently one of the most active research fields. Recently, there has been an increased interest due to the fact that dictionaries yielding sparse representations may be learned directly from the data they represent [6, 16].
The fact that the real world images (and signals) admit sparse representations in suitable dictionaries is a consequence of the high level of selfsimilarity of real world images (and signals), i.e., given an image patch, there is a high likelihood of finding similar patches at different locations and scales. This is also true in the case of STM images: by nature, an STM image is almost a periodic image, and thus there exist similar patches with high probability.
The proposed algorithm works as follows: (i) the noisy STM image is decomposed in smaller patches; (ii) a dictionary is learned from the data; (iii) the patches are sparsely coding using just a few atoms from the learned dictionary; and (iv) the denoised/restored image is obtained by composition of the coded patches. In the following sections we introduce notation and describe in detail the constituent parts of the proposed sparseland framework to restore STM images.
Iia Patch decomposition and composition of an image
Sensor measurements are usually corrupted by noise. STM images are not an exception, and the observation mechanism is well approximated by the additive model
where is the noisy image, the true image, and
models noise and model errors. The noise is originated by different sources, such as the tip movements, thermal noise, Poissonian couting noise, or even from the fact that different molecules have different conductivities. Given the relatively number of independent causes, we assume that the noise is zeromean independent Gaussian with an unknown variance
. The goal is to obtain an estimate of given .Unlike other denoise algorithms, the proposed sparseland framework processes the image in small patches rather than the whole image. Consider the noisy image , where is the image size, and the patch containing the pixels located inside a square window of size centered at the th pixel. The total number of overlapping patches that exist in image is . Let and denote two vectors holding, respectively, the elements and of corresponding to the th patch. We have then
In the sparseland framework, patches are processed individually, and thus for each we obtain an estimate . Given that the reconstruction is not exact, we can write
where is the estimation error for the th patch. To produce the overall estimate of from the estimates , for , we combine the information of several patches as follows. We introduce selection matrices such that . Note that each row of contains just one nonnull element of value 1. Consider also the following matrix and vectors:
With the above notation, the estimate of each patch is given by
and the overall estimate can be obtained by
(1) 
where . Noting that places the path number at its position in the image and that is a diagonal matrix whose th diagonal element holds the number of times pixel appears in any patch, the estimate for the th pixel is the average of all its estimates, one per patch containing thereof. This condition implies that is invertible as far as any pixel belongs at least to one patch.
The estimate given by (1
) is the best linear unbiased estimator for
[13], as long as we assume thatis zeromean and independent and identicaly distributed (iid), and thus having a covariance matrix proportional to the identity matrix, i.e.,
. However, due to the overlapping structure of the patches it is very difficult to accurately compute . We opted to use the suboptimal estimate (1) that, nevertheless, produces stateoftheart results.IiB Dictionary Learning
Given the periodic characteristic of a STM image, it is very likely that it admits a sparse representation in fixed existing dictionaries (i.e., representations). Wavelets and Fourier bases are two paradigmatic examples. However, the use of learned dictionaries from the data they represent holds systematically the stateoftheart [6, 15]. The goal of dictionary learning is to find a dictionary such that the patches can be accurately represented with just a few number of elements. One way to formulate this idea is solving the following optimization problem:
(2) 
where , . The quadratic terms account for the representation errors and the norm is used to promote sparse codes. The relative weight between the two terms is established by the regularization parameter . The constraint is just a technical condition to prevent from being arbitrarily large.
Although the focus of this paper is not on the optimization of (2), it is worth mentioning just a few words about it. A common approach to handle these problems [14, 1, 20, 22] is to alternate the optimization with respect to and . The optimization with respect to is a quadratic problem with convex constraints and optimization with respect to is also convex and decoupled (we can solve individually for each variable). However, the optimization with respect to all variables is nonconvex. Although we are not guaranteed to converge to the global minimum, the obtained stationary points have shown to produce stateoftheart results.
IiC Sparse Coding
Once we have learned a dictionary with respect to which the patches of admit a sparse representation, we proceed and compute the estimate of a given patch , for . Following the standard formulation in synthesis based approaches to sparse regression, the patch estimate is the solution of the constrained optimization
(3) 
where is the observed patch corresponding to the true patch , and is the number of nonzero elements of vector , and is a parameter controlling the reconstruction error.
When and the dictionary is full column range, the solution of the optimization problem (3), if it exists, is easy to compute. However, in the majority of the situations, this is not the case because the dictionaries are often overcomplete and, due to noise, . Under these circumstances, the problem is NPhard [18]. Two different approaches have been followed. One consists in replacing by the norm, which results in the socalled least absolute shrinkage and selection operator (LASSO) [25], which is equivalent to the basis pursuit denoising (BPDN) [4]. The other consists in attacking directly the original problem using greedy algorithms such as the orthogonal basis pursuit (OMP) [21], iterative hard thresholding (IHT) [2], hard thresholding pursuit [9], or approximate message passing (MP) [26].
Iii Noise and Artifact removal
Assuming that the noise of each patch is zeromean and has covariance , it is possible to show [10] that the relative attenuation of noise is given by
(4) 
This means that the estimation error is proportional to the sparsity level of the signal. However, in practice, the ratio (4) is hardly achieved. One can also understand the noise reduction from a geometric point of view. The sparse coding of the patches, given by optimization problem (3), can be viewed as a projection of the noisy patch onto the subspace spanned by the active columns of in that patch, corresponding to the nonzero components of . Given that the number of active actoms is much samller than the , the noise is largely reduced.
The sparseland framework works well for STM images because the images are almost periodic. This makes it easier to learn the dictionary directly from the data. Besides the presence of noise, STM images usually have plenty of artifacts. However, they do not appear isolated, such as impulsive noise, but rather in a structured way. Typically, several consecutive pixels along a scanning line may be corrupted. Figure 1 shows an example of a STM image with artifacts that hampers the observation of darker spots. The image represents a polymorphic monolayer with zinc(II)octaethylporphyrin (ZnOEP) molecules organized as dimerlike units with different brightness aligned along parallel rows and absorbed on highly oriented pyrolitic graphite (HOPG) by stacking interaction [8]. The difference in brightness of the two ZnOEP units is due to their different coupling to the HOPG where the darker molecules have lower signal and for this reason is hard to distinguish with high detail the central part of porphyrines and their axial groups.
In order to proceed, special care must be taken to eliminate these artifacts, otherwise the learned dictionary ends up with patches that can represent well these outliers. To overcome these problems, we change the algorithm by introducing an extra vector that masks out these pixels. This approach has several advantages. It makes it easy to iteratively improve on current results, either by adding or removing pixels from the mask (considered outliers). It also allows us to easily fill in the outliers pixels (inpainting) with new values. As opposed to a classical local filtering method, the new inpainting values can take into account the full image statistics, and thus attain superior reconstruction quality.
Iiia Proposed algorithm
The proposed algorithm changes the previously introduced sparseland framework to accomplish two objectives: (i) deal with the structured outliers; and (ii) do inpainting of those pixels. We start by defining a binary vector of size , where 0 indicates that the corresponding pixel in is considered an outlier. The purpose of this vector is to mask out the outliers. Let for denote a patch of . Note that indicate the outlier’s positions in the patches . Figure 2 shows the STM image of Figure 1 with the mask overimposed. The pixels considered outliers are represented in red.
The dictionary learning algorithm is the same as described in section IIB. To avoid learning the outliers structures, instead of considering all the patches, we use to discard those that contain outliers. The sparse coding algorithm is also the same as described in section IIC. However, in this case, we do not discard any patch; we code the patches ignoring only the pixels that are considered outliers by solving the optimization
(5) 
where the symbol represents elementwise multiplication. Note that the effect of and is to project the columns of and onto a lower dimension subspace. The corresponding change in the OMP algorithm is simply an elementwise multiplication by .
Finally, once we have a sparse representation for all the patches, the image composition is the pixelwise average of all the patches covering a pixel, marked or not as an outlier. The inpainting values obtained outperform those that result from substituting the outliers by a local filtering. In section IV we compare both approaches. A Matlab version and a Gwyddion[19] module of the code are available at http://www.lx.it.pt/jpaos/stm/.
IiiB Preprocessing and artifact selection
Before we apply the proposed algorithm, raw STM images are preprocessed and artifacts are identified, i.e., matrix is defined. We start by applying the following preprocessing steps:

mean level subtraction: a plane is adjusted to all the data points and subtracted from the data;

outlier removal: pixel data that is below 1% or above 99% of the quantile are removed;

leveling by median subtraction: lines are adjusted to match height median.
Following this preprocessing steps, artifacts are identified by inspection and marked in matrix . Remark that common image processing software suites have tools designed to obtain automatically. As an example, Gwyddion [19] implements an algorithm denoted Mark Scars
, where several parameters for outlier detection can be specified such as minimum/maximum length and width.
For better results, the process can be iterated: start with an initial mask, apply the algorithm, improve on the mask from restored image and repeat the process.
(a) noisy  (b)  (c)  (d)  (e) 
(f)  (g)  (h)  (i)  (j) 
IiiC Parameter selection
There are several parameters that affect the behavior of the overall algorithm. In this section we discuss how these parameters can be tuned in order to obtain the best possible outcome. The algorithm is made up of two parts: dictionary learning and sparse coding.
The first parameter that we need to define is the patch size, that is common for both parts. Patch size influences the amount of noise that is possible to remove. Bigger patch sizes results in better noise reduction. However, the patch size influences how data can be represented by the dictionary. Smaller patch sizes means that the dictionary adapts better to data, including details and noise, while larger patch sizes capture the similarities and periodic components.
From the exhaustive experiments, we concluded that the patch size should be at least equal to the smallest interest object that can be represented by pixels. Figure 3 (a) shows an incomplete monolayer with 4,4’bipyridine (BP) absorbed on top of a full packed monolayer of ZnOEP through a ZnN bond. Each pair of ZnOEP on BP forms a vertical molecular wire absorbed on a HOPG substrate [7]. The brighter regions are attributed to the BP molecules bonded to Zn alternated with darker domains that correspond to unreacted ZnOEP molecules. The corresponding molecular schema is depicted in Figure 4.
Figures 3 (b)–(k) show the results obtained by the proposed algorithm by varying the patch size. As it can be seen, smaller patch sizes adapt better to the noise image, and thus attain smaller noise reduction. On the other hand, larger patch sizes result in an almost periodic image, where important details are lost. For this particular image — Figure 3 —, the best results are obtained when the patch size is around . In all experiments the regularization parameter was set at , and the dictionary consisted of 64 atoms.
For the sparse coding algorithm (5), two parameters need to be set: the maximum number of atoms (MNA) and the error parameter . These parameters determine the reconstruction error of patch : although 1 or 2 patches aren’t enough to correctly represent data, a larger number gives more flexibility and adaptability, and thus less noise reduction. The implementation of algorithm [10] searches for the sparsest number of atoms that attain an error less than , with at most the MNA. Note that the MNA is a hard threshold, i.e., each patch is not reconstructed with more atoms than that.
Iv Results
In this section we assess the merit of the proposed algorithm applied on the STM image of polimorphic phase of a ZnOEP monolayer absorbed on HOPG, previously depicted in Figure 1. We selected for the patch size , the maximum number of atoms was set at 4 and . The mask used is depicted in Figure 2.
To compare the inpainting results, we also run a version of the proposed algorithm without the mask. Close ups of the noisy image, mask and restored images are depicted in Figure 5.
(a)  (b) 
(c)  (d) 
(a) 
(b) 
In Figure 5 (c), the restored image (denoised) shows clearly the details of darker ZnOEP where it is possible to observe the ZnOEP central part surrounded by 4 protrusions that we attribute to the ethyl groups that are oriented towards the HOPG as is represented by molecular schema of Figure 6.
When the proposed algorithm is used without a mask, Figure 5 (d), the structured outliers are only partially removed. This happens because the dictionary learning algorithm actually learns these structures. Thus, there will be atoms available in
such that a patch with outliers still admits a sparse representation. However, when a mask is used, the algorithm removes these outliers and fill in the pixels with new values. Instead of a local median or a more sophisticated interpolation algorithm, by relying on the learned dictionary, the filled in values are in accordance to the global image statistics.
V Conclusion
We presented a denoising algorithm, formulated as sparse regression problem, to mitigate the effect of the degradation mechanisms observed in STM images. The proposed formulation was introduced based on the fact that STM images present a high level of selfsimilarity, so that a suitable dictionary could be learned from the noisy image. The algorithm also cope with the existence of artifacts, mainly dropouts, by treating them as missing data. The results obtained outperform those algorithms that substitute the outliers by a local filtering, as the filled in values are now in accordance to the global image statistics.
acknowledgement
The authors thank FCT–Portugal for financial support under the project PEst–OE/EEI/LA0008/2013 and PostDoc grant to QF.
References
 [1] M. Aharon, M. Elad, and A. Bruckstein, “Ksvd: An algorithm for designing overcomplete dictionaries for sparse representation,” Signal Processing, IEEE Transactions on, vol. 54, no. 11, pp. 4311–4322, 2006.
 [2] T. Blumensath and M. E. Davies, “Iterative hard thresholding for compressed sensing,” Appl. Comp. Harm. Anal.
 [3] A. Bovik, Ed., Handbook of Image & Video Processing. Academic Press, 2000.
 [4] S. S. Chen, D. L. Donoho, Michael, and A. Saunders, “Atomic decomposition by basis pursuit,” SIAM Journal on Scientific Computing, vol. 20, pp. 33–61, 1998.
 [5] D. L. Donoho and J. M. Johnstone, “Ideal spatial adaptation by wavelet shrinkage,” Biometrika, vol. 81, no. 3, pp. 425–455, 1994.
 [6] M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” Image Processing, IEEE Transactions on, vol. 15, no. 12, pp. 3736–3745, Dec 2006.
 [7] Q. Ferreira, A. Bragança, L. Alcácer, and J. Morgado, “Conductance of welldefined porphyrin selfassembled molecular wires up to 14 nm in length,” Journal of Physical Chemistry C, vol. 118, no. 3, pp. 7229–7234, March 2014.
 [8] Q. Ferreira, A. Bragança, O. N. O. Oliveira, N. M. Moura, M. Faustino, L. Alcácer, and J. Morgado, “Dynamics of porphyrin adsorption on highly oriented pyrolytic graphite monitored by scanning tunnelling microscopy at the liquid/solid interface,” Applied Surface Science, vol. 273, no. n.a., pp. 220–225, May 2013.
 [9] S. Foucart, “Hard thresholding pursuit: an algorithm for compressive sensing,” SIAM Journal on Numerical Analysis, vol. 49, no. 6, pp. 2543–2563, 2011.
 [10] H. Hongxing, J. M. BioucasDias, and V. Katkovnik, “Interferometric phase estimation via sparse coding in the complex domain,” IEEE Transactions on Geoscience and Remote Sensing, Submitted 2013.
 [11] I. Horcas, R. Fernández, J. M. GómezRodríguez, J. Colchero, J. GómezHerrero, and A. M. Baro, “Wsxm: A software for scanning probe microscopy and a tool for nanotechnology,” Review of Scientific Instruments, vol. 78, no. 1, pp. –, 2007. [Online]. Available: http://scitation.aip.org/content/aip/journal/rsi/78/1/10.1063/1.2432410
 [12] A. Jain, Fundamentals of Digital Image Processing. Prentice Hall, 1989.
 [13] S. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice Hall, 1993, vol. 1.
 [14] H. Lee, A. Battle, R. Raina, and A. Y. Ng, “Efficient sparse coding algorithms,” in In NIPS. NIPS, 2007, pp. 801–808.

[15]
J. Mairal, F. Bach, J. Ponce, and G. Sapiro, “Online dictionary learning for
sparse coding,” in
International Conference on Machine Learning
, 2009.  [16] J. Mairal, M. Elad, and G. Sapiro, “Sparse representation for color image restoration,” Image Processing, IEEE Transactions on, vol. 17, no. 1, pp. 53–69, Jan 2008.
 [17] S. Mallat, A Wavelet Tour of Signal Processing. Academic Press Inc, 1999.
 [18] B. K. Natarajan, “Sparse approximate solutions to linear systems,” SIAM J. Comput., vol. 24, no. 2, pp. 227–234, Apr. 1995. [Online]. Available: http://dx.doi.org/10.1137/S0097539792240406
 [19] D. Nečas and P. Klapetek, “Gwyddion: an opensource software for SPM data analysis,” Central European Journal of Physics, vol. 10, pp. 181–188, 2012.
 [20] B. A. Olshausen and D. J. Fieldt, “Sparse coding with an overcomplete basis set: a strategy employed by v1,” Vision Research, vol. 37, pp. 3311–3325, 1997.
 [21] Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, “Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition,” in in Conference Record of The TwentySeventh Asilomar Conference on Signals, Systems and Computers, 1993, pp. 1–3.
 [22] A. Rakotomamonjy, “Applying alternating direction method of multipliers for constrained dictionary learning,” Neurocomputing, vol. 106, no. 0, pp. 126 – 136, 2013. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0925231212008612
 [23] J.L. Starck, E. Candes, and D. Donoho, “The curvelet transform for image denoising,” Image Processing, IEEE Transactions on, vol. 11, no. 6, pp. 670–684, Jun 2002.
 [24] E. Stein and G. Weiss, Introduction to Fourier Analysis on Euclidean Spaces. Princeton University Press, 1971.
 [25] R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society, Series B, vol. 58, pp. 267–288, 1994.

[26]
J. Vila and P. Schniter, “Expectationmaximization BernoulliGaussian approximate message passing,” in
in Proc. Asilomar Conf. Signals Syst. Comput, 2011.
Comments
There are no comments yet.