I Introduction
Image filtering is an effective way to improve the performance of many applications, such as rain removal [1], stereo matching [2, 3, 4], edge detection, and image editing [5, 6, 7, 8, 9, 10, 11, 12]. Since different types of images have different characteristics and different applications have different requirements, the filtering algorithms should be designed for each case properly. For example, depth images are mainly determined by the scene’s geometry, and typically have smooth regions with sharp boundaries. The boundaries should be preserved with high quality, as it will affect the quality of depth imagebased rendering (DIBR), view synthesis, and 3D video coding’s efficiency [13, 14, 15]. On the other hand, for natural images, if we want to remove the noise, we need to preserve both the image’s structure and textural information. If we want to apply image smoothing, we should remove the detailed textures, but keep the major structural information.
Bilateral filter is an important image filtering technique [16], which can remove image noises and preserve sharp boundaries. A fast bilateral filtering is developed in [17]. In [18], an optimally weighted bilateral filter is proposed, whose performance is competitive to the nonlocal means filter [19]. With selflearning based image decomposition for single image denoising, the undesirable patterns are automatically determined by the derived image components directly from the input image [20]. Anisotropic diffusion is another wellknown image denoising algorithm [21]. The relationship between anisotropic diffusion and robust statistics is analyzed in [22]. In [23]
, a new class of fractionalorder anisotropic diffusion equations is introduced for noise removal, where the discrete Fourier transform is used and an iterative scheme in the frequency domain is also given. A noise removal filter is built by an image activity detector based on the density of connected components
[24]. Latter, a set of textures and images is analyzed to determine the best measure of image activity and it has showed that image activity measure has powerful ability to capture the activities and differentiating between various images [25]. To preserve edges and fine details while effectively removing noise, both local gradient and variance are incorporated into the diffusion model [26, 27]. In [2, 3, 28, 29, 30], anisotropic diffusion is applied to 3D image processing fields. In [31], anisotropic diffusion is utilized as a preprocessing of DIBR to improve its quality.To remove severe artifacts in the compressed depth images, many methods have been explored to filter depth images so as to improve the quality of the synthesized virtual images. In [32], a trilateral filtering method is treated as an inloop filter to prevent depth coding artifacts. This method employs spatial domain filter, depth range domain filter, and color range domain filter. In [33], an adaptive depth truncation filter (ADTF) is presented to restore the sharp object boundaries of depth images. In [34], a candidatevaluebased depth boundary filtering (CVBF) is developed by selecting an appropriate candidate value to replace each unreliable pixel according to both spatial correlation and statistical characteristics. Recently, a twostage filtering (TSF) scheme is proposed in [35], using binary segmentationbased depth filtering and Markov Random Field (MRF). These methods greatly reduce the coding artifacts in the synthesized virtual images, but they often change depth images too much.
Image smoothing is another important technique for many applications. Generally, image smoothing can be classified into two classes: weighted filtering methods and optimizationbased methods. Weighted filtering is usually achieved by a weighting method within a window. For example, the guided image filter in
[8] is a fast and nonapproximate linear time algorithm. It has nothing to do with the kernel size and the intensity range. Another efficient method is the rolling guidance filtering [11], which is a fast iterative method based on bilateral filtering. For realtime tasks, a highquality edge preserving filtering is proposed in [9].Different from these weighted filtering methods, optimizationbased smoothing methods always face a nonconvex yet complex problem. In [12], both the static guidance and dynamic guidance are jointly leveraged to achieve robust guided image filtering, which is formulated as a nonconvex optimization problem. In [7], a multiscale image decomposition method is presented with weighted least square optimization framework to form edgepreserving smoothing operator. In [6], an L0 gradient minimization optimization framework is proposed, which globally controls how many nonzero gradients are kept in the filtered image. By taking advantage of the statistic diversity of gradient information between texture patches and structure patches, the Relative Total Variation (RTV) framework is proposed in [5]. In this method, the inherent variation and total variation are combined together to discriminate the structure from texture, and an optimization problem is formulated to extract the main structure of the image. Later, another efficient image smoothing approach is proposed based on region covariance [10]. Although these methods achieve excellent performances for structurepreserving smoothing, there are still some problems, such as inefficient texture removal and severe edge blurring after smoothing.
In this paper, the clipped and normalized local variance or standard deviation (std) is used as the local activity measurement. In addition, both image gradient and local activity are exploited for image smoothing and denoising. In particular, we show that the product of the gradient and the clipped local activity can better seize the change of the image around a pixel in the presence of noise, while the ratio between the gradient and the clipped local activity could locate the noises in the image and facilitate denoising. In our first framework, we develop a robust local activitytuned anisotropic diffusion framework and apply it for compression artifact removal of piecewise smooth images such as depth images and clipart images.
Our second framework uses a local activitytuned relative total variation, which includes two schemes. The first scheme is a local activitytuned RTV for image smoothing and image representation in different scalespaces, where the RTV is divided by the clipped local activity, which emphasizes the contour information of the image. The second local activitytuned RTV scheme is designed to remove additive white Gaussian noise, which uses the ratio between the gradient and local activity. This can identify the location of the noise. The performances are demonstrated by experimental results.
The rest of this paper is organized as follows. In Section II, a robust local activitytuned anisotropic diffusion scheme is described. In Sec. III, a local activitytuned relative total variation framework is introduced. Experimental results are presented in Section IV, followed by the conclusion in Section V.
Ii Local activitytuned anisotropic diffusion
Iia PeronaMalik anisotropic diffusion
Anisotropic diffusion is an image denoising technique based on the heat equation, which was originally used to describe the change of temperature in a given region over time. In image processing, it can be used to model the change of pixel values during denoising iterations. The heat equation is given by
(1) 
where is the gradient of an image and denotes the divergence of gradient , i.e., the Laplacian operator of . Therefore, diffusion happens when the divergence is nonzero. This equation has the same diffusion strength in every direction, therefore it is called isotropic diffusion, which inevitably leads to blur.
Contrary to isotropic diffusion, anisotropic diffusion proposed by PeronaMalik regularizes the images to preserve significant edges [21]. The anisotropic diffusion model can be written as
(2) 
where is an edgestop function, such that no diffusion happens across the edges in the image. In [21], two gradientbased edgestop functions are suggested, i.e.,
(3) 
(4) 
where is a parameter to control the strength of .
The discrete form of the anisotropic diffusion equation can be written as
(5) 
where the parameter adjusts the convergence speed, is the iteration number, and denotes the gradient between pixel and pixel in the neighboring around pixel .
IiB Modified anisotropic diffusion
In [36], the local intensity variance is utilized to adapt the diffusion function:
(6) 
where is the diffusion parameter, is the local grayscale variance around pixel in the initial image, and is the maximal value of the variance. and are predefined maximal and minimal of . This technique could remove noises and irrelevant details while preserving sharper boundaries. However, it only uses the variance of the initial image. This is not optimal, because the initial image’s variance cannot catch up with the updated diffused image’s information.
Different from [36], another anisotropic diffusion model is suggested in [26][27],
(7) 
where and are the maximal and minimal graylevel variance of the diffused image at the tth iteration, and is the graylevel variance of the ith pixel. This method incorporates both local gradient and grayscale variance to preserve edges and fine details while effectively removing noise. Note that Eq. (6) uses the division or ratio of the gradient and the variance, whereas Eq. (7) uses their product.
IiC Local activitytuned anisotropic diffusion
In general, depth images are characterized by smooth regions with sharp edges. However, after compression, the edges usually suffer from various compression artifacts, which will affect the quality of view synthesis [37]. In this paper, we apply the modified anisotropic diffusion to mitigate the coding artifacts of depth images. We propose a local activitytuned anisotropic diffusion (LATAD) method, which can be written as
(8) 
where is obtained from the local activity of the image .
Similar to Eq. (6), the discrete version becomes
(9) 
where in the first iteration, is a clipped and normalized local activity, which will be defined later. Motivated by [36], we define two new edgestop functions as follows:
(10) 
(11) 
where and are diffusion parameters. Note that is squared in Eq. (10), but not in Eq. (11).
Similar to Eq. (6), the ratio of the gradient and local activity is used, which can capture where the coding artifacts exist in the compressed depth image. Moreover, the diffusion parameter is adaptively tuned according to the ratio, such that larger diffusion parameters are assigned to more severely distorted pixels. Therefore, pixels with larger local activity would receive more diffusion from neighboring pixels than pixels with smaller activity under the control of gradient. This will remove noisy pixels and prevent blurry regions from being heavily diffused.
We next describe how to calculate the clipped and normalized local activity measurement . First, we calculate the local mean and standard variation of the 8connected neighborhood around each pixel.
(12) 
(13) 
Next, a clipped version of is obtained, denoted as
(14) 
where is a predefined parameter.
After that, is normalized by in Eq. (15), which is the maximal value across the image.
(15) 
Finally, to make the iteration more stable, is updated from for every iterations.
(16) 
where denotes the modulo operator. Let be the maximal number of iterations. The updating interval is chosen as .
In the following, the fixed local activitytuned anisotropic diffusion using Eq. (10) as edgestop function is denoted as FLATAD, the timeupdated local activitytuned anisotropic diffusion with Eq. (10) is denoted as TLATAD, and periodically local activitytuned anisotropic diffusion based on edgestop function of Eq. (10) is denoted as PLATAD. Moreover, when Eq. (11) is used, the three other methods are denoted as FLATAD (I), TLATAD (I), and PLATAD (I) respectively.
When is set to be 1, it becomes TLATAD. If is larger than 1, but small than , it reduces to PLATAD. However, if is set to be , it becomes FLATAD.
Since the differences between neighboring pixel’s variance are often relatively greater than the differences of the corresponding standard deviation, when the 8connected activity is larger than , we use the standard variation instead of variance in this paper. To see this, Let and denote two standard deviations and we assume that , , and . We look at the difference . Based above assumptions, and . Therefore, , i.e., .
There are three works [26, 27, 36] related to the proposed method, so next we would like to emphasize their differences. Several differences between our LATAD and [26, 27] are listed as follows: we use clipped function and the local activity; the activity is calculated by the intervalupdated way; and our method uses the division between gradient and local activity, but [26, 27] use the multiplication; our edgestop function comes from Eq. (3), while the ones in [26, 27] use Eq. (4). The differences between the proposed LATAD and [36] are listed as follows:

The local activity is leveraged in our paper, which makes the relative impacts more efficient. The detailed operation of activity used in paper [36] is very complex and the window for their activity is often set to be larger than . In this paper, we aim to achieve fast depth filtering for distorted image compressed by HEVC coder [38], so we just use window centered at pixel to get the 8connected standard deviation instead of variance, because if variance is used, small variance can be easily dominated by large variance, and will have little contribution to the diffusion.

A clipped function is used for local activity to make diffusion stable during anisotropic diffusion, because pixels with very large local activity render local activitytuned anisotropic diffusion useless for pixels with smaller local activity measurement.

During the iterative diffusion, the updated activity is used to control the degree of diffusion. If the image’s diffusion is too fast, the fixed local activity often tends to blur the image discontinuities. The timeupdated local activity can always preserve the sharp boundaries in the image, but it requires extra calculation of the local activity in every iteration. The intervalupdated activity is a good alternative, especially when fast filtering is required by some applications.
Iii Local activitytuned relative total variation
The classic total variation (TV) method [39] can be written as:
(17) 
where is the domain of the image and is the initial image.
To compare with the anisotropic diffusion, according to [40] the EulerLangrage Equation of the TV model can be used, which is given as follows:
(18) 
Comparing Eq. (2) and Eq. (18), it is clear that the total variance model can be viewed as a special case of the anisotropic diffusion with edgestop function to be .
In order to extract the main structure from the textured background, a relative total variation (RTV) model is proposed in [5], which is based on two variation measures. The first is the conventional windowed total variation (WTV) measure to capture visual saliency of the image:
(19) 
where is a Gaussian weighting function with variance ,
(20) 
In addition, a windowed inherent variation (WIV) measure is introduced in [5] as follows:
(21) 
Note that it adds the variations rather than the absolute values of gradient. Therefore its response is much smaller in a window that only contains textures.
To further enhance the contrast between texture and structure, the ratio of the WTV and WIV, which is called the RTV regularizer, is used to remove textures from the image and only keep the structure [5]. The overall objective function is
(22) 
where is a small positive number to avoid dividing by zero.
Inspired by the RTV, we propose a local activitytuned relative total variation for image smoothing (LATRTV), which is given by
(23) 
where the clipped and normalized local activity measurement is obtained according to Eq. (1216).
Most pixels around edges have high activity. By dividing in Eq. (23), these pixels will have less contribution to the RTV term so that the edge will be preserved. Thus, compared to RTV [5], the proposed LATRTV in Eq. (23) will further smoothen the details and textures in the image, but will preserve the structural information.
Due to the nonconvexity of Eq. (23), its solution cannot be directly obtained. As described in [5, 41], an objective function with a quadratic term as penalty can be optimized linearly. According to [5], the LATRTV term can be decomposed into a quadratic part and a nonlinear part. By putting Eq. (19) and Eq. (21) into the LATRTV term in the direction, it can be rewritten as:
(24) 
This can be rewritten as
(25) 
where
(26) 
(27) 
Similarly, the LATRTV term in the direction can be written as:
(28) 
where
(29) 
For simplicity, we rewrite Eq. (23) in the form of matrix as follows:
(30) 
In Eq. (30), and
are respectively the vector representation of
and , and are the Toeplitz matrices from the discrete gradient operators using forward difference. , , and are the diagonal matrices, whose diagonal values are , and .To minimize Eq. (30), we take the derivative with respect to and the solution can be written as:
(31) 
where
is the identity matrix.
Finally, given the initial image , the detailed iterative optimization procedure of LATRTV is presented as follows:

In each iteration, use Eq. (27) and Eq. (29) to calculate and in order to get matrices and . In the first iteration, and are obtained from , otherwise and are obtained from , which in the form of vector is .

Given , , , and , the vector results can be obtained in each iteration as follows, according to Eq. (32).

After times iterations with step (12), is rearranged into a matrix with size , which is the final output image.
(32) 
In contrast to LATRTV, the product between and the RTV is firstly proposed to achieve image denoising (denoted as LATRTVd) as follows:
(33) 
The solution for LATRTVd of Eq. (33) can be obtained similarly to the derivation for LATRTV, which is presented in Eq. (34). Here, is the diagonal matrix and its th diagonal value is .
(34) 
Just as the denoising of LATAD, because the product of RTV and normalized and clipped standard variation can capture the locations of the noises in the contaminated image, LATRTVd can smoothen the detected noisy pixels to achieve image denoising. This comes from the fact that gradient information has noise’s gradient change except for boundary information change, but local variance or standard deviation is usually a stable statistic feature for image without obvious noises.
In the RTV model, whether a pixel is judged as a texture pixel or a structural pixel depends on the gradient changes of local information within a patch through the WTV and WIV. Thus, the RTV model smoothens all the textural pixels so as to extract structure from texture. However, our LATRTVd judges whether and how much a pixel belongs to a noisy pixel based on local activitytuned RTV, so LATRTVd prefers to smoothen noisy pixels detected by local activity and gradient, rather than all the textural pixels. Therefore, our LATRTVd has the ability to maintain more detailed textural information than RTV.
From Eq. (2527), it can be clearly seen that LATRTV employs the multiplication between local activity and gradient in the direction, but it is in the way of division between RTV and normalized clipped local activity. On the contrary, LATRTVd uses the division of and in the direction.
Iv Experimetal results and analysis
M/Seq  U1  S1  C37  B10  U5  S5  C39  B8  Ave. 
Coded41  44.23  39.56  40.60  37.85  44.19  39.46  40.58  37.79  40.53 
CVBF[34]  44.27  39.33  40.45  37.41  44.27  39.25  40.40  37.27  40.33 
ADTF[33]  44.18  39.38  40.45  37.35  44.16  39.30  40.43  37.31  40.32 
TSF[35]  44.34  38.87  40.49  37.50  44.32  38.78  40.41  37.36  40.26 
FLATAD  44.64  39.50  40.65  37.71  44.63  39.42  40.61  37.64  40.60 
TLATAD  44.58  39.60  40.71  37.65  44.56  39.52  40.70  37.59  40.61 
PLATAD  44.64  39.61  40.71  37.72  44.62  39.53  40.68  37.66  40.65 
FLATAD (I)  44.82  39.75  40.87  38.07  44.80  39.66  40.93  37.98  40.86 
PLATAD (I)  44.81  39.74  40.86  38.05  44.78  39.66  40.93  37.97  40.85 
Coded39  45.73  40.85  42.12  38.93  45.71  40.77  42.14  38.83  41.89 
CVBF[34]  45.88  40.86  42.09  38.66  45.86  40.78  42.16  38.52  41.85 
ADTF[33]  45.68  40.68  41.98  38.46  45.64  40.60  42.02  38.37  41.68 
TSF[35]  45.87  39.91  41.97  38.50  45.84  39.81  41.95  38.33  41.52 
FLATAD  46.21  40.71  42.22  37.79  46.20  40.64  42.31  38.71  41.85 
TLATAD  46.16  40.96  42.18  37.65  46.14  40.89  42.25  37.59  41.73 
PLATAD  46.21  40.93  42.24  38.82  46.20  40.86  42.32  38.74  42.04 
FLATAD (I)  46.42  41.10  42.46  39.16  46.42  41.03  42.59  39.06  42.28 
PLATAD (I)  46.41  41.10  42.45  39.15  46.41  41.02  42.58  39.05  42.27 
Coded37  47.30  42.22  43.77  40.09  47.28  42.15  43.89  40.10  43.35 
CVBF[34]  47.46  42.14  43.72  39.68  47.44  42.08  43.78  39.79  43.26 
ADTF[33]  47.23  41.95  43.57  39.60  47.20  41.89  43.65  39.63  43.09 
TSF[35]  47.47  40.86  43.52  39.40  47.46  40.74  43.54  39.54  42.82 
FLATAD  47.89  42.22  43.94  39.95  47.88  42.15  44.08  39.96  43.51 
TLATAD  47.85  42.35  43.93  39.96  47.84  42.29  44.05  39.96  43.53 
PLATAD  47.91  42.31  43.97  40.01  47.90  42.24  44.10  40.01  43.56 
FLATAD (I)  48.12  42.48  44.10  40.35  48.10  42.42  44.31  40.31  43.77 
PLATAD (I)  48.11  42.48  44.10  40.34  48.09  42.42  44.31  40.30  43.77 
M/Seq  U1  S1  C37  B10  U5  S5  C39  B8  Ave. 
Coded35  48.91  43.62  45.29  41.43  48.87  43.54  45.46  41.34  44.81 
CVBF[34]  49.08  43.39  45.16  41.03  49.05  43.30  45.12  40.81  44.62 
ADTF[33]  48.79  43.17  44.99  40.91  48.76  43.09  44.98  40.79  44.44 
TSF[35]  49.16  41.89  45.22  41.05  49.14  41.76  45.17  40.73  44.27 
FLATAD  49.68  43.83  45.71  41.62  49.65  43.76  45.79  41.53  45.20 
TLATAD  49.71  43.89  45.67  41.60  49.68  43.82  45.75  41.51  45.20 
PLATAD  49.69  43.89  45.69  41.61  49.66  43.81  45.77  41.52  45.21 
FLATAD (I)  49.87  43.97  45.81  41.85  49.85  43.90  45.95  41.74  45.37 
PLATAD (I)  49.86  43.98  45.81  41.85  49.84  43.90  45.94  41.73  45.36 
Coded33  50.33  44.97  46.73  42.60  50.29  44.90  46.81  42.72  46.17 
CVBF[34]  50.50  44.52  46.45  41.88  50.50  44.42  46.20  42.16  45.83 
ADTF[33]  50.19  44.14  46.22  41.89  50.13  44.22  46.00  42.06  45.61 
TSF[35]  50.62  42.49  46.56  41.78  50.70  42.35  46.22  42.19  45.36 
FLATAD  51.28  45.09  47.09  42.74  51.24  45.16  46.98  42.87  46.56 
TLATAD  51.32  45.24  47.05  42.72  51.27  45.17  46.92  42.85  46.57 
PLATAD  51.30  45.16  47.07  42.74  51.26  45.23  46.95  42.86  46.57 
FLATAD (I)  51.51  45.33  47.25  43.11  51.46  45.26  47.30  42.97  46.77 
PLATAD (I)  51.50  45.33  47.25  43.11  51.46  45.26  47.30  42.97  46.77 
Coded31  51.71  46.35  48.28  43.95  51.65  46.28  48.35  43.84  47.55 
CVBF[34]  51.92  45.56  47.76  43.16  51.88  45.47  47.34  42.90  47.00 
ADTF[33]  51.38  45.17  47.43  43.00  51.45  45.09  47.10  42.82  46.68 
TSF[35]  51.95  42.93  47.87  43.16  52.01  42.76  47.37  42.78  46.35 
FLATAD  52.82  46.53  48.48  44.01  52.77  46.47  48.30  43.86  47.91 
TLATAD  52.83  46.63  48.45  43.99  52.78  46.57  48.25  43.85  47.92 
PLATAD  52.84  46.62  48.47  44.01  52.79  46.56  48.28  43.86  47.93 
FLATAD (I)  53.01  46.70  48.62  44.22  52.96  46.64  48.62  44.07  48.11 
PLATAD (I)  53.02  46.71  48.63  44.22  52.96  46.65  48.63  44.07  48.11 
In this section, we present extensive results to demonstrate the performance of the proposed methods. First, we first apply the proposed LATAD to the problem of artifact removal of piecewise smooth image, such as depth image and clipart image. Secondly, we validate the efficiency of the proposed LATRTV on image smoothing. Finally, our LATRTVd is compared with several denoising methods to demonstrate the novelty of the proposed method.
Iva Compressed depth image filtering with LATAD
The depth maps are compressed by HEVC v16.8 [44] with quantization parameter chosen as 31, 33, 35, 37, 39 and 41, respectively. We use four standard multiviewplusdepth sequences: Nokia’s Undo_Dancer (U), NICT’s Shark (S), Nagoya University’s Champagne_Tower (C) (in which the first 250 frames of these three sequences are tested) and HHI’s Book_Arrival (B) (the whole sequences with 100 frames are tested) [45]. In the simulations, the 1Dfast mode of 3DHEVC (HTMDEV2.0dev3 version) [46] is used to synthesize the virtual middle view using two views of uncompressed texture images and compressed depth images (filtered or nonfiltered). In our experiment, all the sequences are set with the same parameters for filtering. For FLATAD, TLATAD, PLATAD, FLATAD (I), and PLATAD (I), is 0.25, is 30 and the number of iteration is 11 when QP lower than 37, otherwise the number of iteration is 21, which are the experimental values. For FLATAD, TLATAD, and PLATAD, is set to be 30, while is 300 for FLATAD (I), and PLATAD (I). For PLATAD and PLATAD (I), the interval is 5 when QP =31, 34, 35, but the interval is set to be 10 if QP=37, 39, 41.
The filtering results of the proposed method are compared with those of ADTF [33], CVBF [34], and TSF [35]. For both filtered depth images and the synthesized virtual view (the middle view of two reference views), the peak signal noise ratio (PSNR) is taken as the objective evaluation of filtered depth images and corresponding synthesized images. The average PSNRs of different sequences are presented in TABLEI, TABLEII, and TABLEIII, where U1 represents the view1 of Undo_Dancer (U), the notations of other sequences are defined similarly, and M/Seq denotes Method/Sequence.
From Fig.1 (df), it can be observed that the performances of FLATAD and TLATAD as well as PLATAD are different and the sharpness of TLATAD is stronger than PLATAD, but TLATAD requires updated activity information every time, so TLATAD has more complexity than PLATAD. The diffusion of FLATAD leads to blur of depth image’s discontinuities, so it has the worst performance on boundary regions as compared with the other methods. Different from FLATAD, TLATAD, and PLATAD, the performances of FLATAD (I), TLATAD (I), and PLATAD (I) are very similar, as shown in Fig. 1 (gi). The reason is that the form of leads to more diffusion for some artifact pixels than the form of during each iteration, as shown in Fig. 2 (ef). The stopfunction in Eq. (10) is more efficient to smoothen image, as compared to the stopfunction with Eq. (11). But the stopfunction of Eq. (11) in the proposed FLATAD (I), TLATAD (I), and PLATAD (I) does not change depth structures too much and most of detailed geometry information is well preserved during removing severe coding artifacts.
M/Seq  U  S  C  B  Ave.  M/Seq  U  S  C  B  Ave. 
Coded41  49.30  48.20  46.60  51.34  48.86  Coded39  49.89  49.08  47.77  52.14  49.72 
CVBF[34]  50.99  50.02  47.44  52.66  50.28  CVBF[34]  51.56  50.90  48.31  53.35  51.03 
ADTF[33]  50.82  50.16  47.29  52.29  50.14  ADTF[33]  51.61  51.03  48.29  53.06  51.00 
TSF[35]  50.86  49.87  47.47  52.71  50.23  TSF[35]  51.71  50.81  48.39  53.38  51.07 
FLATAD  50.37  49.54  47.38  52.64  49.98  FLATAD  51.20  50.46  48.37  53.39  50.86 
TLATAD  50.22  49.55  47.21  52.52  49.88  TLATAD  51.06  50.52  48.27  53.30  50.79 
PLATAD  50.32  49.55  47.31  52.55  49.93  PLATAD  51.11  50.55  48.32  53.30  50.82 
FLATAD (I)  50.42  49.58  49.17  52.41  50.40  FLATAD (I)  51.29  50.64  48.43  53.24  50.90 
PLATAD (I)  50.39  49.57  49.17  52.43  50.39  PLATAD (I)  51.25  50.64  48.43  53.25  50.89 
Coded37  50.71  50.00  48.53  53.04  50.57  Coded35  51.47  50.93  49.37  53.91  51.42 
CVBF[34]  52.57  51.67  48.96  54.08  51.82  CVBF[34]  53.47  52.45  49.66  54.76  52.59 
ADTF[33]  52.46  51.78  48.97  53.92  51.78  ADTF[33]  53.27  52.45  49.66  54.68  52.52 
TSF[35]  52.73  51.66  48.96  54.06  51.85  TSF[35]  53.20  52.29  49.75  54.79  52.51 
FLATAD  52.18  51.27  49.03  54.21  51.67  FLATAD  52.80  52.09  49.75  54.83  52.37 
TLATAD  52.03  51.41  48.96  54.12  51.63  TLATAD  52.71  52.11  49.71  54.78  52.33 
PLATAD  52.09  51.39  49.00  54.16  51.66  PLATAD  52.73  52.12  49.72  54.81  52.35 
FLATAD (I)  52.23  51.66  49.17  54.17  51.81  FLATAD (I)  52.81  52.23  49.84  54.86  52.44 
PLATAD (I)  52.22  51.65  49.17  54.16  51.80  PLATAD (I)  52.83  52.23  49.83  54.87  52.44 
Coded33  52.25  51.78  50.02  54.78  52.21  Coded31  53.13  52.64  50.75  55.67  53.05 
CVBF[34]  54.26  53.15  50.24  55.35  53.25  CVBF[34]  55.16  53.76  50.85  56.06  53.96 
ADTF[33]  54.07  53.04  50.29  55.32  53.18  ADTF[33]  54.80  53.48  50.94  55.96  53.80 
TSF[35]  54.12  52.98  50.40  55.43  53.23  TSF[35]  55.00  53.51  51.02  56.14  53.92 
FLATAD  53.75  52.83  50.39  55.56  53.13  FLATAD  54.72  53.57  51.02  56.41  53.93 
TLATAD  53.70  52.85  50.37  55.53  53.11  TLATAD  54.70  53.64  51.02  56.40  53.94 
PLATAD  53.73  52.88  50.37  55.54  53.13  PLATAD  54.75  53.65  51.01  56.42  53.96 
FLATAD (I)  53.74  53.10  50.48  55.61  53.23  FLATAD (I)  54.63  53.84  51.06  56.36  53.97 
PLATAD (I)  53.75  53.11  50.47  55.62  53.24  PLATAD (I)  54.63  53.84  51.06  56.36  53.97 
From TableIV, it can be seen that the overall quality of different depth sequences filtered with our proposed method FLATAD (I) has the best performance, with a gain of up to 0.48 dB, while the quality of the synthesized images can be better than ADTF [33], but slight lower than CVBF [34] and TSF [35]. Meanwhile, the depth qualities of FLATAD, TLATAD, and PLATAD are better than ADTF, CVBF, and TSF, while TLATAD could better preserve boundary information than PLATAD and FLATAD. The synthesized images rendered with filtered depth maps are displayed in Fig. 3, from which we can see that the visual quality of the proposed method has superior performance compared to other methods.
The main advantage of the proposed method lies in the ability to greatly improve the quality of the depth images during the filtering than others as displayed in TABLEI and TABLEII. One fatal drawback of ADTF, CVBF, and TSF is that they smoothen some small but significant objects too much, and can even completely eliminate some small objects, as shown in Fig. 3 (ce). It is obvious that the proposed method can avoid these drawbacks in Fig. 3 (fj).
From Fig. 4, it can be seen that the CVBF spends more filtering time than ADTF, CVBF, TSF, and the proposed method, while the filtering time of the proposed FLATAD, PLATAD, FLATAD (I) and FLATAD (I) is slightly less than TSF, but more than ADTF. However, the TLATAD’s filtering time is more than FLATAD, PLATAD, FLATAD (I) and PLATAD (I), because TLATAD requires to calculate the local activity in each iteration.
M  Depth image  Synthesized color image  

QP  41  39  37  35  33  31  Ave.  41  39  37  35  33  31  Ave. 
Coded31  40.53  41.89  43.35  44.81  46.17  47.55  44.05  48.86  49.72  50.57  51.42  52.21  53.05  50.97 
CVBF[34]  40.33  41.85  43.26  44.62  45.83  47.00  43.82  50.28  51.03  51.82  52.59  53.25  53.96  52.16 
ADTF[33]  40.32  41.68  43.09  44.44  45.61  46.68  43.64  50.14  51.00  51.78  52.52  53.18  53.80  52.07 
TSF[35]  40.26  41.52  42.82  44.27  45.36  46.35  43.43  50.23  51.07  51.85  52.51  53.23  53.92  52.14 
FLATAD  40.60  41.85  43.51  45.20  46.56  47.91  44.27  49.98  50.86  51.67  52.37  53.13  53.93  51.99 
TLATAD  40.61  41.73  43.53  45.20  46.57  47.92  44.26  49.88  50.79  51.63  52.33  53.11  53.94  51.95 
PLATAD  40.65  42.04  43.56  45.21  46.57  47.93  44.33  49.93  50.82  51.66  52.35  53.13  53.96  51.98 
FLATAD (I)  40.86  42.28  43.77  45.37  46.77  48.11  44.53  50.40  50.90  51.81  52.44  53.23  53.97  52.13 
PLATAD (I)  40.85  42.27  43.77  45.36  46.77  48.11  44.52  50.39  50.89  51.80  52.44  53.24  53.97  52.12 
IvB Clipart compression artifact removal with LATAD
LATAD can be used for clipart compression artifact removal. We have tested several cartoon/clipart images with severe compression artifacts. For clipart compression artifact removal and image smoothness, we compare our method with TV [39], modified TV [26], and L0 gradient minimization method [6]. Although TV could well remove the noise when the gradient along boundary is large, weak edge information is not well preserved, as shown in Fig. 5. The modified TV [26] and the L0 gradient minimization method [6] can preserve some weak edges, but some noise and blur still exist after being filtered by these two methods. In Fig. 5, we can see that the proposed methods are better than other methods. Our FLATAD (I) and FLATAD not only make boundaries sharper but also greatly reduce the compression artifacts, thanks to the clipped local activity tuning. The FLATAD (I) makes the filtered image more similar to the unfiltered image than FLATAD, but FLATAD can make edge sharper than FLATAD (I), which keeps the piecewise smoothness of clipart images.
IvC The denoising of contaminated depth image with LATRTV
Since depth image has the properties of piecewise smoothness and sharp discontinuity, we adopt the LATRTV rather than the LATRTVd for noise removal of depth image, because the proposed LATRTV is more powerful to smoothen image than LATRTVd. To verify the efficiency of the proposed LATRTV, ten Middlebury depth maps are tested, including: Aloe (), Art (), Baby1 (), Baby2 (), Cloth3 (), Cones (), Moebius (), Reindeer (), Teddy (), and Barn1 (). Here, the noise is additive white Gaussian noise, whose standard deviations are set to 4, 6, 8, 10, 15 and 20 respectively. We compare our approach with three other competing methods: nonlocal graph based transform (NLGBT) [43], blockmatching 3D (BM3D) [42], and RTV [5], which exploit the local and nonlocal information respectively for denoising.
It has been well known that RTV has the functionality of texture removal, but as far as we know, it has never been applied into noise removal. As a matter of fact, RTV also could remove the Gaussian noise by just treating the Gaussian noise as the texture for piecewise smoothness images. Compared with RTV, the advantages of the proposed LATRTV mainly come from the local activity tuning, which makes the proposed method more robust to Gaussian noise removal. It is worthy to notice that that proposed method preserves the main structure of disparity image without making boundary blur, as shown in Fig. 6 (ef).
Table V shows the objective quality of denoising results by these methods in terms of PSNR at different noise level. The objective measure of LATRTV has better performance than BM3D and RTV, but has slightly lower performance than NLGBT’s. As presented in Fig. 6 (cf), we can see that our method has better edge preserving performance than others. The running time has also been tested and is reported in TABLE V, from which we can find that NLGBT has the longest filtering time compared with others, while the proposed LATRTV, RTV and BM3D only need several seconds.
IvD Image smoothing and scalespace representation with LATRTV
To remove image’s textures and keep structures, the proposed LATRTV is tuned with local activity, which can smoothen more weak edges in order to retain the main contour information. From Fig. 7, we can see that the proposed LATRTV can remove more textures and retain strong edges, compared with the original RTV [5] and other four methods, including Weighted Least Squares (WLS) [7], Region covariance based method (RC) [10], Rolling Guidance Filter (RGF) [11], Robust Guided Image Filtering (RGIF) [12]. Among these methods, RC [10], RGF [11] tend to make image’s edge blurred, although they have removed many details and textures. We have also tested our LATRTV in three scales in Fig. 8. From this figure, we can see that proposed LATRTV can preserve sharp edge information and locate the edge information of main object contour, when images are represented in different scalespaces. Compared to RGF [11], WLS [7], and RTV [5], the proposed LATRTV is more suitable for scalespace representation of images. Moreover, LATRTV has similar performance to RGIF [12] for scalespace representation. Although both of them are achieved by optimization, they use different smoothing methods: LATRTV uses the features of texture and structure, and the method of RGIF considers the static and dynamic guidance’s joint effects for image smoothing, so image representation in various scalespace has some diversity in the appearances, especially when some pixels have similar color information.
The PSNR of filtered disparity images  Filtering time  
Images/M  BM3D [42]  NLGBT [43]  RTV [5]  LATRTV  Images/M  BM3D [42]  NLGBT [43]  RTV [5]  LATRTV 
a  40.1  41.1  38.7  40.3  a  1.4  194.9  1.5  4.5 
b  41.1  42.8  39.8  42.0  b  1.5  210.9  1.5  4.5 
c  45.0  45.2  42.7  45.5  c  1.6  184.7  1.5  3.3 
d  44.7  45.1  42.7  45.0  d  1.6  196.0  1.5  3.9 
e  44.8  45.0  41.4  44.8  e  2.0  260.0  1.7  4.1 
f  42.7  43.8  39.1  42.3  f  1.9  262.7  1.6  5.0 
g  43.4  43.5  40.4  43.1  g  2.2  309.0  1.8  5.2 
h  43.3  44.1  40.5  43.1  h  2.0  272.5  1.9  5.6 
i  42.7  42.9  39.3  42.1  i  2.0  287.7  1.9  5.5 
j  47.1  46.9  45.5  47.7  j  2.1  259.8  1.7  3.7 
mean  43.5  44.0  41.0  43.6  mean  1.8  243.8  1.7  4.5 
Standard deviation=13  Standard deviation=26  
Image  Noisy  RBF [18]  WBF [18]  TV [39]  RTV [5]  LATRTVd  Image  Noisy  RBF [18]  WBF [18]  TV [39]  RTV [5]  LATRTVd 
(a)  26.04  31.75  32.87  29.15  31.55  33.51  (a)  20.14  30.36  30.45  27.66  28.31  30.19 
(b)  26.07  25.71  29.48  25.46  24.61  27.80  (b)  20.21  25.25  26.17  24.78  23.87  25.69 
(c)  26.17  30.27  31.71  29.39  30.19  32.02  (c)  20.31  29.15  29.27  27.74  28.45  29.09 
(d)  26.07  30.4  31.65  30.23  29.62  31.97  (d)  20.22  29.49  29.60  28.44  28.30  29.22 
(e)  26.36  26.67  29.54  26.55  26.68  29.66  (e)  20.49  25.92  26.53  25.50  25.31  26.94 
(f)  26.17  24.39  28.44  23.08  24.78  27.92  (f)  20.36  23.74  24.92  22.61  22.73  25.06 
(g)  26.26  27.79  29.27  26.62  26.77  30.65  (g)  20.31  26.82  27.13  25.48  24.93  27.26 
(h)  26.18  27.72  30.17  26.70  27.01  30.62  (h)  20.44  26.85  27.29  25.71  25.18  27.46 
(i)  26.62  30.85  31.52  31.07  29.31  33.10  (i)  20.86  28.72  28.14  28.44  26.41  28.78 
(j)  26.27  30.58  31.56  29.65  30.21  32.95  (j)  20.32  29.17  29.24  27.78  27.94  29.13 
Ave.  26.22  28.61  30.62  27.79  28.07  31.02  Ave.  20.37  27.55  27.87  26.41  26.14  27.88 
IvE Image denoising with LATRTVd
Ten images are used to test image denoising, including: Monarch, Barbara, Pepper, Lena, Man, Comic, Zebra, Flowers, Bird, Boats. The noise is zero mean Gaussian noise with standard deviation of 13 and 26. We compare the proposed approach with four other methods. The nonlinear combination of the local activity and gradient information in the LATRTVd catch the location of the noise, so Gaussian noises can be removed and fine details are still retained, but RTV only tends to smooth the texture to preserve image’s structure. This is shown in Fig. 9, where three other methods including RBF [18], WBF [18], and TV [39], are also compared with the proposed LATRTVd. From Fig. 9, and TABLE VI, we can see that both objective quality and visual quality of the proposed method for denoising have better performance than other methods and the total gains of noisy image’s PSNR can be up to 7.51 dB compared with noisy image.
V Conclusion
In this paper, two local activitytuned frameworks are introduced. First, a robust local activitytuned anisotropic diffusion is proposed to control the diffusion for depth artifact’s removal. Secondly, our local activitytuned relative total variation framework achieves good performance for image smoothing and represents the image in different scalespace and it has been used for depth image denoising. From these applications, we can see that proposed LATAD, LATRTV and LATRTVd has good performance for image smoothing and noise removal. The local activitytuned strategy can be applied into other schemes, which will be explored in our future works.
References
 [1] D. Hao, Q. Li, and C. Li, “Singleimagebased rain streak removal using multidimensional variational mode decomposition and bilateral filter,” Journal of Electronic Imaging, vol. 26, no. 1, pp. 13 020–13 020, 2017.
 [2] D. Papadimitriou and T. Dennis, “Nonlinear smoothing of stereo disparity maps,” Electronics Letters, vol. 30, no. 5, pp. 391–393, 1994.

[3]
J. Yin and J. Cooperstock, “Improving depth maps by nonlinear diffusion,” in
IEEE International Conference on Computer Graphics, Visualization and Computer Vision
, 2004.  [4] S. Zhu, R. Gao, and Z. Li, “Stereo matching algorithm with guided filter and modified dynamic programming,” Multimedia Tools and Applications, vol. 76, no. 1, pp. 199–216, 2017.
 [5] L. Xu, Q. Yan, Y. Xia, and J. Jia, “Structure extraction from texture via relative total variation,” ACM Transactions on Graphics, vol. 31, no. 6, 2012.
 [6] L. Xu, C. Lu., Y. Xu, and J. Jia, “Image smoothing via l0 gradient minimization,” ACM Transactions on Graphics, vol. 30, no. 6, 2011.
 [7] Z. Farbman, R. Fattal, and D. Lischinski, “Edgepreserving decompositions for multiscale tone and detail manipulation,” ACM Transactions on Graphics, vol. 27, no. 3, 2008.
 [8] K. He, J. Sun, and X. Tang, “Guided image filtering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 6, pp. 1397–1409, 2013.
 [9] E. Gastal and M. Oliveira, “Domain transform for edgeaware image and video processing,” ACM Transactions on Graphics, vol. 30, no. 4, pp. 1244–1259, 2011.
 [10] L. Karacan, E. Erdem, and A. Erdem, “Structurepreserving image smoothing via region covariances,” ACM Transactions on Graphics, vol. 32, no. 6, pp. 1–11, 2003.
 [11] Q. Zhang, L. Xu, and J. Jia, “Rolling guidance filter,” in European Conference on Computer Vision, Zurich, Sep. 2014.
 [12] B. Ham, M. Cho, and J. Ponce, “Robust guided image filtering using nonconvex potentials,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PP, no. 99, pp. 1–1, 2017.
 [13] P. Ndjiki, M. Koppel, D. Doshkov, H. Lakshman, P. Merkle, K. Muller, and T. Wiegand, “Depth imagebased rendering with advanced texture synthesis for 3d video,” IEEE Transactions on Multimedia, vol. 13, no. 3, pp. 453–465, 2011.
 [14] J. Lei, C. Zhang, Y. Fang, Z. Gu, N. Ling, and C. Hou, “Depth sensation enhancement for multiple virtual view rendering,” IEEE Transactions on Multimedia, vol. 17, no. 4, pp. 457–469, 2015.
 [15] C. Yao, J. Xiao, T. Tillo, Y. Zhao, C. Lin, and H. Bai, “Depth Map DownSampling and Coding Based on Synthesized View Distortion,” IEEE Transactions on Multimedia, vol. 18, no. 10, pp. 2015–2022, 2016.
 [16] K. Chaudhury and S. Dabhade, “Fast and provably accurate bilateral filtering,” IEEE Transactions on Image Processing, vol. 25, no. 6, pp. 2519–2528, 2016.
 [17] K. Chaudhury, D. Sage, and M. Unser, “Fast O(1) bilateral filtering using trigonometric range kernels,” IEEE Transactions on Image Processing, vol. 20, no. 12, pp. 3376–3382, 2011.
 [18] K. Chaudhury and K. Rithwik, “Image denoising using optimally weighted bilateral filters: A sure and fast approach,” in IEEE International Conference on Image Processing, Quebec, Canada, Sep. 2015.

[19]
A. Buades, B. Coll, and J. Morel, “A nonlocal algorithm for image
denoising,” in
IEEE Conference on Computer Vision and Pattern Recognition
, Boston, 2015.  [20] D. Huang, L. Kang, Y. Wang, and C. Lin, “Selflearning based image decomposition with applications to single image denoising,” IEEE Transactions on Multimedia, vol. 16, no. 1, pp. 83–93, 2014.
 [21] P. Perona and J. Malik, “Scalespace and edge detection using anisotropic diffusion,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 7, pp. 1629–1639, 1990.
 [22] M. Black, G. Sapiro, D. Marimont, and D. Heeger, “Robust anisotropic diffusion,” IEEE Transactions on Image Processing, vol. 7, no. 3, pp. 421–432, 1998.
 [23] J. Bai and X. Feng, “Fractionalorder anisotropic diffusion for image denoising,” IEEE Transactions on Image Processing, vol. 16, no. 10, pp. 2492–2502, 2007.
 [24] P. Simard and H. Malvar, “An efficient binary image activity detector based on connected components,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, Quebec, May 2004.
 [25] S. Saha and R. Vemuri, “An analysis on the effect of image activity on lossy coding performance,” in IEEE International Symposium on Circuits and Systems, Geneva, May 2000.
 [26] S. Chao and D. Tsai, “An improved anisotropic diffusion model for detailand edgepreserving smoothing,” Pattern Recognition Letters, vol. 31, no. 13, pp. 2012–2023, 2010.
 [27] S. Chao and D. Tsa, “Anisotropic diffusionbased detailpreserving smoothing for image restoration,” in IEEE International Conference on Image Processing, Hong Kong, Sep. 2010.
 [28] S. Bhattacharya, K. Venkatesh, and G. Sumana, “Depth filtering using total variation based video decomposition,” in IEEE International Conference on Image Processing, Quebec, Sep. 2015.
 [29] D. Scharstein and S. Richard, “Stereo matching with nonlinear diffusion,” International Journal of Computer Vision, vol. 28, no. 2, pp. 155–174, 1998.

[30]
L. Alvarez, R. Deriche, J. Sanchez, and J. Weickert, “Dense disparity map estimation respecting image discontinuities: A PDE and scalespace based approach,”
Journal of Visual Communication and Image Representation, vol. 13, no. 1, pp. 3–21, 2002.  [31] N. Hur, J. Tam, F. Speranza, C. Ahn, and S. Lee, “Depthimagebased stereoscopic image rendering considering idct and anisotropic diffusion,” in IEEE International on Conference Consumer Electronics, Las Vegas, Jan. 2005.
 [32] S. Liu, P. Lai, D. Tian, and C. Chen, “New depth coding techniques with utilization of corresponding video,” IEEE Transactions on Broadcasting, vol. 57, no. 2, pp. 551–561, 2011.
 [33] X. Xu, L. Po, T. Cheung, K. Cheung, L. Feng, C. Ting, and K. Ng, “Adaptive depth truncation filter for MVC based compressed depth image,” Signal Processing: Image Communication.
 [34] L. Zhao, A. Wang, B. Zeng, and Y. Wu, “Candidate valuebased boundary filtering for compressed depth images,” Electronics Letters, vol. 51, no. 3, pp. 224–226, 2015.
 [35] L. Zhao, H. Bai, A. Wang, Y. Zhao, and B. Zeng, “Twostage filtering of compressed depth images with markov random field,” Signal Processing: Image Communication, vol. 54, pp. 11–22, 2017.
 [36] Y. Zaz, L. Masmoudi, K. Bouzouba, and L. Radouane, “A new adaptive anisotropic diffusion using the local intensity variance,” in International Conference: Sciences of Electronic, Technologies of Information and Telecommunications, Susa, Tunisia, Mar. 2005.
 [37] S. Shahriyar, M. Murshed, M. Ali, and M. Paul, “Lossless depth map coding using binary tree based decomposition and contextbased arithmetic coding,” in IEEE International Conference on Multimedia and Expo, Barcelona, 2016.
 [38] L. Shen, Z. Liu, X. Zhang, W. Zhao, and Z. Zhang, “ An effective CU size decision method for HEVC encoder,” IEEE Transactions on Multimedia, vol. 15, no. 2, pp. 465–470, 2013.
 [39] L. Rudin, O. Stanley, and F. Emad, “Nonlinear total variation based noise removal algorithms,” Physica D: Nonlinear Phenomena, vol. 60, no. 1, pp. 259–268, 1992.
 [40] E. Weisstein. (2006) Eulerlagrange differential equation. From MathWorld–A Wolfram Web Resource.
 [41] D. Krishnan and R. zeliski, “Multigrid and multilevel preconditioners for computational photography,” ACM Transactions on Graphics, vol. 30, no. 6, p. 177, 2011.
 [42] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3D transformdomain collaborative filtering,” IEEE Transactions on Image Processing, vol. 16, no. 8, pp. 2080–2095, 2007.
 [43] W. Hu, X. Li, C. Gene, and A. Oscar, “Depth map denoising using graphbased transform and group sparsity,” in IEEE International Workshop on Multimedia Signal Processing, Pula, Sep. 2012.
 [44] D. H. Lorenz and A. Orda. High efficiency video coding (HEVC) reference software. the ITUT Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. [Online]. Available: http://hevc.kw.bbc.co.uk/svn/jctvca124/tags/HM16.8/
 [45] The ITUT Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. (2011) Call for proposals on 3d video coding technology. Geneva Switzerland. [Online]. Available: {ftp://vqeg.its.bldrdoc.gov/Documents/VQEG_Seoul_Jun11/MeetingFiles/3DTV/}
 [46] JCT3V. 3DHEVC Test Software (HTM). [Online]. Available: http://hevc.kw.bbc.co.uk/git/w/jctvc3de.git/shortlog/refs/heads/HTMDEV2.0dev3Zhejiang
Comments
There are no comments yet.