Infrared small target detection is a key technique for many applications, including early-warning system, precision guided weapon, missile tracking system, and maritime surveillance system [1, 2, 3]. Traditional sequential detection methods, such as 3D matched filter , improved 3D filter , and multiscan adaptive matched filter , are workable in the case of static background, exploiting the target spatial-temporal information. Nevertheless, with the recent fast development of high-speed aircrafts  like anti-ship missiles, the imaging backgrounds generally change quickly due to rapid relative motion between the imaging sensor and the target. The performance of the spatial-temporal detection method degrades rapidly. Therefore, the research of single-frame infrared small target detection is of great importance and has attracted a lot of attention in recent years.
Different from general object or saliency detection tasks, the main challenge of infrared small target detection is lacking enough information. Due to the long imaging distance, the target is always small without any other texture or shape features. As the target type, imaging distance, and neighboring environment differ a lot in real scenes, the target brightness could vary from extremely dim to very bright (see fig. 5 for example). In the absence of spatial-temporal information and the target features like shape and size, the characteristics of the background  and the relation between the background and target are very important priors for single-frame infrared small target detection. Thus how to design a model to incorporate and exploit these priors is vital for infrared small target detection in a single image.
I-a Prior work on single-frame infrared small target detection
The previously proposed single-frame infrared small target detection methods could be roughly classified into two categories. In the first type, a local background consistent prior is exploited, assuming the background is slowly transitional and nearby pixels are highly correlated. As a result, the target is viewed as the one that breaks this local correlation. Under this assumption, the classical methods, such as two-dimensional least mean square (TDLMS) filter and Max-Median filter , enhance the small target by subtracting the predicted background from the original image. Unfortunately, besides the targets, they enhance the edges of the sky-sea surface or heavy cloud clutter as well, since these structures also break the background consistency as the target does. To differentiate the real target and high-frequency change, some edge analysis approaches [11, 12]
have been proposed to extend these methods to estimate the edge direction in advance and preserve the edges. Bai et al. designed a new Top-Hat transformation using two different but correlated structuring elements. Another class of local prior based methods exploits the local contrast, which is computed by comparing a pixel or a region only with its neighbors. The seminal work of Laplacian of Gaussian (LoG) filter based method  has motivated a broad range of studies on the Human Visual System (HVS), and has led to a series of HVS based methods, e.g., Difference of Gaussians (DoG) , second-order directional derivative (SODD) filter , local contrast measure (LCM) , improved local contrast measure (ILCM) , multiscale patch-based contrast measure (MPCM) , multiscale gray difference weighted image entropy , improved difference of Gabors (IDoGb) , local saliency map (LSM) , weighted local difference measure (WLDM) , local difference measure (LDM) , etc.
The second type of single-frame infrared small target detection methods which has not been explored extensively, exploits the non-local self-correlation property of background patches, assuming that all background patches come from a single subspace or a mixture of low-rank subspace clusters. Then, target-background separation can be realized with the low-rank matrix recovery 
. Essentially, this type of methods attempts to model the infrared small target as an outlier in the input data. To this end, Gao et al. generalized the traditional infrared image model to a new infrared patch-image model via local patch construction. Then the target-background separation problem is reformulated as a robust principal component analysis (RPCA)  problem of recovering low-rank and sparse matrices, achieving a state-of-the-art background suppressing performance. To correctly detect the small target located in a highly heterogeneous background, He et al.  proposed a low-rank and sparse representation model under the multi-subspace-cluster assumption.
Existing methods detect the infrared small target by either utilizing the local pixel correlation or exploiting the non-local patch self-correlation. From our observation, the unsatisfying performance of local prior methods [20, 23] in detecting the dim target under complicated background mainly lies in their imperfect grayscale based center-difference measures. The saliency of a dim but true target would be easily overwhelmed by the measured saliency of some rare structures. We call this phenomenon the rare structure effect. In contrast, the non-local prior methods [26, 29, 30] suffer from the salient edge residuals. Its intrinsic reason is because the strong edge is also a sparse component as the same as the target due to lack of sufficient similar samples. Since the target might be dimmer than the edge, they would simultaneously be wiped out by simply increasing the threshold.
Our key observation is that the non-local prior and local prior are not equivalent, and in fact they are complementary for the problem of infrared small target detection, as illustrated in fig. 1. The often appearing false alarm components in local (non-local) prior methods could be well suppressed by the non-local (local) prior methods. For example, the stubborn strong edges in the non-local prior based methods, can be easily identified by the local edge analysis approaches. Naturally, an intuitive way to solve above-mentioned dilemma is to extract the local structure information and merge it into the non-local prior based detection framework. Therefore, how to simultaneously and appropriately utilize both the local and non-local priors has been an important issue for improving the detection performance under very complex backgrounds. To the best of our knowledge, very few single-frame infrared small target detection methods concern this problem.
To address this issue, we propose a single-frame small target detection framework with reweighted infrared patch-tensor model (RIPT). Our main contributions consist of the following three folds:
To dig out more information from the non-local self-correlationship in patch space, we generalize the patch-image model to a novel infrared patch-tensor model (IPT) and formulate the target-background separation task as an optimization problem of recovering low-rank and sparse tensors.
To incorporate the local structure prior into the IPT model, an element-wise weight is designed based on structure tensor, which helps to suppress the remaining edges and preserve the dim target simultaneously.
To reduce the computing time, we adopt a reweighted scheme to enhance the sparsity of the target patch-tensor. Considering the particularity of infrared small target detection, an additional stopping criterion is used to avoid excessive computation.
The proposed RIPT model is validated on different real infrared image datasets. Compared with the state-of-the-art methods, our proposed model achieves a better background suppressing and target detection performance.
The remaining of this paper is organized as follows. We propose the non-local correlation based IPT model in Section II. The details of the local structure weight construction are described in Section III. In Section IV, we further propose the reweighted IPT model and its detailed optimization scheme is also provided. Section V presents detailed experimental results and discussion. Finally we conclude this paper in Section VI.
Ii Non-local correlation driven infrared patch-tensor model
To dig out more spatial correlationships, we develop a novel target-background separation framework named infrared patch-tensor model in this section. Before describing the details, several mathematical notations are defined first in table I.
tensor, matrix, vector, scalar.
|mode- matricization of tensor , obtained by arranging the mode- fibers as the columns of the resulting matrix of size .|
|vectorization of tensor .|
|inner product of tensor and , which is defined as .|
|norm of tensor which counts the number of non-zero elements.|
|norm of tensor .|
|Frobenius norm of tensor , which is defined by .|
|returns tensor that .|
|nuclear norm of matrix , which is defined by , where the SVD of .|
|element-wise shrinkage operator is defined as . is the closed-form solution of the problem: .|
matrix singular value thresholding operator:, where is the SVD of and . is the closed-form solution of the problem: .
Given an infrared image, it could be modeled as a linear superposition of target image, background image and noise image:
where , , , and represent the input image, background image, target image, and noise image, respectively. Via a window sliding from the top left to the bottom right over an image, we stack the patches into a 3D cube (see the construction step in fig. 4). Then eq. 1 is transferred to the patch space with spatial structure preserved:
where are the input patch-tensor, background patch-tensor, and target patch-tensor, respectively. and are the patch height and width, is the patch number.
Background patch-tensor . Generally, the background is considered as slowly transitional, which implies high correlations among both the local and non-local patches in the image, as illustrated in fig. 2(a). Although patches locate in the different region of the image, they are equivalent. Based on this non-local correlationship, the state-of-the-art IPI model imposed the low-rank constraint to background patch-image. As a patch-image is the mode- unfolding matrix of a patch-tensor, the patch-image model could be viewed as a special case of the proposed patch-tensor model essentially. Since the main difficulty of detecting the infrared small target in a single image is lacking enough information, only considering the low-rank structure in one unfolding is insufficient to deal with highly difficult scenes. Naturally, it motivates us to think whether we can utilize the other two unfolding modes. Actually, the mode- and mode- unfolding matrices of the infrared patch-tensor are also low-rank. In fig. 2(b) – (d), the singular values of all the unfolding matrices rapidly decrease to zero, which demonstrates that every unfolding mode of the background patch-tensor is intrinsically low-rank. Therefore, we can consider the background patch-tensor as a low-rank tensor, and their unfolding matrices are also all low-rank defined as:
where , , and all are constants, representing the complexity of the background image. The larger their values are, the more complex the background is.
Target patch-tensor . Since the small target merely occupies several pixels in the whole image, the target patch-tensor is an extremely sparse tensor in fact. That is:
where is a small integer determined by the number and size of the small target.
Noise patch-tensor . In this paper, we just assume that the noise is additive white Gaussian noise and for some . Thus we have
It should be noted that although the values of parameters are different depending on the images, we would not directly use them in the following sections.
Ideally, we would like to obtain a low-rank and sparse decomposition and solve the following problem:
Unfortunately, the rank computation of a given tensor is a NP-hard problem in general . In Ref. , Goldfarb and Qin proposed the Higher-order RPCA (robust tensor recovery) through replacing the rank by a convex surrogate Tucker-rank , and by to make the above problem tractable. In the singleton model, the tensor rank regularization term is defined as the sum of all the nuclear norms of the mode- unfoldings, i.e., . With this relaxation, our proposed IPT model with random noise assumption can be solved by minimizing the following convex problem:
is a weighting parameter that controls the global trade-off between the background patch-tensor and the target patch-tensor. Larger can shrink those non-target but sparse components to zeros in the target patch-tensor. Nevertheless, it will also over-shrink the dim target which should be preserved. On the contrary, a smaller does retain the dim target, but it keeps the strong cloud edges as well. Therefore, adopting a global constant weighting parameter is not a reasonable scheme for detecting the infrared small target in a complex scene. Naturally, it motivates us to design an entry-wise weighting scheme.
Iii Incorporating local structure prior
In this section, we focus our emphasis on combining the local structure prior and non-local correlation prior together into our model. we construct a local structure weight and interpret it as an edge salience measure. For the sake of simplicity, the local structure weight is designed on the basis of the image structure tensor. Structure tensor is widely used in many partial differential equation (PDE) based methods[35, 36] to estimate the local structure information in the image, including edge orientation. To integrate the local information, the structure tensor is constructed based on a local regularization of a tensorial product, which is defined as
where is a Gaussian-smoothed version of a given image .
is the standard deviation of the Gaussian kernel which denotes the noise scale, making the edge detector ignorant of small details.
is a symmetric and positive semi-definite matrix, which has two orthonormal eigenvectors denoted asand , where . points to the maximum contrast direction of the geometry structure while points to the minimum direction 
. Their corresponding eigenvaluesand can be calculated via
These two values can be used as two feature descriptors of the local geometry structure, where at the flat region, ; at the edge region, ; at the corner region, . For the sake of low computational cost, we take as the edge awareness feature since its value of the edge pixel is much larger than that belongs to the flat region and corner. By applying eq. 8 and eq. 9 to every pixel in the input image , two matrices and can be obtained, which consists of the large and small structure tensor eigenvalues of all the pixels, respectively. Then we can transform and to their corresponding patch-tensors and . Finally, we define the local structure weight patch-tensor as follows
where is a weight stretching parameter, and are the maximum and minimum of , respectively. fig. 3 displays the edge awareness maps of fig. 5, which demonstrates that the structure tensor based local structure weight has a good performance in identifying the edges. It should be noticed that for the sake of displaying effect, fig. 3 is created via a normalized version of . In the proposed algorithm, we still calculate the local structure weight via eq. 10.
Iv Reweighted infrared patch-tensor model and its solution
Iv-a Reweighted infrared patch-tensor model
The computing time is also a major concern in infrared small target detection. Generally, the stopping criterion of a RPCA algorithm is that the reconstruction error is less than a certain small value. To meet this criterion, WIPT needs dozens of iterations, which is still time-consuming. An interesting phenomenon we find is that the non-zero entry number in the target patch-tensor has ceased to grow before the algorithm converges. In fact, in the target image, true target merely occupies the brightest of the few. In the second half of the iteration, their values barely change. Therefore, considering the particularity of infrared small target, it is reasonable to replace the reconstruction error with the target patch-tensor sparsity in our proposed model. The algorithm stops iteration once the non-zero entry ceases to grow. Then the sparsity of the target patch-tensor becomes critical in reducing the computing time. We hope the non-zero entries keep decreasing as the iteration goes on, leaving the final target image the sparsest. Unfortunately, the real situation is against our expectation in IPT and WIPT, where the target image deteriorates as the algorithm converges. Naturally, it motivates us to take a sparsity enhancing approach to solve this problem.
In Ref. , Candès proposed a reweighted minimization for enhancing sparsity. By minimizing a sequence of weighted norm, a significant performance improvement is obtained on sparse recovery. Inspired by it, many improved RPCA models have been proposed [39, 40, 41, 42]. Motivated by these state-of-the-art models, we adopt a similar reweighted scheme for the values in the target patch-tensor. The large weights discourage non-zero entries, and the small weights preserve non-zero elements. The sparsity enhancement weight is defined as follows
where is a preset constant to avoid division by zero. Then besides the relative error , we could add a new end condition that counts the non-zero entry element, namely . With the help of this empirical observation, the computing time could be largely decreased, as illustrated in fig. 10 and table IV.
Another intrinsic characteristic that both Ref.  and Ref.  neglect is the fact that the small target is always brighter than its neighborhood environment in infrared images due to the physical imaging mechanism . Therefore, besides the sparsity constraint [44, 45] of the target patch-tensor, it is reasonable to assume that all the entries in are non-negative. To this end, we incorporate this target non-negative prior into the reweighted IPT model via rewriting eq. 12 as follows
where is an indicator function. We combine the local structure weight and sparsity enhancing weight to get the adaptive weight as follows
Finally, we generalize the proposed IPT model and WIPT model to a novel reweighted infrared patch-tensor model (RIPT) as follows
Iv-B Solution of RIPT model
In this subsection, we show how to solve the proposed RIPT model as a reweighted robust tensor recovery problem via an Alternating Direction Method of Multipliers (ADMM) . The augmented Lagrangian function of eq. 15 is defined as
where are the Lagrange multiplier tensors, and is a positive penalty scalar. ADMM decomposes the minimization of into two subproblems that minimize and , respectively. More specifically, the iterations of ADMM go as follows:
Updating with the other terms fixed.
Updating with the other terms fixed.
Updating the multiplier with the other terms fixed.
From eq. 21, it could be observed that the weighting parameter determines the soft-threshold, controlling the trade-off between the target patch-tensor and background patch-tensor. Therefore, our element-wise adaptive weight tensor could simultaneously preserve the small target and suppress the strong edges. Finally, the solution of the proposed model is given in algorithm 1.
Iv-C Detection Procedure
In fig. 4, we present the whole procedure of detecting the infrared small target via the model proposed in this paper. The detailed steps are as follows:
Given an infrared image , its local structure feature map is calculated via eq. 9.
The original patch-tensor and local structure weight patch-tensor are constructed from and .
algorithm 1 is performed to decompose the patch-tensor into the background patch-tensor and target patch-tensor .
The background image and target image are reconstructed from the background patch-tensor and target patch-tensor , respectively. For the sake of implementation convenience, we adopt the uniform average of estimators (UAE) reprojection scheme .
The target is segmented out as the same as Ref. . The adaptive threshold is determined by:
where and are the average and standard deviation of the target image . and are constants determined empirically.
V Experimental validation
To fully evaluate the proposed algorithm, we conduct a series of experiments using images of various scenarios and include ten state-of-the-art methods for comparison.
V-a Experimental setup
Datasets. We test the proposed model on extensive real infrared images to cover different scenarios, as illustrated in fig. 5, varying from the flat backgrounds with salient targets to complex backgrounds with heavy clutters and extremely dim targets. All targets are labeled with red boxes. Since some targets are so dim that could hardly be observed by human eyes directly, we enlarge the demarcated area. Taking into account that the biggest difficulty of current infrared small target detection is how to detect those very dim targets with strong clutters, a good detection performance on those extremely complex images is more convincing than the satisfying result on relatively simple images. Therefore, in the following experiments, our main focus is put on the datasets with complex scenes that fig. 5(a) – (d) and (l) belong to. The detailed characteristics of these five sequences are presented in table II.
|# Frame||Image Resolution||Target Shape||Target characteristics||Background characteristics|
Baselines and Parameter settings. The proposed algorithm is compared with ten state-of-the-art solutions, including three filtering based methods (Max-Median , Top-Hat , TDLMS ), three HVS based methods (PFT , MPCM , WLDM ), and four recently developed low-rank methods (IPI , PRPCA , WIPI , NIPPS ). table III summarizes all the methods involved in the experiments and their detailed parameter settings. For all the low-rank methods, i.e. IPI, PRPCA, WIPI, NIPPS, IPT, and RPIT, they are all solved via ADMM. All the algorithms are implemented in MATLAB 2016b on a PC of 3.4 GHz and 4GB RAM. The source code of our method is publicly available at https://github.com/YimianDai/DENTIST.
|1||Max-Median filter ||Max-Median||Support size:|
|2||Top-Hat method ||Top-Hat||Structure shape: square, structure size:|
Phase spectrum of Fourier Transform
|4||Multiscale Patch-based Contrast Measure ||MPCM|
|5||Weighted Local Difference Measure ||WLDM|
|6||Two-Dimensional Least Mean Square filter ||TDLMS||Support size: , step size:|
|7||Infrared Patch-Image Model ||IPI||Patch size: , sliding step: , , ,|
|8||Patch-level RPCA method ||PRPCA||Patch size: , sliding step: , , ,|
|9||Weighted Infrared Patch-Tensor Model ||WIPI||Patch size: , sliding step: , smoothing parameter ,|
|10||Non-negative IPI model via Partial Sum minimization of singular values ||NIPPS||Patch size: , sliding step: , , ,|
|11||Infrared Patch-Tensor Model||IPT||Patch size: , sliding step: , ,|
|12||Reweighted Infrared Patch-Tensor Model||RIPT||Patch size: , sliding step: , , , , ,|
Evaluation Metrics. For a comprehensive evaluation, four metrics including the local signal to noise ratio gain (LSNRG), background suppression factor (BSF), signal to clutter ratio gain (SCRG), and receiver operating characteristic (ROC) curve are adopted in comparison of background suppressing performance. LSNRG measures the local signal to noise ratio (LSNR) gain, which is defined as
where and are the LSNR values before and after background suppression. LSNR is given as . and are the maximum grayscales of the target and neighborhood, respectively. BSF measures the background suppression quality using the standard deviation of the neighborhood region. It is defined as:
are the standard variances of background neighborhood before and after background suppression. The most widely used SCRG is defined as the ratio of signal-to-clutter ratio (SCR) before and after processing:
where SCR represents the difficulty of detecting the infrared small target, and it is defined by . is the average target grayscale. and are the average grayscale and standard deviation of the neighborhood region. For all these three metrics, the higher their values are, the better background suppression performance the detection method has. All three metrics are computed in a local region, as illustrated in fig. 6. The target size is , and is the neighborhood width. we set in this paper.
Among all the existing metrics, the detection probabilityand false-alarm rate are the key performance indicators, which are defined as follows
The ROC curve shows the trade-off between the true detections and false detections.
V-B Validation of the proposed method
In this subsection, we take a closer look at the proposed method by validating their robustness against various scenes and noisy cases. At last, the roles of the patch-tensor, local structure weight, and sparsity enhancement weight are examined in depth to evaluate each prior individually.
V-B1 Robustness to various scenes
In fig. 7, we show the separated target images for the images of fig. 5. Observing fig. 7, it can be clearly seen that the background clutters are completely wiped out, leaving the target the only sole component in the target image. Since fig. 5 contains a lot of different scenarios, it is fair to say that the proposed method is quite robust to various scenes.
V-B2 Robustness to noise
Noise is another key influence factor. In fig. 8, we evaluate the proposed method’s performance in the case of noise with different levels. When the noise standard deviation is , the proposed method could well enhance the targets and suppress the clutters and noise. As the noise standard deviation increases to , RIPT still detects the target in fig. 8(m) – (n) and (q) – (r), but fails in fig. 8(o) – (p). Nevertheless, this failure is acceptable, since the target is totally overwhelmed by the noise in fig. 8(o) – (p) (see the enlarged box). Therefore, the noise influence depends not only on the intensity of the noise itself but also on the original contrast of the target. As long as the polluted target can maintain a weak contrast like fig. 8(c) – (d), the proposed method is still able to detect it.
V-B3 Roles of components in the proposed model
To further understand the effects of the components in the proposed RIPT model, we evaluate each prior individually with experiment to investigate how these priors influence the final detection performance. The ROC curves of IPI, IPT, IPT with sparsity enhancement weight (SIPT), WIPT, and RIPT for Sequence 1 – 4 are given in fig. 9, leading to the following observations. (1) The four patch-tensor based methods outperform the state-of-the-art IPI method, which demonstrates that the patch-tensor model, involving mode- and mode- unfolding matrices, does contribute to the final detection performance. (2) By comparing WIPT and RIPT with IPT and SIPT, we see that incorporating local structure prior significantly improves the detection probability. (3) Although the sparsity enhancement weight does not contribute to the final detection performance, it significantly reduces the iteration number, as illustrated in fig. 10. These observations indicate that the introduced priors are effective, and, when combined together, lead to excellent performance as reported in the next subsections.
V-C Algorithm Complexity and Computational Time
The proposed model is solved via ADMM, which has been proved a convergence [51, 52]. Therefore, our solving algorithm is ensured to converge. The algorithm complexity and computational time for fig. 5(a) with various methods are given in table IV. The image size is , and are the rows and columns of the patch-image or mode- unfolding. Although the computational complexities of these methods seem the same, their computing time differs a lot. For the filtering and HVS based methods, the difference in computing time lies in whether the code could be vectorized. For the low-rank methods, the dominant factor is the iteration number. It can be seen from the data in table IV that the low-rank methods are generally slower than the filtering and HVS based methods. Nevertheless, considering low-rank method could handle more difficult scenes, this trade-off is acceptable. Among the low-rank methods, the RIPT costs the least time. The underlying reason is that both the local structure weight and sparsity enhancement weight help to reduce the iteration number. In addition, unlike the weight in WIPI, the time for constructing the weight is neglectable in RIPT.
V-D Parameters analysis
For the proposed model, the related parameters, such as the patch size, sliding step, weight stretching parameter , weighting parameter , and penalty factor , are all important factors, which usually affects the model fitness on the real databases. Therefore, a better performance can be obtained by finely tuning these parameters. Nevertheless, the optimal values are always related to the infrared image content. In fig. 11
, we give the ROC curves for different model parameters on Sequence 1 – 4 to evaluate their influence. These parameters are validated to obtain a local optimal value with other parameters fixed. The stepped shape of our ROC curves might seem a bit odd. It is because we have adopted a much larger weighting parameterthan normal RPCA-based foreground-background separation tasks in order to better fit the actual situation of single-frame infrared small target detection.
V-D1 Patch size
It not only has a large impact on the separation, but also influences the computational complexity. The matrix size of mode- and mode- unfoldings of the patch-tensor is ; the matrix size of mode- unfolding is . Obviously, a smaller patch size will lead to a smaller computational complexity. On one hand, we hope a larger patch size to ensure that the target is sparse enough. On the other hand, a larger patch size reduces the correlationships between the non-local patches, which degrades the separation results. To seek a balance between a low computational burden, target sparsity, and background correlationship, we vary the patch size from to with ten intervals and provide the ROC curves in the first row of fig. 11. By observing the ROC curves, we can have the following conclusions. Firstly, a smaller patch size is easier to raise false alarms while a larger patch size may lead to a relatively lower detection probability, which just demonstrates our above analysis about the patch size. Secondly, the proposed RIPT method is not very sensitive to the patch size. The detection result of the patch size among – is acceptable. Thirdly, seems a good choice for Sequence 1 – 4 since it achieves the best performance in ROC.
V-D2 Sliding step
The sliding step influences the patch-tensor size as well. To reduce the computational complexity, we prefer a larger sliding step, which means smaller matrices to perform SVD. Nevertheless, a larger sliding step also reduces the redundancy of the original patch-tensor and undermines the final detection results since our proposed model is based on the non-local repentance of correlated patches. To investigate its influence, we vary the sliding step from to with two intervals. The results are displayed in the second row of fig. 11. It could be observed that the ROC curve of small sliding step like tends to have a more sharp shape, but its overall detection probability remains relatively low. The best value for sliding step is among to , here we pick . In addition, by comparing the first row with the second row of fig. 11, we can conclude that the algorithm is quite robust to the variation of step length.
V-D3 Weight stretching parameter
It controls the local structure weight’s suppression degree to the clutter edges. We vary from to in the experiment and illustrate the ROC curves in the third row of fig. 11. Generally, we would like a larger which suppresses the undesirable non-target components thoroughly. Nevertheless, since the target-clutter distinguishing scheme is not always perfect, an overlarge would also wipe out some targets. A typical example is the different performance of or among four sequences. For Sequence 2 and 3, or achieves the best performance. But, they perform the worst for Sequence 1 and 4. It is because the target moves along the cloud edge in many frames of Sequence 1 and 4, and an overlarge would easily mistake the target as the edge and suppress it completely, resulting in a lower detection probability. On the contrary, a smaller might preserve the small target, but it also retains some non-target components, making the false-alarm ratio relatively high. For Sequence 2 and 3, when the detection probability is fixed, the false-alarm ratio of is the largest. In order to seek a balance, we set the optimal as in the following experiment.
V-D4 Weighting parameter
Despite the usage of local structure weight, fine tuning of is still of great importance. We show the effects of in the fourth row of fig. 11. Since is set as in our model, instead of directly varying , we vary from to . From the illustration, it can be observed that a large does keep the false-alarm ratio being quite low like. For example, the ROC curves of and for Sequence 2 are straight line segments. But their detection probabilities are also low because many dim targets are suppressed by the overlarge threshold. On the contrary, when the detection probability is fixed, the false-alarm ratio of is always higher than the others, suggesting that a too small is also not a good choice.
V-D5 Penalty factor
It is precisely the shrinking threshold of eq. 21, which influences the low-rank property of the background patch-tensor. With a smaller , more details are preserved in the background patch-tensor. Thus fewer non-target components are left in the target patch-tensor. Nevertheless, the small target might be preserved in the background patch-tensor as well, resulting no target in the target image. On the contrary, a larger would lead to more non-target components lying in the target patch-tensor. Thus, it is necessary to choose an appropriate value for to keep the balance between detection probability and false-alarm ratio. Since we set , instead of varying directly, we investigate the influence of the penalty factor on Sequence 1 – 4 by varying from to . The results are shown in the last row of fig. 11, from which we can observe that an overlarge or too small is not an optimal choice and the best value for our four sequences is about .
V-E Comparison with State-of-the-Arts
In this subsection, we first compare the proposed model with the other state-of-the-art methods on the ability of clutter suppression. fig. 12 – fig. 15 show the separated target images by twelve tested methods for four representative frames of Sequence 1 – 4 in fig. 5. It can be seen that the classical Max-Median filter does enhance the tiny targets in fig. 12(b) - fig. 15(b). Nevertheless, many non-target pixels are also enhanced simultaneously, especially in fig. 13(b) and fig. 14(b), which would raise many false alarms. In fig. 12(a) - fig. 15(a) produced by TDLMS, the phenomenon of enhancing non-target isolated points does not exist, but the cloud edges are highlighted, making them much brighter than the small target. Since the given target size matches the real target size exactly, the Top-Hat operator succeeds to enhance the target region. If not match, the Top-Hat operator is likely to lose the target. No matter whether the given and real target sizes match, Top-Hat cannot well suppress the background clutters. Many strong clutters still remain in resulting images, as illustrated in fig. 12(c) - fig. 15(c). Although PFT can retain the target to a certain extent, the target is not necessarily the brightest and there are also many non-target salient residuals, as shown in fig. 12(d) - fig. 15(d). MPCM and WLDM failed to achieve good results because they suffered from a phenomenon we named rare structure effect which is caused by the inaccuracy of the local dissimilarity measure and often happens when the target is extremely dim. In next subsection, we will further discuss this phenomenon.
In fact, the common and intrinsic reason behind the unsatisfactory performance of all these six methods lies in their pre-set assumption about the target shape, namely a hot spot brighter than its neighborhood. Nevertheless, when the target is too dim to maintain its significant contrast over non-target components, just like fig. 5(a) – (d), they might not perform as well as they usually do.
The last six tested methods are all low-rank based methods. Comparing with above six methods, their results contain fewer background details. Relatively speaking, the effects of PRPCA and WIPI are not as good as the other four methods. Different from the other low-rank based methods that all build their low-rank assumptions on the data structure composed of patches, PRPCA supposes the individual patch is low-rank. Thus in PRPCA, each patch is applied to an individual RPCA process. Then all the separated target patches are synthesized into a target image. By comparing fig. 13(g) and fig. 15(g) with fig. 13(h) and fig. 15(h) , it can be seen that fewer edges were left by IPI than PRPCA. It is because the rare structure in a patch is not necessarily rare in the patch-image due to the redundancy of the whole image. Therefore, the results of IPI and IPT are much better than those of PRPCA. As to WIPI, considering the targets in Sequence 1 – 4 is much dimmer than those in Ref. , it is fair to say that the steering kernel based patch-level weight is still not robust enough to handle all of the complex infrared backgrounds. From fig. 12(l) and fig. 15(l), we can see that with the help of the local structure weight, the non-target components were suppressed completely via our proposed model. For example, the cloud residuals in fig. 12(g) by IPI is brighter than its target, while in fig. 12(l), it is wiped out thoroughly. Based on above comparisons, it is fair to conclude that the proposed RIPT model achieves the most satisfying performance in infrared background suppression among twelve tested methods.
For infrared small target detection, the biggest difficulty is the interference of complex backgrounds. These undesirable background clutters raise the false alarm rates, and might even overwhelm the dim targets. Therefore, the ability of successfully suppressing the background clutters is a major concern in evaluating an infrared small target detection method. Quantitative evaluating indices are also widely used to assess this ability. table V shows the experimental data of all twelve tested methods for fig. 5(a) – (d). The gray-scale of every separated target image is rescaled to the range . It could be observed that our proposed method gets the highest scores among all indices and all tested images. Different from the filtering based methods, for the low-rank based methods, Inf, namely infinity, is quite common, which just means that the target neighborhood completely shrinks to zero. In addition, it should be noted that the high scores in these three quantitative indices merely reflect the good suppression performance in a local region, and not necessarily mean a good global suppression ability.
|th frame of Sequence||th frame of Sequence||th frame of Sequence||th frame of Sequence|
Different from the filtering based methods, for the low-rank based methods, Inf, namely infinity, is quite common, which just means that the target neighborhood completely shrinks to zero.
To further reveal the advantage of the proposed model, we display the ROC curves of Sequence 1 and Sequence 3 – 5 for comparison in fig. 16. The most interesting point is the performances of the state-of-the-art WLDM on Sequence 1, 3, 4 and Sequence 5 are very different. For Sequence 5, WLDM performs very well but fails in Sequence 1 – 4. We believe the reason lies in the rare structure effect which is a born problem for local contrast method. NIPPS’s performance is slightly better than the IPI model. Finally, the proposed algorithm achieves the highest detection probability for the same false-alarm ratio, meaning that the proposed RIPT model has a better performance than the other models.
To further suppress the strong edges while preserving the spatial correlation, a reweighted infrared patch-tensor model for small target detection is developed in this paper, simultaneously combining non-local redundant prior and local structure prior together. A local structure weight is designed based on the structure tensor and served as an edge indicator in the weighted model. In addition, a sparsity enhancement scheme is adopted to avoid the target image being contaminated. Then the target-background separation task is modeled as a reweighted robust tensor recovery problem, which can be efficiently solved via ADMM. Detailed experimental results show that our proposed model is robust to various scenarios and obtains the clearest separated target images compared with the state-of-the-art target-background separation methods.
The authors would like to thank the editor and anonymous reviewers for their helpful comments and suggestions. This work was supported in part by the National Natural Science Foundation of China under Grants no. 61573183, and Open Research Fund of Key Laboratory of Spectral Imaging Technology, Chinese Academy of Sciences under Grant no. LSIT201401.
-  S. Kim and J. Lee, “Scale invariant small target detection by optimizing signal-to-clutter ratio in heterogeneous background for infrared search and track,” Pattern Recognition, vol. 45, no. 1, pp. 393 – 406, 2012.
-  Z. Liu, F. Zhou, X. Chen, X. Bai, and C. Sun, “Iterative infrared ship target segmentation based on multiple features,” Pattern Recognition, vol. 47, no. 9, pp. 2839 – 2852, 2014.
-  X. Bai, Z. Chen, Y. Zhang, Z. Liu, and Y. Lu, “Infrared ship target segmentation based on spatial information improved fcm,” IEEE Transactions on Cybernetics, vol. 46, no. 12, pp. 3259–3271, Dec 2016.
-  I. S. Reed, R. M. Gagliardi, and L. B. Stotts, “Optical moving target detection with 3-d matched filtering,” IEEE Transactions on Aerospace and Electronic Systems, vol. 24, no. 4, pp. 327–336, July 1988.
-  M. Li, T. Zhang, W. Yang, and X. Sun, “Moving weak point target detection and estimation with three-dimensional double directional filter in ir cluttered background,” Optical Engineering, vol. 44, no. 10, pp. 107 007–107 007–4, 2005.
-  K. A. Melendez and J. W. Modestino, “Spatiotemporal multiscan adaptive matched filtering,” Proceedings of SPIE, vol. 2561, pp. 51–65, 1995.
-  J. Zheng, T. Su, H. Liu, G. Liao, Z. Liu, and Q. H. Liu, “Radar high-speed target detection based on the frequency-domain deramp-keystone transform,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 9, no. 1, pp. 285–294, Jan 2016.
-  X. Guo, J. Wu, Z. Wu, and B. Huang, “Parallel computation of aerial target reflection of background infrared radiation: Performance comparison of openmp, openacc, and cuda implementations,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 9, no. 4, pp. 1653–1662, April 2016.
-  M. Hadhoud and D. Thomas, “The two-dimensional adaptive LMS (TDLMS) algorithm,” IEEE Transactions on Circuits and Systems, vol. 35, no. 5, pp. 485–494, May 1988.
-  S. D. Deshpande, M. H. Er, R. Venkateswarlu, and P. Chan, “Max-mean and max-median filters for detection of small targets,” in SPIE’s International Symposium on Optical Science, Engineering, and Instrumentation, vol. 3809. International Society for Optics and Photonics, 1999, pp. 74–83.
-  T.-W. Bae, F. Zhang, and I.-S. Kweon, “Edge directional 2D LMS filter for infrared small target detection,” Infrared Physics & Technology, vol. 55, no. 1, pp. 137 – 145, 2012.
-  Y. Cao, R. Liu, and J. Yang, “Small target detection using two-dimensional least mean square (TDLMS) filter based on neighborhood analysis,” International Journal of Infrared and Millimeter Waves, vol. 29, no. 2, pp. 188–200, 2008.
-  X. Bai and F. Zhou, “Analysis of new top-hat transformation and the application for infrared dim small target detection,” Pattern Recognition, vol. 43, no. 6, pp. 2145 – 2156, 2010.
-  S. Kim, Y. Yang, J. Lee, and Y. Park, “Small target detection utilizing robust methods of the human visual system for IRST,” Journal of Infrared, Millimeter, and Terahertz Waves, vol. 30, no. 9, pp. 994–1011, 2009.
-  X. Wang, G. Lv, and L. Xu, “Infrared dim target detection based on visual attention,” Infrared Physics & Technology, vol. 55, no. 6, pp. 513 – 521, 2012.
-  S. Qi, J. Ma, C. Tao, C. Yang, and J. Tian, “A robust directional saliency-based method for infrared small-target detection under various complex backgrounds,” IEEE Geoscience and Remote Sensing Letters, vol. 10, no. 3, pp. 495–499, May 2013.
-  C. L. P. Chen, H. Li, Y. Wei, T. Xia, and Y. Y. Tang, “A local contrast method for small infrared target detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 52, no. 1, pp. 574–581, Jan 2014.
-  J. Han, Y. Ma, B. Zhou, F. Fan, K. Liang, and Y. Fang, “A robust infrared small target detection algorithm based on human visual system,” IEEE Geoscience and Remote Sensing Letters, vol. 11, no. 12, pp. 2168–2172, Dec 2014.
-  Y. Wei, X. You, and H. Li, “Multiscale patch-based contrast measure for small infrared target detection,” Pattern Recognition, vol. 58, pp. 216 – 226, 2016.
-  H. Deng, X. Sun, M. Liu, C. Ye, and X. Zhou, “Infrared small-target detection using multiscale gray difference weighted image entropy,” IEEE Transactions on Aerospace and Electronic Systems, vol. 52, no. 1, pp. 60–72, February 2016.
-  J. Han, Y. Ma, J. Huang, X. Mei, and J. Ma, “An infrared small target detecting algorithm based on human visual system,” IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 3, pp. 452–456, March 2016.
-  Y. Chen and Y. Xin, “An efficient infrared small target detection method based on visual contrast mechanism,” IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 7, pp. 962–966, July 2016.
-  H. Deng, X. Sun, M. Liu, C. Ye, and X. Zhou, “Small infrared target detection based on weighted local difference measure,” IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 7, pp. 4204–4214, July 2016.
-  H. Deng, X. Sun, M. Liu, and C. Ye, “Entropy-based window selection for detecting dim and small infrared targets,” Pattern Recognition, vol. 61, pp. 66 – 77, 2017.
-  M. Fornasier, H. Rauhut, and R. Ward, “Low-rank matrix recovery via iteratively reweighted least squares minimization,” SIAM Journal on Optimization, vol. 21, no. 4, pp. 1614–1640, 2011.
-  C. Gao, D. Meng, Y. Yang, Y. Wang, X. Zhou, and A. Hauptmann, “Infrared patch-image model for small target detection in a single image,” IEEE Transactions on Image Processing, vol. 22, no. 12, pp. 4996–5009, Dec 2013.
-  E. J. Candès, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?” Journal of the ACM, vol. 58, no. 3, pp. 11:1–11:37, Jun. 2011.
-  Y. He, M. Li, J. Zhang, and Q. An, “Small infrared target detection based on low-rank and sparse representation,” Infrared Physics & Technology, vol. 68, pp. 98 – 109, 2015.
-  Y. Dai, Y. Wu, and Y. Song, “Infrared small target and background separation via column-wise weighted robust principal component analysis,” Infrared Physics & Technology, vol. 77, pp. 421 – 430, 2016.
-  Y. Dai, Y. Wu, Y. Song, and J. Guo, “Non-negative infrared patch-image model: Robust target-background separation via partial sum minimization of singular values,” Infrared Physics & Technology, vol. 81, pp. 182 – 194, 2017. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1350449516303723
-  A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183–202, 2009.
-  J.-F. Cai, E. J. Candès, and Z. Shen, “A singular value thresholding algorithm for matrix completion,” SIAM Journal on Optimization, vol. 20, no. 4, pp. 1956–1982, 2010.
-  J. Håstad, “Tensor rank is np-complete,” Journal of Algorithms, vol. 11, no. 4, pp. 644 – 654, 1990.
-  D. Goldfarb and Z. T. Qin, “Robust low-rank tensor recovery: Models and algorithms,” SIAM Journal on Matrix Analysis and Applications, vol. 35, no. 1, pp. 225–253, 2014.
J. Weickert, “Coherence-enhancing diffusion filtering,”
International Journal of Computer Vision, vol. 31, no. 2-3, pp. 111–127, 1999.
-  Z. Wu, Q. Wang, J. Jin, and Y. Shen, “Structure tensor total variation-regularized weighted nuclear norm minimization for hyperspectral image mixed denoising,” Signal Processing, vol. 131, pp. 202 – 219, 2017.
-  W.-Z. Shao and Z.-H. Wei, “Research on filtering behavior of structure tensor based image modeling approaches,” Tien Tzu Hsueh Pao/Acta Electronica Sinica, vol. 39, no. 7, pp. 1556–1562, 2011, 施引文献 3.
-  E. J. Candès, M. B. Wakin, and S. P. Boyd, “Enhancing sparsity by reweighted ℓ 1 minimization,” Journal of Fourier Analysis and Applications, vol. 14, no. 5, pp. 877–905, 2008.
-  Y. Peng, J. Suo, Q. Dai, and W. Xu, “Reweighted low-rank matrix recovery and its application in image restoration,” IEEE Transactions on Cybernetics, vol. 44, no. 12, pp. 2418–2430, Dec 2014.
-  Y. Xie, S. Gu, Y. Liu, W. Zuo, W. Zhang, and L. Zhang, “Weighted schatten p -norm minimization for image denoising and background subtraction,” IEEE Transactions on Image Processing, vol. 25, no. 10, pp. 4842–4857, Oct 2016.
-  C. Lu, J. Tang, S. Yan, and Z. Lin, “Nonconvex nonsmooth low rank minimization via iteratively reweighted nuclear norm,” IEEE Transactions on Image Processing, vol. 25, no. 2, pp. 829–839, Feb 2016.
-  S. Gu, Q. Xie, D. Meng, W. Zuo, X. Feng, and L. Zhang, “Weighted nuclear norm minimization and its applications to low level vision,” International Journal of Computer Vision, pp. 1–26, 2016.
-  C. Gao, Y. Du, J. Liu, J. Lv, L. Yang, D. Meng, and A. G. Hauptmann, “Infar dataset: Infrared action recognition at different times,” Neurocomputing, vol. 212, pp. 36 – 47, 2016, chinese Conference on Computer Vision 2015 (CCCV 2015).
J. Chen, L. Jiao, W. Ma, and H. Liu, “Unsupervised high-level feature extraction of sar imagery with structured sparsity priors and incremental dictionary learning,”IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 10, pp. 1467–1471, Oct 2016.
J. Chen, L. Jiao, and Z. Wen, “High-level feature selection with dictionary learning for unsupervised sar imagery terrain classification,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 10, no. 1, pp. 145–160, Jan 2017.
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed
optimization and statistical learning via the alternating direction method of
Foundations and Trends® in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011.
-  J. Salmon and Y. Strozecki, “Patch reprojections for non-local methods,” Signal Processing, vol. 92, no. 2, pp. 477 – 489, 2012.
-  J. Rivest and R. Fortin, “Detection of dim targets in digital infrared imagery by morphological image processing,” Optical Engineering, vol. 35, no. 7, pp. 1886–1893, 1996.
-  C. Guo, Q. Ma, and L. Zhang, “Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform,” in 2008 IEEE Conference on Computer Vision and Pattern Recognition, June 2008, pp. 1–8.
-  C. Wang and S. Qin, “Adaptive detection method of infrared small target based on target-background separation via robust principal component analysis,” Infrared Physics & Technology, vol. 69, pp. 123 – 135, 2015.
-  B. He and X. Yuan, “On the $o(1/n)$ convergence rate of the douglas–rachford alternating direction method,” SIAM Journal on Numerical Analysis, vol. 50, no. 2, pp. 700–709, 2012.
-  Z. Wen, B. Hou, and L. Jiao, “Discriminative dictionary learning with two-level low rank and group sparse decomposition for image classification,” IEEE Transactions on Cybernetics, vol. PP, no. 99, pp. 1–14, 2016.