I Introduction
Infrared small target detection is a key technique for many applications, including earlywarning system, precision guided weapon, missile tracking system, and maritime surveillance system [1, 2, 3]. Traditional sequential detection methods, such as 3D matched filter [4], improved 3D filter [5], and multiscan adaptive matched filter [6], are workable in the case of static background, exploiting the target spatialtemporal information. Nevertheless, with the recent fast development of highspeed aircrafts [7] like antiship missiles, the imaging backgrounds generally change quickly due to rapid relative motion between the imaging sensor and the target. The performance of the spatialtemporal detection method degrades rapidly. Therefore, the research of singleframe infrared small target detection is of great importance and has attracted a lot of attention in recent years.
Different from general object or saliency detection tasks, the main challenge of infrared small target detection is lacking enough information. Due to the long imaging distance, the target is always small without any other texture or shape features. As the target type, imaging distance, and neighboring environment differ a lot in real scenes, the target brightness could vary from extremely dim to very bright (see fig. 5 for example). In the absence of spatialtemporal information and the target features like shape and size, the characteristics of the background [8] and the relation between the background and target are very important priors for singleframe infrared small target detection. Thus how to design a model to incorporate and exploit these priors is vital for infrared small target detection in a single image.
Ia Prior work on singleframe infrared small target detection
The previously proposed singleframe infrared small target detection methods could be roughly classified into two categories. In the first type, a local background consistent prior is exploited, assuming the background is slowly transitional and nearby pixels are highly correlated. As a result, the target is viewed as the one that breaks this local correlation. Under this assumption, the classical methods, such as twodimensional least mean square (TDLMS) filter
[9] and MaxMedian filter [10], enhance the small target by subtracting the predicted background from the original image. Unfortunately, besides the targets, they enhance the edges of the skysea surface or heavy cloud clutter as well, since these structures also break the background consistency as the target does. To differentiate the real target and highfrequency change, some edge analysis approaches [11, 12]have been proposed to extend these methods to estimate the edge direction in advance and preserve the edges. Bai et al.
[13] designed a new TopHat transformation using two different but correlated structuring elements. Another class of local prior based methods exploits the local contrast, which is computed by comparing a pixel or a region only with its neighbors. The seminal work of Laplacian of Gaussian (LoG) filter based method [14] has motivated a broad range of studies on the Human Visual System (HVS), and has led to a series of HVS based methods, e.g., Difference of Gaussians (DoG) [15], secondorder directional derivative (SODD) filter [16], local contrast measure (LCM) [17], improved local contrast measure (ILCM) [18], multiscale patchbased contrast measure (MPCM) [19], multiscale gray difference weighted image entropy [20], improved difference of Gabors (IDoGb) [21], local saliency map (LSM) [22], weighted local difference measure (WLDM) [23], local difference measure (LDM) [24], etc.The second type of singleframe infrared small target detection methods which has not been explored extensively, exploits the nonlocal selfcorrelation property of background patches, assuming that all background patches come from a single subspace or a mixture of lowrank subspace clusters. Then, targetbackground separation can be realized with the lowrank matrix recovery [25]
. Essentially, this type of methods attempts to model the infrared small target as an outlier in the input data. To this end, Gao et al.
[26] generalized the traditional infrared image model to a new infrared patchimage model via local patch construction. Then the targetbackground separation problem is reformulated as a robust principal component analysis (RPCA) [27] problem of recovering lowrank and sparse matrices, achieving a stateoftheart background suppressing performance. To correctly detect the small target located in a highly heterogeneous background, He et al. [28] proposed a lowrank and sparse representation model under the multisubspacecluster assumption.IB Motivation
Existing methods detect the infrared small target by either utilizing the local pixel correlation or exploiting the nonlocal patch selfcorrelation. From our observation, the unsatisfying performance of local prior methods [20, 23] in detecting the dim target under complicated background mainly lies in their imperfect grayscale based centerdifference measures. The saliency of a dim but true target would be easily overwhelmed by the measured saliency of some rare structures. We call this phenomenon the rare structure effect. In contrast, the nonlocal prior methods [26, 29, 30] suffer from the salient edge residuals. Its intrinsic reason is because the strong edge is also a sparse component as the same as the target due to lack of sufficient similar samples. Since the target might be dimmer than the edge, they would simultaneously be wiped out by simply increasing the threshold.
Our key observation is that the nonlocal prior and local prior are not equivalent, and in fact they are complementary for the problem of infrared small target detection, as illustrated in fig. 1. The often appearing false alarm components in local (nonlocal) prior methods could be well suppressed by the nonlocal (local) prior methods. For example, the stubborn strong edges in the nonlocal prior based methods, can be easily identified by the local edge analysis approaches. Naturally, an intuitive way to solve abovementioned dilemma is to extract the local structure information and merge it into the nonlocal prior based detection framework. Therefore, how to simultaneously and appropriately utilize both the local and nonlocal priors has been an important issue for improving the detection performance under very complex backgrounds. To the best of our knowledge, very few singleframe infrared small target detection methods concern this problem.
To address this issue, we propose a singleframe small target detection framework with reweighted infrared patchtensor model (RIPT). Our main contributions consist of the following three folds:

To dig out more information from the nonlocal selfcorrelationship in patch space, we generalize the patchimage model to a novel infrared patchtensor model (IPT) and formulate the targetbackground separation task as an optimization problem of recovering lowrank and sparse tensors.

To incorporate the local structure prior into the IPT model, an elementwise weight is designed based on structure tensor, which helps to suppress the remaining edges and preserve the dim target simultaneously.

To reduce the computing time, we adopt a reweighted scheme to enhance the sparsity of the target patchtensor. Considering the particularity of infrared small target detection, an additional stopping criterion is used to avoid excessive computation.
The proposed RIPT model is validated on different real infrared image datasets. Compared with the stateoftheart methods, our proposed model achieves a better background suppressing and target detection performance.
The remaining of this paper is organized as follows. We propose the nonlocal correlation based IPT model in Section II. The details of the local structure weight construction are described in Section III. In Section IV, we further propose the reweighted IPT model and its detailed optimization scheme is also provided. Section V presents detailed experimental results and discussion. Finally we conclude this paper in Section VI.
Ii Nonlocal correlation driven infrared patchtensor model
To dig out more spatial correlationships, we develop a novel targetbackground separation framework named infrared patchtensor model in this section. Before describing the details, several mathematical notations are defined first in table I.
Notation  Explanation 

tensor, matrix, vector, scalar. 

mode matricization of tensor , obtained by arranging the mode fibers as the columns of the resulting matrix of size .  
vectorization of tensor .  
inner product of tensor and , which is defined as .  
norm of tensor which counts the number of nonzero elements.  
norm of tensor .  
Frobenius norm of tensor , which is defined by .  
returns tensor that .  
nuclear norm of matrix , which is defined by , where the SVD of .  
elementwise shrinkage operator is defined as . is the closedform solution of the problem: [31].  
matrix singular value thresholding operator: , where is the SVD of and . is the closedform solution of the problem: [32]. 

. 
Given an infrared image, it could be modeled as a linear superposition of target image, background image and noise image:
(1) 
where , , , and represent the input image, background image, target image, and noise image, respectively. Via a window sliding from the top left to the bottom right over an image, we stack the patches into a 3D cube (see the construction step in fig. 4). Then eq. 1 is transferred to the patch space with spatial structure preserved:
(2) 
where are the input patchtensor, background patchtensor, and target patchtensor, respectively. and are the patch height and width, is the patch number.
Background patchtensor . Generally, the background is considered as slowly transitional, which implies high correlations among both the local and nonlocal patches in the image, as illustrated in fig. 2(a). Although patches locate in the different region of the image, they are equivalent. Based on this nonlocal correlationship, the stateoftheart IPI model imposed the lowrank constraint to background patchimage. As a patchimage is the mode unfolding matrix of a patchtensor, the patchimage model could be viewed as a special case of the proposed patchtensor model essentially. Since the main difficulty of detecting the infrared small target in a single image is lacking enough information, only considering the lowrank structure in one unfolding is insufficient to deal with highly difficult scenes. Naturally, it motivates us to think whether we can utilize the other two unfolding modes. Actually, the mode and mode unfolding matrices of the infrared patchtensor are also lowrank. In fig. 2(b) – (d), the singular values of all the unfolding matrices rapidly decrease to zero, which demonstrates that every unfolding mode of the background patchtensor is intrinsically lowrank. Therefore, we can consider the background patchtensor as a lowrank tensor, and their unfolding matrices are also all lowrank defined as:
(3) 
where , , and all are constants, representing the complexity of the background image. The larger their values are, the more complex the background is.
Target patchtensor . Since the small target merely occupies several pixels in the whole image, the target patchtensor is an extremely sparse tensor in fact. That is:
(4) 
where is a small integer determined by the number and size of the small target.
Noise patchtensor . In this paper, we just assume that the noise is additive white Gaussian noise and for some . Thus we have
(5) 
It should be noted that although the values of parameters are different depending on the images, we would not directly use them in the following sections.
Ideally, we would like to obtain a lowrank and sparse decomposition and solve the following problem:
(6) 
Unfortunately, the rank computation of a given tensor is a NPhard problem in general [33]. In Ref. [34], Goldfarb and Qin proposed the Higherorder RPCA (robust tensor recovery) through replacing the rank by a convex surrogate Tuckerrank , and by to make the above problem tractable. In the singleton model, the tensor rank regularization term is defined as the sum of all the nuclear norms of the mode unfoldings, i.e., . With this relaxation, our proposed IPT model with random noise assumption can be solved by minimizing the following convex problem:
(7) 
is a weighting parameter that controls the global tradeoff between the background patchtensor and the target patchtensor. Larger can shrink those nontarget but sparse components to zeros in the target patchtensor. Nevertheless, it will also overshrink the dim target which should be preserved. On the contrary, a smaller does retain the dim target, but it keeps the strong cloud edges as well. Therefore, adopting a global constant weighting parameter is not a reasonable scheme for detecting the infrared small target in a complex scene. Naturally, it motivates us to design an entrywise weighting scheme.
Iii Incorporating local structure prior
In this section, we focus our emphasis on combining the local structure prior and nonlocal correlation prior together into our model. we construct a local structure weight and interpret it as an edge salience measure. For the sake of simplicity, the local structure weight is designed on the basis of the image structure tensor. Structure tensor is widely used in many partial differential equation (PDE) based methods
[35, 36] to estimate the local structure information in the image, including edge orientation. To integrate the local information, the structure tensor is constructed based on a local regularization of a tensorial product, which is defined as(8) 
where is a Gaussiansmoothed version of a given image .
is the standard deviation of the Gaussian kernel which denotes the noise scale, making the edge detector ignorant of small details.
is a symmetric and positive semidefinite matrix, which has two orthonormal eigenvectors denoted as
and , where . points to the maximum contrast direction of the geometry structure while points to the minimum direction [37]. Their corresponding eigenvalues
and can be calculated via(9) 
These two values can be used as two feature descriptors of the local geometry structure, where at the flat region, ; at the edge region, ; at the corner region, . For the sake of low computational cost, we take as the edge awareness feature since its value of the edge pixel is much larger than that belongs to the flat region and corner. By applying eq. 8 and eq. 9 to every pixel in the input image , two matrices and can be obtained, which consists of the large and small structure tensor eigenvalues of all the pixels, respectively. Then we can transform and to their corresponding patchtensors and . Finally, we define the local structure weight patchtensor as follows
(10) 
where is a weight stretching parameter, and are the maximum and minimum of , respectively. fig. 3 displays the edge awareness maps of fig. 5, which demonstrates that the structure tensor based local structure weight has a good performance in identifying the edges. It should be noticed that for the sake of displaying effect, fig. 3 is created via a normalized version of . In the proposed algorithm, we still calculate the local structure weight via eq. 10.
Iv Reweighted infrared patchtensor model and its solution
Iva Reweighted infrared patchtensor model
The computing time is also a major concern in infrared small target detection. Generally, the stopping criterion of a RPCA algorithm is that the reconstruction error is less than a certain small value. To meet this criterion, WIPT needs dozens of iterations, which is still timeconsuming. An interesting phenomenon we find is that the nonzero entry number in the target patchtensor has ceased to grow before the algorithm converges. In fact, in the target image, true target merely occupies the brightest of the few. In the second half of the iteration, their values barely change. Therefore, considering the particularity of infrared small target, it is reasonable to replace the reconstruction error with the target patchtensor sparsity in our proposed model. The algorithm stops iteration once the nonzero entry ceases to grow. Then the sparsity of the target patchtensor becomes critical in reducing the computing time. We hope the nonzero entries keep decreasing as the iteration goes on, leaving the final target image the sparsest. Unfortunately, the real situation is against our expectation in IPT and WIPT, where the target image deteriorates as the algorithm converges. Naturally, it motivates us to take a sparsity enhancing approach to solve this problem.
In Ref. [38], Candès proposed a reweighted minimization for enhancing sparsity. By minimizing a sequence of weighted norm, a significant performance improvement is obtained on sparse recovery. Inspired by it, many improved RPCA models have been proposed [39, 40, 41, 42]. Motivated by these stateoftheart models, we adopt a similar reweighted scheme for the values in the target patchtensor. The large weights discourage nonzero entries, and the small weights preserve nonzero elements. The sparsity enhancement weight is defined as follows
(12) 
where is a preset constant to avoid division by zero. Then besides the relative error , we could add a new end condition that counts the nonzero entry element, namely . With the help of this empirical observation, the computing time could be largely decreased, as illustrated in fig. 10 and table IV.
Another intrinsic characteristic that both Ref. [26] and Ref. [29] neglect is the fact that the small target is always brighter than its neighborhood environment in infrared images due to the physical imaging mechanism [43]. Therefore, besides the sparsity constraint [44, 45] of the target patchtensor, it is reasonable to assume that all the entries in are nonnegative. To this end, we incorporate this target nonnegative prior into the reweighted IPT model via rewriting eq. 12 as follows
(13) 
where is an indicator function. We combine the local structure weight and sparsity enhancing weight to get the adaptive weight as follows
(14) 
Finally, we generalize the proposed IPT model and WIPT model to a novel reweighted infrared patchtensor model (RIPT) as follows
(15) 
IvB Solution of RIPT model
In this subsection, we show how to solve the proposed RIPT model as a reweighted robust tensor recovery problem via an Alternating Direction Method of Multipliers (ADMM) [46]. The augmented Lagrangian function of eq. 15 is defined as
(16) 
where are the Lagrange multiplier tensors, and is a positive penalty scalar. ADMM decomposes the minimization of into two subproblems that minimize and , respectively. More specifically, the iterations of ADMM go as follows:
Updating with the other terms fixed.
(17) 
Updating with the other terms fixed.
(18) 
Updating the multiplier with the other terms fixed.
(19) 
The subproblems (17) and (18) can be solved via the following two operators, respectively.
(20)  
(21) 
From eq. 21, it could be observed that the weighting parameter determines the softthreshold, controlling the tradeoff between the target patchtensor and background patchtensor. Therefore, our elementwise adaptive weight tensor could simultaneously preserve the small target and suppress the strong edges. Finally, the solution of the proposed model is given in algorithm 1.
IvC Detection Procedure
In fig. 4, we present the whole procedure of detecting the infrared small target via the model proposed in this paper. The detailed steps are as follows:

Given an infrared image , its local structure feature map is calculated via eq. 9.

The original patchtensor and local structure weight patchtensor are constructed from and .

algorithm 1 is performed to decompose the patchtensor into the background patchtensor and target patchtensor .

The background image and target image are reconstructed from the background patchtensor and target patchtensor , respectively. For the sake of implementation convenience, we adopt the uniform average of estimators (UAE) reprojection scheme [47].

The target is segmented out as the same as Ref. [26]. The adaptive threshold is determined by:
(22) where and are the average and standard deviation of the target image . and are constants determined empirically.
V Experimental validation
To fully evaluate the proposed algorithm, we conduct a series of experiments using images of various scenarios and include ten stateoftheart methods for comparison.
Va Experimental setup
Datasets. We test the proposed model on extensive real infrared images to cover different scenarios, as illustrated in fig. 5, varying from the flat backgrounds with salient targets to complex backgrounds with heavy clutters and extremely dim targets. All targets are labeled with red boxes. Since some targets are so dim that could hardly be observed by human eyes directly, we enlarge the demarcated area. Taking into account that the biggest difficulty of current infrared small target detection is how to detect those very dim targets with strong clutters, a good detection performance on those extremely complex images is more convincing than the satisfying result on relatively simple images. Therefore, in the following experiments, our main focus is put on the datasets with complex scenes that fig. 5(a) – (d) and (l) belong to. The detailed characteristics of these five sequences are presented in table II.
# Frame  Image Resolution  Target Shape  Target characteristics  Background characteristics  

Sequence 14  400  Gaussian 



Sequence 5  30  Rectangular 


Baselines and Parameter settings. The proposed algorithm is compared with ten stateoftheart solutions, including three filtering based methods (MaxMedian [10], TopHat [48], TDLMS [9]), three HVS based methods (PFT [49], MPCM [19], WLDM [23]), and four recently developed lowrank methods (IPI [26], PRPCA [50], WIPI [29], NIPPS [30]). table III summarizes all the methods involved in the experiments and their detailed parameter settings. For all the lowrank methods, i.e. IPI, PRPCA, WIPI, NIPPS, IPT, and RPIT, they are all solved via ADMM. All the algorithms are implemented in MATLAB 2016b on a PC of 3.4 GHz and 4GB RAM. The source code of our method is publicly available at https://github.com/YimianDai/DENTIST.
No.  Methods  Acronyms  Parameter settings 

1  MaxMedian filter [10]  MaxMedian  Support size: 
2  TopHat method [48]  TopHat  Structure shape: square, structure size: 
3  Phase spectrum of Fourier Transform [49] 
PFT  Disk radius: 
4  Multiscale Patchbased Contrast Measure [19]  MPCM  
5  Weighted Local Difference Measure [23]  WLDM  
6  TwoDimensional Least Mean Square filter [9]  TDLMS  Support size: , step size: 
7  Infrared PatchImage Model [26]  IPI  Patch size: , sliding step: , , , 
8  Patchlevel RPCA method [50]  PRPCA  Patch size: , sliding step: , , , 
9  Weighted Infrared PatchTensor Model [29]  WIPI  Patch size: , sliding step: , smoothing parameter , 
10  Nonnegative IPI model via Partial Sum minimization of singular values [30]  NIPPS  Patch size: , sliding step: , , , 
11  Infrared PatchTensor Model  IPT  Patch size: , sliding step: , , 
12  Reweighted Infrared PatchTensor Model  RIPT  Patch size: , sliding step: , , , , , 
Evaluation Metrics. For a comprehensive evaluation, four metrics including the local signal to noise ratio gain (LSNRG), background suppression factor (BSF), signal to clutter ratio gain (SCRG), and receiver operating characteristic (ROC) curve are adopted in comparison of background suppressing performance. LSNRG measures the local signal to noise ratio (LSNR) gain, which is defined as
(23) 
where and are the LSNR values before and after background suppression. LSNR is given as . and are the maximum grayscales of the target and neighborhood, respectively. BSF measures the background suppression quality using the standard deviation of the neighborhood region. It is defined as:
(24) 
where and
are the standard variances of background neighborhood before and after background suppression. The most widely used SCRG is defined as the ratio of signaltoclutter ratio (SCR) before and after processing:
(25) 
where SCR represents the difficulty of detecting the infrared small target, and it is defined by . is the average target grayscale. and are the average grayscale and standard deviation of the neighborhood region. For all these three metrics, the higher their values are, the better background suppression performance the detection method has. All three metrics are computed in a local region, as illustrated in fig. 6. The target size is , and is the neighborhood width. we set in this paper.
Among all the existing metrics, the detection probability
and falsealarm rate are the key performance indicators, which are defined as follows(26)  
(27) 
The ROC curve shows the tradeoff between the true detections and false detections.
VB Validation of the proposed method
In this subsection, we take a closer look at the proposed method by validating their robustness against various scenes and noisy cases. At last, the roles of the patchtensor, local structure weight, and sparsity enhancement weight are examined in depth to evaluate each prior individually.
VB1 Robustness to various scenes
In fig. 7, we show the separated target images for the images of fig. 5. Observing fig. 7, it can be clearly seen that the background clutters are completely wiped out, leaving the target the only sole component in the target image. Since fig. 5 contains a lot of different scenarios, it is fair to say that the proposed method is quite robust to various scenes.
VB2 Robustness to noise
Noise is another key influence factor. In fig. 8, we evaluate the proposed method’s performance in the case of noise with different levels. When the noise standard deviation is , the proposed method could well enhance the targets and suppress the clutters and noise. As the noise standard deviation increases to , RIPT still detects the target in fig. 8(m) – (n) and (q) – (r), but fails in fig. 8(o) – (p). Nevertheless, this failure is acceptable, since the target is totally overwhelmed by the noise in fig. 8(o) – (p) (see the enlarged box). Therefore, the noise influence depends not only on the intensity of the noise itself but also on the original contrast of the target. As long as the polluted target can maintain a weak contrast like fig. 8(c) – (d), the proposed method is still able to detect it.
VB3 Roles of components in the proposed model
To further understand the effects of the components in the proposed RIPT model, we evaluate each prior individually with experiment to investigate how these priors influence the final detection performance. The ROC curves of IPI, IPT, IPT with sparsity enhancement weight (SIPT), WIPT, and RIPT for Sequence 1 – 4 are given in fig. 9, leading to the following observations. (1) The four patchtensor based methods outperform the stateoftheart IPI method, which demonstrates that the patchtensor model, involving mode and mode unfolding matrices, does contribute to the final detection performance. (2) By comparing WIPT and RIPT with IPT and SIPT, we see that incorporating local structure prior significantly improves the detection probability. (3) Although the sparsity enhancement weight does not contribute to the final detection performance, it significantly reduces the iteration number, as illustrated in fig. 10. These observations indicate that the introduced priors are effective, and, when combined together, lead to excellent performance as reported in the next subsections.
VC Algorithm Complexity and Computational Time
The proposed model is solved via ADMM, which has been proved a convergence [51, 52]. Therefore, our solving algorithm is ensured to converge. The algorithm complexity and computational time for fig. 5(a) with various methods are given in table IV. The image size is , and are the rows and columns of the patchimage or mode unfolding. Although the computational complexities of these methods seem the same, their computing time differs a lot. For the filtering and HVS based methods, the difference in computing time lies in whether the code could be vectorized. For the lowrank methods, the dominant factor is the iteration number. It can be seen from the data in table IV that the lowrank methods are generally slower than the filtering and HVS based methods. Nevertheless, considering lowrank method could handle more difficult scenes, this tradeoff is acceptable. Among the lowrank methods, the RIPT costs the least time. The underlying reason is that both the local structure weight and sparsity enhancement weight help to reduce the iteration number. In addition, unlike the weight in WIPI, the time for constructing the weight is neglectable in RIPT.
TDLMS  PFT  MPCM  WLDM  IPI  WIPI  NIPPS  IPT  WIPT  RIPT  

Complexity  
Time (s)  0.162  0.025  0.083  6.059  16.998  52.995  15.515  8.598  6.932  3.169 
VD Parameters analysis
For the proposed model, the related parameters, such as the patch size, sliding step, weight stretching parameter , weighting parameter , and penalty factor , are all important factors, which usually affects the model fitness on the real databases. Therefore, a better performance can be obtained by finely tuning these parameters. Nevertheless, the optimal values are always related to the infrared image content. In fig. 11
, we give the ROC curves for different model parameters on Sequence 1 – 4 to evaluate their influence. These parameters are validated to obtain a local optimal value with other parameters fixed. The stepped shape of our ROC curves might seem a bit odd. It is because we have adopted a much larger weighting parameter
than normal RPCAbased foregroundbackground separation tasks in order to better fit the actual situation of singleframe infrared small target detection.VD1 Patch size
It not only has a large impact on the separation, but also influences the computational complexity. The matrix size of mode and mode unfoldings of the patchtensor is ; the matrix size of mode unfolding is . Obviously, a smaller patch size will lead to a smaller computational complexity. On one hand, we hope a larger patch size to ensure that the target is sparse enough. On the other hand, a larger patch size reduces the correlationships between the nonlocal patches, which degrades the separation results. To seek a balance between a low computational burden, target sparsity, and background correlationship, we vary the patch size from to with ten intervals and provide the ROC curves in the first row of fig. 11. By observing the ROC curves, we can have the following conclusions. Firstly, a smaller patch size is easier to raise false alarms while a larger patch size may lead to a relatively lower detection probability, which just demonstrates our above analysis about the patch size. Secondly, the proposed RIPT method is not very sensitive to the patch size. The detection result of the patch size among – is acceptable. Thirdly, seems a good choice for Sequence 1 – 4 since it achieves the best performance in ROC.
VD2 Sliding step
The sliding step influences the patchtensor size as well. To reduce the computational complexity, we prefer a larger sliding step, which means smaller matrices to perform SVD. Nevertheless, a larger sliding step also reduces the redundancy of the original patchtensor and undermines the final detection results since our proposed model is based on the nonlocal repentance of correlated patches. To investigate its influence, we vary the sliding step from to with two intervals. The results are displayed in the second row of fig. 11. It could be observed that the ROC curve of small sliding step like tends to have a more sharp shape, but its overall detection probability remains relatively low. The best value for sliding step is among to , here we pick . In addition, by comparing the first row with the second row of fig. 11, we can conclude that the algorithm is quite robust to the variation of step length.
VD3 Weight stretching parameter
It controls the local structure weight’s suppression degree to the clutter edges. We vary from to in the experiment and illustrate the ROC curves in the third row of fig. 11. Generally, we would like a larger which suppresses the undesirable nontarget components thoroughly. Nevertheless, since the targetclutter distinguishing scheme is not always perfect, an overlarge would also wipe out some targets. A typical example is the different performance of or among four sequences. For Sequence 2 and 3, or achieves the best performance. But, they perform the worst for Sequence 1 and 4. It is because the target moves along the cloud edge in many frames of Sequence 1 and 4, and an overlarge would easily mistake the target as the edge and suppress it completely, resulting in a lower detection probability. On the contrary, a smaller might preserve the small target, but it also retains some nontarget components, making the falsealarm ratio relatively high. For Sequence 2 and 3, when the detection probability is fixed, the falsealarm ratio of is the largest. In order to seek a balance, we set the optimal as in the following experiment.
VD4 Weighting parameter
Despite the usage of local structure weight, fine tuning of is still of great importance. We show the effects of in the fourth row of fig. 11. Since is set as in our model, instead of directly varying , we vary from to . From the illustration, it can be observed that a large does keep the falsealarm ratio being quite low like. For example, the ROC curves of and for Sequence 2 are straight line segments. But their detection probabilities are also low because many dim targets are suppressed by the overlarge threshold. On the contrary, when the detection probability is fixed, the falsealarm ratio of is always higher than the others, suggesting that a too small is also not a good choice.
VD5 Penalty factor
It is precisely the shrinking threshold of eq. 21, which influences the lowrank property of the background patchtensor. With a smaller , more details are preserved in the background patchtensor. Thus fewer nontarget components are left in the target patchtensor. Nevertheless, the small target might be preserved in the background patchtensor as well, resulting no target in the target image. On the contrary, a larger would lead to more nontarget components lying in the target patchtensor. Thus, it is necessary to choose an appropriate value for to keep the balance between detection probability and falsealarm ratio. Since we set , instead of varying directly, we investigate the influence of the penalty factor on Sequence 1 – 4 by varying from to . The results are shown in the last row of fig. 11, from which we can observe that an overlarge or too small is not an optimal choice and the best value for our four sequences is about .
VE Comparison with StateoftheArts
In this subsection, we first compare the proposed model with the other stateoftheart methods on the ability of clutter suppression. fig. 12 – fig. 15 show the separated target images by twelve tested methods for four representative frames of Sequence 1 – 4 in fig. 5. It can be seen that the classical MaxMedian filter does enhance the tiny targets in fig. 12(b)  fig. 15(b). Nevertheless, many nontarget pixels are also enhanced simultaneously, especially in fig. 13(b) and fig. 14(b), which would raise many false alarms. In fig. 12(a)  fig. 15(a) produced by TDLMS, the phenomenon of enhancing nontarget isolated points does not exist, but the cloud edges are highlighted, making them much brighter than the small target. Since the given target size matches the real target size exactly, the TopHat operator succeeds to enhance the target region. If not match, the TopHat operator is likely to lose the target. No matter whether the given and real target sizes match, TopHat cannot well suppress the background clutters. Many strong clutters still remain in resulting images, as illustrated in fig. 12(c)  fig. 15(c). Although PFT can retain the target to a certain extent, the target is not necessarily the brightest and there are also many nontarget salient residuals, as shown in fig. 12(d)  fig. 15(d). MPCM and WLDM failed to achieve good results because they suffered from a phenomenon we named rare structure effect which is caused by the inaccuracy of the local dissimilarity measure and often happens when the target is extremely dim. In next subsection, we will further discuss this phenomenon.
In fact, the common and intrinsic reason behind the unsatisfactory performance of all these six methods lies in their preset assumption about the target shape, namely a hot spot brighter than its neighborhood. Nevertheless, when the target is too dim to maintain its significant contrast over nontarget components, just like fig. 5(a) – (d), they might not perform as well as they usually do.
The last six tested methods are all lowrank based methods. Comparing with above six methods, their results contain fewer background details. Relatively speaking, the effects of PRPCA and WIPI are not as good as the other four methods. Different from the other lowrank based methods that all build their lowrank assumptions on the data structure composed of patches, PRPCA supposes the individual patch is lowrank. Thus in PRPCA, each patch is applied to an individual RPCA process. Then all the separated target patches are synthesized into a target image. By comparing fig. 13(g) and fig. 15(g) with fig. 13(h) and fig. 15(h) , it can be seen that fewer edges were left by IPI than PRPCA. It is because the rare structure in a patch is not necessarily rare in the patchimage due to the redundancy of the whole image. Therefore, the results of IPI and IPT are much better than those of PRPCA. As to WIPI, considering the targets in Sequence 1 – 4 is much dimmer than those in Ref. [29], it is fair to say that the steering kernel based patchlevel weight is still not robust enough to handle all of the complex infrared backgrounds. From fig. 12(l) and fig. 15(l), we can see that with the help of the local structure weight, the nontarget components were suppressed completely via our proposed model. For example, the cloud residuals in fig. 12(g) by IPI is brighter than its target, while in fig. 12(l), it is wiped out thoroughly. Based on above comparisons, it is fair to conclude that the proposed RIPT model achieves the most satisfying performance in infrared background suppression among twelve tested methods.
For infrared small target detection, the biggest difficulty is the interference of complex backgrounds. These undesirable background clutters raise the false alarm rates, and might even overwhelm the dim targets. Therefore, the ability of successfully suppressing the background clutters is a major concern in evaluating an infrared small target detection method. Quantitative evaluating indices are also widely used to assess this ability. table V shows the experimental data of all twelve tested methods for fig. 5(a) – (d). The grayscale of every separated target image is rescaled to the range . It could be observed that our proposed method gets the highest scores among all indices and all tested images. Different from the filtering based methods, for the lowrank based methods, Inf, namely infinity, is quite common, which just means that the target neighborhood completely shrinks to zero. In addition, it should be noted that the high scores in these three quantitative indices merely reflect the good suppression performance in a local region, and not necessarily mean a good global suppression ability.
th frame of Sequence  th frame of Sequence  th frame of Sequence  th frame of Sequence  

Method  LSNRG  SCRG  BSF  LSNRG  SCRG  BSF  LSNRG  SCRG  BSF  LSNRG  SCRG  BSF 
MaxMedian  5.49  10.69  12.10  1.87  5.46  7.37  2.96  6.21  11.27  7.54  9.81  16.66 
TopHat  3.47  13.47  12.34  2.10  12.55  8.08  3.10  9.48  11.24  4.04  22.13  21.06 
PFT  4.83  53.01  7.43  1.37  10.96  3.22  0.68  7.48  3.38  9.16  113.25  18.77 
MPCM  1.48  7.69  3.46  1.62  11.84  15.98  0.38  1.91  2.15  1.88  14.17  4.68 
WLDM  0.87  2.00  1.99  2.22  9.95  12.23  1.94  8.19  3.95  3.11  12.11  3.56 
TDLMS  1.36  3.44  3.53  1.76  4.27  3.38  2.61  4.39  4.49  1.99  4.77  4.50 
IPI  220.38  5215.82  19256.20  10.72  104.34  172.90  Inf  Inf  Inf  Inf  2788.19  4939.03 
PRPCA  5.17  382.58  20179.12  1.30  26.88  2628.42  1.68  31.85  1982.80  2.83  267.31  20966.41 
CWRPCA  2.67  36.77  602.45  4.62  40.59  58.23  7.69  98.57  201.89  52.92  441.65  2065.42 
NIPPS  15.95  315.08  670.65  3.89  66.99  81.05  20.59  343.06  735.19  87.92  2280.13  3103.00 
IPT  9.80  2096.70  87797.82  2.14  332.56  17488.27  3.22  Inf  Inf  2.86  Inf  Inf 
RIPT  Inf  Inf  Inf  Inf  Inf  Inf  Inf  Inf  Inf  Inf  Inf  Inf 

Different from the filtering based methods, for the lowrank based methods, Inf, namely infinity, is quite common, which just means that the target neighborhood completely shrinks to zero.
To further reveal the advantage of the proposed model, we display the ROC curves of Sequence 1 and Sequence 3 – 5 for comparison in fig. 16. The most interesting point is the performances of the stateoftheart WLDM on Sequence 1, 3, 4 and Sequence 5 are very different. For Sequence 5, WLDM performs very well but fails in Sequence 1 – 4. We believe the reason lies in the rare structure effect which is a born problem for local contrast method. NIPPS’s performance is slightly better than the IPI model. Finally, the proposed algorithm achieves the highest detection probability for the same falsealarm ratio, meaning that the proposed RIPT model has a better performance than the other models.
Vi Conclusion
To further suppress the strong edges while preserving the spatial correlation, a reweighted infrared patchtensor model for small target detection is developed in this paper, simultaneously combining nonlocal redundant prior and local structure prior together. A local structure weight is designed based on the structure tensor and served as an edge indicator in the weighted model. In addition, a sparsity enhancement scheme is adopted to avoid the target image being contaminated. Then the targetbackground separation task is modeled as a reweighted robust tensor recovery problem, which can be efficiently solved via ADMM. Detailed experimental results show that our proposed model is robust to various scenarios and obtains the clearest separated target images compared with the stateoftheart targetbackground separation methods.
Acknowledgments
The authors would like to thank the editor and anonymous reviewers for their helpful comments and suggestions. This work was supported in part by the National Natural Science Foundation of China under Grants no. 61573183, and Open Research Fund of Key Laboratory of Spectral Imaging Technology, Chinese Academy of Sciences under Grant no. LSIT201401.
References
 [1] S. Kim and J. Lee, “Scale invariant small target detection by optimizing signaltoclutter ratio in heterogeneous background for infrared search and track,” Pattern Recognition, vol. 45, no. 1, pp. 393 – 406, 2012.
 [2] Z. Liu, F. Zhou, X. Chen, X. Bai, and C. Sun, “Iterative infrared ship target segmentation based on multiple features,” Pattern Recognition, vol. 47, no. 9, pp. 2839 – 2852, 2014.
 [3] X. Bai, Z. Chen, Y. Zhang, Z. Liu, and Y. Lu, “Infrared ship target segmentation based on spatial information improved fcm,” IEEE Transactions on Cybernetics, vol. 46, no. 12, pp. 3259–3271, Dec 2016.
 [4] I. S. Reed, R. M. Gagliardi, and L. B. Stotts, “Optical moving target detection with 3d matched filtering,” IEEE Transactions on Aerospace and Electronic Systems, vol. 24, no. 4, pp. 327–336, July 1988.
 [5] M. Li, T. Zhang, W. Yang, and X. Sun, “Moving weak point target detection and estimation with threedimensional double directional filter in ir cluttered background,” Optical Engineering, vol. 44, no. 10, pp. 107 007–107 007–4, 2005.
 [6] K. A. Melendez and J. W. Modestino, “Spatiotemporal multiscan adaptive matched filtering,” Proceedings of SPIE, vol. 2561, pp. 51–65, 1995.
 [7] J. Zheng, T. Su, H. Liu, G. Liao, Z. Liu, and Q. H. Liu, “Radar highspeed target detection based on the frequencydomain derampkeystone transform,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 9, no. 1, pp. 285–294, Jan 2016.
 [8] X. Guo, J. Wu, Z. Wu, and B. Huang, “Parallel computation of aerial target reflection of background infrared radiation: Performance comparison of openmp, openacc, and cuda implementations,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 9, no. 4, pp. 1653–1662, April 2016.
 [9] M. Hadhoud and D. Thomas, “The twodimensional adaptive LMS (TDLMS) algorithm,” IEEE Transactions on Circuits and Systems, vol. 35, no. 5, pp. 485–494, May 1988.
 [10] S. D. Deshpande, M. H. Er, R. Venkateswarlu, and P. Chan, “Maxmean and maxmedian filters for detection of small targets,” in SPIE’s International Symposium on Optical Science, Engineering, and Instrumentation, vol. 3809. International Society for Optics and Photonics, 1999, pp. 74–83.
 [11] T.W. Bae, F. Zhang, and I.S. Kweon, “Edge directional 2D LMS filter for infrared small target detection,” Infrared Physics & Technology, vol. 55, no. 1, pp. 137 – 145, 2012.
 [12] Y. Cao, R. Liu, and J. Yang, “Small target detection using twodimensional least mean square (TDLMS) filter based on neighborhood analysis,” International Journal of Infrared and Millimeter Waves, vol. 29, no. 2, pp. 188–200, 2008.
 [13] X. Bai and F. Zhou, “Analysis of new tophat transformation and the application for infrared dim small target detection,” Pattern Recognition, vol. 43, no. 6, pp. 2145 – 2156, 2010.
 [14] S. Kim, Y. Yang, J. Lee, and Y. Park, “Small target detection utilizing robust methods of the human visual system for IRST,” Journal of Infrared, Millimeter, and Terahertz Waves, vol. 30, no. 9, pp. 994–1011, 2009.
 [15] X. Wang, G. Lv, and L. Xu, “Infrared dim target detection based on visual attention,” Infrared Physics & Technology, vol. 55, no. 6, pp. 513 – 521, 2012.
 [16] S. Qi, J. Ma, C. Tao, C. Yang, and J. Tian, “A robust directional saliencybased method for infrared smalltarget detection under various complex backgrounds,” IEEE Geoscience and Remote Sensing Letters, vol. 10, no. 3, pp. 495–499, May 2013.
 [17] C. L. P. Chen, H. Li, Y. Wei, T. Xia, and Y. Y. Tang, “A local contrast method for small infrared target detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 52, no. 1, pp. 574–581, Jan 2014.
 [18] J. Han, Y. Ma, B. Zhou, F. Fan, K. Liang, and Y. Fang, “A robust infrared small target detection algorithm based on human visual system,” IEEE Geoscience and Remote Sensing Letters, vol. 11, no. 12, pp. 2168–2172, Dec 2014.
 [19] Y. Wei, X. You, and H. Li, “Multiscale patchbased contrast measure for small infrared target detection,” Pattern Recognition, vol. 58, pp. 216 – 226, 2016.
 [20] H. Deng, X. Sun, M. Liu, C. Ye, and X. Zhou, “Infrared smalltarget detection using multiscale gray difference weighted image entropy,” IEEE Transactions on Aerospace and Electronic Systems, vol. 52, no. 1, pp. 60–72, February 2016.
 [21] J. Han, Y. Ma, J. Huang, X. Mei, and J. Ma, “An infrared small target detecting algorithm based on human visual system,” IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 3, pp. 452–456, March 2016.
 [22] Y. Chen and Y. Xin, “An efficient infrared small target detection method based on visual contrast mechanism,” IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 7, pp. 962–966, July 2016.
 [23] H. Deng, X. Sun, M. Liu, C. Ye, and X. Zhou, “Small infrared target detection based on weighted local difference measure,” IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 7, pp. 4204–4214, July 2016.
 [24] H. Deng, X. Sun, M. Liu, and C. Ye, “Entropybased window selection for detecting dim and small infrared targets,” Pattern Recognition, vol. 61, pp. 66 – 77, 2017.
 [25] M. Fornasier, H. Rauhut, and R. Ward, “Lowrank matrix recovery via iteratively reweighted least squares minimization,” SIAM Journal on Optimization, vol. 21, no. 4, pp. 1614–1640, 2011.
 [26] C. Gao, D. Meng, Y. Yang, Y. Wang, X. Zhou, and A. Hauptmann, “Infrared patchimage model for small target detection in a single image,” IEEE Transactions on Image Processing, vol. 22, no. 12, pp. 4996–5009, Dec 2013.
 [27] E. J. Candès, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?” Journal of the ACM, vol. 58, no. 3, pp. 11:1–11:37, Jun. 2011.
 [28] Y. He, M. Li, J. Zhang, and Q. An, “Small infrared target detection based on lowrank and sparse representation,” Infrared Physics & Technology, vol. 68, pp. 98 – 109, 2015.
 [29] Y. Dai, Y. Wu, and Y. Song, “Infrared small target and background separation via columnwise weighted robust principal component analysis,” Infrared Physics & Technology, vol. 77, pp. 421 – 430, 2016.
 [30] Y. Dai, Y. Wu, Y. Song, and J. Guo, “Nonnegative infrared patchimage model: Robust targetbackground separation via partial sum minimization of singular values,” Infrared Physics & Technology, vol. 81, pp. 182 – 194, 2017. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1350449516303723
 [31] A. Beck and M. Teboulle, “A fast iterative shrinkagethresholding algorithm for linear inverse problems,” SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183–202, 2009.
 [32] J.F. Cai, E. J. Candès, and Z. Shen, “A singular value thresholding algorithm for matrix completion,” SIAM Journal on Optimization, vol. 20, no. 4, pp. 1956–1982, 2010.
 [33] J. Håstad, “Tensor rank is npcomplete,” Journal of Algorithms, vol. 11, no. 4, pp. 644 – 654, 1990.
 [34] D. Goldfarb and Z. T. Qin, “Robust lowrank tensor recovery: Models and algorithms,” SIAM Journal on Matrix Analysis and Applications, vol. 35, no. 1, pp. 225–253, 2014.

[35]
J. Weickert, “Coherenceenhancing diffusion filtering,”
International Journal of Computer Vision
, vol. 31, no. 23, pp. 111–127, 1999.  [36] Z. Wu, Q. Wang, J. Jin, and Y. Shen, “Structure tensor total variationregularized weighted nuclear norm minimization for hyperspectral image mixed denoising,” Signal Processing, vol. 131, pp. 202 – 219, 2017.
 [37] W.Z. Shao and Z.H. Wei, “Research on filtering behavior of structure tensor based image modeling approaches,” Tien Tzu Hsueh Pao/Acta Electronica Sinica, vol. 39, no. 7, pp. 1556–1562, 2011, 施引文献 3.
 [38] E. J. Candès, M. B. Wakin, and S. P. Boyd, “Enhancing sparsity by reweighted ℓ 1 minimization,” Journal of Fourier Analysis and Applications, vol. 14, no. 5, pp. 877–905, 2008.
 [39] Y. Peng, J. Suo, Q. Dai, and W. Xu, “Reweighted lowrank matrix recovery and its application in image restoration,” IEEE Transactions on Cybernetics, vol. 44, no. 12, pp. 2418–2430, Dec 2014.
 [40] Y. Xie, S. Gu, Y. Liu, W. Zuo, W. Zhang, and L. Zhang, “Weighted schatten p norm minimization for image denoising and background subtraction,” IEEE Transactions on Image Processing, vol. 25, no. 10, pp. 4842–4857, Oct 2016.
 [41] C. Lu, J. Tang, S. Yan, and Z. Lin, “Nonconvex nonsmooth low rank minimization via iteratively reweighted nuclear norm,” IEEE Transactions on Image Processing, vol. 25, no. 2, pp. 829–839, Feb 2016.
 [42] S. Gu, Q. Xie, D. Meng, W. Zuo, X. Feng, and L. Zhang, “Weighted nuclear norm minimization and its applications to low level vision,” International Journal of Computer Vision, pp. 1–26, 2016.
 [43] C. Gao, Y. Du, J. Liu, J. Lv, L. Yang, D. Meng, and A. G. Hauptmann, “Infar dataset: Infrared action recognition at different times,” Neurocomputing, vol. 212, pp. 36 – 47, 2016, chinese Conference on Computer Vision 2015 (CCCV 2015).

[44]
J. Chen, L. Jiao, W. Ma, and H. Liu, “Unsupervised highlevel feature extraction of sar imagery with structured sparsity priors and incremental dictionary learning,”
IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 10, pp. 1467–1471, Oct 2016. 
[45]
J. Chen, L. Jiao, and Z. Wen, “Highlevel feature selection with dictionary learning for unsupervised sar imagery terrain classification,”
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 10, no. 1, pp. 145–160, Jan 2017. 
[46]
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed
optimization and statistical learning via the alternating direction method of
multipliers,”
Foundations and Trends® in Machine Learning
, vol. 3, no. 1, pp. 1–122, 2011.  [47] J. Salmon and Y. Strozecki, “Patch reprojections for nonlocal methods,” Signal Processing, vol. 92, no. 2, pp. 477 – 489, 2012.
 [48] J. Rivest and R. Fortin, “Detection of dim targets in digital infrared imagery by morphological image processing,” Optical Engineering, vol. 35, no. 7, pp. 1886–1893, 1996.
 [49] C. Guo, Q. Ma, and L. Zhang, “Spatiotemporal saliency detection using phase spectrum of quaternion fourier transform,” in 2008 IEEE Conference on Computer Vision and Pattern Recognition, June 2008, pp. 1–8.
 [50] C. Wang and S. Qin, “Adaptive detection method of infrared small target based on targetbackground separation via robust principal component analysis,” Infrared Physics & Technology, vol. 69, pp. 123 – 135, 2015.
 [51] B. He and X. Yuan, “On the $o(1/n)$ convergence rate of the douglas–rachford alternating direction method,” SIAM Journal on Numerical Analysis, vol. 50, no. 2, pp. 700–709, 2012.
 [52] Z. Wen, B. Hou, and L. Jiao, “Discriminative dictionary learning with twolevel low rank and group sparse decomposition for image classification,” IEEE Transactions on Cybernetics, vol. PP, no. 99, pp. 1–14, 2016.