## I Introduction

Infrared small target detection is a key technique for many military applications, such as early-warning systems, maritime surveillance system, precision guided weapon and missile [kim2012scale, liu2014iterative, bai2015infrared]. However, due to the long imaging distance of infrared detection systems, the targets always lack fixed shape or texture features. In addition, the targets are usually immersed in strong clutters and complex noises with low signal-to-clutter ratio (SCR). Therefore, research on infrared small target detection is of great significance and a challenging problem.

Infrared small target detection has been studied for decades, and various types of methods have been proposed. Among these methods, low-rank and sparse decomposition (LRSD) based methods are widely used because of its effectiveness. In infrared patch-tensor (IPT) model, sum of nuclear norm (SNN)[dai2017reweighted]

and tensor nuclear norm (TNN)

[zhang2019infrared1] are usually introduced as the convex surrogate for tensor multi rank. However, the solution of SNN is actually suboptimal since it transforms the tensor structure into three folding matrix along three dimension. Then, tensor nuclear norm (TNN) is proposed. Compared with SNN, TNN has better background estimation ability, but it assigns same weights for all singular values, which makes the model prone to false alarm and over-shrinkage problem. Further, the weighted nuclear norm minimization (WNNM) is introduced [sun2018infrared, sun2019infrared2]. However, WNNM can only alleviate the over-shrinkage problem to a certain extent. To further improve the accuracy of recovering background component, Sun et al. [sun2019infrared, sun2020infrared] utilized weighted Schatten -norm minimization (WSNM) . More recently, Kong et al. [kong2021infrared] used nonconvex tensor fibered rank approximation for more accurate background estimation. In summary, how to obtain accurate background estimation is a crucial problem for infrared small target detection.In this paper, we propose an infrared small target detection method via non-convex tensor low-rank approximation. Firstly, to approximate the rank function better than traditional nuclear norm, a non-convex surrogate via Laplace function has been proposed. The Laplace function can more tightly approximate to the norm than norm (see Fig.1). Meanwhile, it can adaptively assign weights to each singular value. Therefore, it is helpful to obtain more accurate background estimation. Additionally, considering that target is temporally consistant among successive frames and spatially smooth in local area. We exploit the ASTTV to thoroughly describe background feature, which can achieve good background detection in non-uniform and non-smooth scenes. Unlike TV which only considers spatial information and STTV which treats spatial TV and temporal TV operators equally, ASTTV upholds different smoothness strength for spatial TV and temporal TV, which helps to make better use of spatial-temporal information and is more flexible for target detection. Finally, asymmetric spatial-temporal total variation regularized non-convex tensor low-rank approximation for infrared small target detection is proposed, named the ASTTV-NTLA model. The pipeline of our proposed framework is illustrated in Fig. 2. The main contributions of this paper are summarized as follows.

(1) We propose a non-convex tensor low-rank approximation method for infrared small target detection. Different from existing low-rank methods, NTLA adaptively assigns different weights to different singular values through Laplace function, which helps to obtain an accurate background estimation.

(2) To capture both spatial and temporal information, the ASSTV regularization is incorporated into the LRSD method. This regularization can simultaneously preserve the spatial and temporal detail smoothness information. Therefore, it can achieve better performance in complex background scenarios.

(3) We develop an efficient algorithm based on the ADMM to solve the ASTTV-NTLA. With the help of tensor singular value decomposition (t-SVD), the algorithm complexity and computation time are reduced, leading to a faster speed in comparison with similar methods.

The organization of the paper for the rest of sections appears in the following manner. Section II describes research work in related fields. We introduce some notations and preliminaries in Section III. In Section IV, the ASTTV-NTLA is proposed, and its optimization procedure is introduced. Section V presents the experimental results and performance evaluation along with discussion and analyses. Finally, we conclude this paper in Section VI.

## Ii Related Work

As mentioned above, according to the principle of algorithm, infrared small target detection methods can be divided into four categories. This section mainly describes these methods in detail.

### Ii-a BS-based Infrared small Target Detection Methods

The BS-based methods treat the small target as a singular point that destroys the continuity of the local background area, such as 2D least mean square (TDLMS) filter [hadhoud1988two], Top-Hat filter [rivest1996detection] and Max-Median filter[deshpande1999max], which first suppressed the background clutters and noise using the filter and then extract the small target with an intensity threshold. These methods obtain satisfactory computation efficiency. However, they will lead to high false alarm rates and poor detection performance under clutter, noise and other discontinuous backgrounds.

### Ii-B HVS-based Infrared small Target Detection Methods

To better suppress the background clutters and noise while enhancing the small target, HVS [kim2009small, chen2013local, han2014robust] methods have been proposed. These methods suppose that the small target is more visually salient than its surrounding background. Based on the above ideas, Chen et al.[chen2013local] proposed a local contrast method (LCM) for infrared small target detection. Further, many improved LCM (ILCM) methods are proposed[han2014robust], such as multiscale relative LCM [han2018infrared], weighted strengthened local contrast measure (WSLCM) [han2020infrared], multiscale tri-layer local contrast measure (TLLCM) [han2019local], and Gaussian scale-space enhanced LCM [guan2019gaussian]. However, the performance of these methods would degrade for complex background cases that the clutters are similar to the target in saliency maps. Moreover, the effectiveness of multi-scale filtering operation cannot be guaranteed.

### Ii-C LRSD-based Infrared small Target Detection Methods

Another representative method is LRSD. This method is a branch of the popular low-rank representation (LRR) [liu2012robust] in recent years. Gao et al. [gao2013infrared]

first proposed a new infrared patch-image model (IPI) via local patch construction. Then the target-background separation problem is reformulated as a robust principal component analysis (RPCA)

[candes2011robust] problem. Subsequently, the LRSD methods were developed vigorously. However, due to the limitation of NNM, it will lead to over-shrinkage problem. To handle the above problems, Dai et al. [dai2016infrared] proposed the weighted infrared patch-image (WIPI) model via weighting each column in the patch image, but its computational complexity is relatively high. On this basis, Guo et al. [guo2017small] proposed a reweighted IPI (ReWIPI) model, in which WNNM was introduced to suppress sparse non-target pixels better. However, it can only alleviate the influence of over-shrinkage problem to a certain extent. Zhang et al. [zhang2018infrared] exploited norm to constrain clutter and proposed a non-convex rank approximation method (NRAM). Further, they proposed non-convex optimization with norm constraint (NOLC)[zhang2019infrared], which can better constrain sparse targets by norm. To improve the robustness of the IPI model, some methods based on multi-subspace structure were designed, such as low-rank and sparse representation (LRSR) [he2015small], stable multi-subspace learning methods (SMSL) [wang2017infrared] and self-regularized weighted sparse (SRWS) [zhang2021infrared]. Encouraged by the powerfulness of TV regularization, Wang et al. [wang2017infrared1] proposed a total variation regularization and principal component pursuit (TV-PCP) method.Since the multiway tensor domain provides more views to dig out the inner relationship of the data than the matrix domain, a reweighted infrared patch-tensor (RIPT) method [dai2017reweighted] is proposed, which extended the matrix domain to the tensor domain. However, due to the limitation of SNN, RIPT method achieves less competitive performance in complex scenes. To improve the performance of RIPT model, Sun et al.[sun2018infrared, sun2019infrared] and Zhang et al. [zhang2019infrared1] exploited different tensor nuclear norms. Further, Kong et al. [kong2021infrared] proposed nonconvex tensor fibered rank approximation to infrared small target detection. In addition, to improve the robustness of RIPT model, Sun et al.[sun2020infrared] proposed a multiple subspace learning and spatial-temporal IPT (MSLSTIPT) method. Considering the importance of TV regularization, Sun et al.[sun2019infrared2] proposed a spatial-temporal TV regularization and weighted IPT model (STTVWNIPT), which extended the traditional TV to explore both the spatial and temporal information.

### Ii-D DL-based Infrared small Target Detection Methods

Recently, deep learning based methods have attracted much attention due to its powerful feature learning ability. It is widely used in infrared small target detection

[fan2018dim, ryu2018small, zhao2020novel, dai2021asymmetric]. Although they achieve improved performance, the main challenge of deep learning is that infrared small target lacks remarkable texture and shape features, which makes feature learning difficult. In addition, the insufficient training set also limits its performance.## Iii Notations and preliminaries

In this section, we introduce the basic notations and give detailed definitions related to the t-SVD scheme. In this paper, scalars, vectors, and matrices are denoted by lowercase letters (e.g., x); boldface lowercase letters (e.g.,

); and boldface capital letters (e.g., ), respectively. Tensors are treated as multi-index arrays, which are denoted by Euler script (e.g., ). Readers can refer to [hu2016moving, zhang2014novel, yuan2016tensor, hu2016twist, chen2017iterative] for more details about TNN and t-SVD.### Iii-a Adaptive Thresholding Using Laplace Function

To explore the low-rank nature of the background component, existing methods generally use the TNN. Moreover, t-SVD is performed by computing matrix SVDs of the frontal slices in the Fourier domain. For a tensor , the t-SVD [chen2017iterative] is given by

(1) |

where and are orthogonal tensors of size and , respectively. denotes the t-product, and is the f-diagonal tensor of size

. Small singular values corresponds to noise or other sparse disturbances which can be removed by setting appropriate thresholds. Then, the remaining larger singular values can be used to reconstruct a low-rank tensor. The first step of t-SVD is to find the Fourier transform along the third dimension of

. Suppose the result of the Fourier transform along the third dimension is . Now, the multirank is a vector whose component gives the rank of the frontal slice, such as , where the frontal slice is denoted as [martin2013order, kilmer2013third]. The sum of the singular values of all the frontal slices (i.e., TNN) is defined as(2) |

TNN is considered as a surrogate of tensor multirank[lu2016tensor]. The disadvantage of TNN is that it assigns equal weight to all singular values of each frontal slice. However, for many natural images, the singular values have clear physical meanings and should be treated differently. Recently, the Laplace function is introduced into TNN to generate another non-convex approximation of tensor multi-rank [xu2019laplace], which can automatically assign the weight according to the importance of singular value. It is defined as follows:

(3) |

where is a positive constant and represents the singular value. represents a Laplace function, which can better approximate to the norm than the norm (see Fig.1). Therefore, the sum of the Laplace function is a better surrogate for tensor multi-rank. For , let the globally optimal solution for the following problem

(4) |

be , where and are orthogonal tensors, denotes t-product, and is the f-diagonal singular value tensor. By performing the adaptive singular value thresholding to , it can be written as

(5) |

where is an f-diagonal tensor whose each frontal slice in the Fourier domain is , is the gradient of at , and the singular value of the frontal slice of at the previous iteration. Each iteration solution of the optimization problem in Eq. (4) is briefly described in .

### Iii-B Asymmetric spatial-temporal total variation regularization

Total variation (TV) regularization is widely used in infrared small target detection, such as [wang2017infrared1], [sun2019infrared3, fang2020infrared] because of its good performance in preserving the spatial piecewise smoothness, edge structure and spatial sparsity of the images. However, existing methods are based on matrix framework and can only describe the spatial continuity of small targets, but ignore their temporal continuity. Besides, the computation of SVD and TV regularization is time-consuming. Since target is temporally consistant among successive frames and spatially smooth in local area. Therefore, Sun et al.[sun2019infrared] extended the traditional TV to STTV, which explored both the spatial and temporal information. The remarkable performance of STTV-WNIPT demonstrate the effectiveness of simultaneously using spatial-temporal information. To model spatial and temporal continuity, we propose an asymmetric spatial-temporal total variation (ASTTV) regularization approach in the tensor framework[sun2018novel]. It explores the spatial-temporal smoothness and temporal coherence of the small targets. There are two reasons for choosing the ASTTV regularization term. First, the regularization of smoothness on target for a smooth boundary and trajectory can be more efficiently achieved by imposing the ASTTV constraint [tom2020simultaneous]. Second, STTV-WNIPT enforces the spatial TV and temporal TV operators equally whereas ASTTV promotes different smoothness strength for spatial TV and temporal TV. Therefore, ASTTV is more flexible for target detection. The formulation of ASTTV regularization can be expressed as follows:

(6) |

where , and denote the horizontal, vertical and temporal difference operators, respectively, and denotes a positive constant to control the contributation in temporal dimension. The ASTTV in Eq. (6) encourages both spatial and temporal smoothness. The definitions of its three operators are expressed as

(7) |

(8) |

(9) |

: ADMM for solving the Eq. (4) |

: , , , |

: , |

: Compute |

Compute each frontal slice of by |

do |

1. ; |

2. Compute by |

3. ; |

do |

: Compute |

## Iv Proposed model

### Iv-a Spatial-temporal Infrared Patch Tensor Model

Given an infrared image, it could be modeled as a linear superposition of target image, background image and noise image:

(10) |

where , , , and represent the input image, background image, target image, and noise image, respectively. we firstly use a sliding window from the top left to the bottom right over each image and stack all image patches from consecutive frames into a 3D tensor. Similar to Eq. (10), we can divide the original tensor into three parts as below:

(11) |

where , , , are the original patch-tensor, background patch-tensor, target patch-tensor and noise patch-tensor, respectively. and denote the height and width of the sliding window, and represents the number of the patches. Compared with the matrix-based methods, our data construction model has two advantages. Firstly, tensor domain provides more views to exploit the inner relationship of data. Secondly, the target detection performance is further improved by combining temporal information.

### Iv-B The proposed ASTTV-NTLA model

By integrating the non-convex tensor rank surrogate and ASTTV regularization into a unified framework, we propose a new model for infrared image small target detection:

(12) |

where is the Laplace function based TNN surrogate, and , , denote the positive regularization parameters for ASTTV term, target and noise component, respectively. The first term is used to separate the background from the whole infrared image. The second term is utilized to characterize the smooth structure of spatial-temporal domains to remove noise and enhance local detail information. The third term is used to find the sparse target. The last Frobenius norm term is further used to remove heavy noise. Then we can use Eq. (6) to rewrite Eq. (13) as below:

(13) |

It is worth noting that the proposed model can fully capture the spatial and temporal information by incorporating the non-convex tensor low-rank surrogate and ASTTV regularization term. The main reasons is that the non-convex tensor rank surrogate ensures automatic weight assignment to the singular values. Furthermore, ASTTV is more flexible as compared with STTV for target detection.

### Iv-C Optimization Procedure

The optimization Eq. (13) can be solved effectively by using the ADMM[boyd2011distributed] approach. By introducing four auxiliary variables , , , , we first rewrite the model from Eq. (13) into an equivalent function

(14) |

The inexact augmented Lagrangian multiplier (IALM)[lin2010augmented] approach is used to solve the Eq. (14), which is described as follows:

(15) |

where , , , , represent the Lagrangian multiplier, and is a positive penalty scalar. Applying ADMM can decompose the Eq. (15) into five optimization subproblems, including , , , , . Since it is hard to concurrently optimize all these variables, we approximately solve this optimization problem by alternately minimizing one variable with the others being fixed. The details are given as follows:

1) Updating with other variables being fixed:

(16) |

Let , then the optimal solution can be obtained by (5). Thus, the solution of (16) is

(17) |

where The detailed solving process of Eq. (16) is shown in .

2) Updating with other variables being fixed:

(18) |

The solution to Eq. (18) is equivalent to the following liner system of equations:

(19) |

where , , , , and T is the matrix transpose. By considering , , and as convolutions along two spatial directions and one temporal direction, this problem has a closed form solution via nFFT.

(20) |

where is the fast nFFT operator, is the inverse nFFT operator, and H is the complex conjugate.

3) Updating with other variables being fixed:

(21) |

The above problem (21) can be solved by performing element-wise shrinkage operation [beck2009fast]:

(22) |

where is the element-wise shrinkage operator.

4) Updating with other variables being fixed:

(23) |

The above problem can also be solved by element-wise shrinkage operator:

(24) |

5) Updating with other variables being fixed:

(25) |

The solution of the above problem can be obtained by:

(26) |

6) Updating multipliers with other variables being fixed:

(27) |

7) Updating by .

Finally, the proposed ASTTV-NTLA method is summarized in .

: ASTTV-NTLA Algorithm |

: infrared image sequence , |

number of frames L, parameters |

: Transform the image sequence into the |

original tensor , |

, , , , |

, . |

not converged do |

Update by |

Update by Eq.(20) |

Update by Eq.(22) |

Update by Eq.(24) |

Update by Eq.(26) |

Update multipliers by Eq.(27) |

Update by |

Check the convergence conditions |

Update |

### Iv-D Target Detection procedure

In Fig. 2, we show the specific implementation procedure of the target detection method based on the proposed ASTTV-NTLA model. Meanwhile, the detailed steps are explained in the following.

1) Patch-tensor construction. The original infrared image sequence is transformed into several patch-tensors by stacking adjacent frames in chronological order.

2) Background and target separation. According to , the original patch-tensor is decomposed into background patch-tenor , target patch-tenor and noise patch-tensor .

3) Image reconstruction. The target image and background image can be reconstructed by simple inverse operation.

4) Target detection. Considering that the pixels of the true targets have higher values in the reconstructed target image, small targets can be extracted via a simple adaptive threshold segmentation algorithm [gao2013infrared].

(28) |

where and

are the mean and standard deviation of the target image

, respectively. is a constant determined experimentally. is an adaptive value. A pixel at can be segmented as target if .### Iv-E Complexity analyses

The computational complexity of the proposed method is briefly discussed here. For the input image sequence , we can obtain , and the dimension of each tensor is . In ASTTV-NTLA algorithm, the main cost is to update and , and the optimization of other variables can be solved by simple linear calculation. Updating requires performing FFT and SVDs of matrices in each iteration by t-SVT, which cost . Updating requires performing FFT operation, which cost . The denotes the iteration times. In summary, the computational cost at each iteration is .

## V Experimental results and analyses

In this section, we first introduce the evaluation metrics and baseline methods, then discuss the key parameters of our method. Finally, we compare the performance of our method with the baseline methods.

Methods | Acronyms | Parameter settings |

Top-Hat method | Top-Hat | Structure shape: square, structure size: |

Weighted strengthened local contrast measure | WSLCM | , gaussian filter size: |

multiscale tri-layer local contrast measure | TLLCM | , gaussian filter size: |

Infrared Patch-Image Mode | IPI | Patch Size: , sliding step: 10, |

Non-Convex Rank Approximation Minimization | NRAM | Patch Size: , sliding step: 10, , , , , |

Stable multisubspace learning | SMSL | Patch Size: , sliding step: 30, , |

Total Variation Regularization and Principal Component Pursuit | TV-PCP | , , , |

Reweighted Infrared Patch-Tensor Model | RIPT | Patch Size: , sliding step: 10, , , , |

Partial Sum of the Tensor Nuclear Norm | PSTNN | Patch Size: ,sliding step: 40, , |

Spatial-temporal Total Variation Regularization and weighted Tensor Nuclear Norm | STTV-WNIPT | , ,, , |

Non-Convex Tensor Low-Rank Approximation for Infrared Small Target Detection | ASTTV-NTLA | , ,, , |

### V-a Evaluation Metrics and Baseline Methods

For a comprehensive evaluation, four metrics including the local signal to noise ratio gain (LSNRG), background suppression factor (BSF), signal to clutter ratio gain (SCRG) and contrast gain (CG) are used to evaluate the background suppression ability and detection performance. LSNRG measures the local signal to noise ratio (LSNR) gain, which is defined as

(29) |

where and are the LSNR values before and after processing, and . and are the maximum pixel values of the target and neighborhood, respectively. Then the background suppression factor (BSF) is used to compare the background suppression ability, which is defined as:

(30) |

where and

are the standard variance of the neighboring background region of original image and target image, respectively. The most widely used SCRG is defined as the ratio of signal-to-clutter ratio (SCR) before and after processing:

(31) |

where SCR is defined as follows [gao2012small]:

(32) |

where is the average value of the target area, and are the average pixel value and standard deviation of the surrounding local neighborhood region, respectively. The size of neighboring background region is , where and represent the size of target region and the width of neighboring area, as illustrated in Fig. 3. We use contrast gain (CG)[gao2018infrared] to compare the ability to expand gray level difference between the target and background, which is defined as below:

(33) |

where and are the contrast (CON) of the original and target images, respectively, and CON is defined as:

(34) |

where and

are the same as those in Eq. (32). In general, higher values means better background suppression ability for the above four metrics, and it should be noted that LSNRG, BSF and SCRG only evaluate the suppression ability in local neighboring area, but not globally. Among all the existing metrics, the detection probability

and false-alarm rate are the key performance indicators, which are defined as follows [gao2013infrared]:(35) |

(36) |

The above two indicators range between 0 and 1.

To further evaluate the effectiveness of the proposed method, we compare its performance with ten state-of-the-art methods, which are BS-based methods (Top-Hat[rivest1996detection]), HVS-based methods (WSLCM[han2020infrared], TLLCM[han2019local]), and recently developed LRSD-based methods (IPI[gao2013infrared], NRAM[zhang2018infrared], SMSL[wang2017infrared], TV-PCP[wang2017infrared1], RIPT[dai2017reweighted], PSTNN[zhang2019infrared1], STTV-WNIPT[sun2019infrared]). Table I summarizes all the methods involved in the experiments and their detailed parameter settings. All the algorithms are implemented in MATLAB 2014a on a PC of 4.4 GHz and 16GB RAM.

Sequence | Frames | Image Size | Target Size | Average SCR | Target Descriptions | Background Descriptions |

1 | 120 | 4.79 | Fast-moving, tiny, regular shape | scenario with multilayer cloud, heavy noise | ||

2 | 120 | 3.89 | Fast-moving, irregularly shaped aircraft | Fierce clouds and heavy noise | ||

3 | 120 | 3.51 | Small and dim , quick motion | A blurred sealand background with a strong reflective artificial building | ||

4 | 120 | 2.33 | Dim and slow-moving airplane | Mountains with strong reflections | ||

5 | 120 | 1.83 | Small and slow-moving airplane | village, reflective road and roof | ||

6 | 120 | 2.64 | Small and slow-moving airplane | Forest, reflective road and roof |

### V-B Parameter Setting and Datasets

The parameters in our model have a key influence on target detection performance. By testing the synthetic data, we have chosen the appropriate values for each parameter. The regularized parameter could balance the tradeoff between the non-convex tensor low-rank approximation and ASTTV regularization, and it is empirically set to 0.005 following[sun2018novel]. is set in the range of [0,1] for most of the cases, we follow[tom2020simultaneous] to set . The parameter is used to control the relative contribution of sparse term. And we follow[lu2019tensor] to set , where denotes a tunning parameter. The parameter is set to 100 following [wang2017hyperspectral]. In the following experiments, we set the number of frames . Moreover, we also conduct experiments to further analyze the influence of parameter , and . Please refer to Section IV-D for more details.

To evaluate the detection ability and stability of the proposed method under diverse backgrounds, We simulate six image sequences that come from various scenes, as given in Table II. The synthetic data are created using six real background data and simulated target data. Note that, we generate the synthetic targets by using the approach in method[gao2013infrared]. In Sequence 1, the target is a fast-moving, regular shaped aircraft with a heavy sky cluttered background. In Sequence 2, a small and irregularly shaped aircraft is moving with fierce clouds and heavy noise. In Sequence 3, a small target is moving with a blurred sealand background, and there exists a brighter artificial building. In Sequence 4, the target is an airplane flying towards the mountains, and the bottom of the infrared image is mountains with strong reflections. In Sequence 5, the target is a small, slow aircraft flying through the village with reflective roofs and roads in the background. In Sequence 6, the target is a small slow aircraft flying in the forest with a reflection road in the background. The 2D and 3D gray distributions of representative images are also shown Figs. 10 and 11 and Figs. 12 and 13.

### V-C Validation of the proposed ASTTV-NTLA method

In this subsection, we validate the robustness of the proposed method in various scenes.

1) Robustness to single targets scene: Firstly, the proposed method is tested on three real single target infrared image sequences. The representative images are given in the first row of Fig. 4, and the corresponding separated target images are shown in the second row. For better visualization, the targets are labeled with red boxes. It can be observed from Fig. 4 that the background clutters are suppressed perfectly and each target is detected successfully.

2) Robustness to multiple targets scene: In a variety of real scenes, the number of targets of interest is different, such as a fleet and multiple independently guided reentry vehicles (MIRVs). Therefore, we test the performance of ASTTV-NTLA method in multi-objective scenario (actually 3). It is worth noting that we adopt a method similar to [gao2013infrared] to synthesize multi-target scenes. The synthetic images are given in the first row of Fig. 5, and it can be seen from the second row of Fig. 5 that the background clutters are suppressed clearly.

3) Robustness to noisy scene: In real scenes, noise is another important factor that affects the performance of target detection. Therefore, ASTTV-NTLA method is tested on several noisy scenarios. Firstly, we add Gaussian noise with to the original images, as shown in the first row of Fig. 6. The results in the second row of Fig. 6 show that ASTTV-NTLA method can suppress clutter and noise better when .

### V-D Parameter Analysis

In this subsection, we analyze the influence of number of frame , tunning parameter , and parameter on the performance of the method.

1) Number of frames: We exploit ASTTV regularization to utilize the temporal information, and the number of frames is a key parameter. We vary from 2 to 6 with a step of 1. The corresponding ROC curves are shown in Fig. 7. It can be observed that the performance is the best when . It is worth noting that if the value of the is too small, the detection probability will decrease. At the same time, the low-rank assumption could fail when is too large, and the performance of the proposed method would degrade. To achieve a balance between the performance and effectiveness, we set in the following experiments, and we find it works well for all the experiments.

2) Tunning parameter: Weighting parameter plays an important role in the optimization process of the model. We vary tunning parameter from 2 to 10 with a step of 2, and the corresponding ROC curves are also given in Fig. 8. It can be observed from Figs. 8 (a) and (b) that the ROC curves of demonstrate that an over-large will decrease the detection probability. Meanwhile, from the results of and in Figs. 8 (a), (d) and (f), we can conclude that an over-small will increase the false alarm rate. So we set in the following experiments, and it should be noted that it is possible to further improve the performance by tunning more carefully.

3) Parameter : has an impact on the detection results. It means that the temporal difference in the ASTTV regularization contributes to improving the performance of the proposed method. We vary from 0 to 1 with a step of 0.2, and the corresponding ROC curves are also given in Fig. 9. As shown in Fig. 9, the ROC curves of demonstrate that the detection probability will be reduced if there is no temporal information. is STTV regularization. As can be seen from Fig. 9, selecting the appropriate value will get better detection performance. So we set in the following experiments, and it should be noted that it is possible to further improve the performance by tunning more carefully.

Method | 60th frame of Sequence 1 | 100th frame of Sequence 2 | 90th frame of Sequence 3 | ||||||

LSNRG | BSF | SCRG | LSNRG | BSF | SCRG | LSNRG | BSF | SCRG | |

Top-hat [rivest1996detection] | 0.59 | 0.77 | 0.33 | 0.85 | 1.27 | 0.13 | 0.37 | 0.71 | 0.26 |

WSLCM [han2020infrared] | 1.15 | 2.18 | 41.69 | 1.02 | 3.06 | 4.26 | 1.44 | 2.46 | 4.36 |

TLLCM [han2019local] | NaN | Inf | NaN | 1 | 5.64 | 7.93 | 0.98 | 1.13 | 4.51 |

IPI [gao2013infrared] | 1.91 | 27.01 | 111.42 | 1.08 | 7.36 | 10.74 | 1.11 | 4.56 | 3.51 |

NRAM [zhang2018infrared] | Inf | Inf | Inf | NaN | Inf | NaN | 1.28 | 4.74 | 4.36 |

TV-PCP [wang2017infrared1] | 1.64 | 5.10 | 24.13 | 1.02 | 4.85 | 6.53 | 1.11 | 4.13 | 3.39 |

SMSL [wang2017infrared] | 1.29 | 4.91 | 18.96 | 0.93 | 5.73 | 2.94 | 1.02 | 3.24 | 1.81 |

RIPT [dai2017reweighted] | Inf | Inf | Inf | 1.05 | 5.73 | 8.05 | 1.12 | 3.60 | 3.00 |

PSTNN [zhang2019infrared1] | Inf | Inf | Inf | 0.94 | 4.65 | 12.73 | 1.19 | 3.73 | 4.36 |

STTV-WNIPT [sun2019infrared] | 1.48 | 1.88 | 42.50 | 2.03 | 3.85 | 11.14 | 1.82 | 3.86 | 22.90 |

ASTTV-NTLA (ours) | Inf | Inf | Inf | 11.02 | 14.42 | 25.24 | Inf | Inf | Inf |

### V-E Comparison to state-of-the-art Methods

To compare the performance of the proposed method and other ten state-of-the-art methods, we conduct extensive experiments on six real infrared image sequences. Figs. 10 and 11 show the comparative results of Sequences 1-6. In these figures, we can see that the detection results of Top-hat is very rough. Top-hat method not only enhances the target on Sequences 1-6, but also enhances the clutters and noises. For example, the residuals of reflective mountains of Sequence 4 and road of Sequence 5 are still remained in the results. The main reason is that the filter size of Top-hat is not suitable for the scenes with strong reflection clutter. As a top-performing HVS method, WSLCM and TLLCM can detect the target more accurately in simple background, but there are still clutter or missed detection in complex background. IPI is a classical LRSD method. Compared with the BS and HVS method, IPI has less background residual or missed clutter in complex background. Therefore, it promotes the development of NRAM, TV-PCP and SMSL methods. From the highlight scene Sequence 4 and the complex ground scene Sequences 5-6, it can be seen that IPI and NRAM method still have a little residual and clutter, but TV-PCP and SMSL methods achieve poor performance on these complex scenes. To solve the above problems, RIPT directly stacks the patches into a patch-tensor, which successfully converts a low-rank matrix recovery into a tensor recovery problem. It can be seen from the experimental results that RIPT method can suppress the clutters more clearly than matrix-based methods. Therefore, many improved methods are proposed, such as PSTNN and STTV-WNIPT. As can be seen from Fig. 10, these tensor-based methods can suppress clutter in complex background, but some non-target pixels still remain in their target image. In contrast, the proposed method can detect targets accurately under the premise of better suppression of background and noise. The results validate the effectiveness of the ASTTV and non-convex tenor low-rank approximately property. Note that the dataset contains a variety of scenes, so the experimental results can demonstrate the robustness and superiority of the proposed method. Morever, we show the 3D maps in Figs. 12 and 13 for an intuitive comparison. It can be seen from Figs. 12 and 13 that the proposed method can better enhance the target and suppress the clutter.

Method | 50th frame of Sequence 4 | 10th frame of Sequence 5 | 90th frame of Sequence 6 | ||||||

LSNRG | BSF | SCRG | LSNRG | BSF | SCRG | LSNRG | BSF | SCRG | |

Top-hat [rivest1996detection] | 0.66 | 1.27 | 4.49 | 0.42 | 0.81 | 0.04 | 0.44 | 1.56 | 0.23 |

WSLCM [han2020infrared] | NaN | Inf | NaN | 0.73 | 1.52 | 5.13 | 1.14 | 0.98 | 5.20 |

TLLCM [han2019local] | NaN | Inf | NaN | 0.95 | 2.15 | 6.45 |

Comments

There are no comments yet.