An important application for marine surveillance radar is to detect sea-surface small floating targets such as buoys, human divers, and small boats . When detecting, the received target signals at the radar are buried in the strong returned signals reflected by the sea surface, referred to as sea clutters [2, 3]. It is known that better detection performance is achieved if prior knowledge of sea clutters’ distribution can be acquired, since by this a proper detection threshold can be determined at the detector [4, 5, 6]. Therefore, an important question that arises is how to accurately model the distribution of sea clutters in the fluctuating sea state for small target detection.
, the authors utilized the Gauss distribution to model the amplitude of the sea clutter at a low-resolution radar. With an increase in radar’s spatial resolution, the amplitude distribution of the sea clutters was further extended from the Gaussian to compound-Gaussian probability density functions (PDFs) in for small target detection. Gao et al. in 
adopted the generalized Gamma distribution to describe the statistical behaviors of sea clutters, and provided a parameter estimation scheme by taking both of estimation precision and applicable conditions into consideration. In, the authors adopted the Weibull model in the constant false alarm rate (CFAR) detector for radar detection and evaluated the involved parameter optimization problem.
To summarize, the distributions of sea clutters adopted in [7, 8, 9, 10] for target detection are generally established as parametric models. However, considering the two following defects of the parametric models, a considerable gap possibly exists between the fitted and realistic distribution of sea clutters. Firstly, the parametric models can hardly depict the spiky components in sea clutters, which is produced when the high-resolution radar works at a low grazing angle or under dynamic sea states . Secondly, as the distribution of sea clutters usually varies with different detection environments, assuming a fixed parametric model for it cannot guarantee satisfactory fitting performance in the varying detection environments and thus degrades the detection performance.
Following these insights, the distribution of sea clutters should be characterized by sufficiently analyzing the collected data instead of assuming a parametric model. Inspired by this, the kernel density estimation (KDE), a non-parametric approach for estimating the PDF of a random variable, can be adopted to reveal the distribution of the collected sea clutters. Different from the methods in[7, 8, 9, 10], the KDE method utilizes smooth kernel functions to fit the realistic distribution of the observed data without making any assumption on it, which can effectively reflect the information of the spiky components and flexibly adapt to the varying detection environments.
When applying the KDE method, it is of utmost importance to determine two key parameters, namely the kernel function and the bandwidth . In , the Gaussian kernel and some traditional bandwidth selectors such as the plug-in were adopted in the KDE method, which show good fitting performance on random sequence samples. However, few works have ever studied how the KDE method works in the sea-surface target detection. In addition, whether there are other kernel functions that can achieve better fitting performance than the Gaussian kernel or not is still unclear. Furthermore, it is also quite challenging to derive the optimal bandwidth for other specialized kernels by the traditionally complicated bandwidth selection methods such as the plug-in technique . These challenges impose restrictions on the application of KDE method in estimating the distribution of sea clutters.
In view of these, this paper first develops a KDE-based sea clutter modeling framework that is suitable for different kernel functions. In this framework, two embedded fundamental problems, the selection of a proper kernel density function and the determination of its corresponding optimal bandwidth, are needed to be solved. Considering three kinds of kernels, i.e., Gaussian, Gamma, and Weibull, we then derive their respective closed-form optimal bandwidth equations and design a fast iterative bandwidth selection algorithm to solve them.
The main contributions of this work are as follows:
We propose a KDE-based framework that enfolds kernel function selection and bandwidth optimization to precisely model the sea clutter distribution. Compared with traditional parametric methods, this framework can not only take the information of spiky components into account but also adapt to varying detection environments.
Inspired by parametric sea clutter models, we select the Gaussian, Gamma, and Weibull distributions as the kernels in our proposed framework. Particularly, we derive closed-form equations of the optimal bandwidth for these three kernels, which are unlikely to be deduced by adopting traditional bandwidth selection methods such as the plug-in technique. Due to the high complexity in solving these derived equations, we further design a fast iterative bandwidth selection algorithm to calculate the optimal bandwidth for each of kernels.
Experimental results exhibit that our proposed approach outperforms the existing methods in terms of the modeling error (about two orders of magnitude reduction). Moreover, applying our modeled sea clutter distribution into the CFAR detector can significantly improve the detection probability, especially in low false alarm rate cases (up to 36%).
Ii System Scenarios and Problem Formulation
In this section, we first introduce the realistic Intelligent PIxel processing X-band (IPIX) radar datasets, and then formulate the asymptotic mean integrated square error (AMISE) minimization problem.
Ii-a IPIX Datasets
In this paper, we adopt the IPIX database, an authoritative and widely-used database collected at the east coast of Canada in November 1993, to model the distribution of sea clutters. As shown from a website held by Simon Haykin , there are a total of 14 datasets in the collected database. Each dataset includes 14 separate spatial range cells, with each cell containing a length of 131072 time sampling data. These cells can be divided into three categories. Specifically, the cell with the target signals is labeled as the primary cell, the adjacent cells affected by the target are labeled as the secondary cells, and the remaining cells are clutter-only cells. For notational simplicity, we denote the samples from the primary cells and clutter-only cells as target signals and sea clutters, respectively.
Ii-B Statistical Sea Clutter Distributions
As is known to all, the amplitude of sea clutters usually follows Gaussian distribution for a low-resolution radar. However, it was soon found that the Gaussian distribution exhibits a poor fitting performance as the spatial resolution of the radar increases. Moreover, the amplitude distribution of sea clutters at a high-resolution radar will demonstrate the characteristics of the compound-Gaussian models. These models, e.g., the Gamma and Weibull models, have been widely used for sea clutter distribution modeling and shown a better fitting performance compared with the Gaussian distribution. In what follows, we present the PDFs and fitting performance of the above distributions.
Ii-B1 Gaussian distribution
The PDF of the Gaussian distribution is given as
where is the expectation of the distribution,
is the standard deviation, anddenotes the amplitude of the clutter samples.
Ii-B2 Gamma distribution
The PDF of the Gamma distribution is presented as
where and are the shape parameter and the rate parameter of the Gamma distribution, respectively. In addition, represents the complete gamma function.
Ii-B3 Weibull distribution
The PDF of the Weibull distribution can be described as
where and are the shape parameter and the scale parameter of the Weibull distribution, respectively.
To obtain the best combination of parameters for the aforementioned three functions, we first utilize (1), (2), and (3) to model the distribution of sea clutter data based on the mean squared error (MSE) criterion. Under the best parameter settings, we then calculate their corresponding PDFs, which are plotted in Fig. 1. From the figure, the fitting performance of the Gamma and Weibull distributions are better than the Gaussian distribution in the case when the normalized amplitude is more than 0. However, there still exists considerable bias between the curves of statistical PDFs and the practical sea clutter PDF, especially in the cases of low normalized amplitude. This phenomenon implies that the distribution of sea clutters should be estimated by tracking and characterizing the instant changes instead of traditionally assuming a parametric model.
Ii-C Problem Formulation
We firstly utilize the kernel density estimation, a nonparametric approach for estimating the probability density function of a random variable, to model the distribution of sea clutters. By adopting the kernel function and bandwidth, the KDE approach assigns a height curve to each observation point. Each curve needs to be normalized first and then summed up by the kernel estimator function to estimate the density of sea clutters. The expression of KDE can be written as follows 
where is the kernel density function, is the bandwidth of the KDE method, is the number of sample points, and is the -th sample point.
Generally, the kernel density function and the bandwidth are two key factors that determine the estimation performance. For a given kernel density function, there exists an optimal bandwidth that achieves the best estimation accuracy, and larger or smaller bandwidth will lead to worse fitting performance. In what follows, we denote as the density function of sea clutters and demonstrate the procedure of the theoretical derivation of the optimal bandwidth.
To derive the expression of optimal bandwidth, we then introduce a useful criterion, referred to as the AMISE, to evaluate the fitting performance of the distribution estimation , given by 
where , , and is the second derivative of the density . Then the optimal bandwidth can be straightforward derived by an optimization problem as follows
The solution of (6) is illustrated in the following theorem.
For each kernel density function to estimate the unknown density , there exists a general expression for the optimal bandwidth, given by
Proof of this theorem can be found in . ∎
Theorem 1 indicates that , , and should be precalculated when determining the optimal bandwidth of the KDE method. Among these variables, although the and can be easily derived if the kernel function is given, the value of is difficult to obtain as the density is still unknown to us. To solve this problem, the estimation of is used to transform into an easy-to-calculate expression, which will be illustrated in the next section.
Iii Derivation of Optimal Bandwidths for Different Kernels
In this section, we propose an analytical approach to solve problem (6) in Section II-C, where the derived optimal bandwidth will be varied with different kinds of kernels. Interestingly, statistical models, e.g., Gaussian, Gamma, and Weibull distributions, usually can reveal the physical nature of sea clutters . Inspired by this, we take the Gaussian, Gamma, and Weibull kernels as examples to evaluate the fitting performance of the KDE-based method.
Iii-a Gaussian Kernel Density Function
According to the Gaussian distribution (1), the kernel density function can be expressed as
In order to derive the optimal bandwidth in (7), and should be derived in the first place, which will be quantified in the following lemma.
For the Gaussian kernel, and can be expressed as
Compared with and , it is more difficult to calculate due to the lack of the prior knowledge of the unknown density . Our idea is to first reshape in an easy-to-calculate form and then estimate it via the estimator of the second derivative of the density. The following theorem quantifies the optimal bandwidth for the Gaussian kernel.
The AMISE gets its minimum for the Gaussian kernel density function when , where is a function of .
As concluded above, the AMISE gets its minimum when the optimal bandwidth is adopted in (4). Substituting into , the optimal estimation distribution of an unknown density is given by
where is the -th derivative of the kernel .
To transform the integral of the squared second derivative of , i.e., , into an easy-to-calculate expression, we utilize the function in (12) to estimate it, given by
For notational simplicity, we set
Rearranging in (13) yields
Iii-B Gamma Kernel Density Function
For the case of Gamma distribution, we first deduce the and in the following lemma.
For the Gamma kernel, and can be expressed as
Then, we derive the for the Gamma kernel based on Lemma 2. Similar to the proof of Theorem 2, we first utilize (12) to estimate for the Gamma kernel, and then obtain and (corresponding to and in (17) and (18), respectively). Related results are summarized in the following theorem and we omit its proof for brevity.
The AMISE gets its minimum for the Gamma kernel density function when , where , , and .
Iii-C Weibull Kernel Density Function
As for Weibull kernel density function, we first deduce the and in the following lemma.
For the Weibull kernel, and can be expressed as
The AMISE gets its minimum for the Weibull kernel density function when , where , , .
|Kernel functions||Optimal bandwidth|
Based on Theorems 2, 3, and 4, we have rigorously derived the closed-form equations about the optimal bandwidths for the three different kernels with their involved parameters in Table III-C. It is observed from the table that, the optimal bandwidths are all determined once the parameters of the selected kernel density functions are given. Moreover, the expressions of optimal bandwidths are all in the form of the fixed-point equations since the , , and are the functions of . This phenomenon motivates us to apply the fixed-point iterative algorithm to obtain .
Iii-D Algorithms for the Optimal Bandwidth Selection
In this subsection, we calculate the optimal bandwidths for the three kernels based on equations in Table III-C. However, it is difficult to obtain a straightforward expression for the optimal bandwidth as the , , and are the functions of , which are complicated and unsolvable. Furthermore, traditional numerical analysis such as the Newton-Raphson method that is based on the derivation of the target equation also increases the computation complexity and reduces the efficiency of the KDE.
In view of these, we adopt the fixed-point theory to efficiently solve the equations in Table III-C, where the optimal bandwidths for the three kernels can be obtained by deriving the fixed-points of the equations. To determine these fixed-points efficiently, we further design a fast iterative bandwidth selection algorithm, which is described in Algorithm 1.
Iv Experimental Results and Analysis
In this section, we present experimental results to exhibit the fitting and detection performance of our proposed method using the realistic IPIX radar datasets.
Consider that the background sea clutter samples are needed to train the KDE model in Eq. (4), the clutter-only cells in IPIX radar dataset, e.g., the 14-th cell in dataset 17, are thus adopted to be served as training sets. Based on the training sets, we obtain the best combination of parameters of the three kernel density functions under the MSE criterion, which are as follows: , , , , , , and . Then, we utilize Algorithm 1 to search for the optimal bandwidths for different kernel functions. The experimental results show that our proposed algorithm converges very fast (within 5 iterations) and the optimal bandwidths are equal to 0.08, 0.05, and 0.06 for the Gaussian, Gamma, and Weibull kernel functions, respectively.
Based on the derived optimal bandwidths, we firstly compare the PDF fitting performance of our proposed KDE methods under different kernels in Fig. 2. From the figure, it is obtained that, compared with the Gaussian kernel, the curves of Gamma and Weibull kernels are much closer to the distribution of sea clutters. This phenomenon indicates that, if the optimal bandwidths are adopted for the three kernels, the kernel density function itself will play the dominant role in estimation. Thus, it may be a better choice to select the Weibull and Gamma distributions rather than the Gaussian distribution as kernel functions.
Secondly, we compare the complementary cumulative distribution function (CCDF) fitting performance of our proposed KDE methods for different kernels. As depicted in Fig.3, the CCDF curves of our proposed KDE methods are much closer to the sea clutter than those of the traditional parametric models, especially in high normalized amplitude. Furthermore, the curves of the Gamma kernel and Weibull kernel almost overlap the curve of sea clutter, except for the Gaussian kernel which shows more or less difference from the others. In particular, the obtained data shows that, compared with traditional parametric models, the Gaussian, Gamma, and Weibull kernels can reduce the MSE from the magnitude of to the magnitudes of , , and , respectively. Therefore, the Gamma and Weibull kernels are more preferred when applying the KDE methods to detect sea-surface targets.
Finally, we introduce the CFAR detector to test the detection performance of our proposed KDE method. The CFAR detector works in a two-step process, namely the training and testing steps. In the training step, estimate the distribution of the clutter-only cell data and calculate the background level of sea clutters. Then, determine the threshold at a given false alarm rate by multiplying the background level with a given factor . In the testing step, compare the amplitude of the testing samples with the obtained threshold to decide whether they are targets or not. We refer the readers to  on mechanisms for the detection and we omit it for brevity.
Fig. 4 depicts how the detection probability of our proposed method and traditional parametric methods varies with the false alarm rate. From the figure, it is obtained that although the detection probabilities of these methods all increase with the false alarm rate, our proposed KDE method always achieves better detection performance than the others either in high or low false alarm rate cases. For example, our proposed Weibull kernel KDE method improves the by , , and compared with the Weibull, Gamma, and Gaussian distribution based methods, respectively, when the is 0.001.
In this paper, we have put forward a KDE-based sea clutter modeling framework that is suitable for different kernel functions. In this framework, we have firstly derived the closed-form optimal bandwidth equations for the Gaussian, Gamma, and Weibull kernels and then designed a fast iterative bandwidth selection algorithm to solve them. Experimental results have exhibited that, compared with existing methods, our proposed approach can significantly decrease the error incurred by sea clutter modeling (about two orders of magnitude reduction) and improve the target detection probability (up to in low false alarm rate cases).
-  W. Zhou, J. Xie, G. Li, and Y. Du, “Robust CFAR detector with weighted amplitude iteration in nonhomogeneous sea clutter,” IEEE Trans. Aero. Elec. Sys., vol. 53, no. 3, pp. 1520–1535, Jun. 2017.
-  G. Gao and G. Shi, “CFAR ship detection in nonhomogeneous sea clutter using polarimetric SAR data based on the notch filter,” IEEE Trans. Geosci. Remote, vol. 55, no. 8, pp. 4811–4824, Aug. 2017.
-  Y. Li, Y. Zhang, W. Li, and T. Jiang, “Marine wireless Big Data: Efficient transmission, related applications, and challenges,” IEEE Wireless Commun., vol. 25, no. 1, pp. 19–25, Feb. 2018.
-  S. Haykin and T. K. Bhattacharya, “Modular learning strategy for signal detection in a nonstationary environment,” IEEE Trans. Signal Process., vol. 45, no. 6, pp. 1619–1637, Jun. 1997.
-  Y. Li, T. Jiang, M. Sheng, and Y. Zhu, “QoS-aware admission control and resource allocation in underlay device-to-device spectrum-sharing networks,” IEEE J. Sel. Areas Commun., vol. 34, no. 11, pp. 2874–2886, Nov. 2016.
-  Y. Li, M. Sheng, Y. Sun, and Y. Shi, “Joint optimization of BS operation, user association, subcarrier assignment, and power allocation for energy-efficient HetNets,” IEEE J. Sel. Areas Commun., vol. 34, no. 12, pp. 3339–3353, Dec. 2016.
-  F. T. Ulaby and M. C. Dobson, Handbook of Radar Scattering Statistics for Terrain. Norwood, MA: Artech House, 1989.
-  K. Ward, “Compound representation of high resolution sea clutter,” Electron. Lett., vol. 17, no. 16, pp. 561–563, Aug. 1981.
-  G. Gao, K. Ouyang, Y. Luo, S. Liang, and S. Zhou, “Scheme of parameter estimation for generalized Gamma distribution and its application to ship detection in SAR images,” IEEE Trans. Geosci. Remote, vol. 55, no. 3, pp. 1812–1832, Mar. 2017.
-  D. Schleher, “Radar detection in Weibull clutter,” IEEE Trans. Aero. Elec. Sys., vol. AES-12, no. 6, pp. 736–743, Nov. 1976.
-  Y. Wei, L. Guo, and J. Li, “Numerical simulation and analysis of the spiky sea clutter from the sea surface with breaking waves,” IEEE Trans. Antenn. Propag., vol. 63, no. 11, pp. 4983–4994, Nov. 2015.
-  A. Qahtan, S. Wang, and X. Zhang, “KDE-Track: An efficient dynamic density estimator for data streams,” IEEE Trans. Knowl. Data En., vol. 29, no. 3, pp. 642–655, Mar. 2017.
-  M. C. Jones, “A brief survey of bandwidth selection for density estimation,” J. Am. Stat. Assoc., vol. 91, no. 433, pp. 401–407, Jun. 1996.
-  [Online]. Available: http://soma.ece.mcmaster.ca/ipix/.
-  B. W. Silverman, Density estimation for statistics and data analysis. Boca Raton, FL, USA: Chapman & Hall/CRC, 1986.
-  L. A. Alexandre, “A solve-the-equation approach for unidimensional data kernel bandwidth selection,” Technical Report, University of Beira Interior, Portugal, 2008.