Spatio-temporal data naturally arise in many fields such as environmental sciences, geophysics, soil science, oceanography, econometrics, epidemiology, forestry, image processing and many others in which the data of interest are collected across space. The literature on spatio-temporal models is relatively abundant, see for example the monograph of Cressie & Wikle (2015).
Complex issues arise in spatial analysis, many of which are neither clearly defined nor completely resolved, but form the basis for current researches. Among the practical considerations that influence the available techniques used in the spatial data modelling, is the data dependency. In fact, spatial data are often dependent and a spatial model must be able to handle this aspect. Notice that the linear models for spatial data only capture global linear relationships between spatial locations. However, in many circumstances the spatial dependency is not linear. It is for example, the classical case where one deals with the spatial pattern of extreme events such as in the economic analysis of poverty.
Then in such situations, it is more appropriate to use a nonlinear spatial dependence measure by using for instance the strong mixing coefficients concept (see Tran, 1990). The literature on nonparametric estimation techniques, which incorporate nonlinear spatial dependency is not extensive compare to that of linear dependence. For an overview on results and applications considering spatial dependent data for density, regression estimation, prediction and classification, we highlight the following works: Lu & Chen (2004), Hallin et al. (2004), Biau & Cadre (2004), Carbon et al. (2007), Dabo-Niang & Yao (2007), Menezes et al. (2010), El Machkouri & Stoica (2010), Wang & Wang (2009), Ternynck (2014)
. Other authors deal with the spatial quantile regression estimationHallin et al. (2009), Abdi et al. (2010), Dabo-Niang et al. (2012), Younso (2017), among others.
The -Nearest Neighbor (-NN) (Biau & Devroye, 2015)
kernel estimator is a weighted average of response variables in the neighborhood of the value of covariate. The-NN kernel estimate has a significant advantage over the classical kernel estimate. The specificity of the -NN estimator lies in the fact that it is flexible to all sort of heterogeneity in the covariate which allows to account the local structure of the data. This consists of using in the choice of an appropriate number of neighbors, a random bandwidth adapted to the local structure of the data, permitting to learn more on the local data dependency. Another advantage of the -NN method is based on its easy smoothing parameter implementation. Indeed, in the classical kernel method, the smoothing parameter is a real positive bandwidth while in the -NN method, the smoothing parameter takes values in a discrete set.
The use of -NN method is very recent for spatial data. Li & Tran (2009) proposed a regression estimator of spatial data based on the
nearest neighbors method. They proved an asymptotic normality result of their estimator in the case of multivariate data. Nearest neighbors rule for classifying spatial real-valued data has been investigated recently byYounso (2017).
We are interested here in -nearest neighbors prediction and classification of spatial real-valued data. The originality of the suggested predictor and classification rule proposed lies in the fact that they depend on two kernels, one of which controls the distance between observations using a random bandwidth and the other controls the spatial dependence structure. This idea has been presented in Menezes et al. (2010), Dabo-Niang et al. (2016), Ternynck (2014) in the context of kernel prediction problem for multivariate or functional spatial data. The outline of the present paper is as follows. In Section 2, we introduce the regression model and define the corresponding predictor. Section 3 is dedicated to the almost complete convergence of the predictor whereas Section 4 applies the regression model to a supervised classification rule and adapt the previous asymptotic result. Section 5 gives some simulations and application to real data, to illustrate the performance of the proposed method. Section 7 is devoted to some conclusions. Finally, the proofs of some lemmas and the main results are postponed to the last Section.
2 Model and construction of predictor
be a spatial process defined over some probability space, . We assume that the process is observable in , , and , we write if , . Let denote the Euclidian norm in or in and the indicator function. We assume that the relation between the two process and is described by the following model:
is assumed to be independent of , the noise is centered, -mixing (see Section 3 for a description of this condition) and independent of . We are interested in predicting the spatial process in some unobserved locations and particularly at an unobserved site under the information that can be drawn on and observations , where is the observed spatial set of finite cardinality tending to as and contained in , with . In the following proposed predictor, we integrate information that might be drawn from the structure of the spatial dependence between the considered site and all sites in . To achieve this objective, we do not suppose as usual a strict stationarity assumption. We assume that the observations are locally identically distributed (given in assumption (H7) below, see Dabo-Niang et al. (2016) and Klemelä (2008) for more detail).
Indeed, we say that a substantial number of observations has a distribution close to that of . In such case, one may imagine that if there is enough sites closed to , then sequence may be used to predict . Assume that is integrable and that has the same distribution as that of some pair . We assume that and have unknown continuous densities with respect to Lebesgue measure and let and be the densities of and respectively.
A predictor of could be defined by combining the principle of -nearest neighbors method (NN) using a random bandwidth depending on the observations and the kernel weight (see Dabo-Niang et al. (2016)), as follows:
if the denominator is not null otherwise the predictor is equal to the empirical mean. Here,
and are two kernels from and to respectively, , and
where , are positive integer sequences.
The random bandwidth
is a positive random variable which depends onand observations .
The main advantage of using this predictor compare to the fully kernel method proposed by Dabo-Niang et al. (2016) may be in its easy implementation. In fact, it is easier to choose the smoothing parameters and which take their values in a discrete subset than the bandwidths used in the following kernel counterpart of (3) (Dabo-Niang et al., 2016)
where here the bandwidths , are non random. In addition, the fact that depends on allows the predictor to be adapted to a local structure of the observations, particularly if these are heterogeneous (see Burba et al. (2009)).
3 Main results
To account for spatial dependency, we assume that the process satisfies a mixing condition defined as follows: there exists a function as , such that
where and are two finite sets of sites, denotes the cardinality of the set , and are -fields generated by the ’s, is the Euclidean distance between and , and is a positive symmetric function nondecreasing in each variable. We recall that the process is said to be strongly mixing if (see Doukhan (1994)). As usual, we will assume that verifies :
for some (i.e. tends to zero at a polynomial rate).
Before giving the main results, let us give the following set of assumptions. All along the paper, we fix a compact subset in and when no confusion is possible, we will denote by , a strictly positive generic constant.
and are continuous functions in . In addition, the density function is lipschitzian and .
and , where , and .
The kernel is bounded, of compact support and
is a bounded nonnegative function, and there exist constants , and such that
The density of is bounded in and for all and .
The densities and of and are such that
The conditional density of given and the conditional density of given exist and
for all .
Hypotheses (H4)-(H8) are usual in nonparametric estimation of spatial data, see for instance Dabo-Niang et al. (2016).
The following theorem gives an almost complete convergence of the predictor.
Under assumptions (H1)-(H5), (H8) and (H6) or (H7), we have
If is lipschitzian we can obtain the rate of almost complete convergence stated in the following Corollary.
Under assumptions (H1)-(H5), (H8) and (H6) or (H7), as ,
Under assumptions (H1)-(H5), (H8) and (H6) or (H7), we have
Under assumptions (H1)-(H5), (H8) and (H6) or (H7), as , we have
The proofs will be postponed to the last section. Since the proofs of Theorem 3.1 and Corollary 3.1 come directly from that of Lemmas 3.1 and 3.2, they will be omitted. The main difficulty in the proofs of these lemmas comes from randomness of the window . Then, we do not have in the numerator and denominator of sums of identically distributed variables. The idea is to frame sensibly by two non-random bandwidths.
In the following, we apply the proposed prediction method to supervised classification.
4 Application to discrimination : kNN classification rule
or is about predicting the unknown nature of an object, a discrete quantity for example one or zero, sick or healthy, black or white. An object is a collection of numerical measurements such as a vector of weather data. More generally, an observation of an object is a-dimensional vector . The unknown nature of the object is called a class and is denoted by which takes values in a finite set . In classification, one constructs a function taking values in which represents one’s guess of given . The mapping is called a classifier.
Here we consider an observation of the object belonging to , with an unknown class . We want to predict this class from at some station using a sample of this pair of variables at some stations. As in Section 2, we assume the prediction site is and is of same distribution as and the observations are locally identically distributed.
The mapping , which is called a classifier, is defined on and takes values in . We err on if , and the probability of error for a classifier is
It is well known that the Bayes classifier, defined by,
is the best possible classifier, with respect to quadratic loss. The minimum
probability of error is called the Bayes error and is denoted by Note that depends upon the distribution
of which is unknown.
However, an estimator of the classifier based on the observations , a sample of ; is guessed by . The performance of is measured by the conditional probability of error
A sequence is called a discrimination rule.
Discrimination kernel rule has been investigated extensively in the literature particularly for independent or time-series data (Paredes & Vidal, 2006; Devroye et al., 1994; Devroye & Wagner, 1982; Hastie & Tibshirani, 1996), see the monograph of Biau & Devroye (2015) for more details. Recently, Younso (2017) has addressed a discrimination kernel rule for multivariate strict stationary spatial processes and binary spatial classes . To the best of our knowledge this work is the first one dealing with spatial data.
In this section, we extend the previous kNN predictor (3) results in the setting of belonging to .
The Bayes classifier can be approximated by the kernel regression rule derived from the kNN regression estimate and chosen such that
Such classifier (not necessarily uniquely determined) is called an
approximate Bayes classifier.
For us, a rule is good if it is consistent, that is if, in probability or almost surely as
The almost sure convergence of the proposed rule is established in the following theorem.
If assumptions (H1)-(H5), (H8) and (H6) or (H7), as , hold then,
The proof is derived easily from Lemma 3.1 and is therefore omitted.
Now, that we have checked the theoretical behavior of our predictor and its extension to a classification rule, we study the practical features through some simulations as well as an application to a multivariate data set related to fisheries data of West African Coast.
5 Numerical experiments
5.1 Simulation dataset
In order to evaluate the efficiency of the -NN prediction for a set of spatial data, we use the average of mean absolute errors (MAE) to compare the prediction by -NN method and that by kernel of Dabo-Niang et al. (2016) using simulated data based on observations such that :
Let the be independent Bernoulli random variables with parameter , , and , where we denote by a stationary Gaussian random field with mean and covariance function defined by . The process allows to control the local dependence between the sites and is:
the greater is, weaker is the spatial dependency. Accordingly, we provide simulation results obtained with different values of ; , , , different grid sizes and
and two variance parameters( and ). The model is replicated times. We take kernels
satisfying assumption (H4). The smoothing parameters are computed using the cross-validation procedure as used in Dabo-Niang et al. (2016) using the mean absolute error
|Kernel method||NN method|
Average and standard deviation of the mean absolute errors associated to prediction by each method. The p-values are very closed to 0 and then they are replaced by the symbol (***) means that the corresponding p-value is closer to 0 than that of (**).
The table 1 gives the average and standard deviations of the mean absolute errors of the both methods, over the
replications. The column entitled p-value, gives for each considered case, the p-value of a paired t-test performing in order to determine if the mean absolute error of the kernel prediction is significantly greater than that of mean absolute error ofNN prediction. We notice that the -NN method performs better than the kernel method in all cases of the spatial dependency parameter and standard deviation parameter . In particular, -NN method is more efficient than kernel method with very small p-value when the deviation is small, which highlight that -NN method is more adapted to a local data structure.
6 Application on fisheries data of West African Coast
Data comes from coastal demersal surveys off Senegalese coasts performed by the scientific team of the CRODT (Oceanographic Research Center of Dakar Thiaroye) in cold and hot season, in the North, Center and South areas, from 2001 to 2015. The fishing gear used is a standard fish trawl long of , with for the length of the bead, for the back rope and for the size of the meshes stretched at the level of the pocket. Fishing stations were visited from sunrise to sunset (diurnal strokes) at the rate of hour per station, according to the usual working methodology of the CRODT. They were essentially fired by stratified sampling, following double stratification by area (North, Center and South) and bathymetry (, , and ). The database includes stations described, among other, by identifying variables (campaign, number of station or trawl), temporal (date, year, season, starting and ending trawl times, duration time and time stratum), spatial (starting and ending latitudes and longitudes, area, starting and ending depths, average depth and bathymetric strata) biological (species/group of species, family, zoological group and specific status) and environmental (sea bottom temperature (SBT), sea surface temperature (SST), sea bottom salinity (SBS) and sea surface salinity (SSS)). We note that the Senegalese and Mauritanian upwellings affect the spatial end seasonal distribution of coastal demersal fish. In this work, taking account the environment effect, we focus on classifying three species which are of economic interest in this west african part.
Dentex angolensis (Dentex) is of Sparidae family. That family are coastal marine fish presents in tropical and temperate regions. Dentex angolensis is the most deep specie in Sparidae family. It is present at depths up to .
Pagrus caeruleostictus (Pagrus) is an intertropical species belonging to the Sparidae family, very abundant in south of Dakar(center zone) between and . It presents in cold waters between to .
Galeoides decadactylus (Thiekem) is of Polynemidae family that belongs to the coastal community of Sciaenidae. It is preferentially at a depth between and , but it is present up to . Migrations perpendicular to the coast are noted for fleeing waters where the oxygen level is too low. Thiekem is a specie of Guinean affinity, particularly abundant in the hot season in Senegal.
A preprocessing process has been applied on the dataset, which allowed us to identify the eventually covariates: SBT, SST, SBS,and SSS. Figure 3 illustrates the spatial variation of each of these selected covariates. Figure 2 presents the spatial repartition of the previous three species. For example, one can observe that Thiekem is very coastal specie manly presents in depth between 10 and 20m. This means that Thiekem prefers a high temperature and lower surface salinity, see figure 3. We aim to compare the classification quality of our method (Kernel kNN) with that of the kernel method proposed by Dabo-Niang et al. (2016) and three other well known classification methods, namely:
Basic k-NN classifier given by the cadet package where the spatial dependency is ignore, the number of neighbors is chosen by Cross validation (CV) method. Note that in addition to the previous covariates, the geographical coordinates (longitude and latitude) are considered as basic covariates.
Classification by logit models. We select the best model in term of AIC (Akaike information criteria). That is a logit model with an intercept, longitude, latitude, SBS, and SBT as explanatory variables.
For each specie, the dataset is split in two samples: training and testing samples with respective sizes and of the whole sample size. Note that these two samples are selected randomly and have the same partition as that of the dataset. Cross validation method based on the correct classification rate (CCR), by using training sample is applied on each method to chose the optimal tune parameters. Several kernels have been used, some of them are not compactly bounded as in the theoretical hypotheses.
For the three species, the results of classification over the testing sample show differences between the various methods. For Dentex specie (see table 2), our k-NN kernel method gives the best CCR () with for the CCR over the first cluster () and for second cluster (). Note that the other methods fail to calibrate between the classification quality over the two classes together. For example SVM gives CCR equals to with the best CCR over the second cluster equals to , but it is less efficient for the first cluster, only . The kernel method accounts only for the second cluster and is efficient for the first one, . Logit model seems to be not adapted to the dataset, this would be related to eventual no linear relationship.
For Pagrus specie (see table 3), the kernel method performs with of total CCR with and for the two clusters respectively. Note that all methods have difficulties to predict well the elements of the second cluster.
For Thiekem specie (see table 4), all methods gives good results with CCR around . The proposed k-NN method seems to be very adaptable to this case, it gives a total CCR equals to with and for the respective two clusters when one use biweight and gaussian kernels.
|kNN kernel method||kernel method|
|CCR for:||CCR for :|
|kNN kernel method||kernel method|
|CCR for:||CCR for :|
|kNN kernel method||kernel method|
|CCR for:||CCR for :|
In this work, we propose a nearest neighbors method to define a nonparametric spatial predictor and discrimination for non-strictly stationary spatial processes. The originality of the proposed method is to take into account both the distance between sites and that between the observations. We give an extension of the recent work of Dabo-Niang et al. (2016) on spatial kernel predictor of a stationary multivariate process. We also extend the kernel discrimination rule of Younso (2017) for multivariate strict stationary spatial processes and two cluster. We provide asymptotic results, simulation results on the predictor. This discrimination rule is applied to a prediction problem through an environmental fishering dataset. The numerical results show that proposed nearest neighbors method outperforms kernel methods, particularly in presence of a local spatial data structure. This is well known in the case of non-spatial data. One can then see the proposed methodology as a good alternative to the classical nearest neighbors approach for spatial data of Li & Tran (2009) and Younso (2017) that does not take into account the proximity between locations.
We start to introduce these followings technical lemmas that will permit us to handle the difficulties induced by the random bandwidth in the expression of the function . These technical lemmas represent adaptation of the results given in Collomb (1980) (for independent multivariate data) and their generalized version by Burba et al. (2009), Kudraszow & Vieu (2013) (for independent functional data).
For , we define
For all and , let
where is the volume of the unit sphere in . It is clear that
If the following conditions are verified:
then we have
Under the following conditions:
Since the proof of Lemma 3.1 is based on the result of Lemma 8.1, it is sufficient to check conditions , and . For the proof of Lemma 3.2, it suffices to check conditions and . To check the condition we will need to use the following two lemmas.
Under assumptions of Theorem 3.1, we have
where denote the closed ball of with center and radius .
Proof of Lemma 8.4
Let , we can deduce that
by the following results. Under the lipschitz condition of (assumption (H1)), we have
and for all , we deduce that
in addition, For , note that by (H5) and for each
since by (18)
Using Lemma 8.3, we can write for
Let be a sequence of real numbers defined as , and its complementary in and write
According to the definitions of and , and equation (20), it is that