1 Introduction
Spatiotemporal data naturally arise in many fields such as environmental sciences, geophysics, soil science, oceanography, econometrics, epidemiology, forestry, image processing and many others in which the data of interest are collected across space. The literature on spatiotemporal models is relatively abundant, see for example the monograph of Cressie & Wikle (2015).
Complex issues arise in spatial analysis, many of which are neither clearly defined nor completely resolved, but form the basis for current researches. Among the practical considerations that influence the available techniques used in the spatial data modelling, is the data dependency. In fact, spatial data are often dependent and a spatial model must be able to handle this aspect. Notice that the linear models for spatial data only capture global linear relationships between spatial locations. However, in many circumstances the spatial dependency is not linear. It is for example, the classical case where one deals with the spatial pattern of extreme events such as in the economic analysis of poverty.
Then in such situations, it is more appropriate to use a nonlinear spatial dependence measure by using for instance the strong mixing coefficients concept (see Tran, 1990). The literature on nonparametric estimation techniques, which incorporate nonlinear spatial dependency is not extensive compare to that of linear dependence. For an overview on results and applications considering spatial dependent data for density, regression estimation, prediction and classification, we highlight the following works: Lu & Chen (2004), Hallin et al. (2004), Biau & Cadre (2004), Carbon et al. (2007), DaboNiang & Yao (2007), Menezes et al. (2010), El Machkouri & Stoica (2010), Wang & Wang (2009), Ternynck (2014)
. Other authors deal with the spatial quantile regression estimation
Hallin et al. (2009), Abdi et al. (2010), DaboNiang et al. (2012), Younso (2017), among others.The Nearest Neighbor (NN) (Biau & Devroye, 2015)
kernel estimator is a weighted average of response variables in the neighborhood of the value of covariate. The
NN kernel estimate has a significant advantage over the classical kernel estimate. The specificity of the NN estimator lies in the fact that it is flexible to all sort of heterogeneity in the covariate which allows to account the local structure of the data. This consists of using in the choice of an appropriate number of neighbors, a random bandwidth adapted to the local structure of the data, permitting to learn more on the local data dependency. Another advantage of the NN method is based on its easy smoothing parameter implementation. Indeed, in the classical kernel method, the smoothing parameter is a real positive bandwidth while in the NN method, the smoothing parameter takes values in a discrete set.The use of NN method is very recent for spatial data. Li & Tran (2009) proposed a regression estimator of spatial data based on the
nearest neighbors method. They proved an asymptotic normality result of their estimator in the case of multivariate data. Nearest neighbors rule for classifying spatial realvalued data has been investigated recently by
Younso (2017).We are interested here in nearest neighbors prediction and classification of spatial realvalued data. The originality of the suggested predictor and classification rule proposed lies in the fact that they depend on two kernels, one of which controls the distance between observations using a random bandwidth and the other controls the spatial dependence structure. This idea has been presented in Menezes et al. (2010), DaboNiang et al. (2016), Ternynck (2014) in the context of kernel prediction problem for multivariate or functional spatial data. The outline of the present paper is as follows. In Section 2, we introduce the regression model and define the corresponding predictor. Section 3 is dedicated to the almost complete convergence of the predictor whereas Section 4 applies the regression model to a supervised classification rule and adapt the previous asymptotic result. Section 5 gives some simulations and application to real data, to illustrate the performance of the proposed method. Section 7 is devoted to some conclusions. Finally, the proofs of some lemmas and the main results are postponed to the last Section.
2 Model and construction of predictor
Let
be a spatial process defined over some probability space
, . We assume that the process is observable in , , and , we write if , . Let denote the Euclidian norm in or in and the indicator function. We assume that the relation between the two process and is described by the following model:(1) 
where
(2) 
is assumed to be independent of , the noise is centered, mixing (see Section 3 for a description of this condition) and independent of . We are interested in predicting the spatial process in some unobserved locations and particularly at an unobserved site under the information that can be drawn on and observations , where is the observed spatial set of finite cardinality tending to as and contained in , with . In the following proposed predictor, we integrate information that might be drawn from the structure of the spatial dependence between the considered site and all sites in . To achieve this objective, we do not suppose as usual a strict stationarity assumption. We assume that the observations are locally identically distributed (given in assumption (H7) below, see DaboNiang et al. (2016) and Klemelä (2008) for more detail).
Indeed, we say that a substantial number of observations has a distribution close to that of . In such case, one may imagine that if there is enough sites closed to , then sequence may be used to predict . Assume that is integrable and that has the same distribution as that of some pair . We assume that and have unknown continuous densities with respect to Lebesgue measure and let and be the densities of and respectively.
A predictor of could be defined by combining the principle of nearest neighbors method (NN) using a random bandwidth depending on the observations and the kernel weight (see DaboNiang et al. (2016)), as follows:
(3) 
if the denominator is not null otherwise the predictor is equal to the empirical mean. Here,
and are two kernels from and to respectively, , and
and
where , are positive integer sequences.
The random bandwidth
is a positive random variable which depends on
and observations .The main advantage of using this predictor compare to the fully kernel method proposed by DaboNiang et al. (2016) may be in its easy implementation. In fact, it is easier to choose the smoothing parameters and which take their values in a discrete subset than the bandwidths used in the following kernel counterpart of (3) (DaboNiang et al., 2016)
(4) 
where here the bandwidths , are non random. In addition, the fact that depends on allows the predictor to be adapted to a local structure of the observations, particularly if these are heterogeneous (see Burba et al. (2009)).
3 Main results
To account for spatial dependency, we assume that the process satisfies a mixing condition defined as follows: there exists a function as , such that
(5) 
where and are two finite sets of sites, denotes the cardinality of the set , and are fields generated by the ’s, is the Euclidean distance between and , and is a positive symmetric function nondecreasing in each variable. We recall that the process is said to be strongly mixing if (see Doukhan (1994)). As usual, we will assume that verifies :
(6) 
for some (i.e. tends to zero at a polynomial rate).
Before giving the main results, let us give the following set of assumptions.
All along the paper, we fix a compact subset in and when no confusion is possible, we will denote by , a strictly positive generic constant.

and are continuous functions in . In addition, the density function is lipschitzian and .

and , where , and .

The kernel is bounded, of compact support and
(7) 
is a bounded nonnegative function, and there exist constants , and such that
(8) 
The density of is bounded in and for all and .

and
where 
, and
where 
The densities and of and are such that
with
The conditional density of given and the conditional density of given exist andfor all .
Remark 1

Hypotheses (H4)(H8) are usual in nonparametric estimation of spatial data, see for instance DaboNiang et al. (2016).
The following theorem gives an almost complete convergence of the predictor.
Theorem 3.1
Under assumptions (H1)(H5), (H8) and (H6) or (H7), we have
(9) 
If is lipschitzian we can obtain the rate of almost complete convergence stated in the following Corollary.
Corollary 3.1
Under assumptions (H1)(H5), (H8) and (H6) or (H7), as ,
(10) 
The results of Theorem 3.1 and Corollary 3.1 can be proved easily from the asymptotic results (stated respectively in Lemmas 3.1 and 3.2) of the following regression function estimate
with
Lemma 3.1
Under assumptions (H1)(H5), (H8) and (H6) or (H7), we have
(11) 
Lemma 3.2
Under assumptions (H1)(H5), (H8) and (H6) or (H7), as , we have
(12) 
The proofs will be postponed to the last section. Since the proofs of Theorem 3.1 and Corollary 3.1 come directly from that of Lemmas 3.1 and 3.2, they will be omitted. The main difficulty in the proofs of these lemmas comes from randomness of the window . Then, we do not have in the numerator and denominator of sums of identically distributed variables. The idea is to frame sensibly by two nonrandom bandwidths.
In the following, we apply the proposed prediction method to supervised classification.
4 Application to discrimination : kNN classification rule
Classical discrimination
or is about predicting the unknown nature of an object, a discrete quantity for example one or zero, sick or healthy, black or white. An object is a collection of numerical measurements such as a vector of weather data. More generally, an observation of an object is a
dimensional vector . The unknown nature of the object is called a class and is denoted by which takes values in a finite set . In classification, one constructs a function taking values in which represents one’s guess of given . The mapping is called a classifier.Here we consider an observation of the object belonging to , with an unknown class . We want to predict this class from at some station using a sample of this pair of variables at some stations. As in Section 2, we assume the prediction site is and is of same distribution as and the observations are locally identically distributed.
The mapping , which is called a classifier, is defined on and takes values in . We err on if , and the probability of error for a classifier is
It is well known that the Bayes classifier, defined by,
is the best possible classifier, with respect to quadratic loss. The minimum
probability of error is called the Bayes error and is denoted by Note that depends upon the distribution
of which is unknown.
However, an estimator of the
classifier based on the observations ,
a sample of ; is guessed by . The performance of
is measured by the conditional probability of error
A sequence is called a discrimination rule.
Discrimination kernel rule has been investigated extensively in the literature particularly for independent or timeseries data (Paredes & Vidal, 2006; Devroye et al., 1994; Devroye & Wagner, 1982; Hastie & Tibshirani, 1996), see the monograph of Biau & Devroye (2015) for more details. Recently, Younso (2017) has addressed a discrimination kernel rule for multivariate strict stationary spatial processes and binary spatial classes . To the best of our knowledge this work is the first one dealing with spatial data.
In this section, we extend the previous kNN predictor (
3) results in the setting of belonging to .The Bayes classifier can be approximated by the kernel regression rule derived from the kNN regression estimate and chosen such that
(13) 
where
(14) 
Such classifier (not necessarily uniquely determined) is called an
approximate Bayes classifier.
For us, a rule is good if it is consistent,
that is if, in probability or almost surely as
The almost sure convergence of the proposed rule is established in the
following theorem.
Theorem 4.1
If assumptions (H1)(H5), (H8) and (H6) or (H7), as , hold then,
The proof is derived easily from Lemma 3.1 and is therefore omitted.
Now, that we have checked the theoretical behavior of our predictor and its extension to a classification rule, we study the practical features through some simulations as well as an application to a multivariate data set related to fisheries data of West African Coast.
5 Numerical experiments
5.1 Simulation dataset
In order to evaluate the efficiency of the NN prediction for a set of spatial data, we use the average of mean absolute errors (MAE) to compare the prediction by NN method and that by kernel of DaboNiang et al. (2016) using simulated data based on observations such that :
and
Let the be independent Bernoulli random variables with parameter , , and , where we denote by a stationary Gaussian random field with mean and covariance function defined by . The process allows to control the local dependence between the sites and is:
the greater is, weaker is the spatial dependency. Accordingly, we provide simulation results obtained with different values of ; , , , different grid sizes and
and two variance parameters
( and ). The model is replicated times. We take kernelsand
satisfying assumption (H4). The smoothing parameters are computed using the crossvalidation procedure as used in DaboNiang et al. (2016) using the mean absolute error
where
and
Kernel method  NN method  

a  Mean  SD  Mean  SD  pvalue  
5  5  0.303  0.0047  0.241  0.001  **  
10  0.548  0.0308  0.396  0.017  **  
20  0.747  0.0471  0.579  0.024  **  
0.1  5  0.289  0.0045  0.149  0.0001  ***  
10  0.428  0.0052  0.198  0.0001  ***  
20  0.629  0.0006  0.289  0.0012  ***  
5  5  0.235  0.002  0.208  0.0002  **  
10  0.367  0.009  0.288  0.0023  **  
20  0.476  0.010  0.405  0.0065  **  
0.1  5  0.169  0.001  0.141  0.00001  ***  
10  0.271  0.002  0.178  0.00004  ***  
20  0.482  0.004  0.241  0.00023  *** 
Average and standard deviation of the mean absolute errors associated to prediction by each method. The pvalues are very closed to 0 and then they are replaced by the symbol (***) means that the corresponding pvalue is closer to 0 than that of (**).
The table 1 gives the average and standard deviations of the mean absolute errors of the both methods, over the
replications. The column entitled pvalue, gives for each considered case, the pvalue of a paired ttest performing in order to determine if the mean absolute error of the kernel prediction is significantly greater than that of mean absolute error of
NN prediction. We notice that the NN method performs better than the kernel method in all cases of the spatial dependency parameter and standard deviation parameter . In particular, NN method is more efficient than kernel method with very small pvalue when the deviation is small, which highlight that NN method is more adapted to a local data structure.6 Application on fisheries data of West African Coast
Data comes from coastal demersal surveys off Senegalese coasts performed by the scientific team of the CRODT (Oceanographic Research Center of Dakar Thiaroye) in cold and hot season, in the North, Center and South areas, from 2001 to 2015. The fishing gear used is a standard fish trawl long of , with for the length of the bead, for the back rope and for the size of the meshes stretched at the level of the pocket. Fishing stations were visited from sunrise to sunset (diurnal strokes) at the rate of hour per station, according to the usual working methodology of the CRODT. They were essentially fired by stratified sampling, following double stratification by area (North, Center and South) and bathymetry (, , and ). The database includes stations described, among other, by identifying variables (campaign, number of station or trawl), temporal (date, year, season, starting and ending trawl times, duration time and time stratum), spatial (starting and ending latitudes and longitudes, area, starting and ending depths, average depth and bathymetric strata) biological (species/group of species, family, zoological group and specific status) and environmental (sea bottom temperature (SBT), sea surface temperature (SST), sea bottom salinity (SBS) and sea surface salinity (SSS)). We note that the Senegalese and Mauritanian upwellings affect the spatial end seasonal distribution of coastal demersal fish. In this work, taking account the environment effect, we focus on classifying three species which are of economic interest in this west african part.

Dentex angolensis (Dentex) is of Sparidae family. That family are coastal marine fish presents in tropical and temperate regions. Dentex angolensis is the most deep specie in Sparidae family. It is present at depths up to .

Pagrus caeruleostictus (Pagrus) is an intertropical species belonging to the Sparidae family, very abundant in south of Dakar(center zone) between and . It presents in cold waters between to .

Galeoides decadactylus (Thiekem) is of Polynemidae family that belongs to the coastal community of Sciaenidae. It is preferentially at a depth between and , but it is present up to . Migrations perpendicular to the coast are noted for fleeing waters where the oxygen level is too low. Thiekem is a specie of Guinean affinity, particularly abundant in the hot season in Senegal.
A preprocessing process has been applied on the dataset, which allowed us to identify the eventually covariates: SBT, SST, SBS,and SSS. Figure 3 illustrates the spatial variation of each of these selected covariates. Figure 2 presents the spatial repartition of the previous three species. For example, one can observe that Thiekem is very coastal specie manly presents in depth between 10 and 20m. This means that Thiekem prefers a high temperature and lower surface salinity, see figure 3. We aim to compare the classification quality of our method (Kernel kNN) with that of the kernel method proposed by DaboNiang et al. (2016) and three other well known classification methods, namely:

Basic kNN classifier given by the cadet package where the spatial dependency is ignore, the number of neighbors is chosen by Cross validation (CV) method. Note that in addition to the previous covariates, the geographical coordinates (longitude and latitude) are considered as basic covariates.

SVM (Support vector machines) with radial basis function defined in the
cadet package. This model is drown with the same covariates of the previous classical kNN classifier 
Classification by logit models. We select the best model in term of AIC (Akaike information criteria). That is a logit model with an intercept, longitude, latitude, SBS, and SBT as explanatory variables.
For each specie, the dataset is split in two samples: training and testing samples with respective sizes and of the whole sample size. Note that these two samples are selected randomly and have the same partition as that of the dataset. Cross validation method based on the correct classification rate (CCR), by using training sample is applied on each method to chose the optimal tune parameters. Several kernels have been used, some of them are not compactly bounded as in the theoretical hypotheses.
For the three species, the results of classification over the testing sample show differences between the various methods. For Dentex specie (see table 2), our kNN kernel method gives the best CCR () with for the CCR over the first cluster () and for second cluster (). Note that the other methods fail to calibrate between the classification quality over the two classes together. For example SVM gives CCR equals to with the best CCR over the second cluster equals to , but it is less efficient for the first cluster, only . The kernel method accounts only for the second cluster and is efficient for the first one, . Logit model seems to be not adapted to the dataset, this would be related to eventual no linear relationship.
For Pagrus specie (see table 3), the kernel method performs with of total CCR with and for the two clusters respectively. Note that all methods have difficulties to predict well the elements of the second cluster.
For Thiekem specie (see table 4), all methods gives good results with CCR around . The proposed kNN method seems to be very adaptable to this case, it gives a total CCR equals to with and for the respective two clusters when one use biweight and gaussian kernels.
kNN kernel method  kernel method  
CCR for:  CCR for :  
All  All  
Biweight  Biweight  0.765  0.717  0.822  0.653  0.453  0.889 
Epanechnikov  0.735  0.679  0.800  0.653  0.453  0.889  
Gaussian  0.776  0.830  0.711  0.653  0.453  0.889  
Indicator  0.745  0.698  0.800  0.673  0.528  0.844  
Parzen  0.724  0.623  0.844  0.673  0.528  0.844  
Triangular  0.694  0.604  0.800  0.673  0.528  0.844  
Epanechnikov  Biweight  0.745  0.755  0.733  0.612  0.396  0.867 
Epanechnikov  0.694  0.566  0.844  0.612  0.396  0.867  
Gaussian  0.765  0.792  0.733  0.551  0.265  0.889  
Indicator  0.714  0.623  0.822  0.633  0.453  0.844  
Parzen  0.745  0.698  0.800  0.643  0.509  0.800  
Triangular  0.724  0.755  0.689  0.633  0.453  0.844  
Gaussian  Biweight  0.837  0.849  0.822  0.745  0.774  0.711 
Epanechnikov  0.847  0.868  0.822  0.745  0.736  0.756  
Gaussian  0.786  0.830  0.733  0.694  0.698  0.689  
Indicator  0.786  0.830  0.733  0.694  0.698  0.689  
Parzen  0.806  0.868  0.733  0.724  0.660  0.800  
Triangular  0.837  0.849  0.822  0.735  0.736  0.733  
Triangular  Biweight  0.684  0.547  0.844  0.653  0.509  0.822 
Epanechnikov  0.714  0.642  0.800  0.653  0.509  0.822  
Gaussian  0.745  0.755  0.733  0.643  0.453  0.867  
Indicator  0.776  0.717  0.844  0.643  0.453  0.867  
Parzen  0.735  0.660  0.822  0.653  0.509  0.822  
Triangular  0.724  0.698  0.756  0.653  0.472  0.867  
Others methods  
kNN classic  0.776  0.811  0.733  
SVM  0.816  0.887  0.733  
Logit model  0.714  0.774  0.644 
kNN kernel method  kernel method  
CCR for:  CCR for :  
All  All  
Biweight  Biweight  0.724  0.452  0.851  0.704  0.097  0.985 
Epanechnikov  0.694  0.355  0.851  0.704  0.097  0.985  
Gaussian  0.755  0.581  0.836  0.684  0.00  1.00  
Indicator  0.745  0.452  0.881  0.714  0.323  0.896  
Parzen  0.663  0.258  0.851  0.694  0.032  1.00  
Triangular  0.663  0.419  0.776  0.704  0.097  0.985  
Epanechnikov  Biweight  0.694  0.419  0.821  0.694  0.032  1.00 
Epanechnikov  0.714  0.323  0.896  0.694  0.032  1.00  
Gaussian  0.796  0.581  0.896  0.714  0.226  0.940  
Indicator  0.694  0.323  0.866  0.745  0.484  0.866  
Parzen  0.724  0.387  0.881  0.755  0.419  0.910  
Triangular  0.694  0.452  0.806  0.694  0.032  1.00  
Gaussian  Biweight  0.735  0.613  0.791  0.776  0.645  0.836 
Epanechnikov  0.745  0.581  0.821  0.786  0.452  0.940  
Gaussian  0.724  0.452  0.851  0.827  0.581  0.940  
Indicator  0.745  0.581  0.821  0.786  0.452  0.940  
Parzen  0.714  0.548  0.791  0.786  0.452  0.940  
Triangular  0.765  0.581  0.851  0.776  0.6452  0.836  
Triangular  Biweight  0.724  0.516  0.821  0.755  0.452  0.896 
Epanechnikov  0.673  0.387  0.806  0.714  0.129  0.985  
Gaussian  0.745  0.581  0.821  0.745  0.387  0.91  
Indicator  0.714  0.323  0.896  0.745  0.355  0.925  
Parzen  0.704  0.387  0.851  0.694  0.032  1.00  
Triangular  0.745  0.581  0.821  0.714  0.129  0.985  
Others methods  
kNN classic  0.735  0.516  0.836  
SVM  0.786  0.452  0.940  
Logit model  0.704  0.258  0.910 
kNN kernel method  kernel method  
CCR for:  CCR for :  
All  All  
Biweight  Biweight  0.837  0.45  0.936  0.847  0.25  1.00 
Epanechnikov  0.867  0.60  0.936  0.847  0.45  0.949  
Gaussian  0.959  0.85  0.987  0.827  0.35  0.949  
Indicator  0.867  0.45  0.974  0.857  0.45  0.962  
Parzen  0.857  0.50  0.949  0.827  0.25  0.974  
Triangular  0.939  0.75  0.987  0.827  0.25  0.974  
Epanechnikov  Biweight  0.898  0.70  0.949  0.868  0.50  0.962 
Epanechnikov  0.960  0.80  1.00  0.847  0.45  0.949  
Gaussian  0.929  0.70  0.987  0.827  0.25  0.974  
Indicator  0.949  0.80  0.987  0.827  0.15  1.00  
Parzen  0.949  0.80  0.987  0.837  0.20  1.00  
Triangular  0.939  0.75  0.987  0.837  0.35  0.962  
Gaussian  Biweight  0.878  0.70  0.923  0.878  0.65  0.936 
Epanechnikov  0.888  0.60  0.962  0.878  0.65  0.936  
Gaussian  0.929  0.70  0.987  0.888  0.55  0.974  
Indicator  0.908  0.65  0.974  0.898  0.70  0.949  
Parzen  0.888  0.75  0.923  0.878  0.65  0.936  
Triangular  0.908  0.65  0.974  0.878  0.65  0.936  
Triangular  Biweight  0.929  0.80  0.962  0.867  0.50  0.962 
Epanechnikov  0.918  0.70  0.974  0.796  0.00  1.00  
Gaussian  0.939  0.75  0.987  0.847  0.40  0.962  
Indicator  0.929  0.70  0.987  0.847  0.35  0.974  
Parzen  0.908  0.75  0.949  0.867  0.50  0.962  
Triangular  0.929  0.75  0.974  0.857  0.50  0.949  
Others methods  
kNN classic  0.857  0.4  0.974  
SVM  0.908  0.75  0.95  
Logit model  0.878  0.75  0.91 
7 Conclusion
In this work, we propose a nearest neighbors method to define a nonparametric spatial predictor and discrimination for nonstrictly stationary spatial processes. The originality of the proposed method is to take into account both the distance between sites and that between the observations. We give an extension of the recent work of DaboNiang et al. (2016) on spatial kernel predictor of a stationary multivariate process. We also extend the kernel discrimination rule of Younso (2017) for multivariate strict stationary spatial processes and two cluster. We provide asymptotic results, simulation results on the predictor. This discrimination rule is applied to a prediction problem through an environmental fishering dataset. The numerical results show that proposed nearest neighbors method outperforms kernel methods, particularly in presence of a local spatial data structure. This is well known in the case of nonspatial data. One can then see the proposed methodology as a good alternative to the classical nearest neighbors approach for spatial data of Li & Tran (2009) and Younso (2017) that does not take into account the proximity between locations.
8 Appendix
We start to introduce these followings technical lemmas that will permit us to handle the difficulties induced by the random bandwidth in the expression of the function . These technical lemmas represent adaptation of the results given in Collomb (1980) (for independent multivariate data) and their generalized version by Burba et al. (2009), Kudraszow & Vieu (2013) (for independent functional data).
Technical Lemmas
For , we define
and
For all and , let
(15) 
where is the volume of the unit sphere in . It is clear that
Lemma 8.1
If the following conditions are verified:



, ,
then we have
Lemma 8.2
Under the following conditions:



,
we have,
Proofs of Lemma 3.1 and Lemma 3.2
Since the proof of Lemma 3.1 is based on the result of Lemma 8.1, it is sufficient to check conditions , and . For the proof of Lemma 3.2, it suffices to check conditions and . To check the condition we will need to use the following two lemmas.
Lemma 8.3
Lemma 8.4
Under assumptions of Theorem 3.1, we have
where
where denote the closed ball of with center and radius .
Proof of Lemma 8.4
Let , we can deduce that
by the following results. Under the lipschitz condition of (assumption (H1)), we have
(18) 
and for all , we deduce that
(19) 
in addition, For , note that by (H5) and for each
(20)  
since by (18)
Using Lemma 8.3, we can write for
(21)  
Let be a sequence of real numbers defined as , and its complementary in and write
According to the definitions of and , and equation (20), it is that