1 Introduction
Modeling expressive structure in timeseries data is a vital requirement in several scientific fields including cyber security, insiderthreat detection, and highwaysafety planning for the purpose of scientific discovery. Chief risk officers (CRO) in large corporations and government agencies are tasked with forecasting potential risks to company assets by malicious employees or contractors. Departments of Transportation (DOTs) are mandated by the Federal Highway Administration (FHWA) to forecast highway crashes as part of their strategic highway safety plans (SHSPs) in order to obtain funding (Smith, 2016). However, these fields often lack sufficient amounts of training data to accurately capture the signal of the phenomenon under study and consequently discover hidden patterns. For instance Greitzer and Ferryman (2013) state that ”ground truth” data on actual insider behavior is typically either not available or is limited. In some cases, one might acquire real data, but for privacy reasons, there is no attribution of any individuals relating to abuses or offenses—i.e., there is no ground truth. The data may contain insider threats, but these are not identified or knowable to the researcher (Greitzer and Ferryman, 2013; Gheyas and Abdallah, 2016). In highwaysafety planning, Veeramisti (2016) mentions that Departments of Transportation (DOTs) only recently started collecting monthly highwaycrash data because of the high cost and extensive process of collecting the required data.
1.1 Insider Threat
Insider threats are malicious acts carried out by current or former employees or trusted partners of an organization who abuse their authorized access to an organization’s networks, systems, and/or data (Glasser and Lindauer, 2013; Lindauer et al., 2014). Insider threats include but not limited to, theft of intellectual property or national security information, fraud, and sabotage. Many government, academic, and industry groups seek to discover and develop solutions to detect and protect against these insider threats. However, the difficulty of obtaining suitable data for research, development, and testing remains a significant hinderance (Glasser and Lindauer, 2013). This is attributed to two major factors. First, confidentiality and privacy concerns create barriers to the collection and use of such data for research purposes. To collect real data, some organization must directly monitor and record the behavior and actions of its own employees. Second, insider threats are black swan events that are very rare and only collected after the fact (Gheyas and Abdallah, 2016). Nonetheless, some studies have proposed methods for timeseries analysis. For example, Stoffel et al. (2013) proposed a Fourier and wavelet model, to estimate a timeseries model for internettraffic data from a computer network of workstations and servers.
1.2 Highway Safety
The Federal Highway Administration (FHWA) requires state Departments of Transportation (DOTs) to develop Highway Safety Plans (SHSPs) to obtain funding as part of the two legislative acts (Smith, 2016). State Departments of Transportation (DOTs) in the United States are tasked with making reasonable predictions of highway crashes on roadways based on historical data. This is so as to set realistic targets for performancebased safety programs to reduce fatalities and serious injuries. For a traffic safety policy, a modelbased approach generally is avoided for forecasting and setting targets (Veeramisti, 2016). This is because the required data collection is extensive and expensive; in addition, establishing a relationship between the performance measures and influencing factors is difficult (Kweon and Lim, 2012). For example, in order to use a modelbased approach, the Average Annual Daily Traffic (AADT) is one of the exposure variables that is required for all public roads. Most DOTs only collect the corresponding data for a sample of their facilities (Veeramisti, 2016). Nonetheless, some studies have proposed modelbased methods for timeseries analysis. For example, Veeramisti (2016) proposed a stochastic time series model, a seasonal autoregressive integrated moving average (sARIMA) model, to forecast performance measures for performancebased safety programs for the Nevada DOT. Other studies have proposed different variations of ARIMA for crash forecasting (Yannis et al., 2011; Sukhai et al., 2011). These targets can be used to determine future statewide safety improvement programs and policies. From the perspective of state agencies, predicting the number of fatalities and serious injuries is significantly important to meet the requirements of MAP21 (Smith, 2016).
1.3 Limitations and Research Question
Current methods for timeseries modeling for insiderthreat detection and highwaysafety planning (described in sections 1.1 and 1.2) are associated with two major limitations. First, the methods involve visualizing the time series for noticeable structure and patterns such as periodicity, smoothness, growing/decreasing trends and then hard coding these patterns into the statistical models during formulation. This approach is suitable for large datasets where more data typically provides more information to learn expressive structure (Wilson, 2014). That is, an analyst is able to look at the data, recognize trends (i.e. periodicity, seasonal variation with possible decay away from periodicity, longterm rising/decreasing trends, mediumterm irregularities, short seasonal variations, and noise), and then hard code those patterns into statistical algorithms such as ARIMA to represent each of these patterns. The limitation with this approach is that all the pattern discovery is done by the analyst and the resulting algorithm is used simply as a smoothing tool. Given limited amounts of data, it is difficult to explicitly incorporate expressive prior information (such as periodicity, smoothness, growing tends) into the statistical learning algorithms.
Second, most of the current literature focus on parametric models that impose strong restrictive assumptions by prespecifying the functional form and number of parameters
(Veeramisti, 2016; Gheyas and Abdallah, 2016). Prespecifying a functional form for a timeseries model could lead to either overly complex model specifications or simplistic models. It is difficult to know a priori the most appropriate function to use for modeling sophisticated insiderthreat behavior and highwaycrash occurrence that involve complex hidden patterns and many other influencing factors. Due to a finite number of parameters, parametric models are not necessarily the best to capture the entire structure of complex and sophisticated processes (Ghahramani, 2015), such as insiderthreat behavior and highwaycrash occurrence.Given limited and noisy timeseries data for insiderthreat detection and highwaysafety planning, is it possible to perform: (1) pattern discovery without hardcording trends into statistical models during formulation, and (2) model estimation that precludes prespecifying a functional form? To answer these two research questions and address the above described limitations, this paper proposed a nonparametric Bayesian framework which is applied as an example to estimate two time series models: (1) the size of email attachments sent outside an organization by insiders and (2) the annual number of highway crashes in Nevada. The proposed framework combines the flexibility of nonparametric Bayesian approaches (Williams and Rasmussen, 2006; Hjort et al., 2010; Ghahramani, 2015) and kernels with more expressive properties (Wilson and Adams, 2013; Duvenaud, 2014) than standard kernels proposed by Williams and Rasmussen (2006). In particular, a flexible nonparametric model – i.e., the Gaussian process (GP) model – is proposed as a prior distribution for the unknown regression functions, thereby precluding the need to prespecify a functional form. Flexibility is achieved by making weaker assumptions about model structure, thereby capturing the underlying behavior from data by exploring an infinite dimensional space of possible functions. Moreover, an expressive covariance – i.e., the spectral mixture (SM) kernel (Wilson, 2014)
– is proposed as the covariance for the GP model thereby precluding the need to hard cord trends and consequently allows to automatically discover hidden structure from limited data. The SM kernel is particularly useful when standard kernels such as the radial basis function fail to capture functional structure
(Schulz et al., 2017) in limited noisy data commonly found in insiderthreat detection and highwaycrash analysis. Recent advances in computing and efficient sampling methods, such as the Hamiltonian Monte Carlo (HMC) (Hoffman and Gelman, 2014)enable fast inference in flexible models. Moreover, a Bayesian modeling approach was used for inference to quantify all uncertainty, using probability distributions. The proposed GPSM model maintained the attractive interpretability properties of a standard GP model; that is, the posterior mean and credible intervals of the predicted series could be analyzed to visualize how the response variable varies over time.
The rest of this paper is organized as follows. Section 2
describes the proposed methodology, including model formulation and estimation using two efficient Bayesian inference methods. Section
3 describes the experiments performed including a description of the data used, the sample formation processes, and empirical analyses. Finally, a conclusion that summarizes the important findings from this study are presented in Section 4, including recommendations for future research.2 Methodology
2.1 Model Formulation
To formulate the methodology, consider for each data point, , that represents a response variable (such as the attachment size in emails sent by an insider outside the organization or number of highway crashes) and is a temporal covariate such as year, month, week, or day. Regression modeling can involve estimating a latent function , which maps input data, , to output data for = 1, 2, …, , where is the total number of data points. Each of the input data is of a single dimension , and X is a x matrix with rows . The observations are assumed to satisfy:
(1) 
The noise term,
, is assumed to be normally distributed with a zero mean and variance,
. Latent function represents hidden underlying trends that produced the observed timeseries data.Given that it is difficult to know a priori the most appropriate functional form to use for , a prior distribution, , over an infinite number of possible functions of interest is formulated. A natural prior over an infinite space of functions is a Gaussian process prior (Williams and Rasmussen, 2006)
. Formally, a GP is defined as a collection of random variables,
, any finite subset of which,, has a joint Gaussian distribution,
. The parameter mdenotes the mean function (or mean vector) which specifies the expected output of the function
f given input X, and denotes the covariance function (or covariance matrix or kernel) which specifies the covariance between outputs . A GP is fully parameterized by a mean function and covariance function, denoted as:(2) 
where each element in the mean vector and covariance matrix is given by equation 3 and 4, respectively:
(3)  
(4) 
The kernel function, , is used to encode prior assumptions of smoothness of functions such as periodicity, rising or decreasing longterm and shortterm trends (Williams and Rasmussen, 2006). However, such properties may not be easily noticeable in timeseries data for insiderthreat detection and highwaycrash forecasting. To address this limitation, a spectral mixture (SM) kernel (Wilson, 2014; Wilson et al., 2015) was proposed for the covariance function. The SM kernel is able to implicitly capture structure by leveraging the idea that any standard kernel can be expressed as an integral using Bochner’s theorem, as illustrated below.
A standard stationary kernels is a functions of , then
(5) 
If has a density , then is the spectral density of ; and are thus Fourier duals (Williams and Rasmussen (2006), see Appendix). This means that a spectral density over the kernel space fully defines the kernel and that furthermore every stationary kernel can be expressed as a spectral density. The spectral density modeled with a single Gaussian can be expressed as:
(6) 
The density is called the spectral density of and is given by:
(7) 
The resulting kernel is given by:
(8) 
The spectral density can be approximated by a mixture of Gaussians (Wilson et al., 2015) as follows:
(9) 
where the th component has mean vector and a covariance matrix = diag
. The inverse mean represents the component periods and the inverse standard deviation the length scales. The result is a flexible and expressive parametrization of the kernel, in which complex kernels are approximated by mixtures of simpler ones. The reader is refereed to
Wilson et al. (2015) for a more detailed formulation of the SM kernel.The likelihood of the response variable is a noisy sample from the latent function expressed as:
(10) 
Given the GP prior in Equation (2) and the data likelihood in Equation (10), the posterior distribution over the unknown function evaluations f at all data points
, was estimated using Bayes theorem:
(11)  
where:
= the posterior distribution of functions that 
best explain the response variable, given the covariates 
= the likelihood of response variable, given 
the functions and covariates 
= the prior over all possible functions of the 
response variable 
= the data (constant) 
This posterior is a Gaussian process composed of a distribution of possible functions that best explain the timeseries pattern. Given that the data are fixed, equation (11) was reformulated as the unnormalized posterior distribution
(12) 
The posterior predictive distribution for a new input
conditional on observed data is Gaussian with mean and variance given by:(13)  
(14) 
where is the covariance between each observed input and the new input .
2.2 Model Estimation
Model estimation involved finding values for eighteen hyperparameters, including nine frequency parameters and nine lengthscale parameters. These hyperparameters describe the distribution over functions rather than the functions themselves. Bayesian inference, which was adopted to estimate these hyperparameters, involved expressing prior beliefs over all the hyperparameters and using probability theory to update those beliefs in light of the observed data. In this approach, in contrast to MLE, overfitting is of less concern because optimization or the minimization of an error is not used for estimation
(McHutchon et al., 2014; Ghahramani, 2015). Recent sampling methods considered to be efficient are used to explore the posterior distribution of the proposed model. Initial sampling was performed using an approximate method, known as automatic differentiation variational inference (ADVI) (Kucukelbir et al., 2015). Final sampling used an exact Markov chain Monte Carlo (MCMC) method, known as the Hamiltonian Monte Carlo (HMC); in particular, the NoUTurn Sampler (NUTS)
(Hoffman and Gelman, 2014) was used. Bayesian optimization (Brochu et al., 2010; Emaasit and Paz, 2018) was used to tune the hyperparameters.Code and data for the experiments described in the next section are available on GitHub at https://github.com/emaasit/longrangeextrapolation.
3 Experiments
3.1 Insider Threat
3.1.1 Raw data and sample formation
The insiderthreat data used for empirical analysis in this study was provided by the computer emergency response team (CERT) division of the software engineering institute (SEI) at Carnegie Mellon University (CERTDivision, ). The data, referred to as the ”Insider Threat Tools dataset”, is a synthetic test dataset generated by Glasser and Lindauer (2013) that represents a fictitious company with employees and features such as their computers, files, removable media and websites they visited. Version ”r6.2.tar.bz2” of the dataset was used. The particular insider threat focused on is the case where a known insider sent information as email attachments from their work email to their home email.
To obtain the sample for the current analysis, the dataset was processed as follows. First, the data was filtered for a known insider with userID ”CDE1846”. Second, only emails sent by the insider outside the company organization were filtered. Third, the emailattachment size in Gigabytes was summarized into a monthly sum resulting into a sample size of 14 months ranging from 20100131 to 20110228. Table 1
shows a comparison of descriptive statistics between emails sent to a personal account versus a nonpersonal account. The size of emails sent outside the company organization is shown to be smaller than amount sent within. Finally, emails sent to the insider’s personal email account (”EwingCarlos@comcast.net”) were filtered as shown in Figure
1. This was the final sample used for analysis. It is difficult to tell any noticeable trends in this small sample size of 14.STATISTIC  EMAIL DESTINATION  

Personal account  Nonpersonal  
Mean  3.84  37.52 
Std Dev  2.28  13.38 
Min  0.18  10.56 
Max  7.96  57.78 
Count  14  14 
3.1.2 Empirical Analysis
The first eleven data points shown in blue in Figure 1 were used for training and the last three data points in red for testing. An appropriate number of terms in the sum, , was set to 10 as proposed by Wilson et al. (2015) resulting in 10 frequency parameters and 10 lengthscale parameters. A maximum number of 30 iterations was chosen to tune the 20 hyperparameters for the optimized GPSM. An ARIMA model was estimated using the methodology proposed by Veeramisti (2016) for comparison.
Figure 6 shows the results of the three estimated models. Figure (a)a shows that the Gaussian process model with a spectral mixture kernel is able to capture the structure implicitly both in regions of the training and testing data. The 95% predicted credible interval (CI) contains the true size of email attachments for the duration of the measurements. Furthermore, the C.I are much narrower than the other two models. Figure (b)b shows that the Gaussian process model with the hyperparameters of the spectral mixture kernel tuned using Bayesian optimization is also able to capture structure implicitly. The predictive performance is especially better in the region of the training data where the predicted data points entirely overlap with the training data. Performance in the region of testing data is comparable to the standard model. This finding suggests that hyperparameter tuning for small data without easily noticeable structure does not produce any significant model improvement. Figure (c)c
shows that ARIMA model is not able to capture the structure within the region of training data. This finding suggests that ARIMA models have poor performance for small data without noticeable structure. The 95% confidence interval for ARIMA is much wider than both the GP models showing a high degree of uncertaininty about the ARIMA predictions.
Table 2 shows the performance measures for the three estimated models including the root mean square error (RMSE) and mean absolute percentage error (MAPE). The GPSM model had the best predictive performance of 1.25 for RMSE and 27.96% for MAPE. The GPSM optimized model had a fairly similar predictive performance of 1.29 for RMSE and 28.80% for the MAPE. The estimated ARIMA model had the worst predictive performance of 3.20 for RMSE and 93.44% for the MAPE. These findings suggest that hyperparameter tuning for small data produces models of comparable performance. If parameter tuning takes a considerable amount of time, then it may not be necessarily to use if it does not produce any significant model improvement.
MODEL  MEASURE  

RMSE  MAPE  
GPSM  1.25  27.96% 
GPSM Optimized  1.29  28.80% 
ARIMA  3.20  93.44% 
3.2 Highway Safety
3.2.1 Raw data and sample formation
The highwaycrash data used for empirical analysis in this study was provided by the Nevada Department of Transportation (NDOT). This data consists of two types of crash severity including the annual number of fatalities and serious injuries standardized over an exposure variable called vehicle miles traveled (VMT). Vehicle miles traveled represents the total annual miles of vehicle travel divided by the total population in a state. Data for this indicator is provided by the Highway Statistics division of the Federal Highway Administration (FHWA) (FHWA, ). Data for fatal crashes and serious injuries were available from 1996 to 2004 and from 1995 to 2003, respectively.
The final samples used for the current analysis include timeseries of annual fatalities per 100 million VMT and annual serious injuries per 100 million VMT resulting in small sample sizes of 20 and 21 respectively. Table 3 provides descriptive statistics of these timeseries. Given that NDOT only collects these crash counts annually (Veeramisti, 2016), they lack seasonality as shown in Figure 9. The first 17 and 16 data points shown in blue in Figure (a)a and Figure (b)b were used for training, respectively. The remaining four data points shown in red were used for testing.
STATISTIC  CRASH TYPE  

Fatalities  Serious injuries  
Mean  1.69  12.82 
Std Dev  0.43  3.99 
Min  1.02  6.63 
Max  2.24  20.97 
Count  21  20 
3.2.2 Empirical Analysis
An appropriate number of terms in the sum, , was set to 10 as proposed by Wilson et al. (2015) resulting in 10 frequency parameters and 10 lengthscale parameters. A maximum number of 30 iterations was chosen to tune the 20 hyperparameters for the optimized GPSM. An ARIMA model was estimated using the methodology proposed by Veeramisti (2016) for comparison.
Figure 14 shows the results of the three estimated models for annual fatal crashes per 100M VMT. Figure (a)a shows that the Gaussian process model with a spectral mixture kernel is able to capture the structure implicitly both in regions of the training and testing data. The 95% predicted credible interval (CI) contains the true number of annual fatalities per 100M VMT albeit that they are are much wider than the other two models. Figure (b)b shows that the Gaussian process model with the hyperparameters of the spectral mixture kernel tuned using Bayesian optimization is also able to capture structure implicitly. The predictive performance is especially better in the region of the training data where the predicted data points entirely overlap with the training data. Predictive performance on testing data is poorer than the standard model. This finding suggests that hyperparameter tuning for small data without easily noticeable structure does not produce any significant model improvement. Figure (c)c shows that ARIMA model is not able to capture the structure within the region of testing data. This finding suggests that ARIMA models have poor performance for small data without noticeable structure. The 95% confidence interval for ARIMA is much wider than both the GP models showing a high degree of uncertaininty about the ARIMA predictions.
Figure 19 shows the results of the three estimated models for annual serious injuries per 100M VMT. Figure (a)a shows that the Gaussian process model with a spectral mixture kernel is able to capture the structure implicitly both in regions of the training and testing data. The 95% predicted credible interval (CI) contains the true number of annual serious injuries per 100M VMT. The C.I are much wider than the other two models. Figure (b)b shows that the Gaussian process model with the hyperparameters of the spectral mixture kernel tuned using Bayesian optimization is also able to capture structure implicitly. The predictive performance is especially better in the region of the training data where the predicted data points entirely overlap with the training data. Predictive performance in the region of testing data is poorer compared to the standard model. This finding suggests that hyperparameter tuning for small data without easily noticeable structure does not produce any significant model improvement. Figure (c)c shows that ARIMA model was also able to capture the structure within the region of testing data. This finding suggests that ARIMA models have good performance for timeseries that have easily noticeable structure. The 95% confidence interval for ARIMA is comparable narrower than the standard GP model showing a lesser degree of uncertainty about the ARIMA predictions.
Table 2 shows the performance measures for the three estimated models including the root mean square error (RMSE) and mean absolute percentage error (MAPE). The GPSM model had the best predictive performance of 1.25 for RMSE and 27.96% for MAPE. The GPSM optimized model had a fairly similar predictive performance of 1.29 for RMSE and 28.80% for the MAPE. The estimated ARIMA model had the worst predictive performance of 3.20 for RMSE and 93.44% for the MAPE. These findings suggest that hyperparameter tunning for small data produces models of comparable performance. If parameter tuning takes a considerable amount of time, then it may not be necessarily to use if it does not produce any significant model improvement.
MODEL  FATALITIES  INJURIES  

RMSE  MAPE  RMSE  MAPE  
GPSM  0.06  3.31%  0.87  11.5% 
GPSM Opt  0.24  14.00%  2.69  34.37% 
ARIMA  0.34  27.07%  1.68  17.72% 
4 Conclusions and Future Work
This paper proposed a Bayesian nonparametric framework to capture implicitly hidden structure in timeseries having limited data. The proposed framework, a Gaussian process with a spectral mixture kernel, was applied to timeseries process for insiderthreat detection and highwaysafety planning. The proposed framework addresses two current challenges when analyzing quite noisy timeseries having limited data whereby the time series are visualized for noticeable structure such as periodicity, growing or decreasing trends and hard coding them into prespecified functional forms. Experiments demonstrated that results from this framework outperform traditional ARIMA when the time series does not have easily noticeable structure and is quite noisy. Future work will involve evaluating the proposed framework on other different types of insiderthreat behavior.
Acknowledgements
The authors would like to thank the CERTDivision and Dr. Naveen Veeramisti for providing the ”Insider Threat Tools dataset” and NDOT’s highwaycrash data, respectively, used in the empirical analyses.
References
 Brochu et al. (2010) Brochu, E., Cora, V. M., and De Freitas, N. (2010). A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599.
 (2) CERTDivision. Software Engineering Institute. https://www.sei.cmu.edu/about/divisions/cert/index.cfm. Accessed: January 30, 2018.
 Duvenaud (2014) Duvenaud, D. (2014). Automatic model construction with Gaussian processes. PhD thesis, University of Cambridge.
 Emaasit and Paz (2018) Emaasit, D. and Paz, A. (2018). Simultaneous estimation of flexible models and associated hyperparameters: An application to activityduration modeling. In Transportation Research Board 97th Annual Meeting. Washington DC: Transportation Research Board.
 (5) FHWA. Highway Statistics Series. https://www.fhwa.dot.gov/policyinformation/statistics.cfm. Accessed: January 30, 2018.

Ghahramani (2015)
Ghahramani, Z. (2015).
Probabilistic machine learning and artificial intelligence.
Nature, 521(7553):452–459.  Gheyas and Abdallah (2016) Gheyas, I. A. and Abdallah, A. E. (2016). Detection and prediction of insider threats to cyber security: a systematic literature review and metaanalysis. Big Data Analytics, 1(1):6.
 Glasser and Lindauer (2013) Glasser, J. and Lindauer, B. (2013). Bridging the gap: A pragmatic approach to generating insider threat data. In Security and Privacy Workshops (SPW), 2013 IEEE, pages 98–104. IEEE.
 Greitzer and Ferryman (2013) Greitzer, F. L. and Ferryman, T. A. (2013). Methods and metrics for evaluating analytic insider threat tools. In Security and Privacy Workshops (SPW), 2013 IEEE, pages 90–97. IEEE.
 Hjort et al. (2010) Hjort, N. L., Holmes, C., Müller, P., and Walker, S. G. (2010). Bayesian nonparametrics, volume 28. Cambridge University Press.
 Hoffman and Gelman (2014) Hoffman, M. D. and Gelman, A. (2014). The nouturn sampler: adaptively setting path lengths in hamiltonian monte carlo. Journal of Machine Learning Research, 15(1):1593–1623.
 Kucukelbir et al. (2015) Kucukelbir, A., Ranganath, R., Gelman, A., and Blei, D. (2015). Automatic variational inference in stan. In Advances in neural information processing systems, pages 568–576.
 Kweon and Lim (2012) Kweon, Y.J. and Lim, I.K. (2012). Appropriate regression model types for intersections in safetyanalyst. Journal of Transportation Engineering, 138(10):1250–1258.
 Lindauer et al. (2014) Lindauer, B., Glasser, J., Rosen, M., Wallnau, K. C., and ExactData, L. (2014). Generating test data for insider threat detectors. JoWUA, 5(2):80–94.
 McHutchon et al. (2014) McHutchon, A. et al. (2014). Nonlinear modelling and control using Gaussian processes. PhD thesis, Citeseer.
 Schulz et al. (2017) Schulz, E., Tenenbaum, J. B., Duvenaud, D., Speekenbrink, M., and Gershman, S. J. (2017). Compositional inductive biases in function learning. Cognitive psychology, 99:44–79.
 Smith (2016) Smith, S. (2016). HSIP 2016 National Summary Report. Number FHWASA17040.
 Stoffel et al. (2013) Stoffel, F., Fischer, F., and Keim, D. A. (2013). Finding anomalies in timeseries using visual correlation for interactive root cause analysis. In Proceedings of the Tenth Workshop on Visualization for Cyber Security, pages 65–72. ACM.
 Sukhai et al. (2011) Sukhai, A., Jones, A. P., Love, B. S., and Haynes, R. (2011). Temporal variations in road traffic fatalities in south africa. Accident Analysis & Prevention, 43(1):421–428.
 Veeramisti (2016) Veeramisti, N. K. (2016). A business intelligence framework for networklevel traffic safety analyses. PhD thesis, University of Nevada, Las Vegas.
 Williams and Rasmussen (2006) Williams, C. K. and Rasmussen, C. E. (2006). Gaussian processes for machine learning. the MIT Press, 2(3):4.
 Wilson and Adams (2013) Wilson, A. and Adams, R. (2013). Gaussian process kernels for pattern discovery and extrapolation. In International Conference on Machine Learning, pages 1067–1075.
 Wilson (2014) Wilson, A. G. (2014). Covariance kernels for fast automatic pattern discovery and extrapolation with gaussian processes. University of Cambridge.
 Wilson et al. (2015) Wilson, A. G., Dann, C., Lucas, C., and Xing, E. P. (2015). The human kernel. In Advances in neural information processing systems, pages 2854–2862.
 Yannis et al. (2011) Yannis, G., Antoniou, C., and Papadimitriou, E. (2011). Modeling traffic fatalities in europe. In 90th Annual Meeting of the Transportation Research Board, Washington, DC.