DeepAI
Log In Sign Up

Capturing Structure Implicitly from Time-Series having Limited Data

Scientific fields such as insider-threat detection and highway-safety planning often lack sufficient amounts of time-series data to estimate statistical models for the purpose of scientific discovery. Moreover, the available limited data are quite noisy. This presents a major challenge when estimating time-series models that are robust to overfitting and have well-calibrated uncertainty estimates. Most of the current literature in these fields involve visualizing the time-series for noticeable structure and hard coding them into pre-specified parametric functions. This approach is associated with two limitations. First, given that such trends may not be easily noticeable in small data, it is difficult to explicitly incorporate expressive structure into the models during formulation. Second, it is difficult to know a priori the most appropriate functional form to use. To address these limitations, a nonparametric Bayesian approach was proposed to implicitly capture hidden structure from time series having limited data. The proposed model, a Gaussian process with a spectral mixture kernel, precludes the need to pre-specify a functional form and hard code trends, is robust to overfitting and has well-calibrated uncertainty estimates.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

11/21/2016

Probabilistic structure discovery in time series data

Existing methods for structure discovery in time series data construct i...
01/07/2020

Scalable Hybrid HMM with Gaussian Process Emission for Sequential Time-series Data Clustering

Hidden Markov Model (HMM) combined with Gaussian Process (GP) emission c...
09/06/2017

Deep and Confident Prediction for Time Series at Uber

Reliable uncertainty estimation for time series prediction is critical i...
04/16/2021

Data Generating Process to Evaluate Causal Discovery Techniques for Time Series Data

Going beyond correlations, the understanding and identification of causa...
07/04/2016

Automatic Generation of Probabilistic Programming from Time Series Data

Probabilistic programming languages represent complex data with intermin...
11/18/2019

Detecting structural breaks in eigensystems of functional time series

Detecting structural changes in functional data is a prominent topic in ...

1 Introduction

Modeling expressive structure in time-series data is a vital requirement in several scientific fields including cyber security, insider-threat detection, and highway-safety planning for the purpose of scientific discovery. Chief risk officers (CRO) in large corporations and government agencies are tasked with forecasting potential risks to company assets by malicious employees or contractors. Departments of Transportation (DOTs) are mandated by the Federal Highway Administration (FHWA) to forecast highway crashes as part of their strategic highway safety plans (SHSPs) in order to obtain funding (Smith, 2016). However, these fields often lack sufficient amounts of training data to accurately capture the signal of the phenomenon under study and consequently discover hidden patterns. For instance Greitzer and Ferryman (2013) state that ”ground truth” data on actual insider behavior is typically either not available or is limited. In some cases, one might acquire real data, but for privacy reasons, there is no attribution of any individuals relating to abuses or offenses—i.e., there is no ground truth. The data may contain insider threats, but these are not identified or knowable to the researcher (Greitzer and Ferryman, 2013; Gheyas and Abdallah, 2016). In highway-safety planning, Veeramisti (2016) mentions that Departments of Transportation (DOTs) only recently started collecting monthly highway-crash data because of the high cost and extensive process of collecting the required data.

1.1 Insider Threat

Insider threats are malicious acts carried out by current or former employees or trusted partners of an organization who abuse their authorized access to an organization’s networks, systems, and/or data (Glasser and Lindauer, 2013; Lindauer et al., 2014). Insider threats include but not limited to, theft of intellectual property or national security information, fraud, and sabotage. Many government, academic, and industry groups seek to discover and develop solutions to detect and protect against these insider threats. However, the difficulty of obtaining suitable data for research, development, and testing remains a significant hinderance (Glasser and Lindauer, 2013). This is attributed to two major factors. First, confidentiality and privacy concerns create barriers to the collection and use of such data for research purposes. To collect real data, some organization must directly monitor and record the behavior and actions of its own employees. Second, insider threats are black swan events that are very rare and only collected after the fact (Gheyas and Abdallah, 2016). Nonetheless, some studies have proposed methods for time-series analysis. For example, Stoffel et al. (2013) proposed a Fourier and wavelet model, to estimate a time-series model for internet-traffic data from a computer network of workstations and servers.

1.2 Highway Safety

The Federal Highway Administration (FHWA) requires state Departments of Transportation (DOTs) to develop Highway Safety Plans (SHSPs) to obtain funding as part of the two legislative acts (Smith, 2016). State Departments of Transportation (DOTs) in the United States are tasked with making reasonable predictions of highway crashes on roadways based on historical data. This is so as to set realistic targets for performance-based safety programs to reduce fatalities and serious injuries. For a traffic safety policy, a model-based approach generally is avoided for forecasting and setting targets (Veeramisti, 2016). This is because the required data collection is extensive and expensive; in addition, establishing a relationship between the performance measures and influencing factors is difficult (Kweon and Lim, 2012). For example, in order to use a model-based approach, the Average Annual Daily Traffic (AADT) is one of the exposure variables that is required for all public roads. Most DOTs only collect the corresponding data for a sample of their facilities (Veeramisti, 2016). Nonetheless, some studies have proposed model-based methods for time-series analysis. For example, Veeramisti (2016) proposed a stochastic time series model, a seasonal auto-regressive integrated moving average (sARIMA) model, to forecast performance measures for performance-based safety programs for the Nevada DOT. Other studies have proposed different variations of ARIMA for crash forecasting (Yannis et al., 2011; Sukhai et al., 2011). These targets can be used to determine future statewide safety improvement programs and policies. From the perspective of state agencies, predicting the number of fatalities and serious injuries is significantly important to meet the requirements of MAP-21 (Smith, 2016).

1.3 Limitations and Research Question

Current methods for time-series modeling for insider-threat detection and highway-safety planning (described in sections 1.1 and 1.2) are associated with two major limitations. First, the methods involve visualizing the time series for noticeable structure and patterns such as periodicity, smoothness, growing/decreasing trends and then hard coding these patterns into the statistical models during formulation. This approach is suitable for large datasets where more data typically provides more information to learn expressive structure (Wilson, 2014). That is, an analyst is able to look at the data, recognize trends (i.e. periodicity, seasonal variation with possible decay away from periodicity, long-term rising/decreasing trends, medium-term irregularities, short seasonal variations, and noise), and then hard code those patterns into statistical algorithms such as ARIMA to represent each of these patterns. The limitation with this approach is that all the pattern discovery is done by the analyst and the resulting algorithm is used simply as a smoothing tool. Given limited amounts of data, it is difficult to explicitly incorporate expressive prior information (such as periodicity, smoothness, growing tends) into the statistical learning algorithms.

Second, most of the current literature focus on parametric models that impose strong restrictive assumptions by pre-specifying the functional form and number of parameters

(Veeramisti, 2016; Gheyas and Abdallah, 2016). Pre-specifying a functional form for a time-series model could lead to either overly complex model specifications or simplistic models. It is difficult to know a priori the most appropriate function to use for modeling sophisticated insider-threat behavior and highway-crash occurrence that involve complex hidden patterns and many other influencing factors. Due to a finite number of parameters, parametric models are not necessarily the best to capture the entire structure of complex and sophisticated processes (Ghahramani, 2015), such as insider-threat behavior and highway-crash occurrence.

Given limited and noisy time-series data for insider-threat detection and highway-safety planning, is it possible to perform: (1) pattern discovery without hard-cording trends into statistical models during formulation, and (2) model estimation that precludes pre-specifying a functional form? To answer these two research questions and address the above described limitations, this paper proposed a nonparametric Bayesian framework which is applied as an example to estimate two time series models: (1) the size of email attachments sent outside an organization by insiders and (2) the annual number of highway crashes in Nevada. The proposed framework combines the flexibility of nonparametric Bayesian approaches (Williams and Rasmussen, 2006; Hjort et al., 2010; Ghahramani, 2015) and kernels with more expressive properties (Wilson and Adams, 2013; Duvenaud, 2014) than standard kernels proposed by Williams and Rasmussen (2006). In particular, a flexible nonparametric model – i.e., the Gaussian process (GP) model – is proposed as a prior distribution for the unknown regression functions, thereby precluding the need to pre-specify a functional form. Flexibility is achieved by making weaker assumptions about model structure, thereby capturing the underlying behavior from data by exploring an infinite dimensional space of possible functions. Moreover, an expressive covariance – i.e., the spectral mixture (SM) kernel (Wilson, 2014)

– is proposed as the covariance for the GP model thereby precluding the need to hard cord trends and consequently allows to automatically discover hidden structure from limited data. The SM kernel is particularly useful when standard kernels such as the radial basis function fail to capture functional structure

(Schulz et al., 2017) in limited noisy data commonly found in insider-threat detection and highway-crash analysis. Recent advances in computing and efficient sampling methods, such as the Hamiltonian Monte Carlo (HMC) (Hoffman and Gelman, 2014)

enable fast inference in flexible models. Moreover, a Bayesian modeling approach was used for inference to quantify all uncertainty, using probability distributions. The proposed GP-SM model maintained the attractive interpretability properties of a standard GP model; that is, the posterior mean and credible intervals of the predicted series could be analyzed to visualize how the response variable varies over time.

The rest of this paper is organized as follows. Section 2

describes the proposed methodology, including model formulation and estimation using two efficient Bayesian inference methods. Section

3 describes the experiments performed including a description of the data used, the sample formation processes, and empirical analyses. Finally, a conclusion that summarizes the important findings from this study are presented in Section 4, including recommendations for future research.

2 Methodology

2.1 Model Formulation

To formulate the methodology, consider for each data point, , that represents a response variable (such as the attachment size in emails sent by an insider outside the organization or number of highway crashes) and is a temporal covariate such as year, month, week, or day. Regression modeling can involve estimating a latent function , which maps input data, , to output data for = 1, 2, …, , where is the total number of data points. Each of the input data is of a single dimension , and X is a x matrix with rows . The observations are assumed to satisfy:

(1)

The noise term,

, is assumed to be normally distributed with a zero mean and variance,

. Latent function represents hidden underlying trends that produced the observed time-series data.

Given that it is difficult to know a priori the most appropriate functional form to use for , a prior distribution, , over an infinite number of possible functions of interest is formulated. A natural prior over an infinite space of functions is a Gaussian process prior (Williams and Rasmussen, 2006)

. Formally, a GP is defined as a collection of random variables,

, any finite subset of which,

, has a joint Gaussian distribution,

. The parameter m

denotes the mean function (or mean vector) which specifies the expected output of the function

f given input X, and denotes the covariance function (or covariance matrix or kernel) which specifies the covariance between outputs . A GP is fully parameterized by a mean function and covariance function, denoted as:

(2)

where each element in the mean vector and covariance matrix is given by equation 3 and 4, respectively:

(3)
(4)

The kernel function, , is used to encode prior assumptions of smoothness of functions such as periodicity, rising or decreasing long-term and short-term trends (Williams and Rasmussen, 2006). However, such properties may not be easily noticeable in time-series data for insider-threat detection and highway-crash forecasting. To address this limitation, a spectral mixture (SM) kernel (Wilson, 2014; Wilson et al., 2015) was proposed for the covariance function. The SM kernel is able to implicitly capture structure by leveraging the idea that any standard kernel can be expressed as an integral using Bochner’s theorem, as illustrated below.

A standard stationary kernels is a functions of , then

(5)

If has a density , then is the spectral density of ; and are thus Fourier duals (Williams and Rasmussen (2006), see Appendix). This means that a spectral density over the kernel space fully defines the kernel and that furthermore every stationary kernel can be expressed as a spectral density. The spectral density modeled with a single Gaussian can be expressed as:

(6)

The density is called the spectral density of and is given by:

(7)

The resulting kernel is given by:

(8)

The spectral density can be approximated by a mixture of Gaussians (Wilson et al., 2015) as follows:

(9)

where the th component has mean vector and a covariance matrix = diag

. The inverse mean represents the component periods and the inverse standard deviation the length scales. The result is a flexible and expressive parametrization of the kernel, in which complex kernels are approximated by mixtures of simpler ones. The reader is refereed to

Wilson et al. (2015) for a more detailed formulation of the SM kernel.

The likelihood of the response variable is a noisy sample from the latent function expressed as:

(10)

Given the GP prior in Equation (2) and the data likelihood in Equation (10), the posterior distribution over the unknown function evaluations f at all data points

, was estimated using Bayes theorem:

(11)

where:

= the posterior distribution of functions that
best explain the response variable, given the covariates
= the likelihood of response variable, given
the functions and covariates
= the prior over all possible functions of the
response variable
= the data (constant)

This posterior is a Gaussian process composed of a distribution of possible functions that best explain the time-series pattern. Given that the data are fixed, equation (11) was re-formulated as the unnormalized posterior distribution

(12)

The posterior predictive distribution for a new input

conditional on observed data is Gaussian with mean and variance given by:

(13)
(14)

where is the covariance between each observed input and the new input .

2.2 Model Estimation

Model estimation involved finding values for eighteen hyperparameters, including nine frequency parameters and nine lengthscale parameters. These hyperparameters describe the distribution over functions rather than the functions themselves. Bayesian inference, which was adopted to estimate these hyperparameters, involved expressing prior beliefs over all the hyperparameters and using probability theory to update those beliefs in light of the observed data. In this approach, in contrast to MLE, overfitting is of less concern because optimization or the minimization of an error is not used for estimation

(McHutchon et al., 2014; Ghahramani, 2015). Recent sampling methods considered to be efficient are used to explore the posterior distribution of the proposed model. Initial sampling was performed using an approximate method, known as automatic differentiation variational inference (ADVI) (Kucukelbir et al., 2015)

. Final sampling used an exact Markov chain Monte Carlo (MCMC) method, known as the Hamiltonian Monte Carlo (HMC); in particular, the No-U-Turn Sampler (NUTS)

(Hoffman and Gelman, 2014) was used. Bayesian optimization (Brochu et al., 2010; Emaasit and Paz, 2018) was used to tune the hyperparameters.

Code and data for the experiments described in the next section are available on GitHub at https://github.com/emaasit/long-range-extrapolation.

3 Experiments

3.1 Insider Threat

3.1.1 Raw data and sample formation

The insider-threat data used for empirical analysis in this study was provided by the computer emergency response team (CERT) division of the software engineering institute (SEI) at Carnegie Mellon University (CERT-Division, ). The data, referred to as the ”Insider Threat Tools dataset”, is a synthetic test dataset generated by Glasser and Lindauer (2013) that represents a fictitious company with employees and features such as their computers, files, removable media and websites they visited. Version ”r6.2.tar.bz2” of the dataset was used. The particular insider threat focused on is the case where a known insider sent information as email attachments from their work email to their home email.

To obtain the sample for the current analysis, the dataset was processed as follows. First, the data was filtered for a known insider with userID ”CDE1846”. Second, only emails sent by the insider outside the company organization were filtered. Third, the email-attachment size in Gigabytes was summarized into a monthly sum resulting into a sample size of 14 months ranging from 2010-01-31 to 2011-02-28. Table 1

shows a comparison of descriptive statistics between emails sent to a personal account versus a non-personal account. The size of emails sent outside the company organization is shown to be smaller than amount sent within. Finally, emails sent to the insider’s personal email account (”Ewing-Carlos@comcast.net”) were filtered as shown in Figure

1. This was the final sample used for analysis. It is difficult to tell any noticeable trends in this small sample size of 14.

STATISTIC EMAIL DESTINATION
Personal account Non-personal
Mean 3.84 37.52
Std Dev 2.28 13.38
Min 0.18 10.56
Max 7.96 57.78
Count 14 14
Table 1: Descriptive statistics for monthly-attachment-size of emails in GB sent by a known insider outside the organization.
Figure 1: Monthly-attachment-size of emails in GB sent by a known insider to a personal account.
(a) GP-SM
(b) GP-SM Optimized
(c) ARIMA
(d) Number of Iterations of Bayesian Optimization
Figure 6: Comparison of Gaussian process models with spectral mixture kernels to ARIMA models for emails.

3.1.2 Empirical Analysis

The first eleven data points shown in blue in Figure 1 were used for training and the last three data points in red for testing. An appropriate number of terms in the sum, , was set to 10 as proposed by Wilson et al. (2015) resulting in 10 frequency parameters and 10 length-scale parameters. A maximum number of 30 iterations was chosen to tune the 20 hyperparameters for the optimized GP-SM. An ARIMA model was estimated using the methodology proposed by Veeramisti (2016) for comparison.

Figure 6 shows the results of the three estimated models. Figure (a)a shows that the Gaussian process model with a spectral mixture kernel is able to capture the structure implicitly both in regions of the training and testing data. The 95% predicted credible interval (CI) contains the true size of email attachments for the duration of the measurements. Furthermore, the C.I are much narrower than the other two models. Figure (b)b shows that the Gaussian process model with the hyperparameters of the spectral mixture kernel tuned using Bayesian optimization is also able to capture structure implicitly. The predictive performance is especially better in the region of the training data where the predicted data points entirely overlap with the training data. Performance in the region of testing data is comparable to the standard model. This finding suggests that hyperparameter tuning for small data without easily noticeable structure does not produce any significant model improvement. Figure (c)c

shows that ARIMA model is not able to capture the structure within the region of training data. This finding suggests that ARIMA models have poor performance for small data without noticeable structure. The 95% confidence interval for ARIMA is much wider than both the GP models showing a high degree of uncertaininty about the ARIMA predictions.

Table 2 shows the performance measures for the three estimated models including the root mean square error (RMSE) and mean absolute percentage error (MAPE). The GP-SM model had the best predictive performance of 1.25 for RMSE and 27.96% for MAPE. The GP-SM optimized model had a fairly similar predictive performance of 1.29 for RMSE and 28.80% for the MAPE. The estimated ARIMA model had the worst predictive performance of 3.20 for RMSE and 93.44% for the MAPE. These findings suggest that hyperparameter tuning for small data produces models of comparable performance. If parameter tuning takes a considerable amount of time, then it may not be necessarily to use if it does not produce any significant model improvement.

MODEL MEASURE
RMSE MAPE
GP-SM 1.25 27.96%
GP-SM Optimized 1.29 28.80%
ARIMA 3.20 93.44%
Table 2: Performance measures for the three different models on email data
(a) Number of fatal crashes per 100 million VMT
(b) Number of serious injuries per 100 million VMT
Figure 9: Number of fatal and serious injury crashes per 100 million vehicle miles travel.
(a) GP-SM for fatalities
(b) GP-SM Optimized for fatalities
(c) ARIMA for fatalities
(d) Number of Iterations in Optimization for fatalities
Figure 14: Comparison of Gaussian process models with spectral mixture kernels to ARIMA for fatal crashes.
(a) GP-SM for serious injuries
(b) GP-SM Optimized for serious injuries
(c) ARIMA for serious injuries
(d) Number of Iterations in Optimization for serious injuries
Figure 19: Comparison of Gaussian process models with spectral mixture kernels to ARIMA for serious injuries.

3.2 Highway Safety

3.2.1 Raw data and sample formation

The highway-crash data used for empirical analysis in this study was provided by the Nevada Department of Transportation (NDOT). This data consists of two types of crash severity including the annual number of fatalities and serious injuries standardized over an exposure variable called vehicle miles traveled (VMT). Vehicle miles traveled represents the total annual miles of vehicle travel divided by the total population in a state. Data for this indicator is provided by the Highway Statistics division of the Federal Highway Administration (FHWA) (FHWA, ). Data for fatal crashes and serious injuries were available from 1996 to 2004 and from 1995 to 2003, respectively.

The final samples used for the current analysis include time-series of annual fatalities per 100 million VMT and annual serious injuries per 100 million VMT resulting in small sample sizes of 20 and 21 respectively. Table 3 provides descriptive statistics of these time-series. Given that NDOT only collects these crash counts annually (Veeramisti, 2016), they lack seasonality as shown in Figure 9. The first 17 and 16 data points shown in blue in Figure (a)a and Figure (b)b were used for training, respectively. The remaining four data points shown in red were used for testing.

STATISTIC CRASH TYPE
Fatalities Serious injuries
Mean 1.69 12.82
Std Dev 0.43 3.99
Min 1.02 6.63
Max 2.24 20.97
Count 21 20
Table 3: Descriptive statistics for highway crashes per 100 million VMT in Nevada

3.2.2 Empirical Analysis

An appropriate number of terms in the sum, , was set to 10 as proposed by Wilson et al. (2015) resulting in 10 frequency parameters and 10 length-scale parameters. A maximum number of 30 iterations was chosen to tune the 20 hyperparameters for the optimized GP-SM. An ARIMA model was estimated using the methodology proposed by Veeramisti (2016) for comparison.

Figure 14 shows the results of the three estimated models for annual fatal crashes per 100M VMT. Figure (a)a shows that the Gaussian process model with a spectral mixture kernel is able to capture the structure implicitly both in regions of the training and testing data. The 95% predicted credible interval (CI) contains the true number of annual fatalities per 100M VMT albeit that they are are much wider than the other two models. Figure (b)b shows that the Gaussian process model with the hyperparameters of the spectral mixture kernel tuned using Bayesian optimization is also able to capture structure implicitly. The predictive performance is especially better in the region of the training data where the predicted data points entirely overlap with the training data. Predictive performance on testing data is poorer than the standard model. This finding suggests that hyperparameter tuning for small data without easily noticeable structure does not produce any significant model improvement. Figure (c)c shows that ARIMA model is not able to capture the structure within the region of testing data. This finding suggests that ARIMA models have poor performance for small data without noticeable structure. The 95% confidence interval for ARIMA is much wider than both the GP models showing a high degree of uncertaininty about the ARIMA predictions.

Figure 19 shows the results of the three estimated models for annual serious injuries per 100M VMT. Figure (a)a shows that the Gaussian process model with a spectral mixture kernel is able to capture the structure implicitly both in regions of the training and testing data. The 95% predicted credible interval (CI) contains the true number of annual serious injuries per 100M VMT. The C.I are much wider than the other two models. Figure (b)b shows that the Gaussian process model with the hyperparameters of the spectral mixture kernel tuned using Bayesian optimization is also able to capture structure implicitly. The predictive performance is especially better in the region of the training data where the predicted data points entirely overlap with the training data. Predictive performance in the region of testing data is poorer compared to the standard model. This finding suggests that hyperparameter tuning for small data without easily noticeable structure does not produce any significant model improvement. Figure (c)c shows that ARIMA model was also able to capture the structure within the region of testing data. This finding suggests that ARIMA models have good performance for time-series that have easily noticeable structure. The 95% confidence interval for ARIMA is comparable narrower than the standard GP model showing a lesser degree of uncertainty about the ARIMA predictions.

Table 2 shows the performance measures for the three estimated models including the root mean square error (RMSE) and mean absolute percentage error (MAPE). The GP-SM model had the best predictive performance of 1.25 for RMSE and 27.96% for MAPE. The GP-SM optimized model had a fairly similar predictive performance of 1.29 for RMSE and 28.80% for the MAPE. The estimated ARIMA model had the worst predictive performance of 3.20 for RMSE and 93.44% for the MAPE. These findings suggest that hyperparameter tunning for small data produces models of comparable performance. If parameter tuning takes a considerable amount of time, then it may not be necessarily to use if it does not produce any significant model improvement.

MODEL FATALITIES INJURIES
RMSE MAPE RMSE MAPE
GP-SM 0.06 3.31% 0.87 11.5%
GP-SM Opt 0.24 14.00% 2.69 34.37%
ARIMA 0.34 27.07% 1.68 17.72%
Table 4: Performance measures for the three different models on highway crashes per 100 million VMT

4 Conclusions and Future Work

This paper proposed a Bayesian nonparametric framework to capture implicitly hidden structure in time-series having limited data. The proposed framework, a Gaussian process with a spectral mixture kernel, was applied to time-series process for insider-threat detection and highway-safety planning. The proposed framework addresses two current challenges when analyzing quite noisy time-series having limited data whereby the time series are visualized for noticeable structure such as periodicity, growing or decreasing trends and hard coding them into pre-specified functional forms. Experiments demonstrated that results from this framework outperform traditional ARIMA when the time series does not have easily noticeable structure and is quite noisy. Future work will involve evaluating the proposed framework on other different types of insider-threat behavior.

Acknowledgements

The authors would like to thank the CERT-Division and Dr. Naveen Veeramisti for providing the ”Insider Threat Tools dataset” and NDOT’s highway-crash data, respectively, used in the empirical analyses.

References

  • Brochu et al. (2010) Brochu, E., Cora, V. M., and De Freitas, N. (2010). A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599.
  • (2) CERT-Division. Software Engineering Institute. https://www.sei.cmu.edu/about/divisions/cert/index.cfm. Accessed: January 30, 2018.
  • Duvenaud (2014) Duvenaud, D. (2014). Automatic model construction with Gaussian processes. PhD thesis, University of Cambridge.
  • Emaasit and Paz (2018) Emaasit, D. and Paz, A. (2018). Simultaneous estimation of flexible models and associated hyperparameters: An application to activity-duration modeling. In Transportation Research Board 97th Annual Meeting. Washington DC: Transportation Research Board.
  • (5) FHWA. Highway Statistics Series. https://www.fhwa.dot.gov/policyinformation/statistics.cfm. Accessed: January 30, 2018.
  • Ghahramani (2015) Ghahramani, Z. (2015).

    Probabilistic machine learning and artificial intelligence.

    Nature, 521(7553):452–459.
  • Gheyas and Abdallah (2016) Gheyas, I. A. and Abdallah, A. E. (2016). Detection and prediction of insider threats to cyber security: a systematic literature review and meta-analysis. Big Data Analytics, 1(1):6.
  • Glasser and Lindauer (2013) Glasser, J. and Lindauer, B. (2013). Bridging the gap: A pragmatic approach to generating insider threat data. In Security and Privacy Workshops (SPW), 2013 IEEE, pages 98–104. IEEE.
  • Greitzer and Ferryman (2013) Greitzer, F. L. and Ferryman, T. A. (2013). Methods and metrics for evaluating analytic insider threat tools. In Security and Privacy Workshops (SPW), 2013 IEEE, pages 90–97. IEEE.
  • Hjort et al. (2010) Hjort, N. L., Holmes, C., Müller, P., and Walker, S. G. (2010). Bayesian nonparametrics, volume 28. Cambridge University Press.
  • Hoffman and Gelman (2014) Hoffman, M. D. and Gelman, A. (2014). The no-u-turn sampler: adaptively setting path lengths in hamiltonian monte carlo. Journal of Machine Learning Research, 15(1):1593–1623.
  • Kucukelbir et al. (2015) Kucukelbir, A., Ranganath, R., Gelman, A., and Blei, D. (2015). Automatic variational inference in stan. In Advances in neural information processing systems, pages 568–576.
  • Kweon and Lim (2012) Kweon, Y.-J. and Lim, I.-K. (2012). Appropriate regression model types for intersections in safetyanalyst. Journal of Transportation Engineering, 138(10):1250–1258.
  • Lindauer et al. (2014) Lindauer, B., Glasser, J., Rosen, M., Wallnau, K. C., and ExactData, L. (2014). Generating test data for insider threat detectors. JoWUA, 5(2):80–94.
  • McHutchon et al. (2014) McHutchon, A. et al. (2014). Nonlinear modelling and control using Gaussian processes. PhD thesis, Citeseer.
  • Schulz et al. (2017) Schulz, E., Tenenbaum, J. B., Duvenaud, D., Speekenbrink, M., and Gershman, S. J. (2017). Compositional inductive biases in function learning. Cognitive psychology, 99:44–79.
  • Smith (2016) Smith, S. (2016). HSIP 2016 National Summary Report. Number FHWA-SA-17-040.
  • Stoffel et al. (2013) Stoffel, F., Fischer, F., and Keim, D. A. (2013). Finding anomalies in time-series using visual correlation for interactive root cause analysis. In Proceedings of the Tenth Workshop on Visualization for Cyber Security, pages 65–72. ACM.
  • Sukhai et al. (2011) Sukhai, A., Jones, A. P., Love, B. S., and Haynes, R. (2011). Temporal variations in road traffic fatalities in south africa. Accident Analysis & Prevention, 43(1):421–428.
  • Veeramisti (2016) Veeramisti, N. K. (2016). A business intelligence framework for network-level traffic safety analyses. PhD thesis, University of Nevada, Las Vegas.
  • Williams and Rasmussen (2006) Williams, C. K. and Rasmussen, C. E. (2006). Gaussian processes for machine learning. the MIT Press, 2(3):4.
  • Wilson and Adams (2013) Wilson, A. and Adams, R. (2013). Gaussian process kernels for pattern discovery and extrapolation. In International Conference on Machine Learning, pages 1067–1075.
  • Wilson (2014) Wilson, A. G. (2014). Covariance kernels for fast automatic pattern discovery and extrapolation with gaussian processes. University of Cambridge.
  • Wilson et al. (2015) Wilson, A. G., Dann, C., Lucas, C., and Xing, E. P. (2015). The human kernel. In Advances in neural information processing systems, pages 2854–2862.
  • Yannis et al. (2011) Yannis, G., Antoniou, C., and Papadimitriou, E. (2011). Modeling traffic fatalities in europe. In 90th Annual Meeting of the Transportation Research Board, Washington, DC.