In 2017, Jeffrey C. Hall, Michael Rosbash, and Michael W. Young were awarded the Nobel prize in physiology or medicine for their discoveries of molecular mechanisms controlling the circadian rhythm of plants. Indeed, one of the focus areas of modern scientific research is to reveal mysteries related to genes, their interactions, and their connection to observable phenotypes. Application areas of analysing interactions of genes are not limited to plants and their circadian clocks. In the field of biomedicine, for example, knowledge of genes and their interactions plays a crucial role in prevention and cure of diseases.
Interactions between genes are typically represented as a gene regulatory network (GRN) whose nodes correspond to different genes, and a directed edge denotes a direct causal effect of some gene on another gene. The usual problem statement is to infer the network topology from given gene expression data. Classically, GRN inference has been based on analysing steady state data corresponding to gene knockout experiments, based on silencing one gene and observing changes in the steady state expressions of other genes. However, carrying out knockout experiments on a high number of genes is costly and technically infeasible. Moreover, for example circadian clocks of plants are oscillating systems, and in practice it can be difficult to determine whether a particular measurement corresponds to a steady state. In contrast, methods that infer GRNs from time series data can infer the networks on a more global level using data from few experiments. Therefore GRN inference from time series data has gained more and more attention recently. This article presents BINGO (Bayesian Inference of Networks using Gaussian prOcess dynamical models). BINGO is designed for network inference mainly from time series data, but steady state measurements can be straightforwardly implemented as well.
We model the time series data as samples from a continuous trajectory,
where represents measurement noise. The continuous trajectory is assumed to satisfy a nonlinear stochastic differential equation
where is some driving process noise. Here is an -valued function, and thus also
If function depends on , in the corresponding GRN there is a link from gene to gene . The task in GRN inference is to discover this regulatory interconnection structure between variables. It is known that the dynamics of one gene can only be influenced by few other genes, that is, a component only depends on some components of .
What is characteristic to the GRN inference problem is that the data tends to be rather poor in terms of temporal resolution and the overall amount of data. Low temporal resolution has a deteriorating effect on network inference—in linear systems ( in (1.2)) this is illustrated by the fact that matrices and do not share even approximatively the same sparsity pattern when is big. In addition, many methods use derivatives estimated directly from the time series data using difference approximations or curve fitting techniques. Using different techniques can significantly effect the results, and therefore avoiding such derivative approximations is desirable.
In this article, we introduce a continuous-time version of the so-called Gaussian process dynamical model (GPDM) . Essentially, the dynamics function in (1.2) is modelled as a Gaussian process with some covariance function . This defines as a stochastic process. We prove existence and uniqueness of solutions of the corresponding differential equation, and the convergence of the Euler discretisation. We derive a probability distribution for the discretised trajectory, enabling direct MCMC sampling of realisations of this process. This way the derivative approximations are avoided. This trajectory sampling is illustrated in Figure 1, showing samples from , where represents hyperparameters, and comprises the time series measurements. Then is the measurement model arising from (1.1), and is the probability distribution for the trajectory , arising from (1.2). Network inference is then done by estimating the hyperparameters of the chosen covariance function. The technique of performing variable selection based on estimating variable-specific hyperparameters is known as automatic relevance determination (ARD) [24, 28]. In our approach, missing measurements as well as non-constant sampling frequency are easy to treat, and these functionalities are already implemented in the code, as well as a possibility to include prior information on the network. In the end, we are mainly interested in which of the variable-specific hyperparameters are non-zero. Therefore we introduce an indicator variable matrix for these hyperparameters, that can also be interpreted as the adjacency matrix of the GRN. The pipeline of BINGO is illustrated in Figure 2. We demonstrate that BINGO is superior in dealing with poor time resolution while still remaining computationally feasible.
Linear version of the method is presented in , where the MCMC samplers for the trajectory and the network topology are introduced. Other methods that model the dynamics function using a Gaussian process are presented in [3, 20, 31, 32]. The main difference between BINGO and these older methods is that they all treat the problem as a nonlinear regression problem with input-output pairs 
or using some other method to approximate the derivatives from the time series data. As mentioned above, we avoid this derivative estimation by fitting continuous-time trajectories to the time series data. It should be noted that the GP framework can handle combinatorial effects, meaning nonlinearities that cannot be decomposed into . This is an important property for modelling a chemical system—such as gene expression—where reactions can happen due to combined effects of reactant species. For example, the dynamics corresponding to a chemical reaction cannot be modelled by .
Several GRN inference problems from different types of data have been posed as competitive challenges by the Dialogue for Reverse Engineering Assessments and Methods (DREAM) project. Results and conclusions, as well as some top performers are introduced in [26, 18]. The inference problems are based mainly on two types of data, namely time series data, and steady state measurements corresponding to gene knockout or knockdown experiments. A review 
on methods based on time series data concluded that methods based on nonparametric nonlinear differential equations performed best. Such methods include Gaussian process models and random forest models. Other types of ODE models tend to make rather restrictive assumptions on the dynamics, such as linear dynamics [1, 21, 4], or a library of nonlinear functions [6, 8, 30, 25]. Mechanistic models [2, 29] try to fit the data using dynamical systems constructed from enzyme kinetics equations.
The rest of the article is organised as follows. In Section 2, we introduce the continuous-time Gaussian process dynamical models, and we prove the existence and uniqueness of solutions, and the convergence of the Euler discretisation. This is applied in Section 3 where we derive the probability distribution for the discretised trajectory. The full network inference method BINGO is introduced in Section 4 including incorporating several time series and knockout/knockdown experiments. Section 5 is devoted to benchmark data experiments. We apply BINGO to the DREAM4 In Silico Network Challenge data [26, 27, 35, 39] using either the time series data only, or including also the steady state experiments. The method is compared with the best performers in the DREAM4 challenge, as well as with more recent methods. Moreover, BINGO’s performance with low sampling frequency is demonstrated by performing network inference on simulated data of the circadian clock of Arabidopsis thaliana , using different sampling frequencies. Finally, to demonstrate BINGO’s performance using real data, it has been applied to the IRMA in vivo dataset , which is obtained from a synthetic network of five genes, and therefore the ground truth network is known. In Section 6, we provide a short summary and discuss future prospects.
2. Continuous-time Gaussian process dynamical models
Discrete-time GPDMs (a.k.a. Gaussian process state space models [12, 11]) were originally introduced in , whose treatment was based on the GP latent variable models . They are an effective tool for analysing time series data that is produced by a dynamical system that is unknown to us, or somehow too complicated to be presented using classical modelling techniques. In the original paper , the method was used for human motion tracking from video data. Motion tracking problems remain the primary use of GPDMs [9, 13], but other types of applications have emerged as well, such as speech analysis , traffic flow prediction , and electric load prediction .
In this section we study theoretical properties of the continuous time GPDM trajectory defined as the solution on for some fixed to the stochastic differential equation
where the initial statefor some covariance matrix , is a smooth deterministic input function, and is an -dimensional Brownian motion with diagonal covariance matrix . Finally, where each component conditioned on a trajectory is modelled as a Gaussian process. For simplicity, we assume that each is centred (see Remark 2.2) and has a covariance depending on (input variable) and (state variable). That is, and .
By Mercer’s theorem, each covariance can be represented as
Hence a Gaussian with covariance can be modelled by
where are mutually independent. From this it is clear that for given , is Gaussian, whereas for random , it is usually not.
Throughout the article we make the following assumption on the covariances .
For every , there exists a constant such that
uniformly in .
The GRN inference algorithm developed below is based on the squared exponential covariance functions
and estimating the hyperparameters . The hyperparameters satisfy and . If , it indicates that gene is a regulator of gene . The Assumption 2.1 is satisfied with the constant .
Before stating and proving existence and uniqueness result for (2.1) we need one technical lemma.
Suppose that Assumption 2.1 holds and let be arbitrary. Then for any there exists a constant depending on and the numbers such that
Since is a Gaussian vector, it suffices to prove the claim only for . Furthermore, by triangle inequality, it suffices to prove that for each component we have
and Assumption 2.1 implies
which concludes the proof. ∎
The claim follows from Lemma 2.1 together with the fact that conditioned on and is Gaussian with covariance . ∎
The following existence and uniqueness result for the stochastic differential equation (2.1) justifies the use of the continuous-time GPDM model.
We use Picard iteration and define
and for we set
Taking expectation, conditioning, and using Corollary 2.1 then gives
We now claim that
This follows by induction. For we have
which proves the claim for as the supremum of Gaussian process and the supremum of
have all moments finite. Suppose
Then (2.4) implies
In particular, this gives
On the other hand, from (2.3) we get
Consequently, taking expectation gives
and thus we also have
This implies that
We stress that while we assumed the Gaussian process to be centred for the sake of simplicity, the extension to a non-centred case is rather straightforward. Indeed, if for each component the mean function is Lipschitz continuous with respect to uniformly in , i.e.
then the existence and uniqueness follows from the above proof by centering first. We leave the details to the reader.
The following result studies the basic properties of the solution.
Clearly, each in the proof of Theorem 2.1 is continuous. Consequently, the solution is continuous as a uniform limit of continuous trajectories. The Hölder continuity then follows from (2.1) and the Hölder continuity of the Brownian motion . Indeed, since is continuous in and is bounded as a continuous function on a bounded interval , it follows that is also bounded. Finally, the existence of all moments follow from the fact that has all the moments finite as well as has all the moments finite. ∎
The method’s numerical implementation will be based on the Euler discretised equation (2.1). Define therefore a partition of the compact interval of interest . Denote the discretised trajectory corresponding to the partition by , and recall that its dynamics are given by
where , and . Later, we will obtain a probability distribution for the discrete trajectory , but first we show the pointwise (in ) convergence to the continuous solution of (2.1) as the temporal discretisation is refined.
We study the continuous version defined for by
Note that for all .
Suppose that Assumption 2.1 holds and consider arbitrary discretisation such that , where . Then for any
Moreover, for any we have
Let and denote
where denotes the -norm. Now
As in the proof of Theorem 2.1, this implies
Let now be large enough such that . We get
Iterating then gives
Note next that
for some other constant . Since
as and is bounded by assumption, it follows that
for some unimportant constant . But now
for from which it follows that proving the first claim. Finally, the second claim is a direct consequence of the Borel-Cantelli lemma. ∎
3. Probability distribution of the discretised trajectory
The probability distribution is now derived for the discrete trajectory , where denotes collectively all the hyperparameters. The discretisation level index is dropped now, since we only use one discretisation level from now on. It holds that
For given , the trajectory is a Markov process, and therefore its distribution satisfies
Let us introduce notation and . Same notation is also used for the different dimensions of the trajectory. Then it holds that
where is a diagonal matrix whose element is , and .
Now in the integral (3.1) depends only on the values of at points . By definition of a Gaussian process, the integral can equivalently be computed over a collection of finite-dimensional, normally distributed random variables where has mean zero, and covariance given elementwise by . The integral in (3.1) can be computed analytically (see Appendix B),
Applying the Woodbury identity to the exponent gives
and the determinant lemma gives (recall is a diagonal matrix with ’s on the diagonal)
Finally, the desired probability distribution is
Note that above it was implicitly assumed that the covariance is positive definite. This assumption is only violated if for some or if the covariance function is degenerate. In this case the integral should be computed over a lower-dimensional variable , but the end result would not change.
Note also that (3.2) corresponds to the finite dimensional distribution of the continuous Euler scheme (2.6) evaluated at discretisation points. Since (2.6) converges strongly to the solution of (2.1), the finite dimensional distributions converge as well. This means that (3.2) is a finite dimensional approximation of the distribution of .
4. Network inference method
Consider then the original problem, that is, estimating the hyperparameters from given time series data. Denote where is assumed to be a noisy sample from the continuous trajectory , that is, , and is a Gaussian noise with zero mean and covariance , and when . We intend to draw samples from the parameter posterior distribution using an MCMC scheme. Therefore, we only need the posterior distribution up to constant multiplication. Denoting the hyperparameters collectively by , the hyperparameter posterior distribution is
Here is the Gaussian measurement error distribution, will be approximated by (3.2) for the discretised trajectory , and is a prior for the hyperparameters. This prior consists of independent priors for each parameter. The integration with respect to the trajectory is done by MCMC sampling. In the network inference algorithm, we consider only the squared exponential covariance function (2.2). The function has mean
where and are regarded as nonnegative hyperparameters corresponding to basal transcription () and mRNA degradation ().
For sampling the hyperparameters in the squared exponential covariance function (2.2), we introduce an indicator variable as in . That is, each hyperparameter is a product , where and . The state of the sampler consists of the indicator variable , the hyperparameters () , , , , , and the discrete trajectory . They are sampled using a Gibbs sampler (or more precisely, Metropolis–Hastings within Gibbs sampler) as described below.
For the Gibbs sampler, notice that given in (3.2) is readily factorised in form
This factorisation makes it natural to sample , , , , and one dimension at a time. However, each factor still depends on the full trajectory , so the trajectory sampling is done separately. Also, when using the Crank–Nicolson sampling (see Section A.2), the sampling of is intertwined with the trajectory sampling, so they are sampled together. This two-phase sampling scheme is described in the following algorithm. Here the algorithm is presented in its basic form. Some ways to make the sampling more efficient are presented in Appendix A. We assume that the initial time coincides with the time of the first measurement , so that . In the algorithm, this is included in the data fit term .
Denote the samples by parenthesised superindex, e.g., is the trajectory of the sample. The proposal samples are denoted by a hat.
Indicator and hyperparameter sampling:
Sample the row of by drawing
from uniform distribution over. Then
Sample , , , and using random walk sampling, that is, add small changes to each component, drawn from zero-mean normal distribution. If the proposal sample is negative, take its absolute value.
Accept the proposal samples with probability
where is the hyperparameter prior, and the factors are defined in (4.1).
Sample with random walk sampling, with acceptance probability