1 Introduction
Detailed information about interactions between molecules in biological processes can be an important ingredient for methods that aim to predict cellular behavior in response to perturbations and in deriving mechanistic models of disease processes Khatri et al. (2012). Typical networks models in the biological literature are based on small scale targeted molecular experiments and while having led to a significant corpus of molecular biology knowledge, they are often incomplete and / or not quantitatively predictive. New modeling opportunities arise from modern perturbation screening and profiling techniques that have been used to generate rich biological datasets. These have greatly improved the potential for using machine learning technology to build much more comprehensive predictive interaction models according to what one might call the perturbationinteractionresponse paradigm.
Standard computational methods, however, are typically insufficient for accurate and efficient network inference. Static models, e.g., using partial correlation, maximum entropy or mutual information do not use timeresolved information Locasale and WolfYadlin (2009); Chan et al. (2017)
. Dynamic models, e.g. dynamic Bayesian networks, and ordinary differential equation (ODE) models typically require prior knowledge of kinetic parameters, which are often not available
Zou and Conzen (2005); Hill et al. (2017). Deep learning models, although usually providing good predictive ability, often lack interpretability to help understand the system
Zhou and Troyanskaya (2015); Montavon et al. (2018). Recent work of interpretable neural differential equations by Rackauckas et al. (2020) combines the idea of neural ODEs Chen et al. (2018) and physicsbased models, providing a potential merge of deep learning and biological network inference for scientific machine learning models.We have previously developed a hybrid approach, CellBox, that combines explicit ODE models of cellular interactions with an automatic differentiation framework implemented in TensorFlow (
Yuan et al. (2020) ). While successful for a specific cancer cell line, CellBox had some key limitations including, (1) we only tested in one specific system; (2) the model was trained only on steadystate data; (3) there exists potential truncation errors during numerical ODE solving due to the use of nonstiff solvers on stiff biological systems, and (4) we used arbitrarily fixed time step, due to the fact that the computational graph is static in TensorFlow.Here, we implemented the CellBox algorithm in Julia with the aim to move beyond these limitations. We replaced back propagation through time with adjoint sensitivity Fröhlich et al. (2017) and changed Heun’s ODE solver with fixed time steps to highorder ODE solvers with adaptive time steps. Therefore, the new implementation would allow adjoint backward optimization on full trajectory simulation. Using simulated data from both abstract and biologyinspired networks, we demonstrate that the new implementation can accurately predict the dynamic time course of cell response to unseen perturbations. Importantly, this method can efficiently infer molecular interaction parameters with high accuracy for various network structures. These results, taken together, strongly suggest the broad applicability of CellBox in a diversity of systems.
2 Method
2.1 CellBox model
Following Yuan et al. (2020), a set of ODEs was used to describe the time development of the system variables upon perturbation (Eq. 1).
(1) 
where the
dimensional vector
represents the change in molecular or phenotypic measurement, e.g., log ratios of molecular concentration after and before perturbation , and is the number of molecule types. The initial condition is . quantifies the directional interaction among molecules where each denotes the interaction from network node to node . The scaling factor quantifies the strength of the natural decay term , which is the tendency of an entity to return to its steady state level before perturbation. A hyperbolic tangent envelope function is used to introduce a saturation effect of the interaction term. denotes element wise multiplication.By integrating Eq. 1 using an ODE solver, we can predict the time evolution of after a given perturbation . CellBox learns these system parameters from experimental data. With given input data, CellBox optimizes and by minimizing the mismatch between model predictions and experimental measurements. Here, instead of experimental data, we provide CellBox timeseries pseudoexperimental data, generated from a given network model assumed to be the ground truth. Multiple time points are sampled and noise is added for different perturbation conditions, which collectively form the training data. Once trained, the models can predict the responses to unseen perturbations of the system.
2.2 Adjoint sensitivity methods in dynamic system optimization
The dimension of can be on the order of hundreds and thousands (human cells have about 20,000 genes in total), thus models can have interaction parameters, leading to computational challenges for the parameter inference algorithm. Recently developed neural ODEs Chen et al. (2018); Rubanova et al. (2019) have shown promise in efficiently learning complex dynamic models from timeseries data. One can view Eq. 1
as a neural network with single hidden layer where both the dimension of input and output is equal to
. We have reimplemented CellBox to learn the model parameters in the framework of neural ODEs. For instance, we denote the model predictions as(2) 
And we use the loss function of mean absolute error (MAE), i.e.,
(3) 
where and correspond to simulated and measured molecular profiles. In this work, we use the ODE solver of the Tsitouras 5/4 RungeKutta method, implemented as Tsit5() in the Julia package of DifferentialEquations.jl developed by Rackauckas and Nie (2017). The firstorder optimizer Adam proposed in Kingma and Ba (2014) implemented in Flux.jl by Innes (2018) is adopted with a default learning rate of 0.001. Learning rate annealing is exploited. We employ weight decay to encourage sparsity in the interaction matrix (which is expected given our understanding of interactions in biological networks) and the value of weight decay is treated as a hyperparameter. To accelerate the training, the training proceeds with a specific minibatching, in which we sample time points from each perturbation condition as a single batch and we iteratively loop over all of the perturbation conditions. Unlike regular deep learning tasks, in addition to tracking the loss function, we also monitor the convergence of the model parameters by inspecting the Pearson correlations between learned interaction weights and ground truth network connections, reported as inference accuracy.
3 Results
3.1 Abstract synthetic systems
We first conduct proofofconcept experiments by applying the CellBox algorithm to several typical abstract networks presented in Pratapa et al. (2020), including linear (LI), cyclic (CY), and bifurcation (BI) networks. We artificially introduce perturbations into the systems and use ODE equations to simulate pseudoexperimental response trajectories, from which samples are drawn at multiple time points and used in training. To mimic realistic experimental setups, each synthetic perturbation condition is constrained such that no more than 2 nodes are perturbed simultaneously and at most 5 time points are sampled. We discuss the effect of the number of nodes perturbed and the number of time steps on model performance in Section 4.
We evaluated both the predictive accuracy of cell responses to these perturbations, as well as the accuracy of network inference. The model was trained with as few as 10 perturbation conditions (Table 1). Our results indicated that on these simulated biological systems, the CellBox method could train predictive models of system responses to external perturbations (average MAE ), as well as capture the system structures by inferring the network interaction parameters (average Pearson’s correlation). Overall, while different systems resulted in different efficiency of parameter inference, CellBox could be effectively trained for all four system types tested, when provided with sufficient perturbation conditions. These results suggest the potential applicability of such methods to diverse biological systems.
3.2 Biologyinspired synthetic systems
We tested CellBox on synthetic analogs of four real biological systems adapted from Pratapa et al. (2020), including mammalian cortical area development (mCAD), ventral spinal cord (VSC) development, hematopoietic stem cell (HSC) differentiation and gonadal sex determination (GSD), which were humancurated from the research literature and have been repeatedly validated by experiments. We then used these biologyinspired synthetic system models in simulation and model training in the same way as the synthetic systems described above. Similar to the abstract systems, these models achieved the same level of prediction error (MAE: ) and inference accuracy (Pearson’s correlation: ) (Table 2) and accurately predicted postperturbational responses in each of the four systems described in Pratapa et al. (2020).
4 Conclusion and discussion
Constructing datadriven models for molecular interactions and cellular behavior of biological systems is useful but challenging. Here we adapted the CellBox method Yuan et al. (2020); Shen et al. (2020), a hybridization of explicit dynamic models and machine learning optimization, with adjoint sensitivity methods in Julia. We tested performance on a diverse range of synthetic network systems, and demonstrate that by simulating perturbationresponse data and using such data to construct interaction models, CellBox methods can be effectively trained to predict responses and reconstruct the ground truth interaction network. The overall performance across diverse systems, including several typical abstract networks and multiple biologyinspired networks, suggest that the CellBox can potentially be applied to a wide variety of biological systems.
Another improvement in this work compared to the first version of CellBox is that the previous models used only single endpoint measurement, assuming the system had reached equilibrium. It also obvious that time series data can be more informative for understanding and simulating the system, as explored in (Nyman et al. (2020)). To explore this potential we used simulated time series data for training. The models then are challenged to predict the full time series of the dynamic response to perturbations. This encouraged us to redesign the optimization procedure, e.g. by incorporating NeuralODE and adjoint sensitivity methods from Chen et al. (2018); Fröhlich et al. (2017), as well as to challenge the model training efficiency and robustness, for which more careful finetuning are used to ensure stable simulation and modeling.
The data generation procedure used here does not systematically explore what are the minimal or optimal requirement of data for successful training. Some of the models above can be effectively trained with even less data, e.g., only 1 time point measurement; so it is not surprising that some biological datasets that are sparse in the time dimension are still informative. However, we did test scenarios where each synthetic condition has more than 2 nodes perturbed or more than 5 time points measured. As expected, increased information benefits model training and results in better performance even with a smaller sample size. Taken together, these observations might help design the scope of perturbation experiments needed to accurately simulate and predict the responses of a particular system. In future work, exploring associations of experiment size and quality with successful training would be of tremendous help in experimental design and validation.
References
 Gene regulatory network inference from singlecell data using multivariate information measures. Cell Systems 5 (3), pp. 251–267.e3. External Links: Document Cited by: §1.
 Neural Ordinary Differential Equations. NeurIPS. External Links: 1806.07366v5 Cited by: §1, §2.2, §4.

Scalable parameter estimation for genomescale biochemical reaction networks
. PLOS Computational Biology 13 (1), pp. 1–18. External Links: Document Cited by: §1, §4.  Context Specificity in Causal Signaling Networks Revealed by Phosphoprotein Profiling. Cell Syst 4 (1), pp. 73–83. Cited by: §1.

Flux: elegant machine learning with julia.
Journal of Open Source Software
3 (25), pp. 602. Cited by: §2.2.  Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Comput Biol 8 (2), pp. e1002375. Cited by: §1.
 Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §2.2.
 Maximum entropy reconstructions of dynamic signaling networks from quantitative proteomics data. PLoS One 4 (8), pp. e6522. Cited by: §1.
 Methods for interpreting and understanding deep neural networks. 73, pp. 1–15. Cited by: §1.
 Perturbation biology links temporal protein changes to drug responses in a melanoma cell line. PLOS Computational Biology 16 (7), pp. 1–26. External Links: Document Cited by: §4.
 Benchmarking algorithms for gene regulatory network inference from singlecell transcriptomic data. Nature Methods 17 (2), pp. 147–154. External Links: Document, ISSN 15487105 Cited by: §3.1, §3.2.
 Universal differential equations for scientific machine learning. External Links: 2001.04385 Cited by: §1.
 Differentialequations. jl–a performant and featurerich ecosystem for solving differential equations in julia. Journal of Open Research Software 5 (1). Cited by: §2.2.
 Latent odes for irregularlysampled time series. arXiv preprint arXiv:1907.03907. Cited by: §2.2.
 Constructing datadriven network models of cell dynamics with perturbation experiments. Learning Meaningful Representations of Life workshop at NeurIPS 2020. Cited by: §4.
 CellBox: interpretable machine learning for perturbation biology with application to the design of cancer combination therapy. Cell Systems. Cited by: §1, §2.1, §4.
 Predicting effects of noncoding variants with deep learningbased sequence model. Nat Methods 12 (10), pp. 931–934. Cited by: §1.
 A New Dynamic Bayesian Network (DBN) Approach for Identifying Gene Regulatory Networks From Time Course Microarray Data. Bioinformatics (Oxford, England) 21 (1). External Links: Document, ISSN 13674803 Cited by: §1.
Comments
There are no comments yet.