Identifying efficient controls of complex interaction networks using genetic algorithms

by   Victor-Bogdan Popescu, et al.
Turun yliopisto

Control theory has seen recently impactful applications in network science, especially in connections with applications in network medicine. A key topic of research is that of finding minimal external interventions that offer control over the dynamics of a given network, a problem known as network controllability. We propose in this article a new solution for this problem based on genetic algorithms. We tailor our solution for applications in computational drug repurposing, seeking to maximise its use of FDA-approved drug targets in a given disease-specific protein-protein interaction network. We show how our algorithm identifies a number of potentially efficient drugs for breast, ovarian, and pancreatic cancer. We demonstrate our algorithm on several benchmark networks from cancer medicine, social networks, electronic circuits, and several random networks with their edges distributed according to the Erdős-Rényi, the small-world, and the scale-free properties. Overall, we show that our new algorithm is more efficient in identifying relevant drug targets in a disease network, advancing the computational solutions needed for new therapeutic and drug repurposing approaches.


page 1

page 2

page 3

page 4


Computational prediction and analysis of protein-protein interaction networks

Biological networks provide insight into the complex organization of bio...

Network-based Biased Tree Ensembles (NetBiTE) for Drug Sensitivity Prediction and Drug Sensitivity Biomarker Identification in Cancer

We present the Network-based Biased Tree Ensembles (NetBiTE) method for ...

Opportunities and challenges in partitioning the graph measure space of real-world networks

Based on a large dataset containing thousands of real-world networks ran...

Bayes Optimal Informer Sets for Early-Stage Drug Discovery

An important experimental design problem in early-stage drug discovery i...

Integrating Prior Knowledge Into Prognostic Biomarker Discovery based on Network Structure

Background: Predictive, stable and interpretable gene signatures are gen...

A deep belief network-based method to identify proteomic risk markers for Alzheimer disease

While a large body of research has formally identified apolipoprotein E ...

1 Introduction

Network modelling in systems medicine has emerged as a powerful analytics approach in the last couple of decades ([37, 62, 70]). Its aim is to analyse diseases and drug interventions as ways of acting, and re-acting, over bio-medical dynamical networks ([3, 22, 82]), such as the protein-protein interaction networks ([13, 36]), signalling networks ([57]), metabolic networks ([51]), and immunological responses ([15]). In this framework, a disease is seen as emerging from some of its modules being affected (directly or through cascading signals) and from critical nodes in the network being deregulated ([41]). Similarly, drug therapies are seen as outside controlled interventions within a deregulated network with the aim of either re-balancing the system or possibly isolating some specific components of the network ([7]). A particular advantage of this approach is reasoning about multiple-drug interventions, analysing and predicting multi-drug synergies, as well as aiming for personalised therapies. The current “state” of a patient can be reflected in its personalised network ([70]), by integrating elements specific to the disease, to treatment pathways, and to the patient herself, such as genetic mutations and current medical conditions and treatments.

Instead of acting over each individual disregulated component, one can try to influence several of these entities through a few well-chosen interventions, and to have them spread in cascade into the network using the network’s own internal interconnections. It turns out that network controllability is a topic of high relevance in this area with a rich theory to support it ([35]). It has found in recent years powerful applications in computational systems medicine and therapeutics ([43], [36, 13, 79, 42, 21, 26]).

The theory of network controllability aims at providing sound and theoretically accurate description of what control means within a network, and how it can be achieved. Intuitively, achieving control over a system from a set of input nodes means being able to drive that system from any initial setup to any desired state. This is an intrinsic optimisation problem with the objective to minimise the number of input nodes (e.g., drug targets) needed for the control. Additional constraints may be added depending on the application, such as the control pathways from the input nodes to the controlled nodes to be short, or the input nodes to be primarily selected from a given set of preferred nodes (targets of standard therapy drugs). This leads to several problem variations, such as: structural controllability ([43]), i.e., identifying pathways that offer control over the system regardless of its numerical setup; target controllability ([21]), i.e., achieving control over a predefined set of target nodes; minimum dominating sets ([52]), i.e., finding a minimal set of nodes that are one step upstream of all other nodes in the network. Some of these optimisation problems are known to have efficient algorithmic solutions ([43]). Others, on the contrary, are known to be computationally difficult, yet approximate efficient solutions are still achievable ([13]).

Motivated by the applicability of network control in systems medicine, the problem we focus on in this paper is minimising the number of external interventions needed to achieve target control of a system. We are particularly interested in the case where the targets are disease-specific survivability-essential genes, key targets for synthetic lethality ([60]

). We identify control interventions that are achievable through the delivery of FDA-approved drugs, by giving a preference to FDA-approved drug targets being selected as input nodes. The target controllability problem is known to be NP-hard, meaning that finding the smallest set of inputs for controlling the target set is computationally prohibitive for large networks. We give as a solution an approximation of the minimal solution based on genetic algorithms, well known heuristic choices for nonlinear optimisation problems (

[73]). We demonstrate that this approach offers an efficient solution for applications in combinatorial drug therapy identification and drug repurposing.

2 Materials and methods

2.1 Network controllability

We introduce briefly the basic concepts of network controllability and the Kalman condition for the target controllability problem. For more details we refer to [13]

. By convention, all vectors are considered to be column vectors so that the matrix-vector multiplications are well defined.

Let be an matrix, for some . The linear dynamical system defined by matrix is an -dimensional vector of real functions,

, defined as the solution of the system of ordinary differential equations


for some given . The structure of a linear dynamical system can be thought of as an edge-labeled directed graph with vertices and adjacency matrix : for any nodes , , the (directed) edge documents node being influenced by node , with the weight of the influence (documented as the label of edge ) given by the entry of matrix .

We consider a subset of input nodes , , , thought of as the nodes of the linear dynamical system on which an external contribution can be applied to influence the dynamics of the system. The subset of input nodes can also be described through its characteristic matrix , defined as follows: if and otherwise, for all and . The external influence is exerted through an -dimensional input vector of real functions, . The influence of the input vector on the linear dynamical system is described by the equation


We also consider a subset of target nodes , , , thought of as a subset of the nodes of the linear dynamical system whose dynamics we aim to control (as defined below) through a suitable choice of input nodes and of an input vector. The subset of target nodes can also be defined through its characteristic matrix , defined as follows: if and otherwise, for all and .

The triplet is called the targeted linear dynamical system with inputs, defined by matrix , input set and target set . We say that this system is target controllable if for any and any , there is an input vector such that the solution of (2) eventually coincides with on its -components, i.e., , for some . Intuitively, the system being target controllable means that for any input state and any desired final state of the target nodes, there is a suitable input vector driving the target nodes to . Obviously, the input vector depends on and .

We illustrate in Figure 1 the structural setup of the target controllability problem.

Figure 1: The structural setup of the target controllability problem. In green: input nodes. In red: target nodes. The control paths are indicated with thicker arrows.

The question whether a given targeted linear dynamical system with inputs is controllable has an elegant algebraic answer known as the Kalman condition.

Theorem 2.1 ([35]).

A targeted linear dynamical system with inputs is controllable if and only if its controllability matrix is of full rank.

The controllability matrix of the targeted linear dynamical system with inputs is an matrix, meaning that being of full rank is equivalent with its rank being equal to (since ). Intuitively, this matrix describes all weighted paths from the input nodes to the target nodes in directed graph associated to the linear dynamical system described by matrix . This line of thought can be further developed into a structural formulation of the targeted controllability and into a graph-based solution for it, see [13].

The problem we focus on and solve in this paper is that of minimising the set of input nodes needed for the target controllability of a dynamical system. For the linear dynamical system defined by a matrix and a set of target nodes of size , the problem is to find the smallest , , such that for a suitable input set of size , the targeted linear dynamical system with inputs is target controllable.

We add an extra layer of optimisation to the target controllability problem, motivated by medicine as our application domain. In medical applications, the input functions mimic the effect of drug delivery, with the input nodes being targets of commercially available drugs. Consequently, we introduce in our mathematical formulation an additional set of so-called preferred nodes , with the aim of selecting in the input set as many preferred nodes as possible. The problem in this case becomes the following. For the linear dynamical system defined by a matrix , a set of target nodes and a set of preferred nodes , the problem is to find a smallest-sized input set whose intersection with is maximal, such that the targeted linear dynamical system with inputs is controllable, i.e., such that matrix is of full rank.

The optimisation version of the above-defined target controllability problem has been shown to be NP-hard in [36], meaning that an exact solution may require a prohibitive amount of time, exponential in the number of nodes. We focus in this paper on the next most-feasible objective, an efficient and effective heuristic solution that comes, however, with no guarantee of being optimal.

2.2 The outline of the genetic algorithm

The algorithm takes as input a network given as a directed graph and a list of target nodes , . We denote the graph’s adjacency matrix by . The algorithm gives as a result a set of input nodes controlling the set , with the objective being to minimise the size of . The algorithm can also take as an additional, optional input a set of so-called preferred nodes. In this case, the algorithm will aim for a double optimisation objective: minimise the set , while maximising the number of elements from included in . Our typical application scenario will be that of a network consisting of protein-protein interactions specific to a disease mechanism of interest, with the set of targets being a disease-specific set of essential genes, and the set of preferred nodes a set of proteins targetable by available drugs or by specially designed compounds (e.g., inhibitors, small silencing molecules, etc.) The terminology we use to describe the algorithm, e.g., population/chromosome/crossover/mutation/fitness is standard in the genetic algorithm literature and refers to its conventions, rather than being suggestive of specifics in molecular biology.

Our algorithm will start by generating several solutions to the control problem, in the form of several control sets – we discuss in Section 2.3 how this is achieved. Each such solution is encoded as a “chromosome”, i.e., as a vector of (not necessarily distinct) “genes” , where for all , controls the target node . In particular, is an ancestor of in graph , for all .

A set of chromosomes is called a population. Note that a chromosome will always encode a solution to our optimisation problem, throughout the iterative run of the algorithm. Any population maintained by the algorithm consists of several such chromosomes, some better than others from the point of view of our optimisation criteria, but all valid solution to the target controllability problem to be solved.

The algorithm iteratively generates successive populations (sets of chromosomes) that get better at the optimisation it aims to solve: the size of the control set gets smaller and the proportion of preferred nodes in the control set gets higher. The algorithm stops after a maximum number of iterations, or after a number of iterations in which the quality of the solution does not get improved. This pre-defined stop is necessary since the target controllability problem is known to be NP-hard and so, finding the optimal solution can require a prohibitively high number of steps, potentially exponential in the number of nodes in the network. The end result consists of several solutions to the problem, represented by all the control sets in the final population obtained by the algorithm.

The initial population of solutions is randomly generated in such a way that each element selected for it is indeed a solution to the target controllability problem . To generate the next generation/population from the current one, we use three techniques.

  • Retain in the population the best solutions (from the point of view of the optimisation problem to be solved). “Elitism” will be used to conserve the best solutions (discussed in the next sections in details).

  • Add random chromosomes (all being valid solutions to the optimisation problem, albeit potentially of lower fitness score than some of the others in the population).

  • Generate new solutions/chromosomes resulting from combinations of those in the current population. A selection operator is used to choose the chromosomes which will produce offsprings for the following generation. New chromosomes are produced using crossover and mutation (discussed in the next sections in details).

A list of all the parameters used by the genetic algorithm can be found in Table 1. The basic outline of the proposed genetic algorithm is described below. All operators will be detailed in the following subsections.

Parameter Meaning Type Default value
Total number of generations for which the algorithm will run
Total number of chromosomes in a generation
Probability of mutation for a chromosome
Maximum percentage of elites in a generation
Percentage of randomly generated chromosomes in a generation
Maximum number of interactions in a control path
Maximum number of randomly generated genes in a chromosome
Table 1: The parameters used by the genetic algorithm. N represents the maximum number of generations for which the algorithm will run. Additionally, the algorithm will stop after generations with no improvement in the fitness of the best chromosomes. Higher values for improve the chances of getting better results, but increase the overall running time. n represents the total number of chromosomes in a generation. This includes randomly generated chromosomes, elite chromosomes of the previous generation, as well as offspring of the chromosomes in the previous generation. Higher values result in a larger population, thus increasing the gene pool, however it might spread the search space too much, getting the process closer to a random search. represents the probability of mutation for a chromosome. It is defined for an entire chromosome, and not for a particular gene. Increasing its value will help with the exploration of the solution space, but too much will make the process get closer to a random search. A value of will deactivate the mutation operator. represents the percentage of elite chromosomes in a population. Higher values will increase the number of preserved chromosomes with high fitness score over the generations, but the solution space will get smaller. A value of will deactivate elitism. represents the percentage of new randomly generated chromosomes in a population. Higher values will increase the exploration of the solution space, but it will also be getting the process closer to a random search. A value of will deactivate this feature.
  1. Generate the initial population. We set for the first generation. We initialise with a number of randomly generated chromosomes.

  2. Preserve the fittest chromosomes. We evaluate the fitness of all chromosomes in . We add to the next population the chromosomes in the current generation with the highest fitness score, where is the ‘elitism’ parameter. If there are more chromosomes of equal fitness being considered, the ones to be added are randomly chosen.

  3. Add random chromosomes. We add new randomly generated chromosomes to , where is the ‘randomness’ parameter.

  4. Add the offsprings of the current population. We apply two times the selection operator on , obtaining two chromosomes of selected randomly with a probability proportional to their fitness score. On the two thus selected chromosomes we apply the crossover operator, obtaining an offspring to be added to . The offspring is added in a mutated form with the mutation probability . We continue applying this step until the number of chromosomes in becomes .

  5. Iterate. If the current index is smaller than the maximum number of generations , then we set and we continue with Step 2.

  6. Output. We return the fittest chromosomes in the current generation as solutions to the problem and we stop the algorithm.

2.3 Chromosome encoding and the fitness function

A chromosome consists of a vector of genes , not necessarily of distinct value, where is the size of the target set . As discussed before, for all , controls node and so, in particular, it is an ancestor of in graph . Keeping with our focus on applications in medicine, where paths encode signalling networks, we aim for short paths between the input nodes and the target nodes . The maximum allowed length for such a path is encoded in the parameter . Any time we discuss an ancestor of node we mean implicitly that the shortest path from to is of length at most .

A chromosome is always checked to ensure that the nodes encoded by its genes are able to control the target set . This is equivalent to the Kalman matrix corresponding to graph , input set and target set having maximum rank . All populations, throughout all steps of the algorithm, will consist only of such chromosomes.

The fitness score of a chromosome is defined as the complement of the number of distinct nodes encoded by its genes:


Thus, . Considering that we are interested in the smallest possible number of input nodes, the higher the fitness score of a chromosome, the better its encoded solution is.

To generate a random chromosome, we will first initialise each of its elements , , with its corresponding target node . Then, for a number of randomly selected genes (as many as indicated by the parameter ), we will replace gene with a randomly chosen ancestor of (at distance at most from ). Each vector of nodes generated in this way is checked for its Kalman condition: if satisfied, the vector encodes a set of nodes controlling the target set and it is accepted as a valid output.

2.4 Selection

The selection operator is used to choose which chromosomes in the current generation will contribute offsprings to the next generation. The selection of a chromosome depends only on its fitness in relation to the average fitness of the current generation: the better the fitness, the higher the chance it has to be selected. If the current population consists of chromosomes , then the probability of selecting chromosome , is


Obviously, for any , so all chromosomes have a chance of being selected.

2.5 Crossover

The crossover operator is used to produce a new (valid) “offspring” chromosome from two “parent” chromosomes. Each of the offspring chromosome’s genes will be directly inherited from one of the two parent chromosomes. The actual parent who will contribute a certain gene is randomly chosen based on the number of occurrences of that gene in the two parent chromosomes: the more often it occurs (in other words, the more efficient its control over the target set), the higher the probability it will be selected for the offspring. For a chromosome and a gene , we define the number of occurences of in to be . Also, for two chromosomes , the number of occurrences of in is defined to be .

Consider the two parent chromosomes to be and . For all , gene of the offspring will be either or . If the genes or are either both preferred (i.e., elements of

), or they are both un-preferred, then the probability distribution is defined to be


On the other hand, if one of the two parent genes, say , is a preferred node, while the other is not, we reflect our preference for nodes in in the selection probability in the following way


If the set of nodes obtained as a result of the selection above does not satisfy the Kalman condition, then we do not accept it as a valid solution and we discard it, restarting the crossover operator by selecting two new parent chromosomes. In our numerical experiments, relatively few sets of nodes thus selected failed to satisfy the Kalman condition and this step did not become a bottleneck in our algorithm.

2.6 Mutation

The mutation operator is used to change the values of a small number of genes in a chromosome. The probability for a gene of a chromosome to be selected for mutation is given by the parameter . Thus, on average, each newly generated offspring chromosome will have a number of mutated genes. Each gene , represents an ancestor of its corresponding target node , so getting mutated corresponds to replacing it with another ancestor of ; the option of getting again is allowed. The new ancestor is selected randomly from the set of predecessors of , with those being preferred nodes having double the weight.

If the newly obtained chromosome is not valid according to the Kalman condition, then we repeat the process with the same genes selected for mutation.

2.7 Implementation

The proposed algorithm has been implemented as a cross-platform stand-alone desktop application written in C# / .NET Core and usable within a command-line interface or under a browser-based graphical interface. The source code and the latest release are available at [59].

For each run, the software requires several files – one containing the list of directed edges (by default, each edge on a separate row, containing semicolon-separated source and target nodes), the list of target nodes (by default, each node on a separate row) and the set of parameters in Table 1 (by default, as a JSON file). The required format for each of these files is presented in the supplementary information. In addition, a file containing the list of preferred nodes (by default, each node on a separate row) can also be given as an optional input. For the command-line interface, the input data is provided as paths to the corresponding input files. For the graphical user interface, the same options are available, with the added possibility to directly type in or edit the data within the corresponding text fields of the interface. Both cases return the same type of output data, a JSON file containing all of the relevant information of the algorithm run, such as details about the input data, the used parameters, the time elapsed and the control nodes corresponding to each of the identified solutions.

All matrix operations use the Math.NET Numerics library [47]. We parallelised the execution of the most used methods, such as chromosome initialisation or crossover. In turn, this required adapting the default pseudo-random number generator into a thread-safe version by using a thread-safe collection of seeds based on the initial random seed.

The graphical interface of the program can be seen in Figure 2. Further details on the implementation and usage can be found in the supplementary information and in the GitHub repository ([59]).

Figure 2: The graphical user interface of the program. Top left: The start page. Top right: The form to create a new analysis. Bottom left: The dashboard containing the list of analyses. Bottom right: The details of an analysis.

3 Results

3.1 Benchmark data

We applied the algorithm on several real world and randomly generated complex networks. The size of the networks varied from to over nodes. An overview of the data sets is presented in Table 2. We used the breast, pancreatic, and ovarian cancer cell line-specific protein-protein interaction networks documented in [36]. We also used the breast, pancreatic, and ovarian cancer networks of [38]. We also considered several social interaction networks and electronic circuit networks documented in [50] and [49]. Finally, we generated several random graphs with the number of nodes ranging from 100 to 3000 and the edges distributed according to the Erdös-Rényi-, scale-free-, and small world-graph edge distributions, all of them generated using the Python networkx library ([72]). All networks are available as supplementary information.

Type Network Reference Nodes Edges
Protein-protein interaction Breast DEF [36]
Protein-protein interaction Breast HCC1428 [38]
Protein-protein interaction Breast MDA-MB-361 [38]
Protein-protein interaction Ovarian DEF [36]
Protein-protein interaction Ovarian O1946 [38]
Protein-protein interaction Ovarian OVCA8 [38]
Protein-protein interaction Pancreatic AsPC-1 [38]
Protein-protein interaction Pancreatic DEF [36]
Protein-protein interaction Pancreatic KP-3 [38]
Protein-protein interaction SIGNOR BrOvPa DEF [36]
Social Social Interaction 1 [49]
Social Social Interaction 3 [49]
Electronic circuit Electronic circuit 208 [50]
Electronic circuit Electronic circuit 420 [50]
Electronic circuit Electronic circuit 838 [50]
Erdős–Rényi Erdos-Renyi 100 *
Erdős–Rényi Erdos-Renyi 500 *
Erdős–Rényi Erdos-Renyi 1000 *
To be continued.
Type Network Reference Nodes Edges
Erdős–Rényi Erdos-Renyi 1500 *
Erdős–Rényi Erdos-Renyi 2000 *
Erdős–Rényi Erdos-Renyi 2500 *
Erdős–Rényi Erdos-Renyi 3000 *
Scale-free Scale Free 100 **
Scale-free Scale Free 500 **
Scale-free Scale Free 1000 **
Scale-free Scale Free 1500 **
Scale-free Scale Free 2000 **
Scale-free Scale Free 2500 **
Scale-free Scale Free 3000 **
Small world Small World 100 ***
Small world Small World 500 ***
Small world Small World 1000 ***
Small world Small World 1500 ***
Small world Small World 2000 ***
Small world Small World 2500 ***
Small world Small World 3000 ***
Table 2: The data sets used for testing the algorithm ([59]). * Generated in Python, using networkx.generators.random_graphs.fast_gnp_random_graph with . ** Generated in Python, using networkx.generators.directed.scale_free_graph with the default parameters. *** Generated in Python, using networkx.generators.random_graphs.watts_strogatz_graph with and . All isolated nodes were removed from the networks.

For the protein-protein interaction networks we used as target nodes the cancer essential genes specific to each cell line, based on [38]. For the other networks, as target nodes we chose the top nodes with highest degree.

3.2 The comparison setup

All runs were executed with the same default values for the parameters of the algorithm, as detailed in Table 1. The algorithm was stopped after 1,000 generations with no improvement in the fitness of the best solution, up to a maximum of 10,000 generations. The results are presented in Table 3 and in the supplementary data.

Network N E T Ige Igc Igr
Breast DEF
Breast HCC1428
Breast MDA-MB-361
Ovarian DEF
Ovarian O1946
Ovarian OVCA8
Pancreatic AsPC-1
Pancreatic DEF
Pancreatic KP-3
Social Interaction 1
Social Interaction 3
Electronic Circuit 208
Electronic Circuit 420
Electronic Circuit 838
Erdos-Renyi 100 *
Erdos-Renyi 500 *
Erdos-Renyi 1000
To be continued.
Network N E T Ige Igc Igr
Erdos-Renyi 1500
Erdos-Renyi 2000
Erdos-Renyi 2500
Erdos-Renyi 3000
Scale Free 100
Scale Free 500
Scale Free 1000
Scale Free 1500
Scale Free 2000
Scale Free 2500
Scale Free 3000
Small World 100
Small World 500
Small World 1000
Small World 1500
Small World 2000
Small World 2500
Small World 3000
Table 3: The results of the algorithm. N: the number of nodes in the network; E: the number of edges in the network; T: the number of target nodes in the network: I: the smallest number of input nodes found for controlling the control target set by the respective algorithm, i.e., Ige for the genetic algorithm, Igc for the constrained greedy algorithm, and Igr for the general (unconstrained) greedy algorithm. * The nodes without any inbound or outbound edge are not counted.

In addition, we applied the algorithm one more time on the protein-protein interaction networks, considering as preferred nodes the FDA approved drug-targets ([75]) existent in the network. The results are presented in Table 4 and in the supplementary data.

Network N E T P Ige IPge Igc IPgc Igr IPgr
Breast DEF (drug)
Breast HCC1428 (drug)
Breast MDA-MB-361 (drug)
Ovarian DEF (drug)
Ovarian O1946 (drug)
Ovarian OVCA8 (drug)
Pancreatic AsPC-1 (drug)
Pancreatic DEF (drug)
Pancreatic KP-3 (drug)
SIGNOR BrOvPa DEF (drug)
Table 4: The results of the algorithm. N: the number of nodes in the network; E: the number of edges in the network; T: the number of control target nodes in the network; P: the number of preferred (i.e., drug-targetable) nodes in the network; I: the smallest number of input nodes found for controlling the control target set; IP the number of preferred input nodes in the best solution; ge: the results for the genetic algorithm; gc: the results for the constrained greedy algorithm;gr: the results for the general (unconstrained) greedy algorithm.

We compared the results of our genetic algorithm to the results of the greedy algorithm described in [13], applied on the same data sets. To make the comparison possible, we limited both algorithms to running for a maximum of 10,000 total iterations (translated to 10,000 generations for the genetic algorithm), stopping if there was no improvement in the best result over the past 1,000 iterations / generations. To investigate the effect of the limited length pathways in our algorithm, we ran the greedy algorithm in two different settings: with the control path’s length upper bounded by the same parameter as in the genetic algorithm, and with it unconstrained.

The three algorithms work quite differently, not only in their inner logic for optimising the objective, but also in their output. The genetic algorithm maintains in each step of its search a family of solutions, some better than others (from the point of view of the optimisation problem to be solved), but all valid solutions in terms of controlling the given set of targets. Thus, each run of the genetic algorithm offers as an output several different solutions. This is in contrast with the greedy algorithms, where only one solution is found in one run and multiple runs have to be done to collect multiple solutions (of variable optimisation quality).

3.3 The comparison results

The first benchmark objective we compared against was the size of the smallest set of input nodes found by each of the three algorithms, with the smallest being the best. The results are presented in Table 3.

We also compared the running time required by the algorithms to complete on each of the benchmark networks and the speed of convergence towards a good solution. The results, reported as running time per solution and convergence speed are in Figures 3 and 4.

Figure 3: A comparison between the results of the three algorithms: the running time per solution. The data is displayed on logarithmic scale.
Figure 4: The average and best fitness of the chromosomes over the generations in the case of the largest benchmark protein-protein interaction network.

Finally, we compared the ability of the three algorithms to maximise the use of preferred nodes. We did this on the biological networks, with the preferred nodes being the set of FDA-approved drug targets. We also did a literature-based validation of the relevance of the results found by the genetic algorithm in each of the cancer networks. The results are in Tables 4, 5 and 6.

Breast Ovarian Pancreatic
RET (5)
EGFR (6)
MTCP1 (5)
RAF1 (5)
SRC (5)
PLRG1 (2)
TGFB1 (5)
SRC (5)
SET (5)
LCK (5)
MAPK3 (6)
CSNK2A1 (5)
IGF1R (6)
ERBB4 (5)
KRAS (6)
ABL1 (6)
PDPK1 (5)
PPP2R1A (5)
DUSP7 (5)
ERBB2 (5)
CSNK2A1 (5)
CDK1 (6)
MTOR (6)
Table 5: The proteins that control the most target proteins, for each cancer type. In bold letters: proteins that are known to be of significance in the corresponding cancer type. In italic letters: proteins that are known to be of significance in other cancer types. In brackets: number of target proteins controlled by the protein by itself (the sets of target proteins controlled by different proteins might overlap).
Breast Ovarian Pancreatic
ridaforolimux seliciclib
Table 6: The drugs that target the identified control proteins. In bold letters: drugs approved or under investigation for usage in the treatment of the corresponding cancer type. In italic letters: drugs under investigation for treatment of other or unspecified cancer or tumour types.

4 Discussion

We applied the algorithm on a series of networks ranging from a few tens of nodes, to several thousands of them. For each such network size, we analysed several network types (real-life, random, scale-free, small world), each with a varying number of edges and target nodes. The running times ranged from a few seconds on the smaller networks, up to several hours on the bigger ones.

Size of the best solutions

In the case of the cancer protein-protein interaction networks, the genetic algorithm returned input sets of size smaller than the constrained greedy algorithm and smaller than the general greedy algorithm. The comparison can be seen in Figure 5.

Figure 5: A comparison between the results of the three algorithms: the number of input nodes controlling the set of disease-specific essential genes in each of the biological benchmark networks.

In the case of the non-biological networks, the size of the smallest input sets is virtually identical in the three algorithms, even though the sets themselves may be different.

An interesting aspect can be seen in the analyses of the well-connected Erdös-Rényi and small world random networks, where the genetic algorithm succeeds in finding the optimal solution. To see this, let’s consider the case of the largest small world benchmark network. The maximum path length being set at means that a given input node can control at most nodes in the network (itself, and others, through consecutive edges). Thus, for a set of target nodes, there have to be at least input nodes. The genetic algorithm indeed identifies exactly input nodes, with a maximum length of the control path of . The general greedy algorithm, being allowed to use longer control paths, identified one input node that controls the entire target set, at the cost of an increased maximum path length of and a running time times longer.

Running times and convergence speed

A big difference in the results was in the running time of the algorithms to complete. The genetic algorithm was the fastest algorithm in the random Erdös-Rényi networks, in the random small world networks, and in some of the real-life non-biological networks. For the biological and the scale-free networks the fastest was the constrained greedy algorithm, with the genetic algorithm the slowest of the three, and in general 2-4 times slower than the general greedy algorithm. However, the genetic algorithm provides simultaneously several solutions (in our benchmarks tests we set the population size parameter to 80), a key advantage of this method. Because of this, the comparison of the running times per solution shows the genetic algorithm to be the fastest of the three in all except a handful of examples, see Figure 3. Compared to the general greedy algorithm, the running times per solution of the genetic algorithm were up to 10,000 times faster, except in the case of two networks (one biological, the other a random scale-free) where the greedy algorithm seemed to stumble almost immediately on a solution. Compared to the constrained greedy algorithm, the genetic algorithm was 10-5,000 times faster per solution, except in the case of the the biological networks (where it was about 2-4 times slower) and the same scale-free random network where the generic greedy also performed unusually well.

The setup in which we ran the genetic algorithm was to search thoroughly for a good solution through 10,000 generations. We compared the evolution of the quality of the solutions throughout the generations and we noticed that a good solution (i.e., a solution with the fitness within of the solution obtained after 10,000 generations) is in fact achieved very quickly, typically within a few tens of generations from the start. Figure 4 illustrates the average and best fitness of the chromosomes in each generation of the algorithm on the largest protein-protein interaction network in the data set. This suggests that the genetic algorithm may be applied successfully with a much lower number of generations, perhaps as low as 100, adding a considerable speed-up to it.

Maximising the number of preferred nodes

We applied the three algorithms on the biological networks with the additional optimisation objective of maximising the selection as input nodes of FDA-approved drug targets in the network (preferred nodes). In all cases, the sets of input nodes returned by the genetic algorithm contained more preferred nodes than the ones returned by the other algorithms (Figure 6), with a running time per solution similar to that of the constrained greedy and better than that of the general greedy algorithm (Figure 7). Even more, the percentage of the preferred nodes relative to the size of the input nodes in the best solution was in general two to four times higher in the case of the genetic algorithm (Figure 8). This led to more control target nodes being controlled by preferred nodes (Figure 9), i.e., leading to predictions of potentially more efficient drugs. This has as a consequence a clear improvement in the applicability of the algorithm in the biomedical domain for drug repurposing, an aspect that we discuss next.

Figure 6: The number of preferred (FDA-approved drug targets) nodes in the best solution found by each of the three algorithms.
Figure 7: The runtime for finding preferred (FDA-approved drug targets) nodes in the best solution found by each of the three algorithms. The data is displayed on logarithmic scale.
Figure 8: The percentage of preferred (FDA-approved drug targets) nodes in the best solution found by each of the three algorithms.
Figure 9: The number of essential genes controlled by the preferred (FDA-approved drug targets) nodes in the best solution found by each of the three algorithms.

Therapeutically-relevant findings

We analysed in details the FDA-approved drug targets predicted by our algorithm to control the most essential genes in each of our network. The results are presented in Table 5. We used the public DrugBank database [75] to find drugs targeting the proteins in Table 5 and known to be used in cancer therapeutics. The results are presented in the Table 6.

Out of the seven top controlling proteins for the analysed breast cancer networks, four proteins are known to be of significance in breast cancer proliferation: EGFR ([5, 54, 9]), AURKB ([25]), RAF1 ([40, 48, 6]) and SRC ([58, 53, 32]). These proteins are targetted by several FDA-approved and investigational drugs used in fighting breast cancer (lapatinib, vandetanib, canertinib, varlitinib, KX-01, dasatinib). Additionally, two other drugs targetting the same proteins are investigated for use in treating unspecified cancer types (LErafAON, XL281). Two of the other top controlling proteins, RET and MTCP1, are known to be significant in other types of cancer, such as ovarian cancer ([24]), pancreatic cancer ([16]), prostate cancer ([71]) or brain cancer ([27]). Even more, there already exists an FDA-approved breast cancer drug targeting these proteins (selpercatinib), as well as several approved and investigational drugs used for treating other cancer types (ponatinib, sorafenib, sunitinib, cabozantinib, amuvatinib). In particular, the drug sorafenib, used for the treatment of kidney and liver cancers, targets two of our top four controlling proteins, which may indicate its potential use in breast cancer as well. Indeed, there are several completed clinical trials researching the drug in the treatment of breast cancer ([10, 55]). Additionally, the drug cabozantinib is also on trial for breast cancer treatment ([46, 14]). Among the top controlling proteins was also PLRG1, for which no drug exists, marking it as a potential drug-target for future research.

For the analysed ovarian cancer networks, the algorithm identified six top controlling proteins. Three of these are documented as being of significance in ovarian cancer: TGFB1 ([66, 19]), SRC ([39, 28, 74]) and MAPK3 ([64, 77, 12]), with several drugs targeting these proteins already approved or being under investigation for various types of cancer treatment (dasatinib, XL228, seliciclib, ulixertinib, ponatinib, nintedanib). Furthermore, the other three top controlling proteins, SET, LCK and CSNK2A1, are significant in other cancer types, such as leukemia ([31, 33]), colorectal cancer ([11, 20]) and prostate cancer ([1]).

For the pancreatic cancer networks analysed by the algorithm, we identified eleven top controlling proteins. Five of them are known to be significant in pancreatic cancer: IGF1R ([18, 81, 78]), ERBB4 ([30]), KRAS ([8, 65, 17]), ERBB2 ([4, 69, 56]) and MTOR ([44, 80, 29]) and targeted by several FDA-approved and investigational cancer drugs (XL228, rhIGFB-3, linsitinib, brigatinib, SF1126, XL765, ridaforolimus). Four of the other identified top controlling proteins, ABL1, PDPK1, CSNK2A1 and CDK1 are documented as significant in other types of cancer, for example breast cancer ([34, 68]) and prostate cancer ([1, 76]). They are, as well, targeted by several drugs under investigation for treating cancer (AT-7519, seliciclib). In particular, the drug brigatinib could be especially powerful in this case, as it targets four of the eleven top controlling proteins (ABL1, IGF1R, ERBB4, ERBB2). Another interesting case is the non-steroidal anti-inflammatory drug celecoxib targeting PDPK1, used to manage symptoms of arthritis pain and in familial adenomatous polyposis, which has also been under investigation as potential cancer chemo-preventive and therapeutic drug [23]. Indeed, there are several ongoing and completed trials researching the effect of the drug in pancreatic cancer ([63, 45]). Among the top controlling proteins were also PPP2R1A and DUSP7, for which no drug exists, marking them as well as potential drug-targets for future research.

Furthermore, we found the drug fostamatinib, used for the treatment of rheumatoid arthritis and immune thrombocytopenic purpura, which targets five out of the seven top controlling proteins for breast cancer, four out of the six for ovarian cancer, and seven out of the eleven for pancreatic cancer. Our algorithm thus suggests that the drug could potentially be used in cancer treatment. This idea is supported by several completed clinical trials for using fostamatinib in treating lymphoma ([2, 61]) and one ongoing trial for ovarian cancer ([67]).


In this paper we proposed a new solution for the target network controllability problem. Our search strategy is based on a genetic algorithm, where the population in each generation of the training of the algorithm is a set of valid solutions to the network controllability problem. The algorithm turns out to be scalable, with its performance staying strong even for very large networks.

The number of edges in a network alone does not seem to influence the performance of the algorithm. The increase in the running time as the network size increases, is mainly caused by the increasing number of nodes and target nodes. This makes the genetic algorithm optimally suited to be applied on very large networks, where many solutions need to be collected.

The genetic algorithm provides, at every step, a family of solutions, while the greedy algorithms offer only one. This is a key advantage of this algorithm, especially for drug repurposing applications, where multiple alternative solutions are important to collect and compare.

The genetic algorithm comes by design with a set-limit on the maximum length of control paths from the input to the target nodes. This is a feature that is of particularly important interest in applications in medicine, where the effects of a drug dissipate quickly over longer signalling paths. The focused search upstream of the target nodes led to the genetic algorithm drastically improving the percentage of FDA-approved drug targets selected in its solution, a clear step forward towards applications in combinatorial drug selection and drug repurposing. The drugs identified by our algorithm as potentially efficient for breast, ovarian, and pancreatic cancer correlate well with recent literature results, and some of our suggestions have already been subject to several clinical studies. This strengthens the potential of our approach for studies in synthetic lethality-driven drug repurposing.

Author Contributions

V.P., I.N., E.C. and I.P. conceived and designed the study. V.P. planned and coded the implementation, and performed the numerical calculations. K.K. collected the data for the protein-protein interaction networks and generated the networks. V.P. and I.P. analysed the results. All authors contributed to writing the article and approved its final form.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.


This work was partially supported by the Academy of Finland (project 311371/2017) and by the Romanian National Authority for Scientific Research and Innovation (POC grant P_37_257 and PED grant 2391).


  • [1] K. A. Ahmad, G. Wang, and K. Ahmed (2006) Intracellular hydrogen peroxide production is an upstream event in apoptosis induced by down-regulation of casein kinase 2 in prostate cancer cells. Molecular cancer research 4 (5), pp. 331–338. Cited by: §4, §4.
  • [2] AstraZeneca (2013) Study to learn if 200mg test drug (fostamatinib) helps people with large b-cell lymphoma,a type of blood cancer. Note: Available at Cited by: §4.
  • [3] A. Barabási, N. Gulbahce, and J. Loscalzo (2011/01/) Network medicine: a network-based approach to human disease. Nature reviews. Genetics 12 (1), pp. 56–68. External Links: ISBN 1471-0064; 1471-0056, Link Cited by: §1.
  • [4] L. Booth, A. Poklepovic, and P. Dent (2020) Neratinib decreases pro-survival responses of [sorafenib + vorinostat] in pancreatic cancer. Biochemical pharmacology 178, pp. 114067. Cited by: §4.
  • [5] M. L. Burness, T. A. Grushko, and O. I. Olopade (2010) Epidermal growth factor receptor in triple-negative and basal-like breast cancer: promising clinical target or only a marker?. Cancer journal 16 (1), pp. 23–32. Cited by: §4.
  • [6] L.S. Callans, H. Naama, M. Khandelwal, R. Plotkin, and L. Jardines (1995) Raf-1 protein expression in human breast cancer cells. Annals of surgical oncology 2, pp. 38–42. Cited by: §4.
  • [7] F. Cheng, I. A. Kovács, and A. Barabási (2019) Network-based prediction of drug combinations. Nature Communications 10 (1), pp. 1197. External Links: ISBN 2041-1723, Link Cited by: §1.
  • [8] J. Cheng and J. R. Cashman (2020) PAWI-2 overcomes tumor stemness and drug resistance via cell cycle arrest in integrin b 3-kras-dependent pancreatic cancer stem cells. Scientific reports 10 (1), pp. 9162. Cited by: §4.
  • [9] S.A. Chrysogelos and R.B. Dickson (1994) EGF receptor expression, regulation, and function in breast cancer. Breast cancer research and treatment 29 (1), pp. 29–40. Cited by: §4.
  • [10] City of Hope Medical Center (2014) Sorafenib and vinorelbine in treating women with stage iv breast cancer. Note: Available at Cited by: §4.
  • [11] I. Cristóbal, B. Torrejón, J. Rubio, A. Santos, M. Pedregal, C. Caramés, S. Zazo, M. Luque, M. Sanz-Alvarez, J. Madoz-Gúrpide, F. Rojo, and J. García-Foncillas (2019) Deregulation of set is associated with tumor progression and predicts adverse outcome in patients with early-stage colorectal cancer. Journal of clinical medicine 8 (3). Cited by: §4.
  • [12] W. Cui, E. M. Yazlovitskaya, M. S. Mayo, J. C. Pelling, and D. L. Persons (2000) Cisplatin-induced response of c-jun n-terminal kinase 1 and extracellular signal–regulated protein kinases 1 and 2 in a series of cisplatin-resistant ovarian carcinoma cell lines. Molecular carcinogenesis 29 (4), pp. 219–228. Cited by: §4.
  • [13] E. Czeizler, C. Gratie, W. Kai Chiu, K. Kanhaiya, and I. Petre (2018) Structural target controllability of linear networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics 15, pp. 1217–1228. Cited by: §1, §1, §1, §2.1, §2.1, §3.2.
  • [14] Dana-Farber Cancer Institute (2016) Cabozantinib for metastatic triple negative brca. Note: Available at Cited by: §4.
  • [15] M. M. Davis, C. M. Tato, and D. Furman (2017/06/20) Systems immunology: just getting started. Nature immunology 18 (7), pp. 725–732. External Links: ISBN 1529-2916; 1529-2908, Link Cited by: §1.
  • [16] T. R. Donahue and O. J. Hines (2009) CXCR2 and ret single nucleotide polymorphisms in pancreatic cancer. World journal of surgery 33 (4), pp. 710–715. Cited by: §4.
  • [17] U. Dreissigacker, M. S. Mueller, M. Unger, P. Siegert, F. Genze, P. Gierschik, and K. Giehl (2006) Oncogenic k-ras down-regulates rac1 and rhoa activity and enhances migration and invasion of pancreatic carcinoma cells through activation of p38. Cell signalling 18 (8), pp. 1156–1168. Cited by: §4.
  • [18] C. Du, A. da Silva, V. Morales-Oyarvide, A. Dias Costa, M. M. Kozak, R. F. Dunne, D. A. Rubinson, K. Perez, Y. Masugi, T. Hamada, L. K. Brais, C. Yuan, A. Babic, M. D. Ducar, A. R. Thorner, A. Aguirre, M. H. Kulke, K. Ng, T. E. Clancy, J. J. Findeis-Hosey, D. T. Chang, J. L. Hornick, C. S. Fuchs, S. Ogino, A. C. Koong, A. F. Hezel, B. M. Wolpin, and J. A. Nowak (2020) Insulin-like growth factor-1 receptor expression and disease recurrence and survival in patients with resected pancreatic ductal adenocarcinoma. Cancer Epidemiology and Prevention Biomarkers. Cited by: §4.
  • [19] A. Evangelou, S. K. Jindal, T. J. Brown, and M. Letarte (2000) Down-regulation of transforming growth factor beta receptors by androgen in ovarian cancer cells. Cancer research 60 (14), pp. 929–935. Cited by: §4.
  • [20] H. Fujiki, E. Sueoka, T. Watanabe, and M. Suganuma (2018) The concept of the okadaic acid class of tumor promoters is revived in endogenous protein inhibitors of protein phosphatase 2a, set and cip2a, in human cancers. Journal of cancer research and clinical oncology 144 (12), pp. 2339–2349. Cited by: §4.
  • [21] J. Gao, Y. Liu, R. M. D’Souza, and A. Barabási (2014) Target control of complex networks. Nature Communications 5 (1), pp. 5415. External Links: ISBN 2041-1723, Link Cited by: §1, §1.
  • [22] K. Goh, M. E. Cusick, D. Valle, B. Childs, M. Vidal, and A. Barabási (2007/05/22) The human disease network. Proceedings of the National Academy of Sciences 104 (21), pp. 8685. External Links: Link Cited by: §1.
  • [23] L. Gong, C. F. Thorn, M. M. Bertagnolli, T. Grosser, R. B. Altman, and T. E. Klein (2012) Celecoxib pathways: pharmacokinetics and pharmacodynamics. Pharmacogenetics and genomics 22 (4), pp. 310–318. Cited by: §4.
  • [24] L. Guan, Z. Li, F. Xie, Y. Pang, C. Zhang, H. Tang, H. Zhang, C. Chen, Y. Zhan, T. Zhao, H. Jiang, X. Jia, Y. Wang, and Y. Lu (2020) Oncogenic and drug-sensitive ret mutations in human epithelial ovarian cancer. Journal of experimental and clinical cancer research 39 (53). Cited by: §4.
  • [25] C. P. Gully, F. Zhang, J. Chen, J. A. Yeung, G. Velazquez-Torres, E. Wang, S. J. Yeung, and M. Lee (2010) Antineoplastic effects of an aurora b kinase inhibitor in breast cancer. Molecular cancer 22, pp. 9–42. Cited by: §4.
  • [26] W. Guo, S. Zhang, Z. Wei, T. Zeng, F. Liu, J. Zhang, F. Wu, and L. Chen (2017) Constrained target controllability of complex networks. Journal of Statistical Mechanics: Theory and Experiment 2017 (6), pp. 063402. External Links: ISBN 1742-5468, Link Cited by: §1.
  • [27] L. Han, H. Liu, J. Wu, and J. Liu (2018) MiR-126 suppresses invasion and migration of malignant glioma by targeting mature t cell proliferation 1 (mtcp1). Medical science monitor 24, pp. 6630–6637. Cited by: §4.
  • [28] L. Y. Han, C. N. Landen, J. G. Trevino, J. Halder, Y. G. Lin, A. A. Kamat, T. Kim, W. M. Merritt, R. L. Coleman, D. M. Gershenson, W. C. Shakespeare, Y. Wang, R. Sundaramoorth, C. A. Metcalf 3rd, D. C. Dalgarno, T. K. Sawyer, G. E. Gallick, and A. K. Sood (2006) Antiangiogenic and antitumor effects of src inhibition in ovarian carcinoma. Cancer research 66 (17), pp. 8633–8639. Cited by: §4.
  • [29] A. R. He, A. P. Lindenberg, and J. L. Marshall (2008) Biologic therapies for advanced pancreatic cancer. Expert review of anticancer therapy 8 (8), pp. 1331–1338. Cited by: §4.
  • [30] K. Hedegger, H. Algül, M. Lesina, A. Blutke, R. M. Schmid, M. R. Schneider, and M. Dahlhoff (2020) Unraveling erbb network dynamics upon betacellulin signaling in pancreatic ductal adenocarcinoma in mice. Molecular oncology. Cited by: §4.
  • [31] K. Heyninck and R. Beyaert (2006) A novel link between lck, bak expression and chemosensitivity. Oncogene 25 (12), pp. 1693–1695. Cited by: §4.
  • [32] S. Hiscox and R. I. Nicholson (2008) Src inhibitors in breast cancer therapy. Expert opinion on therapeutic targets 12 (6), pp. 757–767. Cited by: §4.
  • [33] G. Jiang, T. Freywald, J. Webster, D. Kozan, R. Geyer, J. DeCoteau, A. Narendran, and A. Freywald (2008) In human leukemia cells ephrin-b-induced invasive activity is supported by lck and is associated with reassembling of lipid raft signaling complexes. Molecular cancer research 6 (2), pp. 291–305. Cited by: §4.
  • [34] N. Johnson, J. Bentley, L.-Z. Wang, D. R. Newell, C. N. Robson, G. I. Shapiro, and N. J. Curtin (2010) Pre-clinical evaluation of cyclin-dependent kinase 2 and 1 inhibition in anti-estrogen-sensitive and resistant breast cancer cells. British journal of cancer 102 (2), pp. 342–350. Cited by: §4.
  • [35] R. E. Kalman, Y. C. Ho, and K. S. Narendra (1963) Controllability of linear dynamical systems. Contributions to Differential Equations 1, pp. 189–213. Cited by: §1, Theorem 2.1.
  • [36] K. Kanhaiya, E. Czeizler, C. Gratie, and I. Petre (2017) Controlling directed protein interaction networks in cancer. Scientific Reports 7. Cited by: §1, §1, §2.1, §3.1, §3.1.
  • [37] H. Kitano (2002) Computational systems biology. Nature 420 (6912), pp. 206–210. External Links: ISBN 1476-4687, Link Cited by: §1.
  • [38] J. Koh, K. Brown, A. Sayad, D. Kasimer, T. Ketela, and J. Moffat (2011) COLT-cancer: functional genetic screening resource for essential genes in human cancer cell lines. Nucleic acids research 40 (D1), pp. D957–D963. Cited by: §3.1, §3.1, §3.1.
  • [39] E. L. Leung, J. C. Wong, M. G. Johlfs, B. K. Tsang, and R. R. Fiscus (2010) Protein kinase g type ialpha activity in human ovarian cancer cells significantly contributes to enhanced src activation and dna synthesis/cell proliferation. Molecular cancer research 8 (4), pp. 578–591. Cited by: §4.
  • [40] H. Z. Li, Y. Gao, X. L. Zhao, Y. X. Liu, B. C. Sun, J. Yang, and Z. Yao (2009) Effects of raf kinase inhibitor protein expression on metastasis and progression of human breast cancer. Molecular cancer research 7, pp. 832–840. Cited by: §4.
  • [41] X. Liu, Z. Hong, J. Liu, Y. Lin, A. Rodríguez-Patón, Q. Zou, and X. Zeng (2019-6/26/2020) Computational methods for identifying the critical nodes in biological networks. Briefings in Bioinformatics 21 (2), pp. 486–497. External Links: ISBN 1477-4054, Link Cited by: §1.
  • [42] X. Liu and L. Pan (2015 Mar-Apr) Identifying driver nodes in the human signaling network using structural controllability analysis.. IEEE/ACM Trans Comput Biol Bioinform 12 (2), (MEDLINE), pp. 467–472. External Links: ISSN 1557-9964 (Electronic); 1545-5963 (Linking) Cited by: §1.
  • [43] Y. Liu, J. Slotine, and A. Barabási (2011) Controllability of complex networks. Nature. Cited by: §1, §1.
  • [44] Y. Liu, M. Feng, H. Chen, G. Yang, J. Qiu, F. Zhao, Z. Cao, W. Luo, J. Xiao, L. You, L. Zheng, and T. Zhang (2020) Mechanistic target of rapamycin in the tumor microenvironment and its potential as a therapeutic target for pancreatic cancer. Cancer letters 485, pp. 1–13. Cited by: §4.
  • [45] M.D. Anderson Cancer Center (2005) Gemcitabine and celecoxib in treating patients with metastatic pancreatic cancer. Note: Available at Cited by: §4.
  • [46] Massachusetts General Hospital (2019) Cabozantinib in women with metastatic hormone-receptor-positive breast cancer. Note: Available at Cited by: §4.
  • [47] MathNET (2019) numerics. Note: Available at Cited by: §2.7.
  • [48] R. R. Mewani, S. Tian, B. Li, M. T. Danner, T. D. Carr, S. Lee, A. Rahman, U. N. Kasid, M. Jung, A. Dritschilo, and P. C. Gokhale (2006) Gene expression profile by inhibiting raf-1 protein kinase in breast cancer cells. International journal of molecular medicine 17 (3), pp. 457–463. Cited by: §4.
  • [49] R. Milo, S. Itzkovitz, N. Kashtan, R. Levitt, S. Shen-Orr, I. Ayzenshtat, M. Sheffer, and U. Alon (2004) Superfamilies of evolved and designed networks. Science 303 (5663), pp. 1538–1542. Cited by: §3.1, §3.1.
  • [50] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon (2002) Network motifs: simple building blocks of complex networks. Science 298 (5594), pp. 824–827. Cited by: §3.1, §3.1.
  • [51] K. Misselbeck, S. Parolo, F. Lorenzini, V. Savoca, L. Leonardelli, P. Bora, M. J. Morine, M. C. Mione, E. Domenici, and C. Priami (2019) A network-based approach to identify deregulated pathways and drug effects in metabolic syndrome. Nature Communications 10 (1), pp. 5215. External Links: ISBN 2041-1723, Link Cited by: §1.
  • [52] F. Molnár, S. Sreenivasan, B. K. Szymanski, and G. Korniss (2013) Minimum dominating sets in scale-free network ensembles. Scientific Reports 3 (1), pp. 1736. External Links: ISBN 2045-2322, Link Cited by: §1.
  • [53] L. Morgan, R. I. Nicholson, and S. Hiscox (2008) SRC as a therapeutic target in breast cancer. Endocrine, metabolic and immune disorders drug targets 8 (4), pp. 273–278. Cited by: §4.
  • [54] C. Morris (2002) The role of egfr-directed therapy in the treatment of breast cancer. Breast cancer research and treatment 75, pp. S51–S59. Cited by: §4.
  • [55] National Cancer Institute (NCI) (2006) Sorafenib in treating patients with metastatic breast cancer. Note: Available at Cited by: §4.
  • [56] J. Novotný, L. Petruzelka, J. Vedralová, Z. Kleibl, B. Matous, and L. Juda (2001) Prognostic significance of c-erbb-2 gene expression in pancreatic cancer patients. Neoplasma 48 (3), pp. 188–191. Cited by: §4.
  • [57] S. A. Ochsner, D. Abraham, K. Martin, W. Ding, A. McOwiti, W. Kankanamge, Z. Wang, K. Andreano, R. A. Hamilton, Y. Chen, A. Hamilton, M. L. Gantner, M. Dehart, S. Qu, S. G. Hilsenbeck, L. B. Becnel, D. Bridges, A. Ma’ayan, J. M. Huss, F. Stossi, C. E. Foulds, A. Kralli, D. P. McDonnell, and N. J. McKenna (2019) The signaling pathways project, an integrated ‘omics knowledgebase for mammalian cellular signaling pathways. Scientific Data 6 (1), pp. 252. External Links: ISBN 2052-4463, Link Cited by: §1.
  • [58] S. K. Pal and J. Mortimer (2009) Triple-negative breast cancer: novel therapies and new directions. Mauritas 63 (4), pp. 269–274. Cited by: §4.
  • [59] V. Popescu (2020) GeneticAlgNetControl. Note: Available at, version 1.0 Cited by: §2.7, §2.7, Table 2.
  • [60] G. Rancati, J. Moffat, A. Typas, and N. Pavelka (2018) Emerging and evolving concepts in gene essentiality. Nature Reviews Genetics 19 (1), pp. 34–49. External Links: ISBN 1471-0064, Link Cited by: §1.
  • [61] Rigel Pharmaceuticals (2010) Efficacy and safety study of fostamatinib tablets to treat b-cell lymphoma. Note: Available at Cited by: §4.
  • [62] M. Saqi, J. Pellet, I. Roznovat, A. Mazein, S. Ballereau, B. De Meulder, and C. Auffray (2016) Systems medicine: the future of medical genomics, healthcare, and wellness.. Methods Mol Biol 1386, (MEDLINE), pp. 43–60. External Links: ISSN 1940-6029 (Electronic); 1064-3745 (Linking) Cited by: §1.
  • [63] Second Affiliated Hospital, School of Medicine, Zhejiang University (2023) Gemcitabine and celecoxib combination therapy in treating patients with r0 resection pancreatic cancer (gcrp). Note: Available at Cited by: §4.
  • [64] F. Shangguan, Y. Liu, L. Ma, G. Qu, Q. Lv, J. An, S. Yang, B. Lu, and Q. Cao (2020) Niclosamide inhibits ovarian carcinoma growth by interrupting cellular bioenergetics. Journal of Cancer 11 (12), pp. 3454–3466. Cited by: §4.
  • [65] S. Shankar, J. C. Tien, R. F. Siebenaler, S. Chugh, V. L. Dommeti, S. Zelenka-Wang, X. Wang, I. J. Apel, J. Waninger, S. Eyunni, A. Xu, M. Mody, A. Goodrum, Y. Zhang, J. J. Tesmer, R. Mannan, X. Cao, P. Vats, S. Pitchiaya, S. J. Ellison, J. Shi, C. Kumar-Sinha, H. C. Crawford, and A. M. Chinnaiyan (2020) An essential role for argonaute 2 in egfr-kras signaling in pancreatic cancer development. Nature communications 11 (1), pp. 2817. Cited by: §4.
  • [66] A. Sharma, J. Belna, J. Espat, G. Rodriguez, V. T. Cannon, and J. A. Hurteau (2009) Effects of omega-3 fatty acids on components of the transforming growth factor beta-1 pathway: implication for dietary modification and prevention in ovarian cancer. American journal of obstetrics and gynecology 200 (5), pp. 516.e1–516.36. Cited by: §4.
  • [67] Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins (2020) Clinical trial of combined fostamatinib and paclitaxel in ovarian cancer. Note: Available at Cited by: §4.
  • [68] D. Srinivasan, J. T. Sims, and R. Plattner (2008) Aggressive breast cancer cells are dependent on activated abl kinases for proliferation, anchorage-independent growth and survival. Oncogene 27 (8), pp. 1095–1105. Cited by: §4.
  • [69] J. Standop, M. Andrianifahanana, N. Moniaux, M. Schneider, A. Ulrich, R. E. Brand, J. L. Wisecarver, J. A. Bridge, M. W. Büchler, T. E. Adrian, S. K. Batra, and P. M. Pour (2005) ErbB2 growth factor receptor, a marker for neuroendocrine cells?. Pancreatology 5 (1), pp. 44–58. Cited by: §4.
  • [70] Q. Tian, N. D. Price, and L. Hood (2012 Feb) Systems cancer medicine: towards realization of predictive, preventive, personalized and participatory (p4) medicine.. J Intern Med 271 (2), (MEDLINE), pp. 111–121. External Links: ISSN 1365-2796 (Electronic); 0954-6820 (Print); 0954-6820 (Linking) Cited by: §1.
  • [71] H. R. VanDeusen, J. R. Ramroop, K. L. Morel, S. Bae, A. V. Sheahan, Z. Sychev, N. A. Lau, L. C. Cheng, V. M. Tan, Z. Li, A. Petersen, J. K. Lee, J. W. Park, R. Yang, J. H. Hwang, I. Coleman, O. N. Witte, C. Morrissey, E. Corey, P. S. Nelson, L. Ellis, and J. M. Drake (2020) Targeting ret kinase in neuroendocrine prostate cancer. Molecular Cancer Research. Cited by: §4.
  • [72] G. Varoquaux, T. Vaught, and J. Millman (Eds.) (2008) Exploring network structure, dynamics, and function using networkx. Proceedings of the 7th Python in Science Conference (SciPy2008). Cited by: §3.1.
  • [73] D. Whitley and A. M. Sutton (2012) Genetic algorithms — a survey of models and methods. In Handbook of Natural Computing, G. Rozenberg, T. Bäck, and J. N. Kok (Eds.), pp. 637–671. External Links: ISBN 978-3-540-92910-9 Cited by: §1.
  • [74] J. R. Wiener, T. C. Windham, V. C. Estrella, N. U. Parikh, P. F. Thall, M. T. Deavers, R. C. Bast, G. B. Mills, and G. E. Gallick (2003) Activated src protein tyrosine kinase is overexpressed in late-stage human ovarian cancers. Gynecologic oncology 88 (1), pp. 73–79. Cited by: §4.
  • [75] D. S. Wishart, Y. D. Feunang, A. C. Guo, E. J. Lo, A. Marcu, J. R. Grant, T. Sajed, D. Johnson, C. Li, Z. Sayeeda, N. Assempour, I. Iynkkaran, Y. Liu, A. Maciejweski, N. Gale, A. Wilson, L. Chin, R. Cummings, D. Le, A. Pon, C. Knox, and M. Wilson (2018) DrugBank 5.0: a major update to the drugbank database for 2018. Nucleic Acids Research 46, pp. D1074–D1082. Cited by: §3.2, §4.
  • [76] Z. Xu, K. Nagashima, D. Sun, T. Rush, A. Northrup, J. N. Andersen, I. Kariv, and E. V. Bobkova (2009) Development of high-throughput tr-fret and alphascreen assays for identification of potent inhibitors of pdk1. Journal of biomolecular screening 14 (10), pp. 1257–1262. Cited by: §4.
  • [77] T.-T. Yu, C.-Y. Wang, and R. Tong (2020) ERBB2 gene expression silencing involved in ovarian cancer cell migration and invasion through mediating mapk1/mapk3 signaling pathway. European review for medical and pharmacological sciences 24 (10), pp. 5267–5280. Cited by: §4.
  • [78] H. Zeng, K. Datta, M. Neid, J. Li, S. Parangi, and D. Mukhopadhyay (2003) Requirement of different signaling pathways mediated by insulin-like growth factor-i receptor for proliferation, invasion, and vpf/vegf expression in a pancreatic carcinoma cell line. Biochemical and biophysical research communications 302 (1), pp. 46–55. Cited by: §4.
  • [79] W. Zhang, J. Chien, J. Yong, and R. Kuang (2017)

    Network-based machine learning and graph theory algorithms for precision oncology

    npj Precision Oncology 1 (1), pp. 25. External Links: ISBN 2397-768X, Link Cited by: §1.
  • [80] H.-W. Zhao, N. Zhou, F. Jin, R. Wang, and J.-Q. Zhao (2020) Metformin reduces pancreatic cancer cell proliferation and increases apoptosis through mtor signaling pathway and its dose-effect relationship. European review for medical and pharmacological sciences 24 (10), pp. 5336–5344. Cited by: §4.
  • [81] Y. Zheng, C. Wu, J. Yang, Y. Zhao, H. Jia, M. Xue, D. Xu, F. Yang, D. Fu, C. Wang, B. Hu, Z. Zhang, T. Li, S. Yan, X. Wang, P. J. Nelson, C. Bruns, L. Qin, and Q. Dong (2020) Insulin-like growth factor 1-induced enolase 2 deacetylation by hdac3 promotes metastasis of pancreatic cancer. Signal transduction and targeted therapy 5 (1), pp. 53. Cited by: §4.
  • [82] X. Zhou, J. Menche, A. Barabási, and A. Sharma (2014) Human symptoms–disease network. Nature Communications 5 (1), pp. 4212. External Links: ISBN 2041-1723, Link Cited by: §1.