Maximum-Likelihood Network Reconstruction for SIS Processes is NP-Hard

07/23/2018 ∙ by Bastian Prasse, et al. ∙ Delft University of Technology 0

The knowledge of the network topology is imperative to precisely describing the viral dynamics of an SIS epidemic process. In scenarios for which the network topology is unknown, one resorts to reconstructing the network from observing the viral state trace. This work focusses on the impact of the viral state observations on the computational complexity of the resulting network reconstruction problem. We propose a novel method of constructing a specific class of viral state traces from which the inference of the presence or absence of links is either easy or difficult. In particular, we use this construction to prove that the maximum-likelihood SIS network reconstruction is NP-hard. The NP-hardness holds for any adjacency matrix of a graph which is connected.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We consider the network reconstruction of the sampled-time susceptible-infected-susceptible (SIS) process in a maximum-likelihood (ML) sense as introduced in [1]. We assume that the infection rate and the curing rate are known and that no self-infections occur; hence, the self-infection rate is . We denote the number of nodes by and the

viral state vector at discrete time

by . At any time , a node is either infected or susceptible, which is denoted by and , respectively. We confine ourselves to connected graphs and denote by the set of all symmetric adjacency matrices with the elements . These adjacency matrices correspond to undirected, unweighted and connected graphs without self-loops.

The network reconstruction problem for sampled-time SIS process is stated in the ML sense [1]. In contrast to the true adjacency matrix , which generated the viral states , the optimisation variable

in the ML estimation problem is denoted as

. The solution to the ML estimation problem, i.e. the adjacency matrix which maximises the likelihood, is denoted by .

Definition 1 (SIS Network Reconstruction).

Given the viral state observations from time to which originate from a sampled-time SIS process on an unknown adjacency matrix , find the adjacency matrix which maximises the log-likelihood:

(1)

An instance of the optimisation problem (1) is fully specified by the viral state observations from time to , where usually the observation length satisfies .

To stress the dependency of the ML estimate on a given viral state sequence , we may also denote the ML estimate by . The SIS network reconstruction (1) gives rise to two fundamental problems:

  1. How many observations are required such that the ML estimate achieves a given accuracy

    with high probability

    ?

  2. How to design an algorithm that computes the ML estimate for a given viral state sequence ? What is the computational complexity of the SIS network reconstruction (1)?

The first problem translates to finding the minimal observation length such that

where

denotes some matrix norm. By proposing a heuristic to solve the ML estimation (

1), the results in [1] indicate that the minimum observation length increases subexponentially with respect to the number of nodes : for some constants and .

The focus of this work is on the second question. We prove that the ML estimation (1) is NP-hard with respect to the number of nodes for any connected adjacency matrix . The idea of the proof is as follows: We aim to show that there is a polynomial-time reduction from the maximum cut problem to the ML estimation for the sampled-time SIS process (1). Since the maximum cut problem is NP-complete [2], this polynomial-time reduction proves that the ML estimation (1) is NP-hard. As introduced in Section 3, the maximum cut problem can be stated as zero-one unconstrained quadratic programme (UQP). By comparison, we make the observation that the zero-one UQP which results from the maximum cut problem resembles the ML estimation (1). We show that for every graph of the maximum cut problem, there is an SIS viral state sequence such that solving the ML estimation (1) is equivalent to solving the maximum cut problem on the graph . The polynomial-time reduction is presented in Section 4.

2 Sampled-Time SIS Process

We give a brief summary of the sampled-time SIS process, and we refer to [1]

for a more detailed description. The sampled-time Markov chain with sampling time

is a discrete-time Markov chain [3]. The probabilities of the viral state transitions depend on the adjacency matrix . There are three kinds of transitions possible in the sampled-time Markov chain of the SIS process. These transitions are listed below and their probabilities are inferred from the continuous-time SIS equations.

Curing of a node

A single node changes from the infected state at discrete time to the susceptible state at discrete time . The probability of this transition is

(2)

where the curing probability equals .

Infection of a node

A single node changes from the susceptible state at time instant to the infected state at time instant with the probability

(3)

where is the number of infected nodes adjacent to node in at time and the infection probability equals . The number of infected nodes adjacent to node equals

No Change

No node changes its viral state from time to time . This constant transition occurs when neither a curing nor an infection takes place, and hence

(4)

where the probabilities on the right-hand side can be derived from (2) and (3).

To ensure that (2), (3) and (4) are feasible expressions for probabilities, they have to be in for all adjacency matrices and for all viral states . In [1], an upper bound on the sampling time was derived, such that (2), (3) and (4) are in , and we assume that the sampling time does not exceed this upper bound.

3 Maximum Cut

We consider an undirected and unweighted graph , where is the set of nodes and is the set of links. A cut-set of the graph is defined as follows [4, 5].

Definition 2 (Cut-set).

For a non-empty node subset of a graph and its complement , the cut-set is the set of all links that connect nodes in to nodes in . In other words:

The cut size of a cut-set equals the number of links in the cut-set and is denoted as . The maximum cut problem and the corresponding decision problem are as follows.

Definition 3 (Maximum Cut Problem).

Given a graph , find a cut of maximal cut size .

Definition 4 (Maximum Cut Decision Problem).

Given a natural number and a graph , is there a cut such that its cut size is at least ?

The maximum cut decision problem is NP-complete, as shown by Garey et al. [6]. Hence, the maximum cut problem is NP-hard [7]. The maximum cut problem can be equivalently stated as zero-one unconstrained quadratic programming (UQP) [8]

(5)

The binary variable

equals 1 if node is in the node set , and if node is in the node set . The optimisation problem (5) is equivalent to

(6)

The coefficients of the objective function of (6) are given by

(7)

and the degree of node

(8)

Since the elements of the adjacency matrix are either zero or one, the coefficients are in the sets

(9)

and

(10)

The objective function of the optimisation problem (6) is a quadratic function which maps binary variables to a non-negative integer, i.e. . Hence, the optimisation problem (6) is a special case of pseudo-Boolean optimisation [9], in which the objective function maps binary variables to a real number, i.e. . Rosenberg [10] showed that the optimisation of any pseudo-Boolean function can always be reduced in polynomial time to the optimisation of a quadratic pseudo-Boolean function. The general optimisation of a quadratic pseudo-Boolean function is of the form (6) with the difference that the coefficients and may attain any value in - not only the integer values in (9) and (10) - and is NP-hard [11]. If the coefficients are non-negative real numbers, then the zero-one UQP (6) is polynomially solvable [12]. There are other special cases for the range of values of the coefficients and for which the zero-one UQP (6) is solvable in polynomial time [13, 14].

4 Reduction of Maximum Cut to SIS Network Reconstruction

We will show that any instance of the zero-one UQP (6) with coefficients and in the sets (9) and (10), and thus any instance of the maximum cut problem, can be translated to an SIS network reconstruction problem (1) in polynomial time. Hence, the SIS network reconstruction (1) is NP-hard. Since the zero-one UQP (6) is not NP-hard for certain ranges [12, 13, 14] of values of the coefficients and , we emphasise that the conditions (9) and (10) are crucial (at least sufficient) for the NP-hardness of the zero-one UQP (6). Thus, our aim is to show that the SIS network reconstruction problem (1) can be translated to a zero-one UQP (6) with any111More precisely, the coefficients and do not attain any values in and independently. Due to (7) and (8), it holds . We show the stronger statement that, independently of the coefficients , the coefficients may attain any value in . coefficients and in the sets given by (9) and (10). Since the SIS network reconstruction problem (1) is fully specified by the viral state observations , we aim to find viral state transitions such that solving the SIS network reconstruction problem (1) is equivalent to solving the zero-one UQP (6). The proof of the NP-hardness of the SIS network reconstruction problem (1) is based on four lemmas, which are stated below and whose proofs are given in the Appendix.

Since a graph given by an adjacency matrix in is connected, there is a node such that the graph remains connected if node is removed: Indeed, in any connected graph, there exists a spanning tree that connects all the nodes. In any tree, there exists a node with degree one (a leaf node), whose removal does not disconnect the spanning tree and hence neither the graph. Without loss of generality, we label this node as node 1.

Our approach is based on stating a reduced-size version of the ML estimation (1), namely only with respect to the links which are incident to node 1. Since the graph given by an adjacency matrix in is connected, node 1 has at least one neighbour. Without loss of generality, we label this neighbour as node 2. Furthermore, we consider that is known. In the following, we abbreviate

i.e. the likelihood when the elements and for are fixed to the true values, formally by

and we introduce the following reduced-size SIS network estimation problem:

Definition 5 (Reduced-Size SIS Network Reconstruction).

Given the links and , where and , of the matrix and the viral state observations from time to time , which resulted from a sampled-time SIS process with the adjacency matrix , find the links which maximise the log-likelihood:

(11)

Lemma 6 states that solving the reduced-size SIS network reconstruction (11) is equivalent to solving a zero-one UQP with particular coefficients:

Lemma 6 (Reduced-Size SIS Network Reconstruction as Zero-One UQP).

For some natural numbers , , , , define the coefficients

(12)
(13)

where , and are constant and are given by the equations (32), (33) and (34), respectively. For any coefficients and given by (12) and (13) and for any connected adjacency matrix , there is a viral state sequence from time to a finite time such that the reduced-size SIS network reconstruction problem (11) becomes:

(14)
Proof.

Appendix A. ∎

Comparing the objective function of (14) to the objective function in the zero-one UQP (6) shows that they are of the same form222The reduced-size SIS network reconstruction (11) for a graph with nodes results in a zero-one UQP (6) with optimisation variables . Strictly speaking, to obtain the zero-one UQP (6) with optimisation variables, one has to consider the reduced-size SIS network reconstruction (11) for graphs with nodes. For ease of exposition, we omit the detail of the deviation of the number of optimisation variables of the two optimisation problems (6) and (11).: the binary variables in (6) correspond to , and the coefficients and in (6) are replaced by and in (14), respectively.

As stated in the beginning of Section 4, a crucial condition for the NP-hardness of the zero-one UQP (6) is that its coefficients are in the sets and . To show the NP-hardness of the zero-one UQP (14), we have to show that also the coefficients and attain any value in and , respectively. As stated by (12), the coefficients may attain either value in . The remaining condition that the coefficients , given by (13), may attain any value in exactly does generally not hold. Nevertheless, the coefficients may approach any arbitrarily close, as stated by Lemma 7.

Lemma 7 (Coefficients Approach Any Number).

The coefficients of the optimisation problem (14), given by (13), may approach any numbers , , arbitrarily close for suitably chosen natural numbers :

(15)
Proof.

Appendix B. ∎

If the deviation is positive and not greater than a threshold , then we can solve any instance of the maximum-cut problem by solving an instance of the reduced-size SIS network reconstruction (11):

Lemma 8 (Sufficiently Small Error on the UQP Coefficients).

If and for all , then the solution to the reduced-size SIS network reconstruction problem (14) is also a solution to the zero-one UQP (6).

Proof.

Appendix C. ∎

Lemma 6, Lemma 7 and Lemma 8 prove the NP-hardness of the reduced-size SIS network reconstruction (11). Lemma 9 states how to obtain the reduced-size SIS network reconstruction (6) from the original, full-size SIS network reconstruction problem (1).

Lemma 9 (From Full-Size to Reduced-Size SIS Network Reconstruction).

For all connected adjacency matrices and all viral state sequence , there is a viral state sequence with , such that the solution to the full-size SIS network reconstruction (1) satisfies:

  1. The following elements of equal the elements of the true adjacency matrix :

  2. The other elements of are the solution to the reduced-size SIS network reconstruction problem (11) whose objective function is changed by an additive term:

    (16)

    Here, denotes the degree of node when node 1 is removed from the graph given by the adjacency matrix , and is a natural number which is independent of the optimisation variables .

Proof.

Appendix D. ∎

The optimisation problem (16) resembles the reduced-size SIS network reconstruction (11), but the objective functions differ by the additive term . We show in Appendix E that the additive term does not have an impact on the difficulty: The NP-hardness of the reduced-size SIS network reconstruction (11) implies the NP-hardness of the optimisation problem (16). Since solving the full-size SIS network reconstruction problem (1) with the viral state sequence as input implies solving the NP-hard optimisation problem (16), we obtain the main theorem of this work:

Theorem 10 (SIS Network Reconstruction is NP-Hard).

For all connected adjacency matrices , the SIS network reconstruction problem (1) is NP-hard.

Proof.

Appendix E. ∎

We emphasise that the NP-hardness holds for any class of connected adjacency matrices , also for simple topologies such as paths or star graphs.

5 Conclusions

This work considers the computational complexity of finding the ML estimate of the network topology from observing a sampled-time SIS viral state trace. Instead of reconstructing a network for a given viral state sequence, we considered the reverse problem of designing a viral state sequence such that estimating the presence or absence of links either becomes computationally difficult (Lemma 6) or easy (first statement of Lemma 9).

Specifically, we have shown that any instance of the NP-hard maximum cut problem can be reduced to an instance of the SIS network reconstruction problem, whereby an instance of the latter problem is given by a viral state sequence. Thus, we have proved that the ML network reconstruction for SIS processes is NP-hard. In general, the exact ML estimate of the network topology can hence not be computed in polynomial time. The NP-hardness is a worst case result, and we emphasise two points. Firstly, it may be possible that the ML network reconstruction can be solved for some classes of practical problems within a reasonable computation time. Nevertheless, it remains to study which viral state sequences could result (possibly on average) in a low computational complexity. Secondly, considering the inapproximability results for the maximum cut problem [15], one might be tempted to conclude that an accurate reconstruction of the network for SIS processes is not possible in polynomial time. However, a thorough analysis of the accuracy of the exact ML estimator of an unweighted (and hence discrete valued) adjacency matrix is an open question.

Acknowledgements

We are grateful to Jaron Sanders for helpful discussions on this material.

References

  • [1] B. Prasse and P. Van Mieghem, “Exact network reconstruction from complete SIS nodal state infection information seems infeasible,” Submitted.
  • [2] H. L. Bodlaender, On the complexity of the maximum cut problem, vol. 91. Unknown Publisher, 1991.
  • [3] P. Van Mieghem, Performance Analysis of Complex Networks and Systems. Cambridge University Press, 2014.
  • [4]

    K. Devriendt and P. Van Mieghem, “Tighter spectral bounds for the cut size, based on laplacian eigenvectors,”

    Submitted.
  • [5] P. Van Mieghem and K. Devriendt, “An epidemic perspective on the cut size in networks,” Delft University of Technology, vol. 1, no. 19, 2015.
  • [6] M. R. Garey, D. S. Johnson, and L. Stockmeyer, “Some simplified NP-complete graph problems,” Theoretical computer science, vol. 1, no. 3, pp. 237–267, 1976.
  • [7] T. H. Cormen, Introduction to algorithms. MIT press, 2009.
  • [8] A. Caprara, “Constrained 0–1 quadratic programming: Basic approaches and extensions,” European Journal of Operational Research, vol. 187, no. 3, pp. 1494–1503, 2008.
  • [9] E. Boros and P. L. Hammer, “Pseudo-boolean optimization,” Discrete Applied Mathematics, vol. 123, no. 1-3, pp. 155–225, 2002.
  • [10] I. G. Rosenberg, “Reduction of bivalent maximization to the quadratic case,” Cahiers du Centre d’etudes de Recherche Operationnelle, vol. 17, pp. 71–74, 1975.
  • [11] M. R. Garey and D. S. Johnson, “A Guide to the Theory of NP-Completeness,” WH Freemann, New York, vol. 70, 1979.
  • [12] J.-C. Picard and H. D. Ratliff, “Minimum cuts and related problems,” Networks, vol. 5, no. 4, pp. 357–370, 1975.
  • [13] P. M. Pardalos and S. Jha, “Graph separation techniques for quadratic zero-one programming,” Computers & Mathematics with Applications, vol. 21, no. 6-7, pp. 107–113, 1991.
  • [14] F. Barahona, “A solvable case of quadratic 0–1 programming,” Discrete Applied Mathematics, vol. 13, no. 1, pp. 23–26, 1986.
  • [15] B. Gärtner and J. Matousek, Approximation algorithms and semidefinite programming. Springer Science & Business Media, 2012.
  • [16] J. Edmonds and E. L. Johnson, “Matching, euler tours and the chinese postman,” Mathematical programming, vol. 5, no. 1, pp. 88–124, 1973.

Appendix A Proof of Lemma 6

The objective function of (11) equals

(17)

where the last equality follows from the Markov property of the sampled-time SIS process. To reduce the zero-one UQP (6) to the reduced-size SIS network reconstruction problem (11), we show below that it is possible to construct a series of viral state transitions for the time points for all adjacency matrices , such that the objective function of the latter problem is of the form

(18)

with the coefficients and and an additive term which is constant with respect to the links and, hence, can be omitted in the optimisation problem (11). We prove Lemma 6 in five steps, on which we elaborate in detail in the respective Subsections A.1 to A.5.

  1. We design a viral state transition which results in setting the quadratic costs of (18) to a value. In Subsection A.5, we show that if the viral state transition occurs, then we obtain , and if it does not occur, then we obtain .

  2. We design a viral state transition which results in setting the linear costs of (18) to a positive value .

  3. We design a viral state transition which results in setting the linear cost of (18) to a negative value .

  4. We show how two transitions of the kind and can be connected by constructing a suitable transition sequence.

  5. We show that it is possible to construct a viral state sequence which is composed of several of the three kinds of viral state transitions and . If the viral state transition occurs multiple times, then the value of the coefficient increases. On the other hand, if the viral state transition occurs multiple times, then the value of the coefficient decreases333In the following Lemma 7, we show that the coefficient can be set (arbitrarily close) to any value in by adjusting the number of occurrences of the transitions and .. By choosing the multiplicity of the occurrence of viral state transitions , and , we show that the reduced-size SIS network reconstruction (11) becomes a zero-one UQP of the form (14).

a.1 Setting the Quadratic Costs

In order to set the coefficients for and , corresponding to the terms in the objective function (18), we construct the following special case of an infectious transition (3). The links and appear simultaneously in the probability for the infectious transition (3) if both node and node are infected at time , i.e. , and node 1 becomes infected at time , i.e. . We choose the viral state of node 2 as444If node and were the only infected nodes at time , then the transition probability (19) would equal zero if both elements and . In that case, we would not be able to express the logarithm of the transition probability in the form (20) for all values of the elements . and define the transition

The elements of the vector are given by , where is the Kronecker delta. The transition is a special case of an infectious transition (3) and, since in the reduced-size SIS network reconstruction (11), its transition probability is given by

(19)

To compute the objective function according to (17), we express the logarithm of the above transition probability (19) more compactly as

(20)

If solely the transition occurred once, then it follows from (20) that the quadratic cost of (18) would equal . We emphasise that the transitions only need to occur for and since the quadratic coefficients in the objective function (18) only occur for those values of and .

a.2 Setting the Linear Costs to a Positive Value

In order to set the coefficients , corresponding to the terms in the objective function of (18), to a positive value , we construct the following special case of an infectious transition (3). The link appears in the probability for the infectious transition (3) if node is infected at time , i.e. , and node 1 becomes infected at time , i.e. . Analogously to Subsection A.1, we choose the viral state of node 2 as and define the transition

The transition is a special case of an infectious transition (3). Since in the reduced-size SIS network reconstruction (11), the transition probability of is given by

(21)

To compute the objective function according to (17), we obtain the logarithm of the above transition probability (21) as

(22)

If solely the transition occurred once, then it follows from (22) that the linear cost of (18) would equal .

a.3 Setting the Linear Costs to a Negative Value

In order to set the coefficients , corresponding to the terms in the objective function of (18), to a negative value , we construct the following special case of a constant transition (4). The link appears in the probability for the constant transition (4) if node 1 is susceptible and node is infected ( and ). Hence, we define the transition

(23)

The transition is a special case of a constant transition (4) and its transition probability can be calculated as follows. From time to time , the probability of the infection of a node is

The probability of an infection of a node at the time is hence

since in the reduced-size SIS network reconstruction (11). The probability of the curing (2) of node equals . Thus, the probability for the constant transition (23) becomes

(24)

where

is constant with respect to the links and does not have to be considered in the optimisation problem (11). It holds that is in for all link estimates , which implies that . To compute the objective function according to (17), we obtain the logarithm of the transition probability (24) as

(25)

If solely the transition occurred once, then it follows from (25) that the linear cost of (18) would equal .

a.4 Connecting Viral State Transitions

In order to set the coefficients and for more than one node (or for more than one pair of nodes and ), the transitions , and must occur multiple times in the viral state sequence for different values of , and . Consider that one of the transitions , or occurs from time to and that another (not necessarily different) of the transitions , or shall occur from time to for some . For any connected adjacency matrix , there is a viral state sequence which transform the viral state at the end of one transition to the viral state at the beginning of another transition, as we show in the three steps below.

  1. If the transition is one of the infectious transition or , then node 1 is infected at time . In that case, we consider that node 1 cures from time to . In the two steps below, replace formally time by .

  2. The expressions (20), (22) and (25) influence the values of the coefficients and in the objective function (18). In order to give explicit expressions for coefficients and , we would like to achieve that the viral state transitions from time to do not have an influence on the values of any of the coefficients and , such that their value is solely determined by the expressions (20), (22) and (25).

    The coefficients and correspond to addends in the objective function (18), which include the links , and , which are incident to node 1. A link , which is incident to node 1, appears in the expressions for the probability of a viral state transition of the sampled-time SIS process for exactly two cases. Firstly, in the probability of an infectious transition (3) from time to only if node 1 is infected before or afterwards ( or ). Secondly, the link may appear in the probability of a constant transition (4) from time to . We thus would like to exclude these two kinds of transitions from time to .

    Hence, we want to construct the viral state transitions from time to such that the first node is constantly susceptible ( for ) and additionally, such that there is no constant transition (4) from time to . Then, the coefficients and in the objective function (18) are not affected by any of the viral state transitions from time to and are solely determined by the expressions (20), (22) and (25).

  3. The graph given by an adjacency matrix remains connected if node 1 is removed as stated above Definition 5. Thus, there exists a time and a finite sequence of non-constant transitions of the SIS process which transforms the viral state to any other viral state under the constraint that node 1 is susceptible for time to : The simplest of such transition sequences would be successive infections (3), resulting in all nodes being infected, with a subsequent curing (2) of those nodes for which shall hold.

For a network of six nodes, Figure 1 gives an illustration on how two infectious transitions, namely and , can be connected by the viral state sequence described in the three steps above.

Figure 1: An illustration of connecting two viral transitions, namely from time to and from time to , for a connected network of six nodes by the procedure described in Subsection A.4. Above the blue arrows, the respective transition probabilities are stated. It holds and for in the optimisation problem (11), and thus the transition probabilities from time to can be stated without the dependency on . On the other hand, both transitions and do depend on the elements . Since the transition from time to is an infectious transition, we consider that node 1 cures from time to according to step one in Subsection A.4. Then, following the description in step two and three of Subsection A.4, every node except node 1 becomes infected from time to . Subsequently, the nodes 3, 4 and 5 cure from time to as required for the first state of the transition . In Subsection A.5, the viral state sequence from time to is also denoted by