Resilience Bounds of Sensing-Based Network Clock Synchronization

09/10/2018 ∙ by Rui Tan, et al. ∙ 0

Recent studies exploited external periodic synchronous signals to synchronize a pair of network nodes to address a threat of delaying the communications between the nodes. However, the sensing-based synchronization may yield faults due to nonmalicious signal and sensor noises. This paper considers a system of N nodes that will fuse their peer-to-peer synchronization results to correct the faults. Our analysis gives the lower bound of the number of faults that the system can tolerate when N is up to 12. If the number of faults is no greater than the lower bound, the faults can be identified and corrected. We also prove that the system cannot tolerate more than N-2 faults. Our results can guide the design of resilient sensing-based clock synchronization systems.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

For distributed systems such as sensor networks, accurate clock synchronization among the distributed nodes is important. Correct timestamps make sense data; synchronized clocks enable punctual coordinated operations among the nodes. In contrast, desynchronized clocks will undermine system performance and even lead to physical damages and system disruptions in time-critical systems. However, various factors present significant challenges to maintain resilient clock synchronization of distributed systems, such as large network sizes, deep embedding of the nodes into complex physical environments with various disturbances, and exposure of the systems to cybersecurity threats.

Network Time Protocol (NTP) [1]

is the foremost means of clock synchronization that is widely known and adopted. Its design principle of estimating the offset between the clocks of a pair of nodes based on the network transmission delays of the synchronization packets is also a basis for many other clock synchronization protocols such as Precision Time Protocol (PTP)

[2] for industrial Ethernets and RBS [3], TPSN [4], and FTSP [5] for sensor networks. However, as discussed in RFC 7384 [6], the NTP principle is susceptible to various cybersecurity threats. While most of the vulnerabilities can be solved by conventional security measures such as cryptographic authentication and encryption, a simple packet delay attack that delays the transmissions of the synchronization packets has remained as an open issue that cannot be solved by conventional security measures [6, 7, 8].

To address the packet delay attack, our previous studies [9, 10] have developed sensing-based clock synchronization approaches exploiting external periodic signals that are practically difficult for the attacker to tamper with or jam. Specifically, in [9], the minute fluctuations of the power grid voltage cycle lengths, which are similar across a geographic area served by the same power grid, are used as a time fingerprint to develop a clock synchronization approach that is secure against the packet delay attack. In [10], the power grid voltage phase, which is nearly identical anytime within a city-scale power grid, is integrated into the NTP principle and achieve the security against the packet delay attack as long as a verifiable condition is satisfied.

These sensing-based approaches focus on the peer-to-peer (p2p) clock synchronization for a node pair. Although they well address the cybersecurity concern regarding the packet delay attack, they may be susceptible to the process noises of the external signals and sensor hardware noises/faults. For instance, as shown in [9], an insufficiently long time fingerprint may lead to faults in estimating the clock offset between a pair of nodes. In [10], when the round-trip time of an NTP synchronization session exceeds twice of the power grid voltage cycle, the approach will yield multiple clock offset estimates, causing ambiguity. Given the criticality of trustworthy clock synchronization, it is important to develop methods with understood resilience bounds to deal with the nonmalicious synchronization faults of the sensing-based clock synchronization approaches.

In this paper, based on a general class of p2p sensing-based clock synchronization, we study the resilience of network clock synchronization for a network of nodes against the p2p synchronization faults. Upon the occurrence of a fault between a pair of nodes, the measured offset between the two nodes’ clocks will have an error of a multiple of the period of the used external signal. In the network clock synchronization, every node pair in the network performs a p2p clock synchronization session and returns the measured clock offset to a central node. Based on a total of clock offset measurements, the central node uses an algorithm to estimate the offsets of all nodes’ clocks from a selected reference node’s clock, while accounting for the possible p2p synchronization faults. Specifically, each step of the algorithm assumes that out of totally p2p synchronization sessions are faulty, exhaustively tests all possible distributions of these faulty p2p synchronization sessions, and yields a solution once the estimated clock offsets and the estimated p2p clock synchronization faults agree with all the p2p clock offset measurements. Starting from , the algorithm increases by one in each step and terminates once a solution is found. Thus, this algorithm does not require any run-time knowledge about the p2p synchronization faults, including the number of the faults and their distribution among the p2p synchronization sessions.

4 5 6 7 8 9 10 11 12
Lower bound of 1 1 2 2 2 3 4 5 5
tolerable faults
Lower bound of 17 10 13 10 7 8 7 9 8
tolerance (%)
TABLE I: Lower bound of tolerable faults.

Based on the algorithm, we inquire basic questions regarding the scaling laws of system resilience, such as how many p2p synchronization faults that any -node system can tolerate in that the algorithm will not give wrong estimates of the clock offsets and the p2p clock synchronization faults. Our analysis gives the lower bound of the number of p2p synchronization faults that any -node system can tolerate when is up to 12. The result is given in Table I. If the number of faults is no greater than the lower bound, the faults can be identified and corrected by the algorithm. By defining the tolerance as the ratio between the number of tolerable faults to the total number of p2p synchronization sessions, the third row of Table I shows the lower bound of the tolerance. Our results can guide the design of network clock synchronization systems with potential p2p synchronization faults. Moreover, we prove that any -node system with cannot tolerate more than p2p synchronization faults.

When the number of faults is greater than the lower bound given in Table I and no greater than the upper bound, whether the system can tolerate the faults is still an open issue. It is of great interest for future research to explore the tight bound of the fault tolerance.

The remainder of this paper is organized as follows. Section II reviews related work. Section III introduces the background and states the problem. Section IV analyzes the resilience bounds. Section VI concludes the paper.

Ii Related Work

Highly stable time sources are often ill-suited for sensor networks. Despite initial study of using chip-scale atomic clock (CSAC) on sensor platforms [11], CSAC is still too expensive ($1,500 per unit [11]) for wide adoption. The Global Positioning System (GPS) and several timekeeping radio stations (e.g., WWVB in U.S.) can provide highly stable global time. However, GPS and radio receivers have various limitations such as high power consumption, poor signal reception in indoor environments (e.g., 47% good time for WWVB [12]), and susceptibility to wireless spoofing attacks [13]. Thus, GPS and radio receivers are often used on a limited number of time masters with clear sky views, carefully installed antennas, and sufficient physical air gap to provide global time to a large number of slave nodes via some clock synchronization protocol (e.g., NTP). The resilience of this clock synchronization protocol between the master and the slaves is the focus of this paper.

Various sensing-based approaches exploit external periodic signals for clock synchronization [9, 10, 14], time fingerprinting [15, 16, 9, 17], and clock calibration [18, 19, 20, 21]. Time fingerprinting approaches focus on studying the global time information embedded in the sensing data such as microseisms [15], sunlight [16], and powerline electromagnetic radiation (EMR) [17]. They can be a basis for clock synchronization. For instance, the secure clock synchronization approach in [9] is based on the time fingerprints found in power grid voltage. These studies focus on the p2p synchronization. In this paper, we study the resilience bounds of network clock synchronization against p2p synchronization faults.

Different from clock synchronization that ensures the clocks to have the same value, clock calibration ensures different clocks to advance at the same speed. The approaches presented in [18, 19, 20, 21] exploit powerline EMR, fluorescent lamp flickering, Wi-Fi beacons, and FM Radio Data System broadcasts to calibrate clocks. However, clock calibration does not address the resilience issues of clock synchronization. In particular, the sensing-based clock calibration is also prone to faults that can subvert the network clock synchronization.

The resilience of network clock synchronization against Byzantine clock faults has been studied [22, 23]. A Byzantine faulty clock gives an arbitrary clock value whenever being read. It has been proved that, to guarantee the synchronization of non-faulty clocks in the presence of faulty clocks, a total of at least clocks are needed. Different from the Byzantine faulty clock model, we consider faulty p2p synchronization sessions between clocks. The conversion of our problem to the Byzantine clock synchronization problem by considering either node involving a faulty p2p synchronization session as a faulty clock is invalid, because this faulty clock after the conversion is not a Byzantine faulty clock, unless all p2p synchronization sessions involving this clock are faulty. As our problem does not have this assumption, the resilience bound obtained in [22, 23] is not applicable to our problem.

Iii Background and Problem Statement

Iii-a Background and Preliminaries

Fig. 1: Principle of sensing-based p2p clock synchronization.

Iii-A1 Sensing-based p2p clock synchronization

This section describes the principle of the sensing-based p2p clock synchronization that exploits external periodic signals. Without loss of generality, we assume that the external periodic signals sensed by the two peers, nodes and , are two synchronous Dirac combs with the same period . Fig. 1 illustrates the two Dirac combs in the same Newtonian time frame. The objective of the sensing-based p2p clock synchronization is to estimate the offset between ’s and ’s clocks by using the Dirac combs.

To simplify the analysis of the clock offset estimation, we assume that ’s and ’s clocks advance at the same speed, such that the offset between the two clocks is a constant within a concerned time period before any clock is adjusted according to the estimated clock offset to achieve clock synchronization. In existing sensing-based p2p clock synchronization approaches [9, 10, 14], a synchronization session, i.e., the process of estimating the clock offset, takes a short time (e.g., tens of milliseconds in [10]). Typical crystal oscillators found in microcontrollers and personal computers have drift rates of 30 to 50 parts-per-million (ppm) [20]. Thus, the change of the clock offset during a synchronization session of 100 milliseconds is at most 5 microseconds only, whereas the clock offset estimation errors of successful synchronization sessions are at sub-millisecond [9, 10] or milliseconds levels [14] in practice. Thus, the clock offset estimation errors caused by signal noises are much larger than those caused by the two peers’ different clock speeds.

Iii-A2 Fault model

A synchronization session is successful (or non-faulty) if it identifies the correspondence between an ’s Dirac impulse and a ’s Dirac impulse that occur at the same Newtonian time instant; otherwise, the synchronization session is faulty. Since the two Dirac combs are synchronous, a successful synchronization session gives a zero clock offset estimation error, whereas a faulty synchronization session gives a clock offset estimation error of , where is a non-zero integer.

Iii-A3 Other related issues

It has been shown in [9, 10], if the Dirac combs are practically difficult for the attacker to tamper with or jam, the sensing-based p2p clock synchronization can address the packet delay attack, which is an open issue that cannot be solved by conventional security measures [6, 7, 8]. However, due to process noises of the external signals and sensor hardware noises/faults, the sensing-based p2p synchronization can be faulty. For self-containment of this paper, Appendix -A reviews the detailed reasons of the faults. In this paper, we focus on the fault tolerance of sensing-based synchronization. Built upon the secure p2p synchronization [9, 10], the clock synchronization approach presented in this paper is resilient against both the packet delay attacks and synchronization faults.

We note that, in practice, the two Dirac combs may not be perfectly synchronous. For instance, in [9, 10], the time displacement between the two Dirac combs is about 0.05% to 0.5% of . This time displacement is the major source of the clock offset estimation error. As the time displacements are much smaller than the synchronization faults (i.e.,

), we can easily classify successful and faulty synchronization sessions by comparing the clock offset estimation error with a threshold (e.g.,

). For simplicity of exposition, we ignore the time displacement in our analysis regarding the system resilience against faulty synchronization sessions.

Iii-B Network Clock Synchronization

To improve the robustness of clock synchronization against p2p synchronization faults, this section proposes an approach to cross-check the p2p synchronization results among multiple nodes and correct the faults if present.

Consider a system of nodes: . Let denote the offset between the clocks of and , which is unknown and to be estimated. Specifically, , where and are the clock values of and at any given time instant , respectively. As discussed in Section III-A, we assume that is time-invariant. By designating as the reference node, we have . Any pair of two nodes, and , will perform a synchronization session using the sensing-based p2p clock synchronization to measure . Denote by the synchronization session between and . Denote by the measured clock offset. If the synchronization session is successful, ; if the synchronization session is faulty, , where is the p2p synchronization fault. Every node pair performs a p2p synchronization session. Thus, there will be a total of p2p synchronization sessions.

All the clock offset measurements are transmitted to a central node, which runs a fault-tolerate network clock synchronization algorithm. Denote by and the estimates for and , respectively. A general equation system assuming all the p2p synchronization sessions are faulty is

(1)

The variables to be solved are the unknowns and , where is the estimated clock offset between and the reference node ; is the estimated p2p clock synchronization fault between and .

If the network clock synchronization algorithm considers that a total of p2p synchronization sessions are faulty, it keeps estimated p2p synchronization faults (i.e., ) in Eq. (1) and removes other estimated p2p synchronization faults. Thus, there will be possible distributions of the estimated p2p synchronization faults among a total of p2p synchronization sessions. Algorithm 1 shows the pseudocode of algorithm. It starts by assuming there are no faults (i.e., ). In each iteration that increases by one, it solves Eq. (1) for all possible distributions of the estimated p2p synchronization faults. Once a solution is found, Algorithm 1 returns.

0:  
0:  
1:  
2:  while  do
3:     for each distribution of the estimated p2p synchronization faults among the p2p synchronization sessions do
4:         if the corresponding Eq. (1) has a solution then
5:            return
6:         end if
7:     end for
8:     
9:  end while
Algorithm 1 Fault-tolerate network clock synchronization.

Algorithm 1 requires neither the number nor the distribution of the actual p2p synchronization faults. Whether it can correct the faults and how many faults it can tolerate will be the focus of this paper. Algorithm 1 is executed on a central node; its fault tolerance performance, which is the focus of this paper, will provide important understanding.

Iii-C Problem Statement

Definition 1 (-resilience).

Let denote the number of faulty p2p synchronization sessions among a total of sessions in an -node system. The system with Algorithm  1 is -resilient if the algorithm can correct any non-zero p2p synchronization faults.∎

From Algorithm 1, we define the -resilience condition that can be used to check whether a system is -resilient.

Definition 2 (-resilience condition).

A system with Algorithm 1 is -resilient if the following conditions are satisfied:

  1. , Eq. (1) constructed with any distribution of the actual p2p synchronization faults and any distribution of the estimated p2p synchronization faults has no solutions;

  2. When , for any distribution of the actual p2p synchronization faults and any distribution of the estimated p2p synchronization faults,

    1. if the distribution of the estimated p2p synchronization faults is identical to the distribution of the actual faults, Eq. (1) has a unique solution;

    2. otherwise, Eq. (1) has no solutions. ∎

Note that in the condition 2)-a) of Definition 2, the unique solution must give the correct estimates of the clock offsets and the p2p synchronization faults.

We aim at analyzing the following resilience bounds:

Definition 3 (Lower bound of maximum resilience).

A function is a lower bound of maximum resilience if any -node system with Algorithm 1 is -resilient for .

Definition 4 (Upper bound of maximum resilience).

A function is a upper bound of maximum resilience if any -node system with Algorithm 1 is not -resilient for .

Definition 5 (Tight bound of maximum resilience).

A function is a tight bound of maximum resilience if any -node system with Algorithm 1 is -resilient for and not -resilient for .

Iv Vectorization and -Resilience

Iv-a Vectorization

We vectorize the representation of Eq. (

1) that is solved by Line 4 of Algorithm 1. Define composed of all clock offset estimates, i.e., . Define composed of the p2p synchronization fault estimates. Eq. (1) can be rewritten as , where and are two matrices composed of -1, 0, and 1 containing coefficients corresponding to and , respectively; the vector consists of all the measured clock offsets. To simplify notation, we define and . From the Rouché-Capelli theorem [24], the necessary and sufficient condition that has no solutions is , where is the augmented matrix.

Iv-B -Resilience under Certain Settings

This section presents the analysis on the -resilience of an -node system with Algorithm 1 under certain settings of and . This analysis provides insights into the more general analysis of the lower/upper bounds of maximum resilience.

Proposition 1.

A 3-node system is not 1-resilient.

Proof.

Consider a case where the p2p synchronization session is faulty. When in Algorithm 1, the vectorized equation system in Eq. (1) is

Note that and are empty. With , Gaussian elimination shows that . Thus, the equation system has no solutions and Algorithm 1 will move on to the case of . The algorithm will attempt to test all the possible cases of a single faulty p2p synchronization session. For instance, when the algorithm assumes that is faulty, the equation system is

With , we have and has full column rank. Thus, the equation system has a unique solution. Therefore, the condition 2)-b) of Definition 2 is not satisfied and the 3-node system is not 1-resilient. In fact, the unique solution must be a wrong solution, which is . ∎

Proposition 2.

A 4-node system is 1-resilient.

We provide a sketch of the proof as follows instead of the complete proof due to space limit. Consider a case where the p2p synchronization session is faulty. When in Algorithm 1, similar to Proposition 1, the equation system has no solutions and Algorithm 1 will move on to the case of . The algorithm will test all the possible cases of a single faulty p2p synchronization session. For instance, when the algorithm assumes is faulty, the vectorized equation system is

(2)

As , the equation system has no solutions. An exhaustive check shows that, only when the algorithm assumes the synchronization session between and is faulty, the equation system has a unique solution (i.e., and has full column rank). Thus, the algorithm can correct the fault. In fact, it can be verified that, for the 4-node system, no matter which p2p synchronization session is faulty, the algorithm can correct the fault. Therefore, the 4-node system is 1-resilient.

Proposition 3.

A 4-node system is not 2-resilient.

Proof.

Consider the 4-node system with two faulty p2p synchronization sessions: and . When , the equation system has no solutions. When , consider a case where is assumed to be faulty by the algorithm. The vectorized equation system is

(3)

If , and the equation system has no solutions. However, if , and has full column rank; the equation system has a unique wrong solution of . Although this counterexample against the 4-node system’s 2-resilience is obtained under a certain condition of , we can conclude that the 4-node system is not 2-resilient. ∎

To gain more insights, we also analyze a case of with and assumed to be faulty by the algorithm. The vectorized equation system is

(4)

As and has full column rank, the equation system has a unique solution, which violates the 2-resilience condition. In fact, the equation system has a unique wrong solution that does not require any relationship between and : .

Proposition 4.

A 5-node system is 1-resilient.

We provide a sketch of the proof as follows instead of the complete proof due to space limit. Consider a 5-node system with one p2p synchronization fault. The resilience is independent from how we name the nodes. We name the two involving nodes of the faulty synchronization session to be and . An exhaustive check over all the possible cases for a single assumed faulty synchronization session shows that the 1-resilience condition is satisfied. Thus, the 5-node system is 1-resilient.

Proposition 5.

A 5-node system is not 2-resilient.

Proof.

We consider a 5-node system, in which (i) the p2p synchronization sessions and are faulty and (ii) the p2p synchronization sessions and are assumed by the algorithm to be faulty. The vectorized equation system is

(5)

If , the equation system has a unique solution of , which violates the resilience condition. Thus, a 5-node system is not 2-resilient. ∎

Iv-C Re-Vectorization

In Section IV-B, we adopt an approach of enumerating counterexamples to prove that a system is not -resilient. As shown in the proofs of Propositions 3 and 5, if the actual faults satisfy certain conditions, the rank of may change, presenting a pitfall to the approach of enumerating counterexamples. This motivates us to consider the actual faults as the variables of the equation system in Eq. (1). The following re-vectorization will be used in Section V-A to derive the lower bound of maximum resilience.

By defining a vector composed of the actual p2p synchronization faults, we can reformat to include the actual faults into the vector of unknowns:

(6)

is a matrix corresponding to , consists of the actual clock offsets.

The re-vectorization of the equation systems in Eqs. (2), (3), and (4) are respectively given by

(7)
(8)
(9)

In Eq. (7), and has full column rank. Thus, Eq. (7) has a unique solution, which is . This is consistent with the observation in the proof sketch of Proposition 2 that has no solutions if .

In Eq. (8), and is not full column ranked. Thus, has an infinite number of solutions. Applying Guassian elimination to Eq. (8) gives , where and are considered as variables in , not as constants in . The above result means that there exist non-zero and such that the solution of is wrong.

In Eq. (9), and is not full column ranked. Thus, has an infinite number of solutions. Applying Gaussian elimination to Eq. (9) gives the relationship derived in the proof of Proposition 3, i.e., , where and are considered as variables in , not as constants in . The above result also shows that there exist non-zero and such that the solution of is wrong.

From the above examples, we can see that the solution to re-vectorization captures the condition that the actual faults need to satisfy such that the will give wrong solutions.

V Bounds of Maximum Resilience

V-a Lower Bound of Maximum Resilience

In this section, we first develop two lemmas, Lemma 1 and Lemma 2. The proof of Lemma 2 uses Lemma 1. Then, we prove Proposition 6 using Lemma 2. Proposition 6 gives a sufficient condition that a system is -resilient. This condition can be used to compute the lower bound of maximum resilience for any -node system.

Lemma 1.

always has one or more solutions. When has full column rank, the original either has no solutions or has a unique correct solution.

Proof.

The satisfying (i) , , (ii) , and (iii) must be a solution. We denote this solution as . As shown in previous examples, can have an infinite number of solutions. Therefore, always holds and always has one or more solutions.

When has full column rank, has a unique solution that must be . The in this solution means that the original does not allow any p2p synchronization fault. We now consider two cases. First, in the presence of any p2p synchronization fault, the must have no solutions; otherwise, the solution of conflicts with the unique solution of with . Second, in the absence of synchronization fault, the unique solution encompasses the unique correct solution of . ∎

We say that an estimated p2p synchronization fault is correctly positioned if the corresponding p2p synchronization session is truly faulty. For example, in Eq. (9), the is correctly positioned, but the is not correctly positioned.

Lemma 2.

When , where is the number of correctly positioned estimated p2p synchronization faults, the original either has no solutions or has a unique correct solution.

Proof.

We define three sets: (1) is the set of the subscripts of the estimated p2p synchronization faults, (2) is the set of the subscripts of the actual p2p synchronization faults, (3) is the set of the subscripts of the correctly positioned estimated p2p synchronization faults.

When , the given condition ensures that has full column rank. From Lemma 1, has either no solutions or a unique correct solution.

The rest of the proof considers . We now prove that the is the entire solution space of . First, clearly, is a solution subspace of , because it is the correct solution to a system with actual non-zero p2p synchronization faults and correct distribution of the estimated p2p synchronization faults. The dimension of is the cardinality of (i.e., ), because only the are the free variables. Second, as and the number of variables is , the dimension of the entire solution space is . From the above two statements, the solution subspace and the entire solution space of have the same dimension. From the uniqueness of the solution space of linear equation system, the is the entire solution space of .

The ’s condition , means that the original does not allow any actual p2p synchronization fault without a corresponding estimated p2p synchronization fault. In the absence of actual p2p synchronization faults, the unique solution encompasses the unique correct solution of . In the presence of actual p2p synchronization faults, there are two cases.

  1. If , is the unique correct solution of ;

  2. Otherwise, we must have . As a result, the must have no solutions, because otherwise the fact that allows non-zero actual p2p synchronization faults only conflicts with the fact that there are non-zero actual p2p synchronization faults.

Based on Lemma 2, the following proposition can be used to compute the lower bound of maximum resilience.

Proposition 6.

A system is -resilient if , for any distribution of the actual p2p synchronization faults and any distribution of the estimated p2p synchronization faults, , where is the number of correctly positioned estimated p2p synchronization faults.

Proof.

As , from Lemma 1, the original either has no solutions or has a unique correct solution. We now analyze the cases considered in Definition 2:

  1. When , since , the solution of cannot be correct. Thus, the has no solutions.

  2. When ,

    1. if the distribution of the estimated p2p synchronization faults is identical to the distribution of the actual synchronization faults, as the statement that has no solution must not be true (because the correct solution is a solution), must have a unique (and correct) solution.

    2. otherwise, since the distributions are different, the solution of cannot be correct. Thus, the has no solutions.

In summary, ensures that the -resilience condition is satisfied. ∎

0:  The number of nodes
0:  A lower bound of maximum resilience
1:  
2:  while  do
3:     for each distribution of the actual p2p synchronization faults among the p2p synchronization sessions do
4:         
5:         while  do
6:            for each distribution of the estimated faults among the p2p synchronization sessions do
7:               determine the value of (i.e., the number of correctly positioned estimated faults)
8:               if  then
9:                   return  
10:               end if
11:            end for
12:            
13:         end while
14:     end for
15:     
16:  end while
Algorithm 2 Compute a lower bound of maximum resilience
(10)
 

Based on Proposition 6, Algorithm 2 computes a lower bound of maximum resilience for any -node system. Specifically, by starting with no synchronization faults (i.e., ), it increases by one in each step of the outer loop to check whether the -node system is -resilient. The condition of in Line 2 is from Proposition 7 that the system is not -resilient if . The loops from Line 3 to Line 6 will generate all possible combinations of the distributions of actual and estimated synchronization faults. In Line 8, we check whether the sufficient condition in Proposition 6 is met. If not, the current value of has already exceeded the lower bound of maximum resilience. Thus, the algorithm returns as the lower bound.

Table I shows the results computed by Algorithm 2 for up to 12. We can see that the lower bound of maximum resilience is a non-decreasing function of , which is consistent with intuition. We also compute the lower bound of tolerance as , i.e., the percentage of the faulty p2p synchronization sessions to ensure correct network clock synchronization. The last row of Table I shows the lower bound of tolerance.

V-B Upper Bounds of Maximum Resilience

Proposition 7.

is an upper bound of maximum resilience, i.e., any -node system is not -resilient when .

Proof.

We prove by an counterexample where all the p2p synchronization sessions involving the node are faulty. The remaining faulty p2p synchronization sessions may occur between any other node pairs. Consider that Algorithm 1 is testing a distribution of the p2p synchronization faults that is identical to the actual distribution. Since the true clock offsets and the true p2p synchronization faults must form a valid solution to the equation system, we have .

The matrix of the vectorized equation system is given by Eq. (10). We add labels to help understanding each column’s corresponding unknown to be solved and each row’s corresponding p2p synchronization session. In the first column of that corresponds to the clock offset estimate , the first element and the last elements that correspond to all p2p synchronization sessions involving are non-zeros; all other elements are zero. This column is a linear combination of the columns corresponding to . Thus, is not full column ranked. Therefore, the equation system have an infinite number of solutions, which violates the resilience condition. ∎

Vi Conclusion and Future Work

This paper studies how many p2p synchronization faults that an -node system can tolerate in achieving network clock synchronization. Table I gives the lower bound of maximum resilience under certain settings of . We also prove that is an upper bound of maximum resilience.

It is interesting to study the following issues not addressed in this paper:

  1. The tight bound of maximum resilience is still an open issue. However, even if the upper bound given by Proposition 7 is tight, the tolerance still decreases with when . It suggests that increasing the number of nodes is not beneficial in terms of fault tolerance. In future work, we will study how to reduce the number of p2p synchronization sessions and examine whether doing so can improve the fault tolerance.

  2. Algorithm 1 and our analysis do not exploit the property that each fault is a multiple of . If this discrete property is used, intuitively, the fault tolerance can be improved.

References

  • [1] D. L. Mills, “Internet time synchronization: the network time protocol,” IEEE Trans. Commun., vol. 39, no. 10, pp. 1482–1493, 1991.
  • [2] “Ieee standard for a precision clock synchronization protocol for networked measurement and control systems,” IEEE Std 1588-2008 (Revision of IEEE Std 1588-2002), pp. 1–300, July 2008.
  • [3] J. Elson, L. Girod, and D. Estrin, “Fine-grained network time synchronization using reference broadcasts,” ACM SIGOPS Operating Systems Review, vol. 36, no. SI, pp. 147–163, 2002.
  • [4] S. Ganeriwal, R. Kumar, and M. B. Srivastava, “Timing-sync protocol for sensor networks,” in SenSys.   ACM, 2003, pp. 138–149.
  • [5] M. Maróti, B. Kusy, G. Simon, and Á. Lédeczi, “The flooding time synchronization protocol,” in SenSys.   ACM, 2004, pp. 39–49.
  • [6] T. Mizrahi, “Security requirements of time protocols in packet switched networks,” 2014, https://tools.ietf.org/html/rfc7384.
  • [7] ——, “A game theoretic analysis of delay attacks against time synchronization protocols,” in International Symposium on Precision Clock Synchronization for Measurement Control and Communication, 2012.
  • [8] M. Ullmann and M. Vögeler, “Delay attacks – implication on ntp and ptp time synchronization,” in International Symposium on Precision Clock Synchronization for Measurement, Control and Communication, 2009.
  • [9] S. Viswanathan, R. Tan, and D. K. Yau, “Exploiting power grid for accurate and secure clock synchronization in industrial iot,” in RTSS.   IEEE, 2016, pp. 146–156.
  • [10] D. Rabadi, R. Tan, D. K. Yau, and S. Viswanathan, “Taming asymmetric network delays for clock synchronization using power grid voltage,” in AsiaCCS.   ACM, 2017, pp. 874–886.
  • [11] A. Dongare, P. Lazik, N. Rajagopal, and A. Rowe, “Pulsar: A wireless propagation-aware clock synchronization platform,” in RTAS, 2017.
  • [12] Y. Chen, Q. Wang, M. Chang, and A. Terzis, “Ultra-low power time synchronization using passive radio receivers,” in IPSN, 2011.
  • [13] T. Nighswander, B. Ledvina, J. Diamond, R. Brumley, and D. Brumley, “Gps software attacks,” in CCS.   ACM, 2012, pp. 450–461.
  • [14] Z. Yan, Y. Li, R. Tan, and J. Huang, “Application-layer clock synchronization for wearables using skin electric potentials induced by powerline radiation,” in SenSys, 2017.
  • [15] M. Lukac, P. Davis, R. Clayton, and D. Estrin, “Recovering temporal integrity with data driven time synchronization,” in SenSys, 2009.
  • [16] J. Gupchup, R. Musăloiu-e, A. Szalay, and A. Terzis, “Sundial: Using sunlight to reconstruct global timestamps,” in EWSN, 2009, pp. 183–198.
  • [17] Y. Li, R. Tan, and D. K. Yau, “Natural timestamping using powerline electromagnetic radiation.” in IPSN, 2017, pp. 55–66.
  • [18] A. Rowe, V. Gupta, and R. R. Rajkumar, “Low-power clock synchronization using electromagnetic energy radiating from ac power lines,” in SenSys.   ACM, 2009, pp. 211–224.
  • [19] Z. Li, W. Chen, C. Li, M. Li, X.-Y. Li, and Y. Liu, “Flight: Clock calibration using fluorescent lighting,” in MobiCom.   ACM, 2012.
  • [20] T. Hao, R. Zhou, G. Xing, and M. Mutka, “Wizsync: Exploiting wi-fi infrastructure for clock synchronization in wireless sensor networks,” in RTSS, 2011, pp. 149–158.
  • [21] L. Li, G. Xing, L. Sun, W. Huangfu, R. Zhou, and H. Zhu, “Exploiting FM radio data system for adaptive clock calibration in sensor networks,” in MobiSys.   ACM, 2011, pp. 169–182.
  • [22] D. Dolev, J. Halpern, and H. R. Strong, “On the possibility and impossibility of achieving clock synchronization,” in PODC, 1984.
  • [23] L. Lamport and P. M. Melliar-Smith, “Synchronizing clocks in the presence of faults,” JACM, vol. 32, no. 1, pp. 52–78, 1985.
  • [24] I. R. Shafarevich and A. Remizov, Linear algebra and geometry.   Springer Science & Business Media, 2012.