Fundamental Limits of Invisible Flow Fingerprinting

09/23/2018 ∙ by Ramin Soltani, et al. ∙ University of Massachusetts Amherst 0

Network flow fingerprinting can be used to de-anonymize communications on anonymity systems such as Tor by linking the ingress and egress segments of anonymized connections. Assume Alice and Bob have access to the input and the output links of an anonymous network, respectively, and they wish to collaboratively reveal the connections between the input and the output links without being detected by Willie who protects the network. Alice generates a codebook of fingerprints, where each fingerprint corresponds to a unique sequence of inter-packet delays and shares it only with Bob. For each input flow, she selects a fingerprint from the codebook and embeds it in the flow, i.e., changes the packet timings of the flow to follow the packet timings suggested by the fingerprint, and Bob extracts the fingerprints from the output flows. We model the network as parallel M/M/1 queues where each queue is shared by a flow from Alice to Bob and other flows independent of the flow from Alice to Bob. The timings of the flows are governed by independent Poisson point processes. Assuming all input flows have equal rates and that Bob observes only flows with fingerprints, we first present two scenarios: 1) Alice fingerprints all the flows; 2) Alice fingerprints a subset of the flows, unknown to Willie. Then, we extend the construction and analysis to the case where flow rates are arbitrary as well as the case where not all the flows that Bob observes have a fingerprint. For each scenario, we derive the number of flows that Alice can fingerprint and Bob can trace by fingerprinting.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Given the presence of communication systems in daily life and their rapid growth, e.g., cellular networks, internet of things, etc., security and privacy has emerged as a vital area of research and development [2, 3, 4, 5, 6, 7, 8, 9]. For every communication system, security involves not only allowing authorized users to communicate a message in a way that the message content is protected from unauthorized users, but also preventing access by malicious users. Hence, breaking the anonymity of users in an anonymous network such as Tor, Bitblinder, and Darknet plays a major role in preventing malicious use of technology.

Even if the messages are encrypted, traffic analysis can be used to infer sensitive information from the packet characteristics such as timing patterns, sizes, and packet rates. For instance, packet timings can reveal information about passwords sent over SSH channels [10]. Also, traffic analysis can discover stepping stone attacks where malicious users employ compromised computers to relay their traffic [11, 12]. Furthermore, it can be used to find correlations between input and output links of a network to reveal connections between the links [13].

Unlike passive traffic analysis which involves only recording traffic characteristics, such as packet timings, active traffic analysis involves both recording and modifying traffic characteristics to embed information in them. For instance, in flow watermarking [14, 15, 16], watermarks are embedded into flows by changing their packet timings according to a unique secret pattern. Therefore, each flow contains one bit of information indicating whether it contains the watermark. However, in flow fingerprinting, the embedded patters are used to communicate information such as the identity of the party that performed fingerprinting [17], the location of the flow in the network where it was fingerprinted [1], and the time when the fingerprint was embedded. Thus necessarily this will convey more than one bit of information.

Active traffic analysis has emerged as a vibrant area of research recently. In [18], the authors propose detecting stepping stones using flow watermarking. Peng et al. [19] show that this method is detectable and propose attacks on it. Wang et al. [20] show that the anonymity of VoIP calls made over an anonymity network can be broken using watermarking methods. Kiyavash et al. [21] propose a multi-flow attack on interval-based watermarking methods, which delay packets of specific intervals based on the value of the watermarks. Houmansadr et al. propose RAINBOW watermarking [14] and SWIRL [15] which is a scalable traffic analysis method resilient against aggregated-flows attacks. They also study the capacity of flow watermarking [22] and propose a flow fingerprinting scheme allowing fingerprinting of millions of flows by perturbing the packet timings of relatively short lengths of flows [23]. Rezaei et al. [24] introduce an active fingerprinting method called TagIt that works by slightly delaying packets into secret time intervals.

[width=height=keepaspectratio]fig2.png

(a) Setting 1: The network is modeled as independent parallel queues where each queue is shared between a flow from Alice to Bob (main flow) and other interfering flows that are independent of the main flow.

[width=1]fig11.png

(b) Setting 2: The network is modeled as independent parallel queues with single input/output where each queue conveys a flow from Alice to Bob.
Fig. 1: Alice may fingerprint the flows, and Bob receives the fingerprinted flow after they pass through the network which adds timing noise to the fingerprints. Willie who is warden of the network protects the links from being traced; he wishes to determine whether Alice has fingerprinted flows.

Previous active traffic analysis methods do not offer theoretical guarantees on the trade-off between performance (number of the flows) and invisibility, i.e., altering the packet timings so that the outcome is statistically indistinguishable from intact packet timings. When the traffic analyzer is the warden of the network who protects the links from being traced by anonymous users (e.g., for de-anonymization), invisibility of traffic analysis is important since attackers (anonymous users) can evade analysis if they are aware of the fingerprinting process. Even when the traffic analyzer is not the network warden, the invisibility of the traffic analysis is crucial in order to hide from the network warden. In this paper, we consider invisible fingerprinting to trace the input and output links of a network in the presence of a network warden. Consider an anonymous network where connections between input and output links are unknown. We model the network as parallel work conserving queues with Poisson arrivals and exponential service times ( queues) and First In First Out (FIFO) discipline. Queues are independent and each queue is shared by a flow from the input of a network to the output of the network and other flows independent of the flow from Alice to Bob (see Fig. 0(a)). Alice has access to the input flows and she can buffer and release packets when she desires. On the other side of the network, Bob has access to the output flows so he can read the packet timings of the flows. Alice and Bob wish to perform fingerprinting to infer the connections between input links and output links, without being detected by Willie whose goal is to discover flow fingerprints.

We consider the following problem: in a time interval of length , can Alice and Bob perform fingerprinting to link input and output flows of the network without being detected by Willie, and if yes, how can they do so and what is the maximum number of flows that they can link reliably? For the case where packet timings of each flow is an independent instantiation of a Poisson process, we present the construction and analysis, and calculate the asymptotic expression for as a function of . We first assume flow packet rates are equal and that Bob observes only flows with fingerprints and consider two main scenarios: 1) Alice fingerprints all flows she observes; 2) Alice fingerprints a subset of the flows, and the subset is unknown to Willie. Then, we present the extensions to arbitrary flow rates as well as the case where Bob observes a set of flows in which not all flows are fingerprinted.

The remainder of the paper is organized as follows. We present the system model, definitions, and invisibility and reliability metrics employed in this paper in Section II. Then, in Sections III and IV, we present constructions and analyses for the two main fingerprinting scenarios. In Section V, we present the extensions of the main scenarios to arbitrary flow rates, and in Section VI, we present the extensions of the main scenarios to the case where Bob observes flows with and without fingerprints. Section VII discusses the results, and Section VIII discusses future work. We conclude in Section IX.

Ii System Model, Definitions, and Metrics

Ii-a System Model

We consider a set of flows between pairs of input and output links. We assume the links are known but not the pairings. Also present are two parties Alice and Bob whose goal is to identify some or all of the pairings by fingerprinting, without a third party, Willie, detecting this identification. Moreover, Alice and Bob wish to do so within the time interval . Alice, Bob, and Willie know that all packet timings are governed by Poisson processes and they the rate of each flow that they observe.

Alice has access to a subset of the input links where each link conveys a packet flow . She is allowed to buffer packets and release them from her buffer but no other operations (e.g., inserting packets, changing packet ordering). Willie is located between Alice and the network, and he watchfully observes all of the input links accessed by Alice () to detect whether or not Alice is fingerprinting flows (see Fig. 1). Willie is able to verify the sources and the order of the packets. Therefore, if Alice inserts a packet of her own or re-orders the packets on any of the links to transmit information to Bob, Willie will detect her immediately. Bob observes a subset of the output links where each link conveys a packet flow . He is only allowed to observe the time of the arrival of each of the packets in each flow. Bob and Willie cannot manipulate the flows (e.g., change packet timings, remove packets, insert packets, change packet ordering).

Prior to fingerprinting, Alice generates a codebook of fingerprints and shares it with Bob. The codebook is secret, and thus Willie does not have access to it. On the other side of the network, Bob uses the codebook to extract the fingerprints and identify the flows.

Each fingerprint (codeword) of the codebook corresponds to a sequence of inter-packet delays, which plays the role of a unique flow identifier. Alice embeds a unique fingerprint in each flow, i.e., she buffers packets of each flow and releases them according to timings associated with a fingerprint. We denote by the set of flows with fingerprints. In general, not every fingerprinted flow is observed by Bob. However, since our goal is to calculate the maximum number of flows that can be traced by Alice and Bob, we assume Bob observes all fingerprinted flows, i.e., .

As Willie is only able to read the channel, he cannot change packet timings; however, packet timings change after they pass through the network. Nevertheless, we present a construction where Bob can successfully identify the flows.

We model the network as parallel First In First Out (FIFO) queues with exponential service times ( queues). We consider two settings for the network:

  1. Setting 1: each queue is shared by the flow Alice and Bob are monitoring, which we refer to it as “main flow”, and other flows independent of the main flow, which we refer to them as “interfering flows”. (see Fig. 0(a)).

  2. Setting 2: each queue conveys just the flow Alice and Bob are monitoring (see Fig. 0(b)).

Denote by the queue, and by , , and the service rate, the input rate, and the sum of the rates of the interfering flows at , respectively. We term the effective service rate [25] of and we assume Alice knows the effective service time of all queues . The queues are stable, i.e., .

First, we consider Setting 1 (shown in Fig. 0(a)). Assuming the flow rates of the flows observed by Alice and Bob are the same () and that Bob observes only the set of fingerprinted flows (), we present two scenarios:

  • Scenario 1 (analyzed in Section III): Alice fingerprints all flows to which she has access ().

  • Scenario 2 (analyzed in Section IV): Alice fingerprints a subset of the flows to which she has access ().

Then, considering the same setting for the network (Setting 1 shown in Fig. 0(a)), we present Scenarios 3 and 4 which are extensions of Scenarios 1 and 2, respectively, to the case that flow rates are arbitrary. Scenarios 3 and 4 are analyzed in Sections V-A and V-B, respectively. Next, we consider Setting 2 (shown in Fig. 0(b)) and present Scenarios 5 and 6, which are extensions of Scenarios 1 and 2, respectively, to the case that Bob observes fingerprinted flows as well as other flows that are not fingerprinted (). If Bob observes a flow that is not fingerprinted, the flow can be either coming from Alice () or other inputs of the network (). Scenarios 5 and 6 are analyzed in Sections VI-A and VI-B, respectively. We show that in each scenario Alice can fingerprint the flows invisible to Willie but distinguishable by Bob. In addition, we determine the number of flows that Alice and Bob can invisibly and reliably trace by fingerprinting.

Next, we present definitions and describe invisibility and reliability metrics.

Ii-B Definitions

Willie uses hypothesis testing to detect whether Alice is fingerprinting:

  • : Alice is not fingerprinting.

  • : Alice is fingerprinting.

Denote by

the probability of rejecting

when it is true (type I error or false alarm), and by

the probability of rejecting

when it is true (type II error or missed detection). To give more power to Willie, we assume he knows the probability that Alice is fingerprinting,

.

Similar to the definition of covertness [26, 27, 28, 29, 30], we define invisibility[1]:

Definition 1.

(Invisibility) Alice’s fingerprinting is invisible (covert) if and only if she can lower bound Willie’s probability of error, , by for any , as . We term the invisibility parameter.

Definition 2.

(Reliability) Alice’s fingerprinting is reliable if and only if for any and any flow, the probability of the failure event satisfies as . We term the reliability parameter. For a flow with a fingerprint the failure event occurs when one of the following events occurs:

  • Alice cannot successfully fingerprint the flow since she does not have a packet available to release when she needs one. We denote by the probability of this event.

  • Bob cannot extract the fingerprint successfully. We denote by the probability of this event.

For a flow without a fingerprint, the failure event occurs when Bob detects a fingerprint. We denote by the probability of this event.

Definition 3.

(Lambert-W function) The Lambert-W function is the inverse function of .

We present results under the assumption that . We show in Appendix IX that this results in invisibility for the general case where . In this paper, we use standard Big-O, Little-o, Big-Omega, little-omega, and Big-Theta notations [31, Ch. 3].

Iii Scenario 1: All flows are fingerprinted, Setting 1

Consider Scenario 1: Alice fingerprints all flows she observes (), and Bob observes only the fingerprinted flows (). All of flow rates are equal (). We consider Setting 1 (see Fig. 0(a)), i.e., parallel queues where each queue is shared by a fingerprinted flow and other interfering flows independent of the fingerprinted flow. Alice fingerprints the input flows during time interval , and Bob extracts the fingerprints from the flows on the output links of the network to infer the connections between input and output flows.

Alice buffers packets and releases them according to a fingerprint. She uses a secret codebook where each codeword (fingerprint) is a unique flow identifier consisting of a sequence of inter-packet delays. Because the timings of packets that Alice receives as well as the codewords are random, Alice will face a causality problem: the need to send a packet before she receives it. We give an example of when Alice cannot successfully fingerprint a flow in Fig. 2.

Consider a flow and assume the inter-arrival times of this flow before Alice makes any changes are . Also assume Alice selects a fingerprint from her codebook. Note that the inter-arrival time between the first and second packets of the flow is but Alice has to alter the packet timings of the flow to achieve an inter-arrival of between the first and the second packets. In other words, she has to send the second packet before she receives it.

[width=/2,height=keepaspectratio]fig3.png

Fig. 2: The packet timings of the flow received by Alice and the packet timings suggested by the selected fingerprint are and , respectively. Alice faces a causality problem when she needs to send the second packet since she has to send it before she receives it.

To account for this, prior to fingerprinting, Alice invisibly slows down the flow in order to buffer packets [28, Section IV]. This ensures she will have a packet in her buffer to transmit at the appropriate times and can fingerprint the flow successfully.

We calculate the number of flows that Alice and Bob can trace by fingerprinting using this scheme, asymptotically as a function of .

Theorem 1.

Consider Setting 1 (see Fig. 0(a)). If Alice fingerprints all input flows () whose rates are equal () and Bob only observes fingerprinted flows (), then Alice and Bob can invisibly and reliably trace flows in a time interval of length .

Construction: Per above, Alice uses a scheme consisting of two phases of lengths and , and employs a codebook of fingerprints to embed in the flows. The codebook construction is similar to the one adopted in [1, 28, 29]. In particular, Alice generates independent instantiations of a Poisson process with parameter , where is the length of the second phase, as follows. To generate the codeword (), first a number

is generated according to a Poisson distribution with mean

, and then points are distributed randomly and uniformly in a time interval of length  [32] (see Fig. 4). Therefore, the codebook contains fingerprints (codewords) . Alice selects a fingerprint for each flow and applies the inter-packet delays of the chosen fingerprint to the packets of the flow. The codebook is shared with Bob, not know to Willie.

Alice divides the time interval of length into two phases (see Fig. 3):

  • Phase 1 (buffering phase) of length : Alice slows each flow from rate to rate to buffer packets, i.e., if she receives a packet at time , she transmits it at time . This allows her to build up a backlog of packets in her buffer which ensures that she will be able to fingerprint each flow during the next phase successfully.

  • Phase 2 (fingerprinting phase) of length : for each flow, she selects a fingerprint from her codebook and then alters the packet timings of the flow according to the selected fingerprint.

[width=15cm,height=10cm,keepaspectratio]Twophased.png

Fig. 3: Alice’s divides the time interval of length into two phases: a buffering phase of length where packets of each flow are slowed down, and a fingerprinting phase of length where Alice fingerprints the flows.

The lengths of the two phases are,

(1)
(2)

where is a constant defined later, and is the number of flows to be fingerprinted.

[width=16cm,height=10cm,keepaspectratio]codebook.png

Fig. 4: Codebook generation: Alice generates a codebook whose codewords (fingerprints) specify the sequence of inter-packet delays to be embedded in the flows. Each codeword is an instantiation of a Poisson process of rate in a time interval of length

. For each codeword, first a random variable

is generated according to the Poisson distribution with parameter . Then points are placed uniformly and randomly in the time interval of length . The codebook is shared with Bob, but it is unknown to Willie.

Analysis: (Invisibility) Similar to the analysis of covertness in [28, Theorem 2], we can show that Alice’s fingerprinting is invisible. Consider the first phase. We can show that for all , Alice can slow down the flows from rate to rate , and achieve (see the proof in Appendix IX)

(3)

where is Willie’s error probability. Thus, her buffering is invisible. In the second phase, the packet timings for each flow is an instantiation of a Poisson process with rate and hence the traffic pattern is indistinguishable from the pattern that Willie expects to observe. Hence, the scheme is invisible.

(Reliability) Now, we show that Alice’s fingerprinting satisfies all of the conditions in Definition 2, and thus is reliable. Note that all flows have fingerprints. By the union bound:

(4)

Thus, to show the fingerprinting is reliable, it suffices to show that for all .

First, we show that as for each flow, i.e., Bob can successfully extract a fingerprint from each flow. Recall that Alice fingerprints all flows that she observes and Bob observes only the flows fingerprinted by Alice (). Therefore, , where denotes the cardinality of a set.

Without loss of generality, we assume that flow passes through the queue (). Denote by the capacity of for the transmission of information via packet timings. Recall that is an queue with multiple inputs and outputs. Then [25, Proposition 1]:

(5)

where is the sum of rates of the interfering flows passing through , and is the service rate of . From [32, Definition 1], the rate of the codebook is , and  [32, Definition 2], (5) implies that all transmission rates smaller than result in a decoding error probability that tends to zero as . Therefore, we require

(6)

for Bob to successfully extract the fingerprint from . Note that (6) holds for all . Hence, as long as

(7)

where

(8)

for each flow . Note that (2) implies that as . Therefore,

(9)

Next, we show that , i.e., Alice can successfully fingerprint the flows. Recall that Alice accounts for the causality problem by buffering packets before she starts fingerprinting. Since in the first phase Alice slows down the packet rate from rate to rate , on average she can buffer

packets. Consequently, we can apply the weak law of large numbers (WLLN) to show that the probability that Alice buffers more than

packets tends to one, as tends to infinity. Now, we have to answer this question: noting that Alice has packets in her buffer, what is the probability that Alice cannot successfully fingerprint ?

Because Alice receives and transmits packets on each flow according to two independent Poisson processes of rate , and the Poisson process is memoryless, we model the process as a symmetric random walk on a 1-D grid to answer this question [28]. The location of the walker corresponds to the number of packets in Alice’s buffer. The walker goes from location to when Alice receives a packet, and goes from location to when Alice transmits a packet. Denote by the probability of the event that the walker starting from the location reaches the point , at least once, during the time . Then [28, Eq. (27)]:

(10)

Since Alice fingerprints the flows in the second phase, . Recall that the probability that Alice buffers more than packets tends to one, as . Therefore, we let . By (10), the probability that Alice runs out of packets for flow satisfies:

(11)

where the equality holds since following from (1) and (2). Note that (11) is independent of (index of the flow), and holds for all flows , . Let

(12)

By (12), (11) yields

(13)

Consequently, by (4), (9), (13), for all , when and thus Alice and Bob’s fingerprinting is reliable.

(Number of flows) By (7) and (2), we require

(14)

as (). We show that we can achieve (14) as long as

(15)

Consider the following fact:

Fact 1.

For , if , where is the Lambert-W function, then .

Proof. Assume . First, we show that . From the definition of the Lambert-W function, . Therefore, . Consequently,

(16)

Since , implies that . Because is an increasing function of , , and the proof is complete.

Next, for both cases and we show that , which implies (14). Consider . Note that (15) implies . Therefore, Fact 1 yields:

Since ,

Now, consider . Note that (15) implies that , which implies:

(17)

By (17) and Fact 1,

where the last inequality follows from . Consequently, (15) satisfies (14). Since for , , Alice and Bob can invisibly and reliably pair the end points of every flow, and thus break the anonymity of a network (Setting 1 shown in Fig. 0(a)) with flows.

Iv Scenario 2: Alice fingerprint a subset of the flows, Setting 1

In Scenario 1, Willie is certain that if is true, i.e., Alice fingerprints, then all flows are slowed down in the first phase. In Scenario 2, we add uncertainty to Willie’s knowledge under : Alice fingerprints a subset of the flows, and is unknown to Willie. Therefore, Willie has to investigate a large set of flows to detect if some are slowed down in the first phase as required for fingerprinting. We show that Willie’s uncertainty allows Alice to fingerprint more flows without being visible.

Alice fingerprints a subset of the flows she observes (). For each flow, she selects a unique fingerprint from her codebook and alters the timings of that flow according it. Similar to Scenario 1, Alice has units of time which she divides into two phases: a buffering phase of length , which ensures Alice can successfully fingerprint, and a fingerprinting phase of length . Bob, who has access to the fingerprint codebook and observes the set of fingerprinted flows (), extracts the fingerprints from the flows. The fingerprint codebook is secret and Willie does not have access to it. The network is modeled by parallel queues with each queue shared by a flow from Alice to Bob (main flow) as well as other interfering flows independent of the main flow (Setting 1 shown in Fig. 0(a)). We calculate the number of flows () that Alice can fingerprint using this scheme, asymptotically as a function of .

Theorem 2.

Consider Setting 1 (see Fig. 0(a)). In a set containing flows with equal rates (), if Bob observes only the fingerprinted flows (), Alice and Bob can invisibly and reliably trace flows in a time interval of length , where

(18)

is given in (8), and arbitrary constants .

A more accurate characterization of with respect to is presented in (32) in the proof below.

Construction: The construction is similar to that of Scenario 1 except that Alice fingerprints a subset of the flows that she observes. Recall that all of flows observed by Bob are also observed by Alice (). Alice knows which set of her flows will be observed by Bob, and chooses them for fingerprinting (). Note that Willie does not know which subset of is . Alice generates a codebook of fingerprints (similar to Scenario 1) and shares it with Bob prior to fingerprinting, where is given in (18). Recall that we calculate the maximum number of flows that Alice and Bob can trace; therefore, we only consider the case which can be extended to trivially.

Alice’s scheme consists of two phases, a buffering phase of length , and a fingerprinting phase of length , where

(19)
(20)
(21)

Recall that and are given in (12) and (8), respectively, and is the invisibility parameter. Alice generates fingerprints for her codebook analogous to Scenario 1. The number of fingerprints in her codebook is .

Analysis: (Invisibility) For each phase, we show that all operations Alice performs on the flows are invisible. Consider the first phase where Alice slows down each flow from rate to rate with

(22)

From Willie’s perspective, the number packets in time is a sufficient statistic to detect Alice [28]. If Alice does not fingerprint (

), then the joint probability density function (pdf) of Willie’s observations is

where is the pdf of a Poisson random variable with mean . Note that Willie knows that out of flows observed by Alice is selected to be fingerprinted, but he does not know which set is selected. Therefore, from Willie’s point of view, if Alice chooses to fingerprint flows (), then each flow will contain a fingerprint with probability

(23)

Thus, the joint pdfs of Willie’s observations when Alice fingerprints () is

where is the change in flow rate. Note that the change of rate differs from the one in Scenario 1. Suppose that Willie applies an optimal hypothesis test to minimize his probability of error . Then, we can obtain a lower bound on his probability of error[soltani2017covert, Eq.1]:

(24)

where

is the Kullback–Leibler divergence (relative entropy) between

and .

Alice’s scheme is invisible as long as she can make Willie’s detector operate as close as desired to the detector that disregards Willie’s observations and results in (see Definition 1). Next, we show that for arbitrary , , and then we apply (24) to prove Alice’s buffering is invisible. Observe:

(25)
(26)

where

follows from the chain rule for relative entropy 

[33, Eq. (2.67)], denotes expected value with respect to the pdf , follows from the definition of the Kullback–Leibler divergence, is true since , is true since

and follows from substituting the values of , , and given in (22), (23), and (19) respectively. By (24) and (25),

(27)

as , and thus Alice’s buffering is invisible.

The second phase is invisible because the fingerprints are samples of Poisson processes with rate . Combined with the invisibility of the first phase, Alice and Bob’s scheme is invisible.

(Reliability) The analysis is similar to that of Scenario 1. Since all flows observed by Bob are fingerprinted (), to show Alice and Bob’s scheme is reliable, it suffices to show that for each flow for all .

Upper bounding is similar to that of Scenario 1. The first difference is in the number of packets that Alice can buffer from each flow in the first phase. Here, since Alice slows down each flow from rate to , where is given in (22), the probability that Alice can buffer more than packets in the second phase tends to one as . Therefore, letting and in (10) yields:

(28)

The second difference in the analysis of is due to differences in the expressions for and . By (19) and (20), . Therefore, (28) yields:

(29)

where the last step is true since . By (12),

(30)

Now, consider Bob’s decoding error for each flow, . By (19) and (20), as . In order for Bob to be able to successfully extract the fingerprint from each flow, we require

(31)

as (). Substituting from (20) and re-arranging yields:

(32)

We show in Appendix IX that (32) holds asymptotically as , given the value of provided in (18).

Consequently,

(33)

By (4), (30), and (33), as . Thus, if , Alice can invisibly and reliably fingerprint flows in a time interval of length , and Bob can successfully extract the fingerprints, where is given in (8), and if , Alice can invisibly and reliably fingerprint all flows in a time interval of length , and Bob can successfully extract the fingerprints. ∎

In Scenario 2, we assumed that all flows observed by Bob are also observed by Alice and chosen for fingerprinting (). Although this is applicable in many schemes, we present results for the case where this assumption is relaxed in Section VI, i.e., Bob observes flows with and without fingerprints.

V Extension to arbitrary rates

In this section, we extend Theorems 1 and 2 to the case that the flow rates are arbitrary.

V-a Scenario 3: All flows are fingerprinted and flow rates are arbitrary, Setting 1

Consider Scenario 3, which is the extension of Scenario 1 to arbitrary rates: Alice fingerprints all of the flows she observes (), and Bob observes only the fingerprinted flows (). We consider Setting 1 (see Fig. 0(a)), i.e., parallel queues with multiple inputs and outputs, where each queue is shared between a flow from Alice to Bob (main flow) as well as other interfering flows independent of the main flow. Here the flows rates can be arbitrary, and the main flow passing through the queue () has the rate of . Alice fingerprints the input flows of the network in the time interval , and Bob extracts the fingerprints from the flows on the output links of the network to infer the connections between input and output flows.

Similar to Scenario 1, for each flow Alice selects a codeword (fingerprint) from her codebook and embeds it in the flow by changing the packet timings of the flow. She builds her codebook based on the minimum rate of the flows , and to embed a fingerprint (of rate ) in a flow of rate , she scales the fingerprint by a factor of to obtain a modified fingerprint of rate , and then embeds it in the flow. In addition, she uses a two-phase (buffering-fingerprinting) scheme similar to those of Scenarios 1 and 2.

We calculate the number of flows () that Alice and Bob can trace by fingerprinting using this scheme, asymptotically as a function of .

Theorem 3.1.

Consider Setting 1 (see Fig. 0(a)). If Alice fingerprints all input flows () whose rates are arbitrary and Bob observes only the set of fingerprinted flows (), then Alice and Bob can invisibly and reliably trace flows in a time interval of length .

Construction: Per above, Alice employs a two-phase scheme: a buffering phase of length and a fingerprinting phase of length (see Fig. 3), where and are given in  (1) and (2). The codebook construction is similar to Scenario 1, but the rate of the fingerprints (codewords) is . To embed a fingerprint in a flow of rate , Alice selects a fingerprint and scales by a factor to generate a modified fingerprint of rate , . Since fingerprints are instantiations of a Poisson process of parameter (i.e., its inter-arrival times are instantiations of an exponential random variable of mean ), the modified fingerprint is an instantiation of a Poisson process of parameters . Next, Alice applies the inter-packet delays given by the modified fingerprint to each flow.

Recall that Bob knows the rate of each flow. Upon observing , the flow with packet timings and rate , Bob seeks to answer the following question:

Question 1: Given that Alice used the codebook whose fingerprints are of rate , what is the index of the fingerprint that was selected by Alice, scaled to rate , and transmitted through to produce the output packet timings ?

Analysis: (Invisibility) Similar to Scenario 1, we analyze the invisibility of the first and second phases separately. In the first phase, Alice slows down each flow of rate to rate . Using arguments similar to that of Theorem 1, we can show that [28, Theorem 2]:

where is Willie’s error probability. Thus, this phase is invisible to Willie. In the second phase, since Alice embeds a modified fingerprint of rate in a flow of rate , the traffic pattern remains Poisson with rate indistinguishable from the pattern that Willie expects to observe. Hence, the scheme is invisible.

(Reliability) Similar to the reliability analysis in Scenario 1, we upper bound by , for all .

Recall that upon observing , Bob seeks the answer to Question 1. Note that the answer to this question is the same as the answer to the following question:

Question 2: Given that Alice used the codebook what is the index of the fingerprint that was selected by Alice and transmitted through to produce the output packet timings ?

In other words, although Alice generates a codebook whose fingerprints are of rate and then scales each fingerprint to adjust to rate of the flow, Bob’s decoding of each flow is equivalent to the case where Alice uses a codebook whose fingerprints are of rate and she does not scale the fingerprints; the only differences are in the number of fingerprints (codewords) and the time to transmit the fingerprint, as we will explain later. Therefore, from (6), Bob can successfully extract the fingerprint from the flow of rate as long as is large and

(34)

where is the time of the transmission of the fingerprint embedded in the flow of rate . Therefore,

(35)

Since the size of the codebook is , fingerprinting the flow corresponds to transmission of nats of information through the inter-packet delays of the flow . Note that scaling a fingerprint of rate to rate results in transmission of nats of information at a higher rate but a shorter time.

Since (35) holds for all , we require

(36)

where

(37)

to achieve for each flow. Note that (2) implies that as . Therefore,

(38)

Now, consider . In the second phase, on each link Alice receives and transmits the packets according to two independent Poisson processes of equal rate. Thus, we employ a random walk analysis similar to that of Scenario 1 to show that

(39)

Consequently, by (4), (38) and (39), for all , and thus Alice and Bob’s fingerprinting is reliable.

(Number of flows) The analysis is similar to that of Scenario 1. As , we require

(40)

which we can achieve as long as

(41)

Since for , , Alice and Bob can invisibly and reliably break the anonymity of a network (Setting 1 shown in Fig. 0(a)) with flows. ∎

V-B Scenario 4: Alice fingerprints a subset of the flows, Setting 1

Consider Scenario 4, which is the extension of Scenario 2 to arbitrary rates: Alice fingerprints a subset of the flows, and is unknown to Willie. Similar to Scenario 2, since Willie has to investigate a large set of flows to detect if some are slowed down in the first phase as required for fingerprinting, Alice can make more fingerprinted flows invisible.

For each flow in , she selects a unique fingerprint from her codebook and alters the timings of that flow according to the fingerprint. We consider Setting 1 (see Fig. 0(a)), i.e., parallel queues with multiple inputs and outputs, where each queue is shared between a flow from Alice to Bob (main flow) as well as other interfering flows independent of the main flow. Flows rates are , which can be arbitrary, and the main flow passing through the queue () has the rate of . Alice fingerprints the input flows of the network in the time interval , and Bob extracts the fingerprints from the flows on the output links of the network to infer the connections between input and output flows.

For each selected flow Alice selects a codeword from her codebook and embeds it in the flow by changing its packet timings according to the selected fingerprint. Since flow rates are arbitrary, similar to Scenario 3, she builds her codebook based on the minimum rate of the flows to be fingerprinted and scales each fingerprint based on the rate of the flow to be fingerprinted. Also, she uses a two-phase (buffering-fingerprinting) scheme.

We calculate the number of flows () in which Alice fingerprints using this scheme, asymptotically as a function of .

Theorem 3.2.

Consider Setting 1 (see Fig. 0(a)). In a set containing flows with rates , if Bob observes only the fingerprinted flows (), Alice and Bob can invisibly and reliably trace flows in a time interval of length , where is given in (18), where is replaced with which is given in (37).

The construction and analysis follow from those of Scenarios 2 with modifications due to arbitrary rates. The extension to arbitrary rates follows from that of Scenario 3. ∎

Vi Mixing flows with and without fingerprints

We have previously assumed that Bob only observes the set of fingerprinted flows, i.e., . But, in practice Bob might observe a set of flows in which some of the flows are not fingerprinted, and therefore, he must be able to detect if a flow contains a fingerprint. In this Section, we consider Setting 2 (see Fig. 0(b)) and we present Scenarios 5 and 6 which are extensions of Scenarios 1 and 2, respectively, to the case where Bob observes a set of flows in which some of them are not fingerprinted. We present a detector for Bob that is able to detect if a flow is fingerprinted.

Vi-a Scenario 5: All flows are fingerprinted and Bob observes flows with and without fingerprints, Setting 2

Consider Scenario 5, which is the extension of Scenario 1 to the case where Bob observes flows with and without fingerprints (): Alice fingerprints all of the flows she observes (), flow rates are equal (), and Bob observes flows with and without fingerprints. We consider Setting 2 (see Fig. 0(b)), i.e., parallel queues with single input and output. Alice fingerprints the input flows of the network in the time interval , and Bob extracts the fingerprints from the flows on the output links of the network to infer the connections between input and output flows.

In contrast to Scenarios 1-4, Bob uses a detector which is able to distinguish whether a flow is fingerprinted. We calculate the number of flows () that Alice and Bob can trace by fingerprinting using this scheme, asymptotically as a function of .

Theorem 4.1.

Consider Setting 2 (see Fig. 0(b)). If Alice fingerprints all input flows () whose rates are equal () and Bob observes a set of flows with and without fingerprints (), then Alice and Bob can invisibly and reliably trace flows in a time interval of length .

Construction: The only difference between the construction of Scenarios 1 and 5 is that, for Scenario 5, Bob must use a detector which detects if a flow contains a fingerprint.

Here, Bob’s decoder is different from the maximum likelihood decoder proposed in [32, p. 9], which for each codeword calculates the service times that yield , removes the codewords that result in negative values of service times, and finally finds a unique codeword that corresponds to the minimum sum of service times. Instead, Bob’s decoder selects a threshold , applies a function on each codeword, and finds a unique codeword that generates an output for the function that is larger than .

Next, we describe Bob’s decoder in detail [34, p. 12]. For and , if is the sequence of packet timings before the flow passes through (inter-arrival times), then the pdf of the observed packet timings (inter-departure times) is:

where is the exponential pdf with mean , and is the waiting time, the amount of time that the queue waits until it receives the packet. Since the packet timings of the fingerprinted flow is an instantiation of a Poisson process of rate , the joint pdf of the inter-arrival times is . Consequently, the pdf of is:

(42)

Bob’s decoder finds a unique fingerprint (codeword) from that satisfies ; if such a unique codeword does not exist, it outputs flow not fingerprinted.

Analysis: The analysis follows from that of Scenario 1. The only differences appear in the analysis of Bob’s decoding error probability. The auxiliary threshold decoder used in the analysis of the mismatched decoder in [34, p. 413-417] provides what we need for our application. If Bob uses this detector, the decoding error probability of a fingerprinted flow will be:

(43)

which implies that if we generate independent instantiations of a Poisson process of rate on a time interval of length , , we select one of them and send a packet stream whose packet timings follow over the network, then the probability that at least one satisfies tends to zero, i.e.,

(44)

Consider the case where Bob observes a flow that is not fingerprinted. Recall that the packet timings of all the flows follow a Poisson process of rate . Denote by an instantiation of a Poisson process that corresponds to the packet timings of the this flow before it passes through the network. If Bob detects a fingerprint, it must be that one of the fingerprints in the codebook resulted in . Hence,

(45)

Recalling that and are independent instantiations of a Poisson process of rate , (43) and (44) yield as . Thus, Alice and Bob’s fingerprinting is reliable. ∎

Vi-B Scenario 6: Alice fingerprints a subset of the flows and Bob observes flows with and without fingerprints, Setting 2

Consider Scenario 6, which is the extension of Scenario 2 to the case where Bob observes flows with and without fingerprints (): Alice fingerprints a subset of the flows she observes (), flow rates are equal (), and Bob observes flows with and without fingerprints. We consider Setting 2 (see Fig. 0(b)), i.e., parallel queues with single input and output. Alice fingerprints the input flows of the network in the time interval , and Bob extracts the fingerprints from the flows on the output links of the network to infer the connections between input and output flows.

Similar to Scenario 5, Bob’s detector is able to distinguish whether a flow is fingerprinted. We calculate the number of flows () that Alice and Bob can trace by fingerprinting, asymptotically as a function of .

Theorem 4.2.

Consider Setting 2 (see Fig. 0(b)). In a set containing flows with equal rates (), if Bob observes flows with and without fingerprints (), Alice and Bob can invisibly and reliably trace flows in a time interval of length , where is given in (18), where is replaced with

(46)

Note that the replacement of with