Harvesting Time-Series Data from Service-Based Systems Hosted in MANETs

09/23/2018 ∙ by Petr Novotný, et al. ∙ Association for Computing Machinery ibm 0

We are concerned with reliably harvesting data collected from service-based systems hosted on a mobile ad hoc network (MANET). More specifically, we are concerned with time-bounded and time-sensitive time-series monitoring data describing the state of the network and system. The data are harvested in order to perform an analysis, usually one that requires a global view of the data taken from distributed sites. For example, network- and application-state data are typically analysed in order to make operational and maintenance decisions. MANETs are a challenging environment in which to harvest monitoring data, due to the inherently unstable and unpredictable connectivity between nodes, and the overhead of transferring data in a wireless medium. These limitations must be overcome to support time-series analysis of perishable and time-critical data. We present an epidemic, delay tolerant, and intelligent method to efficiently and effectively transfer time-series data between the mobile nodes of MANETs. The method establishes a network-wide synchronization overlay to transfer increments of the data over intermediate nodes in periodic cycles. The data are then accessible from local stores at the nodes. We implemented the method in Java EE and present evaluation on a run-time dependence discovery method for Web Service applications hosted on MANETs, and comparison to other four methods demonstrating that our method performs significantly better in both data availability and network overhead.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Mobile ad hoc networks (MANETs) are used to establish communication in difficult environments such as search and rescue, forest-fire fighting, and war zones. The increasing sophistication of end-user devices has led to an increase in the richness and complexity of the systems deployed on MANETs, including those structured as interconnected and interdependent (micro)services, what are referred to as service-based systems Esfahani et al. (2016); Jalaparti et al. (2013); Lund et al. (2007, 2010). Our interest is in developing methods to help manage those systems in MANET deployment environments.

The foundation of any such management method is the collection of run-time data describing the “health” state of the system. In service-based systems designed for and deployed in MANETs, these data change frequently, as the system attempts to adapt to the dynamicity of the underlying network. For example, the bindings between services can dynamically change according to the status of the nodes, links, and paths.

Time-varying, local data are observed by monitors situated at the nodes of the network and captured as a time series. The time-series data are later harvested from the nodes and presented to a management element located somewhere in the network so that it can perform a global state analysis, such as fault identification Novotny et al. (2015); Sapello et al. (2010), service discovery and composition Issarny et al. (2011), or service re-placement Novotny et al. (2015); Silvestri et al. (2015). As it turns out, harvesting is an especially difficult problem in MANETs due to the inherent resource limitations of wireless devices and the fact that mobility can lead to network instabilities, asymmetries,111An asymmetric network is one in which there is a path from node to node , but no path from to . Wireless media are subject to such communication asymmetries, sometimes even between neighbors. and partitions resulting from relative node movements (e.g., going in and out of range) and link properties (e.g., directionality, interference, and noise) Khandani et al. (2008).

Figure 1: Reachability of MANET nodes in two scenarios.

To appreciate the seriousness of the problem, consider Figure 1, which is a preview of experimental results presented in this paper. The figure shows how the reachability of a node, and therefore access to the data stored on it, can degrade with time in MANET environments. The experiment examines two different mobility behaviors under the same hypothetical service-based system. One behavior is characterized by random, independent movements of nodes, while the other is characterized by collective, grouped movements. The experiment models a management element, residing at some node in the network, needing access to the monitoring data stored on a specific subset of nodes in the network. Approximately 90% of those nodes are reachable at time zero. As time progresses and nodes move about, more nodes in that subset become unreachable, meaning that less monitoring data are available to the management element. Interestingly, we can see that random node movements can cause more problems than grouped node movements.

The general problem of harvesting distributed data is, of course, not new. Many mature solutions exist in the domain of wired, stable networks, where brute-force techniques exploit the high capacity, reliable network environment (e.g., IBM’s Tivoli). For example, the data can be continuously streamed to one or more repositories, or the individual nodes can be reliably contacted to provide their data on demand.

In the dynamic, unreliable, and resource-limited domain of MANETs, however, the key issue is guaranteeing the availability of relevant and timely data whenever and wherever it is needed, even if a node whose data is of interest is unreachable. Existing mechanisms for data dissemination in MANETs either ignore the problem of network asymmetries and partitioning, assume prior knowledge of the intended destination of the data, or focus on single aggregated values rather than the time-series data required for system management tasks (see Section 2).

In general, then, the harvesting method must be sensitive to the properties of time-series data and management tasks in MANETs. The data are time bounded, meaning that only some portion of a monitor’s stored data will be relevant to a given task, and time sensitive, meaning that their utility in a task degrades over time. Moreover, only data from some subset of the monitors may be of interest to a task. Finally, network instability and dynamics mean that the location of the management node within the network topology cannot be anticipated. In fact, the monitors may be unaware of whether, which, and where their data might eventually be used.

Conceptually, our idea is to spread the data in an epidemic, delay tolerant, and intelligent fashion, trading extra communication and storage for increased availability in the presence of node reachability problems. In particular, the method is:

  • epidemic in that it uses a gossip protocol Birman (2007) to create a network-wide data transmission and replication overlay. Instead of requiring a management node to obtain data directly from a monitored node using an end-to-end path, the data are transferred to and stored at intermediate nodes from which the data are then also available. Our algorithm for selecting peers from among neighbors at each cycle in the gossip protocol is designed to use connectivity metrics and a standard random selection process to account for the dynamic MANET topology and limited connectivity.

  • delay tolerant in that it opportunistically uses whichever neighbors happen to be in range at a gossip cycle, rather than relying on a fixed topology to reach some desired (albeit unknown) end point. This is an approach to moving data across an unstable network that probabilistically guarantees that the data will be in reach of a management node.

  • intelligent in that it is sensitive to the time-bounded and time-sensitive nature of time-series data. Moreover, it minimizes the data exchanged among peers through a synchronization and aggregation algorithm that takes account of the history of past encounters and exchanges.

We have implemented our method in Java EE and carried out an extensive set of validation experiments using the US Naval Research Laboratory’s Common Open Research Emulator (CORE) Extendable Mobile Ad-hoc Network Emulator (EMANE) facilities Ahrenholz (2010). We compare the performance of our harvesting method against four data harvesting methods, through a case study: the capture of service dependence data Novotny et al. (2015). The experiments explore a range of network and system dynamics, with the primary dependent variable being the accuracy of the dependence graphs produced from the data as compared to the ground truth.

The novelty of the work presented in this paper is two fold: (i) an extension of ideas developed in the domain of epidemic protocols to solve the problem of efficiently spreading time-series data and (ii) doing so in the context of the unstable and unreliable environment of MANETs. The method is specifically designed to address the needs of service-based systems, however, in generalized form it can be applied to harvest any type of time-series data in MANETs such as data collected from IoT sensors, system monitors and other sources.

We next review related work. In Section 3 we present the distributed monitoring architecture and gossip protocol underlying our method. Our experimental setup and evaluation are detailed in Sections 4 and 5, respectively. We conclude with a summary of our observations and a look at on-going and future work in Section 6.

2 Related Work

The problem of maintaining the availability of data in a MANET environment can be seen as a special case of the more general problem of data sharing in that environment. From this point of view, there are three broad areas of related work: peer-to-peer data access, data replication, and epidemic protocols. While these areas are not orthogonal, they represent the major foci of relevant work in the MANET literature.

Peer-to-peer data access. Several techniques have been proposed to create peer-to-peer overlays that can compensate for the dynamic nature of a mobile network Heer et al. (2006); Oliveira et al. (2005). The most advanced of these techniques are distributed hash tables (DHTs), which are designed to provide fast access to distributed data in the face of “churn” in the set of peers.

A DHT primarily serves as an efficient look-up service (“index”), not as a storage service per se Shah and Qian (2011). As such, DHTs help improve the availability of data only in terms of finding where in the network the data, and possibly redundant copies of those data, can be found, but not in directly addressing what can happen if the nodes storing the data are themselves unavailable. Moreover, although designed for churn, DHTs are not well suited to environments that experience especially high rates of churn, as can be the case in MANETs.

Data replication. The goal of data replication is to increase data availability by creating consistent replicas on multiple network nodes Derhab and Badache (2009); Padmanabhan et al. (2008). Most techniques make use of information provided by or known in advance about the users of the data Bellavista et al. (2005); Hara (2003); Hara et al. (2004); Hara (2001), aiming to create replicas that are topologically near to consumers. We make the weaker assumption that such information is unknown. Similarly, caching techniques Atsan and Öznur Özkasap (2013); Fiore et al. (2011); Hara and Madria (2006); Yu et al. (2005) replicate frequently used parts of data into temporary stores topologically closer to users. However, since requests for data are infrequent, and typically only refer to small subsets of the available data, caching is ineffective.

Some replication techniques try to optimize placement so as to minimize the amount of data movement, either by limiting the locations where data can be created Pacitti et al. (1999) or by delaying and bundling updates for transfer in bulk Moon and Cho (2004). These techniques are essentially transactional in nature, viewing the data as a database that is subject to updates, whereas data in our framework is in the nature of a time series.

Only a small number of data replication techniques explicitly address the problem of disconnected nodes. Huang et al. Huang et al. (2003) and Wang and Li Wang and Li (2002) use mobility and data usage patterns to place and redistribute replicas. Hauspie et al. Hauspie et al. (2002) attempt to predict network partitions, by measuring the quality of links and end-to-end paths, in order to decide where best to place replicas. Although these techniques do indeed increase data availability in partitioned networks, they introduce substantial additional network overhead to test links and discover paths.

Epidemic protocols. An epidemic protocol (and the related technique of gossiping) is a distributed algorithm that uses periodic interactions between nodes to propagate data through a network Birman (2007); Dimakis et al. (2010); Friedman et al. (2007); Reina et al. (2015). Particularly relevant to the problem addressed in this paper is that the protocols do not require reliable communication links. The peers (or neighbors) engaged in the protocol are usually chosen randomly. In the context of MANETs, epidemic protocols are typically used in conjunction with in-network computations, such as distributed signal processing Rabbat and Nowak (2004); Yu et al. (2004), to decrease the overall network load by propagating only aggregate data values. Our method makes heavy use of epidemic protocols and in-network computations, where those computations are instead meant to intelligently synchronize the time-series data traversing common network paths rather than perform simple data aggregations.

3 The Harvesting Method

We now introduce our harvesting method by first describing the basic monitoring architecture and then detailing how the data are disseminated through the network using a gossip protocol.

3.1 Monitoring architecture

At each (participating) node in the network, the harvesting method makes use of two kinds of components and a local store. One component is a monitor, situated at a node, responsible for gathering time-series data about the local state of the network and hosted applications at that node. Monitors store the time-series data according to time slots of some length and use a sliding expiration window such that only a limited history is maintained. A time slot represents either a single measurement or an aggregate series of measurements taken within the period of time represented by the time slot. The time series thus consists of a continuous sequence of time slots. What specific data are collected and how monitors manage to gain access to those data are not of concern here, as this is largely a domain-specific problem solvable in a variety of well-studied ways; in the case study of Section 4.1 we describe one particular mechanism and architecture.

Monitoring data are disseminated through the network using the second kind of component, synchronization agents, also situated at each node of the network. As their name implies, the agents are responsible for synchronizing the data shared with peer agents. In particular, the agent at a node passes local monitoring data, together with data received from other agents, to the node’s peers. Importantly, the agent also uses the local store to maintain a backup (i.e., history) of data received from other agents. In this way, the agents collectively form a network-wide data dissemination overlay.

The ultimate consumer of monitoring data is an analysis element supporting a network and/or service management task of some sort. We assume that the analysis element is itself located at a (mobile or fixed) node in the network. In fact, there may be several such analysis elements located across the network, but we do not assume that the location of an analysis element is a globally known property. Notice, however, that since the dissemination overlay formed by the synchronization agents includes the node hosting the analysis element, a query to a local store by the analysis element is all that is required to retrieve network-wide monitoring data.

3.2 Gossip protocol

Synchronization agents employ a gossip protocol to proactively disseminate the monitoring data throughout the network. Unlike previous gossip protocols, which are used mainly to compute and propagate aggregate values Birman (2007); Dimakis et al. (2010), our method is designed specifically to deal with time-series data whose utility is both time bounded and time sensitive (Section 1).

At a high level, each agent repeatedly performs the following three steps in cycles (Figure 2):

  1. Select peers. The first step is to select the peers with which to synchronize. In order to maintain the simplicity, yet efficiency, of data synchronization under network mobility and wireless link unreliability, we use a random selection of the peers from candidates within a certain network distance.

  2. Determine transfer datasets. For each peer, a dataset to be sent is calculated based on changes in the data since the last successful synchronization with that peer. Two criteria are used to determine what dataset to send to each peer: data completeness and data freshness.

  3. Transfer dataset. The dataset for each peer is placed into a space-efficient data structure and sent to that peer. Since network links are unreliable, the agent records whether the transfer was successful by updating a synchronization timestamp associated with that peer.

We now describe each of these steps in detail.

Figure 2: The gossip cycle.

Select peers. The goal of peer selection is to ensure an even dispersion of the data across the network in a simple yet effective manner. We use a random, memory-less peer selection process, in which the peers are selected regardless of which and when the data were previously sent Jelasity et al. (2007), since it is impractical to maintain a structured overlay topology across a mobile network. Friedman et al. Friedman et al. (2007) demonstrated that this strategy is effective for information dissemination in MANETs.

Using this process, each agent creates a candidate set of peers. The candidate set is chosen based on a connectivity metric

such that it contains only those with a high probability of successful data transfer. It is well documented that as the hop distance between two nodes increases in a MANET, the packet delivery ratio between those nodes decreases dramatically 

De Couto et al. (2005). Hence, we use the hop distance metric, which can be conveniently obtained from the local routing table, as the selection criterion. Specifically, a peer is added to the candidate set if the hop distance is within a certain threshold value (i.e., upper-limit parameter). A value of “1” is a neighbor in the network. Once the candidate set is established, a random subset of nodes in the set is selected as the actual peers to be sent the data. The number of selected peers is bounded by a configuration parameter, as is the hop-distance upper-limit.

An illustrative example of the peer selection mechanism is shown in Figure 3. In this example, the distance threshold for nodes to be included in the candidate set is set to 1 hop. The upper bound of the number of peers is set to 2.

In the first cycle, shown in Figure 3a, the candidate set will contain all nodes directly reachable from the center node. The direct reachability is shown with the dashed circle. The candidate set will thus contain nodes {1,2,3,4}. From this candidate set two peers {1,2} are randomly selected. In the next gossip cycle, shown in Figure 3b, the positions of the nodes have changed and now the candidate set contains nodes {2,3,7,5,8} and the randomly selected peers are {3,8}.

(a) (b)
Figure 3: Peer selections in two gossip cycles.

Determine transfer dataset. The data stored locally by a synchronization agent are conceptually organized into a table (Figure 4), with columns for each time series. The rows contain values for each time slot in the series. Once the peers are selected at the beginning of the gossip cycle, each agent determines the dataset to transfer to each peer individually. The goal of this process is to propagate the data as efficiently as possible, as well as to keep the data fresh. To do this, for each peer , and for each time series , the source agent maintains (and updates) the most recent time slot number, denoted by , in that was successfully transferred to . It then uses the following criteria to determine which time slots of each time series to send to each peer:

  1. Only the incremental changes in each time series since the last successful synchronization to the peer are sent, i.e., only data in time slots in time series  are sent to peer .

  2. Time series data have an aging limit, i.e., only the values in time slots no older than a maximum limit are transferred. This aging limit is a system configuration parameter, ensuring that obsolete data are not included in the synchronization process. Note, however, that eliminating data from a synchronization does not necessarily mean those data are dropped completely from the harvest, since agents could have received those data from other agents via other paths in the synchronization overlay.

  3. To reduce communication overhead, certain empty time slots, which indicate the absence of monitoring data, are not transferred to peer . In particular, those that would appear at the beginning or end of the dataset are dropped, while those appearing between values are kept. We do this to simplify how a time series can be reconstructed at a peer. Of course, more sophisticated compression techniques, both lossy and loss-less, could be considered, depending on how the monitoring data are to be used.

To enable the correct reconstruction of a time series, two additional pieces of meta information are also transferred along with the values of each time series: (i) unique identifying information for each time series, defined and provided by the monitors that generate the time series, and (ii) the timestamp of the last (newest) time slot included in the dataset.

Figure 4: Selection of time slots to include in a transfer dataset, sent from the agent to the peer. The transfer dataset (light shade) is extracted from the original data locally stored on the node of the source agent.

Figure 4 provides an example of the time-slot selection process at a given node as it attempts to synchronize with a peer. The figure shows five time series, D1–D5. The symbol “X” represents the presence of some data value in a slot, the dark shading indicates time slots that were successfully sent to the peer in previous gossip cycles, and the timings of the current and previous gossip cycles are shown as horizontal lines.

The light shading indicates the time slots selected for transfer in the current gossip cycle according to the criteria described above. For time series D1, three time slots will be transferred; the first and last are included because they contain some values, with an empty time slot in between. For D2, five time slots will be transferred, but in this case some slots older than the last gossip cycle are also included. These older slots were either received from some other node after the last gossip cycle or were not successfully transferred in a previous cycle.

Notice that the trailing (i.e., the most recent) empty slot is not included in the transfer dataset. Similarly only two time slots are included in the transfer dataset for D3, for which only those time slots older than the last gossip cycle are not empty. Finally, no time slot will be transferred for either D4 or D5: no new value is present in D4 since the last successful synchronization, and the time slot of new values in D5, while it was not previously transferred, is older than the age limit.

Transfer dataset. After the peer and transfer dataset have been determined, the dataset is sent to that peer. However, since the network links can be unreliable, the agent waits some predetermined timeout period for confirmation of successful receipt. Only once receipt is confirmed does the agent advance the timestamp of the last successfully synchronized time slot for that peer.

If, due to a network or other failure, the dataset transfer cannot be completed or confirmed, the agent will abandon the synchronization process with that peer in the current cycle. The confirmation timeout period is a system parameter, obviously no longer than the time between two successive gossip cycles, as this may lead to a race condition in which the agent repeatedly attempts to send data that might already have been received.

Note that the length of the time slots used by the gossip protocol may not correspond to the length of the time slots used by the monitors. Thus, the data maintained in the time slots by the monitors may be aggregated and/or split in order to fit into the transfer time slots. The aggregation of the data in the time slots used for transfer impacts the overhead imposed on the network. This impact is explored in Section 5.

Upon receipt of a dataset, an agent will reconstruct the transferred portion of the time series (including any empty slots) and append it to the time-series data in its local store. The data in the local store are retained for a limited period of time, with obsolete data pruned regularly to avoid extraneous use of resources.

4 Evaluation Methodology

Our purpose in the evaluation is to understand the performance of the harvesting method under various conditions. In particular, we are interested in how well the method can improve the reachability (i.e., availability) of monitoring data in the face of network dynamics.

In this section we describe the method we use to evaluate our approach. The method is based on a case study in which we experiment with the problem of discovering service dependencies. After describing the case study, we detail the tools, metrics, and scenarios used to conduct the experiments.

4.1 Case study: service dependence data

In previous work, we developed a technique for discovering dynamic dependencies among the distributed components of MANET-hosted applications that are structured as assemblies of (micro)services Novotny et al. (2015). The technique suffered from the problem that it assumes all nodes of interest are reachable, on demand, from the node where the dependence analysis is to be carried out. In fact, it is a common occurrence that not all nodes are reachable, which significantly reduced the effectiveness of the technique and inspired the design of our new harvesting method. In this case study, we evaluate how well the new method can improve the availability of the monitoring data and, thereby, the effectiveness of the dependence discovery technique.

In service-based systems, a dependence is a relation between services defined by the message flow, called a conversation, induced by a client request and normally ending with a response to that request. (A dependence is also the relation between a client and a service. Without loss of generality, we mainly focus here on relations among services.) When a dependence relation exists between two services, one is considered the source and the other the target. In general, sources issue requests (i.e., method calls) on targets, thus defining a directionality to the dependence. Targets are expected to provide replies (i.e., response messages) back to sources.

A dependence graph (DG) captures the run-time dependencies among services and is the output of a dependence discovery analysis tool. A DG can be used to represent the full set of dependence relations in the system, or can be restricted to a subset of those relations. Figure 5 depicts a simple example of several DGs rooted at clients. Highlighted in the figure are Client 2 and the service instances it employs, both directly (Service 2) and transitively (Services 4, 5, and 8). A DG can be combined with network and service failure data to perform global fault identification tasks Novotny et al. (2012); Tati et al. (2013). For example, we can probabilistically identify the root cause of a failed conversation.

To cope with network topology changes, service-based systems deployed in MANETs make use of dynamic service binding mechanisms Mian et al. (2009). This leads to time-varying dependencies, which in turn are represented as a time series of data points, each giving a snapshot of the dependencies at a given instant. For meaningful use in dependence discovery, relevant and timely dependence data must be available to the analysis tool.

Figure 5: Service dependence graphs.

Consistent with the generic monitoring architecture (Section 3.1), the dependence data are gathered by local monitors. In this case, the monitors are deployed within service containers to observe service-level message traffic (Figure 6). The dependence data for a particular client conversation, covering a specific time period, is provided for the benefit of a dependence discovery analysis element that then produces a corresponding DG. We assume that the dependence discovery element is a component that can be hosted in any arbitrary node or nodes (mobile or fixed) in the MANET, but that its specific location or locations is not globally known.

Figure 6: Monitoring and harvesting dependence data. Note that only two of possibly many nodes are shown.

Each time series represents the time-varying dependencies between a source and target, in which entries for each time slot are Boolean data about whether or not the given dependence occurred within that time slot. When the monitor detects the occurrence of a dependence, it signifies this by setting a 1-bit flag in the corresponding time slot. It also records identifying information about the source and target of the dependence. The set of time slots thus represents an aggregated time series of dependencies. The set of relevant time slots shifts as new time slots are added and obsolete ones removed, reflecting the changes in dependencies.

4.2 Experimental setup

As mentioned in Section 1, we conduct our experiments in the widely used CORE222http://cs.itd.nrl.navy.mil/work/core/ network and EMANE333http://cs.itd.nrl.navy.mil/work/emane/ mobility emulator frameworks. CORE provides a network experimentation environment using the container-based virtualization facility of the Linux platform, while EMANE provides real-time modeling of wireless-link and physical-layer connectivity. This combination provides high-fidelity, real-time emulation. A detailed description of the evaluation stack is provided elsewhere Novotny and Wolf (2016).

The application running within the MANET is built as a generic Web service system based on Java EE Glassfish,444https://glassfish.java.net/ the reference implementation of Java EE’s application platform, and Glassfish Metro,555https://metro.java.net/ a reference implementation of a standard Java Web services stack. The system is composed of two kinds of configurable components, a generic client application and a generic Web service, structured as a 2-tiered system. The first tier consists of client-facing, “front-end” services, while the second tier consists of interconnected “back-end” services. In our experiments, we use 50 clients, five front-end services, and 20 back-end services. When starting a conversation, each client invokes a method selected uniformly at random from all methods provided by the front-end services. The invocation triggers a cascade of message exchanges between the interconnected back-end services yielding a network traffic.

Monitors are implemented as Tubes in the Metro framework. Tubes are chained components, each responsible for part of the processing of incoming and outgoing service messages. Monitors intercept incoming and outgoing messages of the clients and services to extract the necessary information. The monitor extracts the dependence fields from the intercepted messages and records occurrences of dependencies.

Synchronization agents are implemented as Java applications that repeatedly synchronize with other agents, receive data from remote agents, and maintain in-memory backup stores of the received data. For purposes of evaluation, synchronization agents also record information about their activities in a trace file for later performance analysis (e.g., measurements of network overheads, success rates of synchronization attempts, and the like).

Finally, dependence discovery analysis elements, implemented as a library, are used by clients to discover DGs for the conversations they initiate (i.e., a series of message exchanges), by querying the dependence information found in their local stores as populated by our harvesting method.

Number of nodes 50
Mobility speed 3 - 6.6 km/h
WiFi standard 802.11b
WiFi unicast rate 11Mbps
WiFi multicast rate 1Mbps
Transmit power -15 dBm
Path loss mode 2ray
Routing protocol OLSR
Protocol stack TCP/IPv4
Table 1: Network-layer parameters.
Number of clients 50 (one per node)
Number of services 25
Size of dependence graph 4
Invokable methods per service 2
Workload (client request rate) 30s
Number of service replicas 5
Response timeout 60s
Table 2: Service-layer parameters.
Monitor time slot length 0.1s
Transfer dataset time slot length 0.1s
Transfer dataset response time 60s
Maximum age limit of time slot 300s
Maximum peer distance 1 hop
Number of gossip cycles 0-32
Number of peers 1-10
Table 3: Gossip protocol parameters.

Each (virtual) CORE node runs a client and/or a service, along with a monitor, a synchronization agent, and an analysis element. (The analysis element would not normally be deployed at all nodes, but we do so here to give us maximum flexibility in evaluating the dissemination of monitoring data.) The DGs are constructed on demand by the discovery element. The graphs are rooted at a given client, beginning at a given time instant, and for some time window. Each DG is constructed for a particular conversation; thus, the time window begins and ends with the start and end of that conversation.

In our experiments, one of the clients is chosen as the component requiring the dependence data, such as when it needs to perform a fault analysis as the initiator of a failed conversation Novotny et al. (2012). This reflects a realistic use case in which the location of the analysis element cannot be known a priori.

To observe the effects of network dynamics, we use two different node mobility patterns as scenarios. The first one, “military”, consists of a unit of 50 members, each carrying a mobile device. The members are collected into several subunits, each of which moves as a whole within an area of 2km x 2km, exhibiting a group mobility driven by the nomadic community mobility model Roy (2011). The second, “firefighting”, represents a pattern of independent mobility of 50 members in a 1km x 2km area, driven by the random waypoint mobility model Roy (2011). Other network and service parameters used in the experiments are summarized in tables 3 and 3. Most notably, the mobile nodes move at a walking-like speed in the range of 3 to 6.6km/h. We use WiFi standard 802.11b and OLSR as a routing protocol.

The configuration of the gossip protocol for our experiments is summarized in Table 3. The lengths of the monitor and transfer dataset time slots are both set to 0.1 seconds, except as part of the last experiment presented in Section 5 in which we vary the transfer time slot from 0.1 to 10 seconds. The resolution of 0.1 seconds reflects the need for high precision of the captured dependencies necessary to construct accurate DG. The monitor waits at most 60 seconds for confirmation of successful receipt of the dataset. The maximum age of a time slot to be transferred is limited to five minutes and the selection of candidate peers is limited to network neighbors (i.e., 1-hop distant). The number of gossip cycles within a given time period determines the frequency at which each agent synchronizes the data in its local store with that of its peers. The final parameter, number of peers, allows us to vary the maximum number of other agents with which each agent shares the data in its local store during each gossip cycle.

We collect our results from 40 minutes of execution after excluding 10 minutes of warm up. Each combination of parameters results in thousands of conversations during the 40-minute execution. The results given in the next section are averages over the data collected from these conversations, where each conversation is then a statistical sample subject to the random variables.

The primary evaluation question for our harvesting method is how well the dependence data of each conversation  has been propagated through the network after a certain period of time. More specifically, we measure the quality of the harvest in terms of the ratio of true positives (TP) in the dependence analysis result, defined as follows Novotny et al. (2015); Sapello et al. (2010):

where is the set of discovered dependencies, is the set of ground-truth dependencies, and true positives are in the intersection of these two sets (we assume and are non-empty). A good result for our method would be that it can transfer as many dependencies of as possible, while not decreasing data resolution due to aggregation.

The secondary evaluation question is the network overhead imposed by our harvesting method. We measure the overhead of a network node as an average of sum of all data sent and received by the synchronization agent hosted on the node within a unit of time (i.e. KB/s). The metric includes content (i.e. headers and payload) of control and data messages induced by the harvesting method in all peer-to-peer and client-to-peer exchanges. The dominant traffic in the network induced by the conversations between clients and services, as well as, the messages exchanged by the underlying routing protocol, are both excluded from the metric.

5 Experimental Evaluation

We focus the evaluation of our delay tolerant harvesting method on the following key issues:

  1. impact of the synchronization frequency;

  2. impact of the number of peers;

  3. impact of constraining the number of peers;

  4. comparison to four data harvesting methods;

  5. tradeoff between overhead and precision.

We study these issues in the context of the dependence discovery use case.

5.1 Impact of synchronization frequency

The frequency at which peers attempt to disseminate monitoring data is a fundamental tuning parameter in our method, as it induces a trade off between communication overhead, on the one hand, and increased availability of monitoring data on the other. To isolate the impact of this parameter, we fix the delay between the end of the client conversation and the start of the dependence analysis at five minutes, and vary the number of gossip cycles within that five-minute period from zero (no synchronization) to 32. At each gossip cycle, we have each node select at most one peer node at random from its neighbors in the network.

Figure 7: Availability of monitoring data, as measured by TP ratio, with different numbers of gossip cycles.

We hypothesize that as the number of cycles increases in the period between the end of the conversation and the start of the data analysis, so too should the availability of the monitoring data, resulting in an improved TP ratio. With no synchronization occurring before the data harvesting, the only monitoring data available to the harvesting node are those at its local monitor. However, with every next cycle, the amount of data arriving from remote monitors should increase.

The results are reported in Figure 7. Initially, with no gossip cycles, the only data available are from the client monitors and from the locally hosted services. As the number of gossip cycles increases, the nodes receive increasing amounts of monitoring data from remote monitors via the gossip protocol.

Comparing the two mobility scenarios, the availability of the data increases faster with an increasing number of gossip cycles under the military scenario than under firefighting. This is because, in the military scenario, the services involved in the conversations are with high probability hosted on nodes within the same, relatively stable, group of nodes as the client. On the other hand, in the firefighting scenario, the services involved in conversations in general should be drawn from greater distances, since the nodes are dispersed relatively evenly over a large area, requiring the monitoring data to be transferred between more nodes than in the military scenario and so requiring more gossip cycles to achieve an equivalent TP ratio. Eventually, with 32 cycles, the availability reaches 99.8% in the military scenario and 98.8% in the firefighting scenario.

5.2 Impact of number of peers

We now investigate the impact of the number of peers selected in each gossip cycle. Again, dependence analysis is started five minutes after the conversation ends and the candidate peers are limited to neighbors. We fix the synchronization frequency to one per minute, i.e., four gossip cycles in that five-minute period. Our hypothesis is that increasing the number of peers should increase the availability of monitoring data.

In practice, the number of selected peers is bounded not only by the upper-limit parameter, but also by the number of neighbors. Because the size of the candidate neighbor set depends on the node density and mobility of the network, the number of selected peers on average will be somewhat lower than the configured limit. Furthermore, due to the presence of poor-quality wireless links, the number of peers that successfully receive the data can be smaller than the number of selected peers. Especially when many peers are selected, congestion in the wireless medium hampers message transmission.

This gap can be seen in Figure 8, which shows a varying upper limit, the number of selected peers (“candidates”), and the number of peers that successfully receive the data under the two mobility scenarios. The gap between these numbers widens as the upper limit increases. Notice the subtle difference between the two scenarios. The gap consistently increases in the firefighting scenario, where the nodes are evenly dispersed. On the other hand, in the military scenario, there is little gap between the upper limit and the number of selected peers below a certain point (eight nodes), and then quickly widens beyond that. This is due to the nature of the military scenario, where nodes move in cohesive groups of certain sizes.

The availability of the monitoring data is reported in Figure 9. As expected, with increasing numbers of peers per gossip cycle, the availability of the data also increases. However, a threshold of about four peers is reached beyond which there is little further gain. This is explained by the gap between the upper-limit parameter and the number of peers successfully receiving data, as discussed above. In principle, this threshold should be taken into account in the configuration of the method to balance overhead against the expected quality of the monitoring data analysis result.

Figure 8: Upper limit on peers, number of candidate neighbors, and number of peers successfully receiving data.
Figure 9: Availability of monitoring data, as measured by TP ratio, with different upper limit on peers per gossip cycle.

5.3 Impact of constraining the number of peers

In the next experiment we return to the synchronization frequency, but remove the constraint on the number of peers. Unlike in the previous experiments, the number of selected peers is bounded only by the availability of the neighbors. Hence, the factor that determines the inclusion of nodes is the node density and mobility of the network. The dependence analysis is started five minutes after the conversation ends and we vary the number of gossip cycles within that period. We expect that increasing the number of gossip cycles should increase the availability of monitoring data.

When the number of peers is unconstrained, data are propagated one hop in each cycle from their source in all directions. Thus, the number of cycles needed to make the data available to an analysis element reflects the distance from the nodes on which those data are collected. This distance, however, is impacted by the dynamics induced by node mobility, as well as by the quality of the wireless links. Furthermore, congestion in the wireless medium may have a more significant impact on the transmissions than in the previous experiments, since the data are sent in contiguous sequences to multiple peers.

The results are reported in Figure 10. In the firefighting scenario, where there is a relatively even dispersion of nodes over a large area, the average number of 1-hop-distant peers is 7.2. In the military scenario, where nodes move in relatively stable groups, the average number of 1-hop-distant peers is 10.4. As with the one-peer-per-cycle configuration, increasing the number of cycles increases the availability of the data. A threshold of about two cycles is reached beyond which there is only a small gain from further cycles. The higher number of 1-hop-distant peers in the military scenario leads to a higher availability with same number of cycles than in the firefighting scenario. However, to achieve a data availability of 99% or more, at least four cycles are needed in either scenario to reach more distant nodes.

To compare the one-peer-per-cycle and unconstrained-peers configurations, we evaluate the network overhead imposed by each configuration to achieve a comparable data availability. In the firefighting scenario, the one-peer-per-cycle configuration requires 32 cycles to achieve 98.8% data availability, while the unconstrained-peers configuration resulting in about 7.2 peers-per-cycle requires four cycles to achieve a comparable 99.1% availability. The overhead of the one-peer-per-cycle configuration is 0.32 KB/s per node and 0.38 KB/s per node in the unconstrained-peers configuration. In the military scenario, the one-peer-per-cycle configuration requires 32 cycles to achieve a 99.8% data availability, while the unconstrained-peers configuration resulting in about 10.4 peers-per-cycle requires 8 cycles to achieve a comparable 99.3% availability. The overhead of the one-peer-per-cycle configuration is 0.19 KB/s per node and about 0.29 KB/s per node in the unconstrained-peers configuration. Hence, the unconstrained-peers configuration induces overall about a 30% to 50% higher network overhead than the one-peer-per-cycle configuration to achieve a comparable data availability.

Figure 10: Availability of monitoring data after four gossip cycles, as measured by TP ratio, with different numbers of gossip cycles and an unconstrained number of peers.

5.4 Comparison to data harvesting methods

To place our method in context, we next compare it to two baseline methods: Gossip protocol and DHT, and to two state of the art methods: DAFN and SCALAR.

5.4.1 Gossip protocol

A naïve approach to harvesting would make use of a generic gossip protocol that is not sensitive to the data already possessed by peer nodes, and thus passes all available data in each transmission regardless of past interactions.

We base the naïve method on the design of a standard gossip protocol Jelasity et al. (2007) and use a uniform random selection of peers, push propagation of data, with no confirmation of the success or failure of transmissions. Nevertheless, rationally, we constrain the maximum age of data to be passed.

5.4.2 Dht

A similarly naïve method to harvesting would make use of DHT with locations of time-series within the network, and attempt to transfer data with direct, on-demand data harvesting approach. In that approach, the analysis element attempts to communicate directly with individual monitoring nodes to obtain (“pull”) their data, rather than using a gossip protocol to disseminate (“push”) the data. The availability of the data is therefore limited by the network reachability of the nodes.

5.4.3 Dafn

Dynamic Access Frequency and Neighborhood (DAFN) method Hara and Madria (2006) increases data availability with collaborative message relaying and placing data replicas on mobile nodes based on data access frequency. Every node maintains a table of frequency of accessing data. When node requests data, a request message is broadcasted to its neighbors. Upon reception of the request message, node either responds with data from its local cache or forwards the request to its neighbors. Upon reception of the data, the node stores the data into the local cache. To eliminate redundant data replicas between neighbor nodes, each connected set of nodes elects a coordinator responsible for optimizing the allocation of replicas. The coordinator monitors the allocation of replicas between neighbor nodes and regularly prunes replicas on nodes with lower access frequencies.

5.4.4 Scalar

Scalable data Lookup And Replication framework (SCALAR) Atsan and Öznur Özkasap (2013) is a scalable method of data lookup and passive replication in MANETs. The method builds a dynamic virtual backbone between mobile nodes based on an approximation of minimum connected dominating set within a set of connected nodes. Thus, every node within a set of connected nodes is directly connected to at least one node of the virtual backbone. The virtual backbone is used to minimize the number nodes involved in relaying data request and response messages within the network. A scalable data lookup protocol uses the virtual backbone to relay data request and response messages between clients and data sources. A reactive replication mechanism uses the data lookup protocol to preload data from sources closer to clients. The mechanism preloads data based on monitoring of request frequency and the distance between requester and source of the data.

5.4.5 Comparison of data availability

Figure 11: Comparison of TP ratio of dependencies with DHT, DAFN, SCALAR, Gossip protocol and our Harvesting method (HM) in the firefighting (a) and military (b) scenarios.

Our aim in this experiment is to compare the data availability achieved by the methods under range of delays of starting data harvesting after the end of conversation.

Our method is configured with a single peer per cycle and a frequency of 32 cycles within the delay of data harvesting after the conversation and the Maximum age limit of time slot parameter is set to the length of the delay of data harvesting after the conversation. Since the naïve method uses the same fundamental gossiping mechanism, we configure the method same as our harvesting method, with a single peer and 32 cycles within the delay period. In the DAFN method, we do not constrain the size of cache available for storing the time-series data replicas on nodes. Thus, the method’s decision on reducing the number of data replicas between neighbor nodes is based solely on the frequency of data access. Similarly, we do not constrain the size of the cache available to data replicas in the SCALAR method. To achieve high data availability, we configure the method to actively replicate incremental changes in the time-series if there were more than one request within the delay period.

Figure 11 shows the availability of monitoring data using DHT, DAFN, SCALAR, Gossip protocol and our Harvesting method (HM), measured in terms of the TP ratio. The x-axis represents the delay between the end of a conversation and the beginning of the harvesting. Notice that the general trend of data availability of DHT, DAFN and SCALAR methods follows the reachability of nodes in the network shown in Figure 1 of Section 1. Initially, with no delay in issuing data harvesting requests after the conversation, data are highly available because the topology of the network has not significantly changed. Thus, methods which use the (“pull”) approach for obtaining data on request achieve high data availability. However, with increasing delay between end of conversation and the requests for data from clients, movements of nodes decrease the data availability.

The baseline DHT method, harvesting data directly on-demand over multiple hops achieves 90% and 69% TP ratios immediately after the conversation and declines to 75% and 45% TP ratios when requests for data are issued 16 minutes after the conversation in the military and firefighting scenarios, respectively. The DAFN method uses collaborative relaying of messages coupled with caching of obtained data on multiple replicas within the network. This approach increases the data availability to 93% and 82% TP ratios after the conversation and declines to 81% and 49% TP ratios at 16 minutes after the conversation. The SCALAR method based on relaying messages over virtual backbone, caching and limited (“push”) based replication yields similar data availability at 93% and 80% TP ratios after the conversation and declines to 84% and 55% TP ratios at 16 minutes after the conversation.

The baseline Gossip protocol and our Harvesting method both employ a (“push’) approach to disseminate time-series data through the network. Both methods achieve very high data availability. Immediately after the conversation, before any data can be disseminated in the network, both methods achieve 34% TP ratios in both scenarios, providing only data available in local stores. However, once the data start to be disseminated in the network, the availability increases to 99.9% and 99.6% TP ratios in the military and firefighting scenarios, respectively. The high availability somewhat declines to 98.5% and 97.3% TP ratios when requests are issued 16 minutes after conversation, caused by low frequency of disseminating data from migrating nodes.

5.4.6 Comparison of network overhead

Figure 12: Comparison of transfer overheads of DHT, DAFN, SCALAR, Gossip protocol and our Harvesting method (HM) in the firefighting (a) and military (b) scenarios.

Here we compare the network overhead of the methods. We hypothesize that the DHT based method will induce only a small overhead, while the overhead of the Gossip protocol will be significantly higher due to its sending of the same data multiple times. The DAFN method, flooding the network with request and control messages will also induce significant overhead. The SCALAR method, utilizing caches and limiting the passing of messages to a subset of virtual backbone nodes will likely be more efficient. Our method should ideally induce significantly less overhead than the Gossip protocol and DAFN methods. However, since in our method the same data may be sent along different links at each cycle, the improvement will depend on the topology and connectivity properties of the network, and will certainly induce greater transfer overhead than the DHT. Note, that in this experiment, the delay between the end of conversation and beginning of harvesting varies among the scenarios, however, the total number of analyzed conversations is constant.

The results are reported in Figure 12. The DHT based method yields an overhead of only about 0.04 KB/s and 0.06 KB/s per node in the two scenarios with one minute delay and decreases to 0.02 KB/s and 0.04 KB/s per node with 16 minutes delay. The DAFN method yields significantly higher overhead, about 4.41 KB/s and 7.21 KB/s, and 1.47 KB/s 2.08 KB/s with one and 16 minutes delays respectively, and the SCALAR method yields moderate overhead of about 0.72 KB/s and 0.67 KB/s, and 0.34 KB/s 0.25 KB/s with one and 16 minutes delays respectively.

The Gossip protocol yields substantially higher overhead of about 7.59 KB/s and 13.16 KB/s with one minute delay and decreases to about 1.22 KB/s and 2.21 KB/s with 16 minutes delay. Our method yields low about 0.87 KB/s and 1.24 KB/s and 0.07 KB/s and 0.11 KB/s transfer overhead to achieve the same levels of availability, placing it as expected between the DHT and DAFN as well as below SCALAR in all cases except in scenarios with one minute delay.

5.4.7 Storage requirements

The storage requirements are an artifact of the degree of data dissemination in the network and the number of nodes holding data backups. The time-series data are stored in the form of regularly pruned series of Boolean values. Overall, the storage requirements of the harvesting methods are very small. Our method uses all network nodes to proactively gossip all changes in the monitoring data throughout the network with each node holding backup of the data received. In the explored scenarios, the amount of data stored on the individual nodes was in the order of tens of KB after data older than 20 minutes are pruned. Similarly, DAFN uses all network nodes to relay data from monitors to clients, yet, with a lower degree of data dissemination. The method further actively reduces the number of data backups by eliminating neighbor replicas. The SCALAR method limits the nodes participating in data transfer and holding data backups to those of the virtual backbone. The DAFN and SCALAR methods required about 22 to 29% and 8 to 12% of storage space compared to our method, respectively. The Gossip protocol achieves the same degree of data dissemination as our method and requires same storage space, while the DHT based method does not store any backup data and therefore does not use any storage space.

5.4.8 Comparison summary

The transfer of incremental changes of the time-series data is a challenging task for the data transfer methods. The DHT and the Gossiping protocol represent the two extremes of minimum overhead with low data availability and very high overhead with high availability respectively. The DAFN method with very high overhead provides only somewhat higher availability than DHT. The method is repeatedly flooding the network with requests for latest additions of the time-series data and makes low utilization of caches due to data becoming quickly obsolete. The SCALAR method provides somewhat higher availability than DAFN with lower overhead due to utilization of virtual backbone and caching reducing the number of messages requesting and disbursing latest additions of the time-series data. Our Harvesting method, designed specifically to continuously and efficiently transfer incremental changes in the time-series is yielding very high data availability with low network overhead.

5.5 Tradeoff between overhead and precision

As a final point of evaluation we look at the important issue of overhead versus precision. The amount of data transmitted depends on the size of the system (i.e., the number of time series), the configuration of the method (i.e., gossip cycle frequency and number of peers per cycle), and the length of the time slot within the time series. The size of the system is an application-specific contextual property, while the configuration settings reflect operational requirements. The length of the data transfer time slot impacts overhead imposed on the network, as well as the resolution (compression) of the transmitted data. It thus impacts the precision of the resulting analysis applied to the monitoring data.

To understand this effect for our case study, we use the ratio of false positives (FP) Novotny et al. (2015); Sapello et al. (2010) in dependence graphs to indicate the impact of (im)precision in the dependence data, defined as follows:

where as for the TP ratio defined in Section 4.2, is the set of discovered dependencies and is the set of ground-truth dependencies, under the assumption that and are non-empty.

The FP ratio represents the fraction of dependence data not belonging to a conversation and so erroneously included in the result; a high FP ratio indicates poor precision. These irrelevant dependencies arise from a combination of the monitors aggregating the dependence data into time slots and the inherent behavioral effects of our harvesting method. We hypothesize that increasing the length of the time slot used in the transfer of data will decrease the communication overhead, but increase the FP ratio.

We present our results in Figure 13, where we show the impact of the time slot length on the overhead and FP ratio. Consistent with the previous experiments, the method is configured with a single peer per cycle and a frequency of 32 cycles within period of five minutes, yielding a high availability. The results confirm our hypothesis: with a 0.1 second time slot length, the same as that used by the monitor, the FP ratio is about 16% and 18% in the military and firefighting scenarios, respectively, while the overhead is 0.18 KB/s and 0.32 KB/s on average between pairs of peers. When increasing the length of the time slot to 10 seconds, the FP ratio increases to 35% and 36%, while the overhead decreases to 0.12 KB/s and 0.85 KB/s.

We observe that when the length of the time slot is decreased by 100 times, the FP ratio increases by only about two times, while the overhead decreases by 33% and 73%. This suggests that for applications able to tolerate a higher FP ratio, trading overhead over higher FP ratio might be a viable option. Note that on average a node sends only between 0.09 to 0.32 KB/s of data depending on the scenario and the length of time slot.

Figure 13: Impact of the transfer dataset time slot length on FP ratio and overhead.

5.6 Summary of results

The experimental results presented above establish the relationship between the configuration parameters of our method (namely, the synchronization frequency, the number of peers selected at each cycle, and the length of the transfer dataset time slot), and its ability to improve the availability of monitoring data.

In general, it is shown that selecting either a higher frequency with a lower number of peers, or a lower frequency with a higher number of peers, provide similar outcomes measured in terms of the availability of the monitoring data at remote nodes. Furthermore, selecting a shorter length for the transfer time slot provides higher precision than a longer time slot. However, the data transfer overhead is significantly higher with a shorter time slot length.

When considering network overhead, configurations using a lower number of peers and a higher number of cycles achieves comparable data availability with lower overhead, than configurations with a higher number of peers and a lower number of cycles.

In comparison to DHT, DAFN, SCALAR and Gossip protocol, our approach achieves significantly higher data availability and lower overhead, especially in dynamic network environments such as MANETs, where mobility can easily lead to nodes becoming unreachable, yet their data remaining critical for analysis.

6 Conclusion

We have presented a method to improve the availability of time-series monitoring data for managing service-based systems. To overcome the limited connectivity of the MANET nodes, the method transfers the data over intermediate nodes in successive gossip cycles. The method minimizes the network overhead caused by continuous and repeated data transfer by storing local information about previous transfers, imposing limits on the age of the data, and eliminating irrelevant data. Through an extensive set of emulation-based experiments, we have evaluated the capacity of the method to transfer data from monitors to management nodes in two types of MANET environments. Within the context of a case study, we demonstrated that the method improves the results of management analysis tasks due to the increased availability of monitoring data. Moreover, we have shown how to tune the method to minimize the network overhead in the resource-constrained MANET environment.

We have observed that the storage and communication overhead of the method is relatively small. Of course, the actual amount of data stored and transferred will depend on the nature of the time series of interest. Putting this into context, the overhead is negligible compared to the overhead of the service platform and services hosted within it.

We have compared the method to four other data harvesting methods and shown that harvesting time-series data in MANETs is a challenging task in which the method performs significantly better in both, data availability and network overhead.

An important issue that we have yet to explore is how to select the parameters when the fundamental nature of the network dynamics changes. This is a challenging problem because a good configuration is dependent upon a variety of factors, including: (i) network topology characteristics, such as node degree and network diameter, which affect the data propagation speed critical to the epidemic process; (ii) node mobility characteristics, which affect the patterns of node groupings and network partitioning; (iii) urgency of the data, which determines how quickly the data need to be available at remote nodes; and (iv) the communication bandwidth available for the epidemic process.

Ideally, we would like the method to be self-configuring, such that it can recognize the factors above and deduce appropriate configuration parameters. There are several approaches to consider incorporating into our design for this purpose:

  • a closed-loop process in which the current network status and the data propagation rate are piggy-backed within the data synchronization messages;

  • an advanced peer-selection process that considers the history of past dataset transfers and the full or partial knowledge of the current dataset stored in the peer candidates, rather than a statistically random selection process, as a way to increase the speed of data dispersion within the network;

  • a hybrid approach that combines the “push” epidemic data propagation method with the “pull” on-demand method, where the pull would be initiated only when the data from remote nodes are recognized to have not yet been pushed all the way to the management node;

  • a back-pressure-like protocol for data dissemination in which the synchronization agents can adjust their synchronization rates according to available communication bandwidth and the data availability at other nodes; and

  • a broadcasting based data dissemination approach to increase the rate of data dispersion in suitable cases (such as with high density of one hop-distant neighbors).

We are currently exploring these approaches in our on-going work, which should result in an adaptive framework for data propagation and harvesting in dynamic mobile networks.


This research was sponsored by the U.S. Army Research Laboratory and the U.K. Ministry of Defence and was accomplished under Agreement Number W911NF-06-3-0001. The views and conclusions contained in this document are those of the author(s) and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Army Research Laboratory, the U.S. Government, the U.K. Ministry of Defence or the U.K. Government. The U.S. and U.K. Governments are authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon.



  • Esfahani et al. (2016) N. Esfahani, E. Yuan, K. R. Canavera, S. Malek, Inferring software component interaction dependencies for adaptation support, ACM Transactions on Autonomous and Adaptive Systems 10 (2016).
  • Jalaparti et al. (2013) V. Jalaparti, P. Bodik, S. Kandula, I. Menache, M. Rybalkin, C. Yan, Speeding up distributed request-response workflows, in: ACM SIGCOMM Conference, pp. 219–230.
  • Lund et al. (2007) K. Lund, A. Eggen, D. Hadzic, T. Hafsoe, F. T. Johnsen, Using web services to realize service oriented architecture in military communication networks, IEEE Communications Magazine 45 (2007) 47–53.
  • Lund et al. (2010) K. Lund, E. Skjervold, F. T. Johnsen, T. Hafsøe, A. Eggen, Robust web services in heterogeneous military networks, IEEE Communications Magazine 48 (2010) 78–83.
  • Novotny et al. (2015) P. Novotny, B. J. Ko, A. L. Wolf, On-demand discovery of software service dependencies in MANETs, IEEE Transactions on Network and Service Management 12 (2015) 278–292.
  • Sapello et al. (2010) A. Sapello, A. Sethi, M. Nodine, R. Chadha, Application of time series analysis to fault management in MANETs, in: International Conference on Network and Service Management, pp. 150–157.
  • Issarny et al. (2011) V. Issarny, N. Georgantas, S. Hachem, A. Zarras, P. Vassiliadist, M. Autili, M. A. Gerosa, A. B. Hamida, Service-oriented middleware for the future internet: State of the art and research directions, Journal of Internet Services and Applications 2 (2011) 23–45.
  • Novotny et al. (2015) P. Novotny, R. Urgaonkar, A. L. Wolf, B. J. Ko, Dynamic placement of composite software services in hybrid wireless networks, in: IEEE Military Communications Conference.
  • Silvestri et al. (2015) S. Silvestri, B. Holbert, P. Novotny, T. L. Porta, A. L. Wolf, A. Swami, Inferring network topologies in MANETs applied to service redeployment, in: 24 International Conference on Computer Communication and Networks.
  • Khandani et al. (2008) A. E. Khandani, J. Abounadi, E. Modiano, L. Zheng, Reliability and route diversity in wireless networks, IEEE Transactions on Wireless Communications 7 (2008) 4772–4776.
  • Birman (2007) K. Birman, The promise, and limitations, of gossip protocols, SIGOPS Operating Systems Review 41 (2007) 8–13.
  • Ahrenholz (2010) J. Ahrenholz, Comparison of CORE network emulation platforms, in: IEEE Military Communications Conference, pp. 166–171.
  • Heer et al. (2006) T. Heer, S. Gotz, S. Rieche, K. Wehrle, Adapting distributed hash tables for mobile ad hoc networks, in: 4 Annual IEEE International Conference on Pervasive Computing and Communications Workshops, pp. 173–178.
  • Oliveira et al. (2005) L. B. Oliveira, I. G. Siqueira, D. F. Macedo, A. A. F. Loureiro, H. C. Wong, J. M. Nogueira, Evaluation of peer-to-peer network content discovery techniques over mobile ad hoc networks, in: 6 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks, pp. 51–56.
  • Shah and Qian (2011) N. Shah, D. Qian, An efficient unstructured P2P overlay over MANET using underlying proactive routing, in: 7 International Conference on Mobile Ad-hoc and Sensor Networks, pp. 248–255.
  • Derhab and Badache (2009) A. Derhab, N. Badache, Data replication protocols for mobile ad-hoc networks: A survey and taxonomy, IEEE Communications Surveys Tutorials 11 (2009) 33–51.
  • Padmanabhan et al. (2008) P. Padmanabhan, L. Gruenwald, A. Vallur, M. Atiquzzaman, A survey of data replication techniques for mobile ad hoc network databases, The VLDB Journal 17 (2008) 1143–1164.
  • Bellavista et al. (2005) P. Bellavista, A. Corradi, E. Magistretti, Comparing and evaluating lightweight solutions for replica dissemination and retrieval in dense MANETs, in: 10 IEEE Symposium on Computers and Communications, pp. 43–50.
  • Hara (2003) T. Hara, Replica allocation methods in ad hoc networks with data update, Mobile Network Applications 8 (2003) 343–354.
  • Hara et al. (2004) T. Hara, N. Murakami, S. Nishio, Replica allocation for correlated data items in ad hoc sensor networks, SIGMOD Record 33 (2004) 38–43.
  • Hara (2001) T. Hara, Effective replica allocation in ad hoc networks for improving data accessibility, in: 20 Annual Joint Conference of the IEEE Computer and Communications Societies, pp. 1568–1576.
  • Atsan and Öznur Özkasap (2013) E. Atsan, Öznur Özkasap, SCALAR: Scalable data lookup and replication protocol for mobile ad hoc networks, Computer Networks 57 (2013) 3654–3672.
  • Fiore et al. (2011) M. Fiore, C. Casetti, C. Chiasserini,

    Caching strategies based on information density estimation in wireless ad hoc networks,

    IEEE Transactions on Vehicular Technology 60 (2011) 2194–2208.
  • Hara and Madria (2006) T. Hara, S. K. Madria, Data replication for improving data accessibility in ad hoc networks, IEEE Transactions on Mobile Computing 5 (2006) 1515–1532.
  • Yu et al. (2005) H. Yu, P. Martin, H. Hassanein, Cluster-based replication for large-scale mobile ad-hoc networks, in: 2005 International Conference on Wireless Networks, Communications and Mobile Computing, volume 1, pp. 552–557 vol.1.
  • Pacitti et al. (1999) E. Pacitti, P. Minet, E. Simon, Fast algorithms for maintaining replica consistency in lazy master replicated databases, in: 25 International Conference on Very Large Data Bases, Morgan Kaufmann Publishers, Inc., 1999, pp. 126–137.
  • Moon and Cho (2004) A. Moon, H. Cho, Energy-efficient replication extended database state machine in mobile ad hoc network, in: IADIS International Conference on Applied Computing, pp. 224–228.
  • Huang et al. (2003) J.-L. Huang, M.-S. Chen, W.-C. Peng, Exploring group mobility for replica data allocation in a mobile environment, in: 12 International Conference on Information and Knowledge Management, pp. 161–168.
  • Wang and Li (2002) K. Wang, B. Li, Efficient and guaranteed service coverage in partitionable mobile ad-hoc networks, in: 21 Annual Joint Conference of the IEEE Computer and Communications Societies, volume 2, pp. 1089–1098.
  • Hauspie et al. (2002) M. Hauspie, D. Simplot, J. Carle, Replication decision algorithm based on link evaluation services in MANET, Technical Report, CNRS LIFL, University of Lille, 2002.
  • Dimakis et al. (2010) A. Dimakis, S. Kar, J. Moura, M. Rabbat, A. Scaglione, Gossip algorithms for distributed signal processing, Proceedings of the IEEE 98 (2010) 1847–1864.
  • Friedman et al. (2007) R. Friedman, D. Gavidia, L. Rodrigues, A. C. Viana, S. Voulgaris, Gossiping on MANETs: The beauty and the beast, SIGOPS Operating Systems Review 41 (2007) 67–74.
  • Reina et al. (2015) D. Reina, S. Toral, P. Johnson, F. Barrero, A survey on probabilistic broadcast schemes for wireless ad hoc networks, Ad Hoc Networks 25, Part A (2015) 263–292.
  • Rabbat and Nowak (2004) M. Rabbat, R. Nowak, Distributed optimization in sensor networks, in: 3 International Symposium on Information Processing in Sensor Networks, pp. 20–27.
  • Yu et al. (2004) Y. Yu, B. Krishnamachari, V. Prasanna, Energy-latency tradeoffs for data gathering in wireless sensor networks, in: 23 Annual Joint Conference of the IEEE Computer and Communications Societies, volume 1, pp. 244–255.
  • Jelasity et al. (2007) M. Jelasity, S. Voulgaris, R. Guerraoui, A.-M. Kermarrec, M. van Steen, Gossip-based peer sampling, ACM Transactions on Computer Systems 25 (2007).
  • De Couto et al. (2005) D. S. J. De Couto, D. Aguayo, J. Bicket, R. Morris, A high-throughput path metric for multi-hop wireless routing, Wireless Networks 11 (2005) 419–434.
  • Novotny et al. (2012) P. Novotny, A. L. Wolf, B. J. Ko, Fault localization in MANET-hosted service-based systems, in: 31 IEEE International Symposium on Reliable Distributed Systems, pp. 243–248.
  • Tati et al. (2013) S. Tati, P. Novotny, B. J. Ko, A. L. Wolf, A. Swami, T. La Porta, Diagnosing degradation of services in hybrid wireless tactical networks, in: SPIE Defense, Security, and Sensing.
  • Mian et al. (2009) A. N. Mian, R. Baldoni, R. Beraldi, A survey of service discovery protocols in multihop mobile ad hoc networks, IEEE Pervasive Computing 8 (2009) 66–74.
  • Novotny and Wolf (2016) P. Novotny, A. L. Wolf, Emulating Web Services-Based Systems Hosted in Ad Hoc Wireless Networks, Technical Report DTR-2016-7, Department of Computing, Imperial College London, 2016.
  • Roy (2011) R. R. Roy, Handbook of Mobile Ad Hoc Networks for Mobility Models, Springer-Verlag, 2011.