Communication system research relies on experiments. Accordingly, methods and tools, such as network simulators and their incorporated network models, emerged within the research community to enable controlled and repeatable experiments. There are numerous simulators and emulators available. These are tailored for different applications, underlying abstractions, and network models [2, 5, 16, 20, 21, 25, 29, 32, 35]. Controlled experiments with these execution environments have become essential in the process of designing and developing communication systems to provide early and recurring feedback.
During our work on designing and developing different communication systems, we noted that we recurrently implemented support infrastructure and tools to automate experiments and analyze results. The development of these tools typically started from scratch for every new research project. While we usually started with just a few scripts, the tooling evolved with the research project, and finally required a notable fraction of the overall research effort. Although the development of such tools is straightforward, it distracts from the actual research and delays the project.
In this paper, we identify three recurring requirements for network experiment studies: i) the specification, management, and documentation of experiments with their dependent and independent control parameters, ii) the scalable experiment execution, i.e., the parallel execution of a large set of experiments, and iii) the interactive analysis of the experiment results based on the previously specified control parameters. We argue that an integrated solution is indispensable to increase the efficiency of network experiments.
In the following, we present MACI, the first bespoke framework for the seamless management, scalable execution, and interactive analysis of a large number of experiments. MACI emerged as the result of our experiences and learned best practices during various research projects and evolved into a smart combination and integration of established tools to foster rigorous evaluations throughout the research process. MACI
adopts, for example, the concepts of interactive data analysis from the domains of business intelligence and data science on network experiments.MACI follows the zeitgeist of agile development and continuous integration by removing obstacles to fast iterations which hinder research progress.
We discuss the benefits of MACI based on our experience with three research projects: i) an extensive DASH video streaming study , ii) the development of various Multipath TCP packet schedulers [12, 10], and iii) the tuning of a distributed topology graph pattern matching protocol .
We publicly release MACI together with tutorials at https://maci-research.net to enable other researchers to increase the efficiency of their work.
2 Requirement Analysis
To make the case for developing MACI, we start by analyzing recent observations and recurring requirements for conducting network experiments.
Req. 1: Improved Efficiency The driving requirement for an integrated network experiment framework is to improve research efficiency. This allows the researcher to focus on reasoning, questioning and improving the observed behavior.
Observation 1: Increasing Complexity While today’s modular, layered communication systems enable optimizations and reduce complexity per layer, research on communication systems has to consider complex cross-layer dependencies. The tuning of transport protocols and congestion controls, for example, has to consider various network environments, application workloads, and configurations of the network stack. Similarly, the performance of DASH video streaming algorithms changes significantly when replacing the underlying TCP congestion control or transport protocol (e.g., replacing TCP with emerging protocols such as MPTCP and QUIC). The systematic analysis of cross-layer dependencies is indispensable even if only a single component should be optimized.
Observation 2: Increasing Innovation Speed We notice an increasing speed of network innovations. The recently proposed QUIC transport protocol , for example, is designed with the explicit goal of enabling frequent iterative improvements . Hence, these iterative improvements have to be repetitively analyzed with respect to their impact on the application performance, e.g., in the previous DASH video streaming example. Recent advances in network programmability, such as congestion control and Multipath TCP scheduler specification languages [4, 12], will further increase innovation speed. Since these languages enable rapid specifications of novel communication system algorithms, we need support for rapid evaluations with systematic experiments.
Observation 3: Extensive Experiments We note an increasing number of extensive experiment studies in various communication system domains. These extensive studies consist of a large number of individual emulation or simulation experiments. Kakhki et al.  identify the need for rapid evaluations of protocols such as QUIC and present a rigorous comparison of QUIC protocol versions. Paasch et al.  used an experimental design approach for Multipath TCP to evaluate dependencies of the protocol configuration, the network capacity, and the network delay. In [30, 34], the authors conducted extensive emulation-based studies of DASH video streaming. We found previously proposed experiment automation frameworks [3, 14, 19, 23, 24] limited to network simulators such as ns-3. Their deep integration makes these frameworks unsuitable various use cases, including the DASH and MPTCP studies in Section 5. All these examples confirm the need for extensive experiments and contribute frameworks for their confined research domain. A general reusable experiment framework for communication systems research remains open.
Observation 4: Resource Availability Evaluations with more experiment repetitions are usually favorable with regard to their insights and confidence but are time and resource consuming. Recent infrastructure management advances pave the way for scalable experiment execution. Tools such as OpenStack enable private clouds to easily allocate and share computing resources, and public cloud providers have apparently infinite computing resources.
Req. 2: Scalable, Parallel Experiment Execution The workload of network experiments with many configurations is embarrassingly parallel, as there are no dependencies between the experiments . Network experiment studies should leverage today’s available experiment resources and the parallel nature of experiments to increase iteration speed. The framework should reflect changing resource requirements during the research project lifecycle.
Req. 3: Modular Framework The framework has to be modular to customize and exchange major components. This includes APIs for additional components, e.g., to automatically trigger new evaluations based on previous results. Network experiments require an execution environment such as a simulator, an emulator, a hardware testbed, or a real-world infrastructure. Accordingly, it should be easy to integrate the variety of established execution environments.
Req. 4: Interactive Analysis To foster a systematic analysis of the experiment results, the framework has to manage the collection, aggregation, and analysis of results. Following best practices from the areas of data analytics, business intelligence, and data science, data should be visualized interactively. The researcher should interact with the data to filter and aggregate for configurations and environments and trigger the evaluation of additional configurations.
Req. 5: Reproducibility The conducted scientific experiments must be reproducible. This is particularly important as research prototypes evolve quickly and previous experiments have to be reproducible with their implementations and configurations.
Req. 6: Coordination of Collaboration We notice that coordination of experiments and sharing of results among researchers introduces overhead. Researcher tend to write just a small analysis script, as the development of reusable features is typically out of scope for the current research project.
3 Experiment-Driven Research
MACI is designed for experiment-driven research, which relies on recurring evaluations with implementations of systems, protocols, and algorithms. In the following, we present the design of MACI for seamless experiment execution and interactive analysis.
MACI supports the entire lifecycle of an iterative research process, including the initial execution and analysis of prototypes with a few varying parameters, the refinement of the underlying algorithms, protocols, and implementations, and the extensive evaluation of matured implementations. Therefore, MACI enables the experiment management, their scalable execution, and the interactive analysis of the experiment results integrated in a seamless fashion, as shown in Fig. 1.
Manage Experiments MACI structures experiments by decoupling experiment study templates, experiment studies, and experiments to enable efficient management and reusability of experiments (Fig. 2). An experiment study template is a reusable template for a certain application domain. The experiment study exposes dependency variables to control configuration and environment conditions. Usually, evaluations compare the application performance in a certain environment depending on its configuration. Accordingly, MACI makes the differentiation between configuration and environment explicit to automatically prepare for meaningful analysis.
An experiment study is a concrete instantiation of a template. The experiment study comprises an executable experiment, which results from the combinations of the specified configurations and environments. The execution of a single experiment results in various measurements, including target metrics and logging information. The experiment script specifies the control flow and experiment process, e.g., controlling tools such as ns-3, Mininet, or custom simulators. MACI keeps track of all meta information, such as version number and commit identifiers of the used implementations, to ensure reproducibility.
Scalable Execution In MACI, experiments are the smallest, atomic execution units. MACI controls the generation and parallel execution of experiments in a scalable worker infrastructure.
Interactive Data Analysis MACI provides various views to interactively analyze experiment results. These interactive views are seamlessly available based on collected and provided data. In particular, the data model, e.g., the available configuration parameters, is automatically derived from the specified data in the management frontend.
allows the selection of target metrics, as well as the specification of filters and aggregations based on configuration and environment parameters. The result of these operations is represented visually, e.g., as box plots. The interactive analysis and visualization of the data distributions enables researchers to inspect sources of variances by changing filters and aggregations.
MACI provides additional analysis views, e.g., to analyze single experiments (drill down) and balance conflicting target metrics. The automatic generation of Pareto frontiers, for example, enables the researcher to inspect trade-offs for the throughput and latency of congestion controls.
In the following, we present the modular implementation of MACI. The contribution of MACI goes beyond these modules, but stems from their seamless integration to foster the experiment-driven research process.
Manage Experiments The web frontend includes an editor and management features for all steps of the experiment lifecycle, i.e., the specification of the experiment and its configuration and environment parameters as well as the monitoring of running experiments. The frontend provides direct feedback, e.g., the total experiment duration, and automates reoccurring manual steps. To integrate and control established network simulators and emulators, MACI relies on Python scripts. The backend is implemented as dotnet core server application, which provides a REST API and a ready to use Java interface.
Scalable Execution Experiment instances are executed in parallel to speed up the evaluation. MACI supports the manual management of worker instances (servers) as well as the integration with manageable infrastructures, i.e., AWS EC2 and Proxmox. The current implementation of MACI follows an Infrastructure as a Service cloud model, as many experiments require own operating system modules (e.g., for transport protocol implementations such as MPTCP) and do not support multiple concurrent experiments per host. For experiments with less infrastructure dependencies, we envision more resource efficient serverless computations, such as AWS Lambda.
Interactive Data Analysis For the data analysis, we rely on the established SciPy  data science tool-chain of Jupyter, numpy, and pandas. We discarded commercial alternatives in favor of a publicly available framework. MACI provides analysis template scripts which instantly provide interactive analysis features to explore and drill down experiments intuitively. These templates are at the sweet spot of automation and flexibility, as they are easily extendable by researchers with the vast Python software module ecosystem.
Deployment To enable a rapid setup, we provide an optional docker-compose configuration, initiating and connecting all required system components, i.e., the MACI-backend, Jupyter/SciPy and a Mininet worker. Thus, a full MACI system can be deployed with a single command on any major OS.
5 Experiences and Results
In the following, we discuss our MACI experiences. We greatly benefited from MACI during the development and evaluation in recent research projects on Multipath TCP scheduling [11, 12, 10, 33], DASH video streaming , topology graph pattern matching algorithms , and the supervision of student theses. We further reproduced the results of a notable Multipath TCP experimental design study . Besides the necessary evaluation setup for the execution of a single experiment instance, we only added six lines of code to benefit from all MACI features, such as the parallel experiment execution and the analysis with plots comparable to the original publication.
Learning Curve We provided MACI to students and found that MACI i) increased their speed and systematics by guiding them through the experiment lifecycle and ii) helped us to monitor their progress.
Simulator/Emulator Integration While MACI was developed with the Mininet network emulator in mind, we integrated ns-3 and a custom Java-based simulator with minimal changes.
5.1 DASH Video Streaming Analysis
We used MACI for an extensive Dynamic Adaptive Streaming over HTTP (DASH) player and adaptation algorithm comparison. While the results of this comparison are published in , we discuss the contribution of MACI on this publication in the following.
DASH  is a main enabler of adaptive video streaming in today’s Internet. By adapting the quality and size of the downloaded video segments, DASH copes with the wide range of fluctuating network conditions in today’s Internet. Various DASH players and video quality adaptation algorithms were proposed to provide high video playback quality and to avoid video stallings in these heterogeneous environments.
|Config.||Player||DASH.JS, Shaka, AStream|
|Adapt. Algo.||Standard, BOLA|
|Segment Length||1, 2, 6, 10, 15 [s]|
|Target Buffer||Default, 5, 20 [s]|
|Env.||0.8, 2, 5, 7.5, 10 [Mbps]|
|(BW)||0, 0.8, 2, 5 |
Experiment Setup We used MACI for a comprehensive DASH emulation study. We compared three major DASH player implementations with two playback quality adaptation algorithms and various player configurations, i.e., the video segment length and the target size of the playback buffer, in networks with varying characteristics (Table 1). For a detailed investigation, we collected various target metrics, including the achieved video quality, the experienced stalling events, and the network utilization.
Iterative Research Process We developed, tested, and improved the DASH specific measurement features iteratively. The interactive analysis of the experiment results enabled us i) to quickly detect errors and inconsistencies in our measurements and implementations and ii) to identify regions of interested and to add additional measurement metrics and configurations to further investigate and question our findings within the process. We profited from MACI for interactive analysis group sessions to discuss and question hypotheses. The simple repetition of experiment studies with improved and extended implementations was crucial for our efficiency.
Scalable Execution As a single execution of all configurations in all environments requires more than 40 hours (120 s video playback per experiment), the parallel experiment execution significantly increased our iteration speed and enabled us to retrieve reliable results with dozens of repetitions.
5.2 MPTCP Scheduler Development
We used MACI for the development of five novel Multipath TCP (MPTCP) schedulers. While the MPTCP specific details and evaluations are published in [12, 10], we discuss the contribution of MACI on the design of one exemplary scheduler in the following.
MPTCP  is a recent TCP evolution, which uses multiple subflows to leverage multiple paths and network interfaces for a single connection. The mapping of packets on subflows, the MPTCP scheduling, has a crucial impact on the performance. The design of MPTCP schedulers has to consider complex dependencies between subflow and traffic flow characteristics.
Iterative Research Process Redundant transmission of packets on multiple subflows proactively compensate packet loss and promises to reduce flow completion times. Tuning a redundant scheduler, however, calls for many design decisions, e.g., when to transmit a redundant or a fresh packet. We used MACI for a systematic comparison of these design decisions for various traffic patterns (e.g., flow sizes) in different network environments (e.g., loss rates and capacities). The interactive analysis of MACI with visualizations as shown in Fig. 3 enabled us to identify and overcome weaknesses of scheduler designs.
I prefer simulator foo and analysis tool bar. MACI focuses on a seamless experiment execution and evaluation process with established, publicly available components. As there is no optimal tool for all scenarios, the modular architecture of MACI enables the integration of additional components, such as simulators and analysis tools. For example, even though big data analysis frameworks were unrequired for our use cases so far, MACI supports their integration in the seamless research process.
Isn’t this just parameter sweeping? MACI differs from parameter tuning and performance analysis frameworks , as it covers the entire research process, including the refinement of the evaluated protocols, algorithms, implementations, and their environments and configurations (Fig. 1). MACI increases the evaluation efficiency to focus on the analysis of research hypotheses and provide empirical evidence.
Isn’t this data dredging? The simplicity of conducting additional experiments and interactive analysis might be tempting to uncover statistically significant yet obviously unreasonable relations. We claim, however, that researchers using MACI save time to focus on rigorous analysis and work on better models.
Isn’t A/B testing superior? A/B tests [18, 26] are indubitably superior to emulation and simulation studies. However, rigorous and meaningful A/B testing i) is reserved for a few leading companies and largely infeasible in academia and ii) requires systematic initial experiments which benefit from MACI.
In this paper, we presented MACI, a framework for the management, the scalable execution, and the interactive analysis of a large number of network experiments. MACI significantly reduced repetitive tasks and increased the quality of the obtained results in various application scenarios [10, 11, 12, 28, 31, 33]. MACI provided all evaluation process specific functionalities and allowed us to focus on research. This paper provides only an overview of MACI—many additional helpful features can be found in the released version.
MACI is designed and evaluated with a focus on the experiences and requirements of researchers in the communication systems community. We assume that the significance of MACI and the idea of a seamless, integrated research process goes beyond this domain. We released MACI at https://maci-research.net and hope that it is the starting point to i) increase the research efficiency and quality and ii) integrate and establish more sophisticated evaluation methodologies in the communication system research process.
This work has been funded by the German Research Foundation (DFG) as part of the projects C2, C3, and B4 in the Collaborative Research Center (SFB) 1053 MAKI. This work was supported by the AWS Cloud Credits for Research program.
-  Scripy 0.9.3: Python tools for manage system commands as replacement to bash script. Python Software Foundation https://pypi.python.org/pypi/Scripy.
-  Afanasyev, A., Moiseenko, I., Zhang, L., et al. ndnSIM: NDN simulator for NS-3. University of California, Los Angeles, Tech. Rep (2012).
-  Andreozzi, M. M., Stea, G., and Vallati, C. A framework for large-scale simulations and output result analysis with ns-2. In SIMUTools (2009).
-  Arashloo, T., Ghobadi, M., Rexford, J., and Walker, D. HotCocoa: Hardware Congestion Control Abstractions. In HotNets (2017).
-  Chan, M.-C., Chen, C., Huang, J.-X., Kuo, T., Yen, L.-H., and Tseng, C.-C. OpenNet: A simulator for software-defined wireless local area network. In Wireless Communications and Networking Conference (WCNC) (2014), IEEE, pp. 3332–3336.
-  Codd, E. F., Codd, S. B., and Salley, C. T. Providing OLAP (on-line analytical processing) to user-analysts: An IT mandate, 1993.
-  Duplyakin, D., Brown, J., and Ricci, R. Active learning in performance analysis. In Proceedings of the IEEE Cluster Conference (Sept. 2016).
-  Ford, A., Raiciu, C., Handley, M., and Bonaventure, O. TCP Extensions for Multipath Operation with Multiple Addresses. RFC 6824, 2013.
-  Foster, Ian. Designing and Building Parallel Programs. Addison–Wesley, 1995.
-  Frömmgen, A., Heuschkel, J., and Koldehofe, B. Multipath TCP Scheduling for Thin Streams: Active Probing and One-way Delay-awarness. In ICC (2018).
-  Frömmgen, A., and Koldehofe, B. Demo: A Programming Model for Application-defined Multipath TCP Scheduling. In ACM/IFIP/USNIX Middleware (2017).
-  Frömmgen, A., Rizk, A., Erbshäußer, T., Weller, M., Koldehofe, B., Buchmann, A., and Steinmetz, R. A Programming Model for Application-defined Multipath TCP Scheduling. In ACM/IFIP/USNIX Middleware (2017).
-  Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., and Pirahesh, H. Data Cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data mining and knowledge discovery (1997), 29–53.
-  Hallagan, A., Ward, B., and Perrone, L. F. An Experiment Automation Framework for NS-3. In SIMUTools (2010).
-  Hamilton, R., Iyengar, J., Swett, I., and Wilk, A. QUIC: A UDP-based secure and reliable transport for HTTP/2, July 2016. IETF, Internet-Draft.
-  Handigol, N., Heller, B., Jeyakumar, V., Lantz, B., and McKeown, N. Reproducible Network Experiments using Container-based Emulation. In CoNEXT (2012).
-  Kakhki, A., Jero, S., Choffnes, D., Nita-Rotaru, C., and Mislove, A. Taking a Long Look at QUIC: An Approach for Rigorous Evaluation of Rapidly Evolving Transport Protocols. In IMC (2017).
-  Langley, A., Riddoch, A., Wilk, A., Vicente, A., Krasic, C., Zhang, D., Yang, F., Kouranov, F., Swett, I., Iyengar, J., et al. The QUIC Transport Protocol: Design and Internet-Scale Deployment. In SIGCOMM (2017), ACM, pp. 183–196.
-  Millman, E., Arora, D., and Neville, S. W. STARS: A Framework for Statistically Rigorous Simulation-Based Network Research. In IEEE Workshops of International Conference on Advanced Information Networking and Applications (2011), pp. 733–739.
-  Netravali, R., Sivaraman, A., Das, S., Goyal, A., Winstein, K., Mickens, J., and Balakrishnan, H. Mahimahi: Accurate Record-and-Replay for HTTP. In USENIX ATC (2015), pp. 417–429.
-  Osterlind, F., Dunkels, A., Eriksson, J., Finne, N., and Voigt, T. Cross-level sensor network simulation with cooja. In LCN (2006), IEEE, pp. 641–648.
-  Paasch, C., Khalili, R., and Bonaventure, O. On the Benefits of Applying Experimental Design to Improve Multipath TCP. In CoNEXT (2013), ACM, pp. 393–398.
-  Perrone, L. F., Kenna, C. J., and Ward, B. C. Enhancing the credibility of wireless network simulations with experiment automation. In IEEE WiMob (2008), pp. 631–637.
-  Perrone, L. F., Main, C. S., and Ward, B. C. Safe: Simulation automation framework for experiments. In Winter Simulation Conference (WSC) (2012).
-  Riley, G. F., and Henderson, T. R. The ns-3 Network Simulator. Modeling and tools for network simulation (2010), 15–34.
-  Schermann, G., Schöni, D., Leitner, P., and Gall, H. C. Bifrost: Supporting Continuous Deployment with Automated Enactment of Multi-Phase Live Testing Strategies. In ACM/IFIP/USNIX Middleware (2016), p. 12.
-  Sodagar, I. The MPEG-DASH Standard for Multimedia Streaming Over the Internet. IEEE MultiMedia (2011), 62–67.
-  Stein, M., Frömmgen, A., Kluge, R., Lin, W., Wilberg, A., Koldehofe, B., and Mühlhäuser, M. Scaling Topology Pattern Matching: A Distributed Approach. In SAC (2018), ACM.
-  Stingl, D., Gross, C., Ruckert, J., Nobach, L., Kovacevic, A., and Steinmetz, R. PeerfactSim.KOM: A simulation framework for Peer-to-Peer systems. In IEEE HPCS (2011), pp. 577–584.
-  Stohr, D., Frömmgen, A., Fornoff, J., Zink, M., Buchmann, A., and Effelsberg, W. Qoe analysis of dash cross-layer dependencies by extensive network emulation. In Workshop on QoE-based Analysis and Management of Data Communication Networks (2016), ACM, pp. 25–30.
-  Stohr, D., Frömmgen, A., Rizk, A., Zink, M., Steinmetz, R., and Effelsberg, W. Where are the Sweet Spots?: A Systematic Approach to Reproducible DASH Player Comparisons. In ACM Multimedia (2017), pp. 1113–1121.
-  Varga, A., and Hornig, R. An Overview of the OMNeT++ Simulation Environment. In Simulation tools and techniques for communications, networks and systems & workshops (2008), p. 60.
-  Viernickel, T., Frömmgen, A., Rizk, A., Koldehofe, B., and Steimetz, R. Multipath QUIC: A Deployable Multipath Transport Protocol. In ICC (2018).
-  Zabrovskiy, A., Kuzmin, E., Petrov, E., Timmerer, C., and Mueller, C. AdViSE: Adaptive Video Streaming Evaluation Framework for the Automated Testing of Media Players. In MMSys (2017), ACM, pp. 217–220.
-  Zeng, X., Bagrodia, R., and Gerla, M. GloMoSim: a library for parallel simulation of large-scale wireless networks. In Workshop on Parallel and Distributed Simulation (1998), IEEE, pp. 154–161.