A continuous stream of designs, of computer-based services and of the distributed systems on which they run,is expected in our knowledge-based society. We use daily many distributed ecosystems  whose designs appeared only a relatively short while ago, e.g., of GAFAM and BAT , and expect new designs that will lead to considerable economic growth and productivity [3, 4, 5]. As Figure 1 indicates, design is a common keyword in top scientific and industry venues, including ICDCS. Yet, as we show in this work, we should not take design for granted, and we should not consider that the current approaches will continue to deliver good results. Design problems keep getting more difficult to formulate, and their solutions more difficult to find and reason about. Existing design processes, from merely relying on intuition to classic [6, 7, 8] to emerging [9, 10], have significant shortcomings for designing distributed ecosystems [11, 1]. Instead, to address this grand challenge of the distributed systems community, we propose a vision toward establishing new theoretical and practical means to produce pragmatic and innovative designs.
[colback=orange!15, colframe=orange!90!black,enhanced,drop fuzzy shadow,sharp corners] Definition: “Design is the intentional solution of a problem, by the creation of plans for a new sort of thing, where the plans would not be immediately seen, by a reasonable person, as an inadequate solution.” [12, Loc.345]. Pragmatic design can be implemented, and evidence shows it can run in production-like settings. Innovative design “consists in novel solutions” [13, Loc.2353].
We are interested in a particular kind of design, for massivizing computer systems (MCS) , that is, for production-ready distributed systems and ecosystems. As in our previous work, we see distributed ecosystems as composites of interconnected (distributed) systems and, recursively, ecosystems. Ecosystems fulfill functional requirements (FRs), such as responding to service-queries, or batch-processing big data and computation, and non-functional requirements (NFRs), such as predictable high performance and availability. They do so subject to Service Level Agreements (SLAs), and in doing so they experience dynamics, such as provisioning and releasing resources from an external cloud, and give rise to various phenomena that are difficult to foresee at design time, such as performance variability.
[colback=BgYellow!15, colframe=BgYellow!90!black,enhanced,drop fuzzy shadow,sharp corners] Vision: We envision a world of distributed ecosystems, based on pragmatic and innovative MCS designs, created by diverse designers using design philosophy, processes, patterns, and tools, together with scientists, engineers, and the society itself.
We see design as a major challenge for the field of MCS, and raise about it two key questions. How to find good designs and even good problems? The ever-increasing complexity of the field—contrast the relatively simple design of the earlier distributed system BitTorrent with the current ecosystems at Google, which can require the orchestration of hundreds of services and systems to produce meaningful results [14, 15]— makes it unlikely that good designs can be achieved from mere sparks of intuition of lonely designers, without good process and collaboration. Not only solving, but also finding problems is increasingly more difficult, and, for ecosystems, finding who should solve them; in contrast, in the 1960s, the core systems problems were well-known, and a small architectural team could direct the large team working on the IBM system 360 family .
How to design the processes and create the bodies of knowledge that increase the likelihood of good MCS design? It is challenging to select the design elements elements that could lead to a high likelihood of good MCS designs , from the hundreds of design patterns [17, 18, 10] and practical steps [19, 20, 21], and from the many development processes such as rational and agile . Students and even practitioners have rarely studied these systematically, which compounds the problem. But even if the designer would have the experience and knowledge to select, these design elements make many unreasonable assumptions about how designers actually work [11, Ch.3], disregard modern design theory  [12, Ch.1-2], and focus not on MCS but on engineering software services , software [17, 23], and hardware [8, 24].
Our vision aims to place design as a core research topic in distributed systems and ecosystems. We do not merely aim to provide a set of design patterns, which is a staple of software  and of service  design but not necessarily the key to design success in distributed systems or even in architecture [26, Loc.572]111The approach based on design patterns in architecture , which has inspired generations of software engineers , was quickly dismissed by the architecture community, including by its author, as too limiting. We also want to steer away from heavyweight design processes, which stifle good design [11, p.233] . We aim to provide a framework for design, from understanding how to think about design in this field to finding and solving MCS design-problems, from design of distributed ecosystems to design supporting experiments of and publications about them, with a five-fold contribution:
We are the first to explicitly posit that design is a key area of research in distributed systems, and especially in MCS (in Section 2). As support, we offer qualitative and quantitative evidence.
We propose the AtLarge framework for design (Section 3). The framework starts from the central premise that design has a fundamentally different nature from science and engineering, which has not been formulated for the distributed systems field. It includes novel elements, focusing on MCS, about design thinking, problem-finding, and reporting, and for problem-solving it leverages the basic design cycle we have previously developed .
We identify 10 current challenges raised by MCS design (in Section 5). The challenges are grouped into the same four main categories as the core principles—central premise, systems, peopleware , and methodological—and give a broad scope of what the field could address in the next 5 years. Although, in doing so, the community and our own work will supersede the framework elements presented here, we envision the general structure of the framework will be long-lasting.
2 Why Focus on MCS Design?
We argue in this section for the timely and important need to focus on MCS design. Not only is (good) design needed (Section 2.1), but we identify an increasing need for good design (Section 2.2) and designers (Section 2.3). We also analyze what good design needs to address, that is, complex challenges from system design (Section 2.4) and from MCS design (Section 2.5).
2.1 Without (Good) Design
Similarly to how Brooks dismissed the idea that organizations can cope with increasing technical debt just by adding more person-months, in this work we want to dismiss the idea that organizations can cope with increasing system complexity (to parallel Brooks, design debt) just by hoping good design will simply emerge.
The consequences of not having good designs are well-known, but difficult to quantify. Lackluster design costs money, causes systems to under-perform and sometimes to fail, and delays the arrival of needed systems in the market. Organizations prioritizing working systems over good design effectively defer the moment when they will have to actually solve the problem. In many cases, careful monitoring and capable engineering teams (e.g., sysadmins or site reliability engineers) can help resolve the problems, and in particular avoid unscheduled downtime222Despite recently publishing books on best-practices for distributed systems design  and on site reliability engineering [14, 15], since the books were published both the Microsoft and the Google clouds have suffered unscheduled downtime and its related bad publicity., poor performance, and the resulting bad reputation. However, monitoring only reveals what is measurable and measured , leaving organizations exposed to wicked problems (defined in Section 2.4) and complex ecosystems (Section 2.5).
If lackluster design is costly, bad design can be catastrophic. Design by committee  is known to cause entire projects to fail [11, Ch.4], yet many organizations still rely on design by committee done by a central team for technology architecture. A particularly bad case of design by committee is when the entire community ignores the needs of the market and society; fiery arguments in this sense appeared in the databases and grid computing communities, around the start of the 2010s.
2.2 The Increasing Need for Good Design
Design articles are increasingly present in major distributed systems venues (Figure 2). Complementing the findings related to Figure 1, we ask Is the presence of design articles in top distributed systems venues increasing?
We have extracted all design articles appearing in such venues over a period of nearly four decades (from 1980 to 2018), and counted them per venue and per 5-year block. Figure 2 depicts the count of design articles in selected systems venues, over contiguous 5-year periods starting with 1980. Some of the venues have started earlier, so for them only censured data is available. The last period depicted in the figure, starting in 2015, is incomplete. Many of the venues, including ICDCS, have experienced an increasing accumulation of design articles, with a marked increase in design articles accepted for publication since 2000.
2.3 The Increasing Need for Good Designers
We also anticipate an increasing need for good designers. We identify two main possible sources for good designers: professionals in the field and students about to become such professionals. We analyze here their capabilities, and conclude there is much room for improvement.
Some professionals produce good designs, but still many do not (Figure 3). We analyze for a top conference in large-scale distributed systems all the review-results in one year333We anonymize the venue, but consider it relevant because its held year is after 2014, the venue is a conference, and its ranking is A in CORE18 and green in MSAR14. For comparison, ICDCS has these rankings too.. For this conference, for each article we have collected whether it is a design article, the final status as accepted for publication in the conference or rejected, and, across the (3+) reviewers, the final scores for (i) the overall quality of the work (the merit), (ii) the quality of the approach (quality), and (iii) the fit with the topic of the conference (topic). Figure 3 depicts the final scores, using distributional (violin) plots. For merit, we find that (1) design articles have a slightly better distributional shape over non-design articles, with higher (better) median, mean, and IQR, and more of the distribution around an overall score of 2 or higher. Across merit and quality, we also find that: (2) a significant percentage of the design articles are not of high quality or high merit (scores significantly below 3). Finding (2) is surprising, because top-tier venues imply self-selection against submitting what the authors themselves consider insufficiently good work; few should merely try out submitting an article to, e.g., ICDCS. This indicates that many professionals still have trouble in both producing and self-assessing their designs.
Graduate students also need training in design thinking and design skills (Figure 4). We analyze here the results obtained from a class of nearly 100 students enrolled in a graduate-level Distributed Systems course444We anonymize the university, but consider the course relevant because it is large, it took place after 2014, and the university is ranked in the top-150 (in computer science) in both the THE and the QS 2018 World University Rankings (out of nearly 1,000 universities), and in Webometrics of July 18 (out of over 28,000).; the course seems popular, as the typical class size is around 15 students. We teach in this course not only typical systems concepts from the field, but also concepts and a process for (MCS) design based on the AtLarge design framework (see Section 3). Throughout seven design sessions, students in groups of up to six are tasked to create several designs addressing given problems. Figure 4 depicts an early design, attempting to satisfice [31, p.27] the problem of scalable ecosystems for massivizing an online game . The figure represents, to a degree, the common submitted design (across all groups) in the same session—what students know after a Bachelors and some graduate courses, but before learning specifically about design. The figure raises many questions about the quality and even the meaning of the proposed design. Even though it is a simplified and high-level design, it still lacks a believable description for solving (even part of) the problem. For instance, an important missing detail are the interconnections, in the geo-distributed datacenter and between stakeholders. This design also lacks any layering, system packaging, or description of the (sub)components. The visual depiction designed by the students is also lacking.
2.4 New Thoughts on Traditional System Design
System design has gone through successive waves of (shifting) traditional challenges. The 1950s and 1960s system designers were operating in a world where the core problems seemed structured, and the core design approach could be entirely rational, aiming to optimize the result [11, Part I]. Well-structured problems have several important characteristics : (1) a criterion to automatically evaluate the result, (2) an unambiguous representation for the goal, and start and intermediate states of the problem, and legal transitions between them, (3) a clear representation of all domain knowledge, (4) if interfacing with the natural world, the interaction system-nature can be captured accurately, (5) the problem itself is tractable. By the 1970s, it has become apparent that core problems could further be ill-structured , that is, not have one or several of the characteristics of well-structured problems, or, worse, wicked problems , that is, without clear and final formulation, with no universally accepted criteria for success and clearly defined states due to involvement of various stakeholders with competing interests and views, and of various types of hardware and software with various degree of autonomy and limited ability to sense their surroundings.
To address ill-defined and wicked problems, the design community has shifted to satisficing instead of optimizing designs, and to a process of co-evolving problem-designs [11, Loc.935]. A cycle of continuous reaction and adaptation triggers the co-evolution: clients change workloads and SLAs, or laws and standards change; in response, system designers evolve, adapt, and decommission parts of the ecosystem; this triggers another round in the cycle. Co-evolving problem-desings are typical in systems design [11, 1] and pose very significant challenges, in particular because the end-goal is unknowable. For example, Google’s datacenter networking evolved significantly over a decade , as did Google’s Spanner for over 5 years .
2.5 New Challenges in Mcs Design
We identify three major trends and related challenges in distributed systems and ecosystems:
(C1) New ecosystem life-cycles: Whereas in the past many systems were developed and hosted in-house, over the past decade organizations have increasingly shifted operations to (public) cloud computing [3, 5], and thus bought into distributed ecosystems. Consequently, systems and workloads have become much more fragmented than in the past, requiring new approaches for (automatic) decomposition and orchestration. This leads to unexplored design directions in distributed systems, e.g., a strong drive to making them as flexible and composable as possible. This further raises many new challenges, e.g., the fundamental challenges of MCS [1, S2.2] are about the lack of: (1) operational laws and theories for ecosystems, (2) comprehensive means to maintain existing ecosystems, (3) means to explore credible future ecosystem designs, (4) qualified personnel, (5) adequate inter-disciplinary tools to assess and control the (unwanted) impact of ecosystems on society.
(C2) New ecosystem needs and phenomena: New design aspects appear when designing entire ecosystems or systems operating in ecosystems. In MCS systems have many new NFRs, including various forms of elasticity , privacy, interoperability, and operational risk associated with them. Ecosystems are super-distributed : they are recursively distributed, with their constituents often being distributed (eco)systems; yet, FRs and NFRs in distributed systems are not known to be directly composable across ecosystems. Various dynamic phenomena appear in distributed ecosystems, seemingly unique situations that do not fit the patterns expected from current theory and practice; for example, vicissitude  is a class of phenomena where several known bottlenecks appear seemingly at random in various parts of the system, performance variability is common in clouds , datacenter networks , and big data operations , and ecosystem owners spar with each other (e.g., in Jan 2019, Apple denied Facebook and Google access to its APIs, Unity changed their Terms-of-Service and thus locked out small developers like SpatialOS).
(C3) New ecosystems, old parts: The evolution of distributed systems technology has generated many useful parts that are commonly used in today’s ecosystems, from simple mechanisms (e.g., caching, scheduling), to protocols (e.g., for multi-site data transfer) and policies (e.g., for autoscaling), to relatively simple systems (e.g., BitTorrent for file-sharing), to commonly used architectures (e.g., for web applications, for big data processing). A large amount of legacy applications, using various generations of technology, still operate. Yet, this legacy technology and applications were not designed for the new ecosystems, for example, they are not cloud-native. Fully replacing them could be prohibitively expensive in the short-term, which means MCS designers must innovate to keep them operational, efficiently.
3 The AtLarge Design Framework for MCS
In this section, we summarize the current theories about design as an activity, then focus on the AtLarge design framework. We give its central premise, explain the focus and main concerns, and focus on its key methods for design-space exploration, problem-finding, problem-solving, and reporting to the community and society. Overall, the key contribution of this framework is that it combines current theories about design thinking (Section 3.1) with MCS-focused design processes (Sections 3.2–3.6).
3.1 Designerly Ways of Thinking
Design, from engineering component to independence555Computer and software engineering have traversed a similar process until emancipation in the mid-1960s [42, Part III], when detaching from mathematics. Interestingly, mathematics had to follow a similar process, to detach from philosophy; an important part of this process Hilbert’s program .: Ever since the introduction of the concept of “designerly ways of thinking”, in the 1990s [26, p.68, concept by Cross], and possibly also earlier, the modern design community has held as a theoretical principle that design is based on specific, idiosyncratic ways of thinking. In 2017, Dorst described a theoretical model for reasoning [22, p.13] that includes design thinking, in which the reasoning universe consists of specific concepts (e.g., real people, software objects), which represent the “What?” of the problem to solve; of relationships between the concepts (e.g., laws of nature, principles of hardware operation, software patterns), which represent the “How?”; and of an outcome that combines the concepts and the relationships (e.g., into a real-world system, into an observable phenomenon).
Figure 5 depicts the Dorst reasoning model. In this model, deduction
proceeds from given concepts and relationships, and reasons toward an outcome that can be observed (and, thus, testing the deduction); for example, given a Turing machine and a deterministic algorithm designed for it (and its input), we can deduce its outcome.Induction follows another classical model from science. Abduction for problem solving (normal abduction in Dorst’s model) matches well the software engineering experience—given the architecture of a software system, determine the best software-design patterns, and the other software engineering concepts and objects, to realize the system that would act as predicted at design-time. Unreasoning, which we add to the Dorst model, simply states an extreme of reasoning where any concept, relationship, and outcome can be put together, for example, by an organization for which facts do not matter (one of “alternative facts”).
Design abduction: In contrast to the other reasoning approaches in Figure 5, design abduction begins with a desirable outcome, and the problem becomes one of finding the concepts and their relationships that lead to the outcome. Of course, an intractable or even infinite number of possible concepts and relationships can exist to consider, which is what makes the design problem rarely amenable to normal abduction (and normal engineering). This does not mean that design abduction must be purely creative, without process.
|Who?||Stakeholders||designers, scientists, engineers,|
|What?||Central Paradigm||design, different|
|from science and engineering|
|Focus||ecosystems, systems within|
|structure, organization, dynamics|
|Concerns||functional and non-functional|
|properties; phenomena, evolution|
|How?||Design Thinking||abductive thinking, processes,|
|Exploration||design space, process to explore|
|Problem-finding||structured, ill-defined, wicked|
|Problem-solving||pragmatic, innovative, ethical|
|Reporting||articles, software, data|
We give an overview of the AtLarge design framework and summarize its key properties in Table I: Who? What? How? are the questions addressed in this section.
Who? Stakeholders: The primary stakeholder of MCS design is the society; this is because designs in this field can have an unusually large impact, for a direct product of computing. The AtLarge design framework considers explicitly that designers fulfill a separate role from scientists and engineers, and, consequently, that students require explicit training in design.
What? The Central Premise: design is unique among intellectual activities. Like Cross , Dorst , and Parsons , the AtLarge framework considers design an unique intellectual activity, essentially different from science and engineering. This does not mean that scientists and engineers cannot design—theory and practice indicate all people can and do design naturally [12, Loc.275, theory by Victor Papanek]—, but doing so proficiently and efficiently still requires professional expertise, much like engineering and science.
What? The Main Focus and Concerns: support for MCS design. This requires focusing on both the traditional challenges raised by system designs (see Section 2.4) and the new challenges raised by MCS (see Section 2.5). Two traditional problems of design are to identify the design space and to explore it efficiently; how to do so for MCS designs is an open challenge. Among the MCS-specific aspects, the AtLarge design framework considers explicitly, for every problem: the architecture of ecosystems and of systems operating in ecosystems; the structure, organization, and dynamics of ecosystems; functional and non-functional properties and their expression as implicit (that is, designer-given) or explicit (that is, client-given) SLAs; and known aspects of ecosystem phenomena, emergence, evolution.
How? Designerly Thinking: Derived from its central premise, the AtLarge design framework considers designerly thinking as an essential ability of its practitioners. Among its core elements, this ability includes understanding, conducting, and managing design as co-evolving problem-solutions. Additional reasoning and practical skills related to science and engineering are also welcome.
How? Key Processes: Although in practice design is still largely an unstructured process, and attempts to impose a rigid structure cause negative reactions  and even opposition in software engineering practice666The agile manifesto, https://agilemanifesto.org/, the AtLarge design framework holds that there still is room for (flexible) process for design. Key to good design, the framework proposes not rigid steps, but a small number of flexible methods and processes for: design space exploration (in Section 3.3), problem-finding (in Section 3.4), a basic cycle for problem-solving (in Section 3.5), and for making the results available beyond the design team (in Section 3.6).
3.3 Free to Co-Evolving Design Exploration
A general, flexible approach to design space exploration for MCS: Figure 6 depicts several processes for design exploration. Following the Dorst design framework, the design abduction could be conducted freely, as pure exploration: the designer considers concepts and relationships at will, guided by own intuition and shared community expertise. Although this approach can result in radically new designs, its likelihood of success is limited by the scale of the design space. In contrast, the AtLarge design process considers three other, more structured approaches for design space exploration. All three consider that there is a process for finding good problems, for example, the process described in Section 3.4. The Fix the What and Fix the How processes explore the same trade-off: they aim to improve the likelihood of obtaining satisficing designs by diminishing the likelihood that the design will be radically innovative. They both do this by limiting the options available to explore. The former does this by fixing the concepts at play and in particular the technology the designer can use; the latter, by fixing the kinds of relationships available to the user (“(re-)framing” in traditional design [22, p.14]).
The third process, co-evolving, focuses on iterating designs by changing the problem itself, and further allows using any of the other exploration processes for solving the problem in the current iteration. The staple of this process is the co-evolving problem-solution, with which it can explore a potentially unlimited design space while having a satisficing solution available at each iteration (after the first iteration).
How does co-evolving design space exploration work, in practice? Figure 7 depicts an abstract but realistic example of co-evolving design. The Design Team (DT) is trying to create a pragmatic, innovative design. DT starts with a problem (Problem 1 in Figure 7 (a)). DT creates a design for it, which satisfices or even optimizes the problem (Solution 1). It is not too sophisticated, so DT agrees they could do better. DT tries to do better, and fail (Failure 1). DT learns from it, and produce a new design (Solution 2). Iterating through their design cycle, DT keeps traversing the design space, exploring several dimensions concurrently, and find after much struggle (and failure) a couple more solutions. However, at this point DT concludes it is too difficult and/or costly to keep exploring. DT has learned enough in the process of design, and possibly with help from their community and clients, and area ready to evolve the problem (Problem 2 in Figure 7 (b)); for example, DT could focus the design on a new ecosystem, replacing the old ecosystem that proved to be too limited for DT to solve the problem. (This does not mean the old ecosystem is not good for other design teams or for other design problems.) It turns out that, for this new problem, DT can find many new solutions relatively easily. The process is successful, and promises more success for the future.
3.4 Problem-Finding Process for Mcs Design
Approach: It would be presumptuous to claim there exists a process for finding all the problems MCS designers can solve. Instead, inspired by how conferences in the field use Calls for Papers to steer the authors, the AtLarge design framework aims to focus the designer by proposing a set of problem archetypes (topics). The community could help expand and refine this set in the future. This approach seems highly successful in focusing designers—Figure 3 (right) indicates the designs submitted for evaluation match closely the topics proposed by the conference’s community, as proposed by the Program Chairs. Although none of the concepts used in the framework is new, synthesizing these aspects into a catalog, as we do in this work, is novel for the field and seems valuable (see Section 6).
What kinds of problems? Derived from Section 2.5, the AtLarge design framework proposes to focus on: (P1) problems in ecosystem life-cycle, including for new and emerging processes and services, and for new and emerging ecosystems; (P2) problems related to new and emerging needs of ecosystem-clients and -operators, addressing newly discovered, emerging, and recurring phenomena, and harnessing new technology (a special kind of phenomenon); (P3) problems related to leveraging and maintaining legacy components. Besides problems that lead to creating new technology, (P4) inspired by natural sciences, where understanding the morphology of natural ecosystems is valued, problems related to understanding how new and emerging technology actually works in practice or when placed in ecosystems, and what new phenomena appear related to ecosystem-operation; (P5) inspired by mathematics, where creating new abstractions can be important regardless of application, problems related to previously unexplored parts of the design space.
How to identify meaningful problems? Also here, the AtLarge design process tries to select from known approaches to identify problems. For addressing problems of types (P1)–(P3), the designer could try to collect and adapt problems from various sources: (S1) (peer-reviewed) qualitative and quantitative studies conducted on ecosystems and on systems within them; (S2) discussion with experts, own analysis of best-practices including reading of technical reports, tech blogs, and best-practice books; (S3) own thought and lab experiments concerning the key technology trends, known technical and other limitations, etc.
For P4, the designer could follow a process matching (empirical) science, but focusing on systems, leveraging the scientific process as finder of phenomena to be harnessed. This could include understanding how systems work through collection and analysis of data archives, where the data represents workloads (e.g., structure of jobs, job life-cycle events such as arrivals, migrations, and cancellations) and operations (e.g., utilization of specific components, (un)availability events). Here, an important set of problems relate to collecting meaningful data: the construction of the observation or measurement instrument, the design of a meaningful data-collection protocol, etc. Currently, these problems seem largely ignored in our field, leading to a dearth of meaningful data for experiments and, possibly, for discovering real problems.
3.5 Problem-Solving Process for Mcs Design
Approach: Similar to problem-finding, problem-solving is too diverse to capture in any single process; moreover, stage-based processes can raise resistance from practitioners as too constraining . The AtLarge design framework aims to balance the pragmatic need to have a process with clear stages, which allows teams to synchronize about and during the process of design, and the need for innovation that is based on the flexibility to not stifle creativity. To this end, the framework includes an iterative process focusing on creative tasks, which in particular allows its practitioners to skip any step at each iteration. Unlike typical processes in the field, which focus either on hardware design [8, 24] or on software design [17, 23], or on higher-level processes on keeping the team agile , the AtLarge problem-solving process focuses first on system-level concerns. Pragmatically, this means it considers first the concepts, components, and challenges specific to MCS.
To manage the complexity of the problem of designing distributed systems and ecosystems, the AtLarge problem-solving process includes two core elements: (1) a Basic Design Cycle (BDC), which is a general process for solving problems, and (2) an Overall Process that combines several BDCs into a structure for decomposing and solving MCS design-problems. We have detailed our problem-solving process elsewhere , and only summarize it here.
The BDC is the core loop: The BDC process aims to solve any generic design problem through a structured process consisting of the following elements: (1) Formulate requirements, (2) Understand alternatives, (3) Bootstrap the creative process, (4) High-level and low-level design, (5) Implementation of mathematical analysis code, of simulators, of prototypes, etc. (6) Conceptual analysis of the design, (7) Experimental analysis of the design, (8) Result summarizing and dissemination. This approach is by design: it matches many classic design processes, and is recognizable to designers and engineers in the distributed systems field, yet each element includes key innovations [16, Table 1].
The Overall Process (OP) is executed iteratively. It operates as an BDC and, hierarchically, its more complex design stages can also operate as BDCs. This design of the OP allows the designers to partition into manageable parts the inherently complex process of solving the problems typical of MCS design, e.g., formulating requirements, creating believable designs777That the result is believable is the core of the epistemological problem of design [12, Loc.972]. It is even more so for MCS-designs, because that such designs are unlikely to be analyzed experimentally to the full extent of their intended application; in other words, many designs will at best be shown as believable, through narrow laboratory experiments.. The hierarchical nature of the OP further facilitates learning the process by practitioners: once a practitioner has learned the BDC, they can apply it several times in the OP.
The OP has one more important feature: in each iteration, each of its stages can be skipped as needed. By not forcing the designer to traverse unnecessary elements, the OP allows each iteration to be tailored to the remaining parts of the problem to be solved, and to the remaining time and other resources. We conjecture this can lead to designerly thinking (see Section 3.1).
The OP elements: Figure 8 depicts the OP. Given a design problem, its BDC spans elements 1–8, with various groupings allowing for finer-grained iteration. In the overall BDC process, elements (5) and (7), which can include various types of prototype implementation and of experimentation, respectively, can be complex. When this complexity occurs, the designers need to expand them each into one BDC. Similarly, Element (8), on reporting, engineering, and public dissemination, can further expand into separate BDC processes for publishing articles, free open-access software (FOSS), and FAIR  or free open-access data (FOAD); we explain this element in Section 3.6.
Stopping criteria: As any iterative process, BDC stops when meeting a predefined set of: (1) finding a single answer that satisfices [31, p.27], that is, gives solutions that are “good enough”, or, where possible, optimizes; (2) finding a few answers, forming a portfolio to allow a human reviewer (e.g., a client) to quickly select one; (3) finding many answers, forming a systematic design, that allows an expert reviewer or system to select one; (4) finding all answers, resulting in design space exhaustion and allowing experts across the community to discuss or select results; (5) running out of time or other resources (e.g., funding).
BDC can, but does not guarantee success:
Because it admits the stopping criterion 5, the BDC does not guarantee a result. In our experience so far, following the OP process has a good probability of success, making pragmatic and innovative designs likely within the time- and resource-budget. We present experimental evidence for this in Section6.
3.6 Dissemination Processes for Mcs Design
The AtLarge design framework also considers various forms of dissemination typical for MCS, related to reports, software, and data. Each of these means of dissemination is based on some form of design; for example, designing the reports to be published as peer-reviewed articles. Thus, for each, the framework proposes design-based processes; in essence, smaller versions of the framework itself, and in particular of the BDC (see Section 3.5).
The reason for this is similar to the reason to use more structured design processes for MCS: it should increase the likelihood of good designs. Although the dissemination of reports, software, and data can be achieved through much intuition, expertise, and by following best-practices in the respective fields (e.g., collaborative editing using a tool such as Overleaf; collaborative FOSS development using CI/CD tools such as Travis CI and customized solutions ; and sharing code and data on archives such as GitHub and Zenodo, respectively), in practice many of these designs are poor (see Section 2.3).
4 Design Principles of MCS
|Type (Section)||Index||Key aspects|
|Highest (4.1)||P4.1||design of design|
|Systems||P4.2||age of distributed ecosystems|
|Peopleware||P4.3||education in design|
|(4.3)||P4.3||pragmatic, innovative, ethical|
|Methodology||P4.4||design science, practice, culture|
|(4.4)||P4.4||evolution and emergence|
We introduce in this section a set of core principles for MCS design. Table II summarizes the principles.
4.1 Highest Principle
[colback=BgBlue!15, colframe=BgBlue!90!black,enhanced,drop fuzzy shadow,sharp corners, boxsep=2pt,left=4.5pt,right=4.5pt,top=2pt,bottom=2pt] P1: Design needs design. We have argued for this principle in Section 2. The highest design principle holds that MCS design must be designed, not left only to intuition and selective experience.
4.2 Systems Principles
[colback=BgBlue!15, colframe=BgBlue!90!black,enhanced,drop fuzzy shadow,sharp corners, boxsep=2pt,left=4.5pt,right=4.5pt,top=2pt,bottom=2pt] P2: This is the Age of Distributed Ecosystems. As stated in Section 2.5, the evolution of distributed systems into ecosystems led to important new problems and solutions. This principle argues for an approach to design where the designer is constantly aware of this fact.
[colback=BgBlue!15, colframe=BgBlue!90!black,enhanced,drop fuzzy shadow,sharp corners, boxsep=2pt,left=4.5pt,right=4.5pt,top=2pt,bottom=2pt] P3: Dynamic non-functional properties and phenomena are first-class concerns. [colback=BgBlue!15, colframe=BgBlue!90!black,enhanced,drop fuzzy shadow,sharp corners, boxsep=2pt,left=4.5pt,right=4.5pt,top=2pt,bottom=2pt] P4: Resource Management and Scheduling, and its interplay with various sources of information to achieve local and global Self-Awareness, are key concerns. Principles P4.2 and P4.2 are consequences of MCS problems always including dynamic and emergent elements. Good MCS designs must consider complex SLAs, emergent phenomena, information-rich decision-making, etc.
4.3 Peopleware Principles
Popular distributed ecosystems service hundreds of millions daily. It is not uncommon for a typical service to call into execution hundreds of hidden systems. This combination of high complexity and responsibility puts pressure on the human resources—the peopleware.
Inspired by the software industry’s struggle to manage and develop its human resources, we explicitly set principles about peopleware.
[colback=BgBlue!15, colframe=BgBlue!90!black,enhanced,drop fuzzy shadow,sharp corners, boxsep=2pt,left=4.5pt,right=4.5pt,top=2pt,bottom=2pt] P5: Education practices for MCS must ensure the competence and integrity needed for experimenting, creating, and operating ecosystems. Because the complexity and responsibility of the job has increased considerably over the past couple of decades, high-quality design education should become a core principle of MCS. With proper training, the community will remain able to produce designs significantly better than early, student-like attempts (see Figure 4), and avoid a culture of hacking that does not work long-term. Education on the ethics of design is also a must, if the community is to avoid even the most basic traps, such as engendering bias and disregarding privacy.
[colback=BgBlue!15, colframe=BgBlue!90!black,enhanced,drop fuzzy shadow,sharp corners, boxsep=2pt,left=4.5pt,right=4.5pt,top=2pt,bottom=2pt] P6: Design communities can foster and curate pragmatic, innovative, and ethical design practices. The community is already structured to foster and curate designs (see Section 2.2). This principle extends this structure to include shared tools and environments for developing and evolving designs: shared datasets and benchmarks, testing infrastructure available to many, common repositories of and documents about operational patterns, online virtual laboratories for global coursework and training, etc. These are elements that greatly facilitate design, and make it pragmatic by linking academia and industry. The community is also best-equipped to understand and explain the ethics of the field, and further to handle ethical risks.
4.4 Methodological Principles
[colback=BgBlue!15, colframe=BgBlue!90!black,enhanced,drop fuzzy shadow,sharp corners, boxsep=2pt,left=4.5pt,right=4.5pt,top=2pt,bottom=2pt] P7: We understand and create together a science, practice, and culture of MCS design. So far, design has not been treated as a scientific subject in the field of distributed systems. However, design should become such a subject, because it meets the requirements explained by Denning [49, p.32]: (i) MCS design is a pervasive phenomenon, which we try to understand, use, and control; (ii) both artificial and natural processes are at play (designs lead to real-world artifacts); (iii) we aim to gain meaningful and non-trivial understanding of the phenomenon; (iv) we aim to make our findings reproducible, so that good designs become more likely, consequence of falsifiable theories and models; etc.
[colback=BgBlue!15, colframe=BgBlue!90!black,enhanced,drop fuzzy shadow,sharp corners, boxsep=2pt,left=4.5pt,right=4.5pt,top=2pt,bottom=2pt] P8: We are aware of the history and evolution of MCS designs, key debates, and evolving patterns. Unlike other exact results in distributed systems, design is prescriptive, and often discursive. This makes it subject to debate and interdisciplinary expertise. To improve design, we need to make use also of the key instruments of empirical research, including exploring the history of the field, surveying the expert view, understanding the key debates and their ongoing resolution (as Tedre does for the whole field of computer science ), etc.
5 Ten Challenges for MCS Design
Many challenges must be overcome before the principles in Section 4 can give us a solid basis for design. Known challenges begin with making the highest principle, of MCS design being based on a design rather than on intuition, a reality. Challenges appear also related to systems, peopleware, and methodological aspects. We give in the following a non-exhaustive list of ten challenges for MCS design.
|Type (Sec.)||Index||Key aspects||Pr.|
|Highest||C5.1||Design of design||P4.1|
|Principle||C5.1||What is good design?||P4.1|
|(5.1)||C5.1||Design space exploration||P4.1|
|Systems||C5.2||Design for ecosystems||P4.2|
|(5.2)||C5.2||Catalog for MCS design||P4.2–4.2|
|(5.4)||C5.4||Design in practice||P4.4|
5.1 Challenges Related to the Highest Principle
[colback=BgGreen!15, colframe=BgGreen!90!black,enhanced,drop fuzzy shadow,sharp corners,boxsep=2pt,left=4.5pt,right=4.5pt,top=2pt,bottom=2pt] C1: The design of design. Creating processes that enable and facilitate pragmatic and innovative MCS designs. The diversity of already existing design processes (see also Section 7) can come as a surprise to the MCS designer, and even to the best of system designers [11, Part I]. Yet, the challenge of designing the MCS design remains open.
First, as we explain in Section 7, much exploration, combination, and innovation is still possible. The framework we propose in this work has been tested only by one research group, albeit large and long-lasting; new designs of (MCS) design could prove vastly superior.
Second, as the following challenges indicate, we have not yet understood the full extent of the problem raised by MCS design. We envision new aspects will become relevant, leading to a co-evolving problem-solution.
[colback=BgGreen!15, colframe=BgGreen!90!black,enhanced,drop fuzzy shadow,sharp corners,boxsep=2pt,left=4.5pt,right=4.5pt,top=2pt,bottom=2pt] C2: Understand what is good design. Currently, the community relies largely on human experts to assess and curate designs. (In contrast, in hardware design, design space exploration has been largely automated.)
What is good design? We pose as an open challenge understanding (automatically) what is good design. This is not easy.
First, top venues use criteria such as “degree of innovation” and “quality of the approach”, but their discrete formulation may ask reviewers to overfit their assessment to a quantitative estimation. Consequently, as exemplified in Figure3, many scores cluster around the middle of the given range, leading to difficulties in separating the better designs from their near-equivalents. What alternative approach could be used?
Second, reviewers often also introduce in their assessment other criteria that have never been analyzed thoroughly. For example, simple designs are valued, which seems reasonable because simple designs foster system maintainability; but the evidence simplicity is the right trade-off between the quality of the approach and maintainability, or even a common understanding of what makes a system simple, are lacking. Other criteria, such as balance of the approach or another (semi-)aesthetic aspect (e.g., “elegant design”), have also not been studied. This contrasts with the nature of real-world ecosystems, which are messy by nature, and which combine various designs created by different organizational cultures.
How to assess good design? An already existing, albeit incomplete and largely subjective body of work facilitates starting our work on this challenge.
First, Altshuller discusses five large levels of design (“Levels of Creativity” [50, Part 1-2]), evaluating them from the perspective of long-term, overall impact as: (1) trivial design, that is, using an existing design and minimally adapting it for local situations; (2) normal design, that is, selecting one of several designs, and adaptating the selected design after careful reasoning; (3) novel design, that is, entailing significant adaptation of an existing design; (4) fundamental design, that is, development of a new design or important feature, or the complete adaptation of an existing design (e.g., big data, serverless computing); and (5) outstanding design: a completely new ecosystem leading to significant scientific/technical advance (e.g., Maxwell’s electricity laws and first use in practice, the Internet, the cloud). Alternatives to this set of levels exist, but roughly follow the same structure, e.g., the rating systems for top conferences roughly consider levels (1) through (4) in Altshuller’s taxonomy.
Second, Altshuller also discusses four levels of design, from the perspective of performance against alternatives, vs. random design, naïve design, current practice, and ideal or optimal alternatives. Other frameworks exist, but these levels are typically considered by reviewers when assessing the technical quality of the experimental setup.
Third, especially the academic community has proposed some quantitative measures for quantifying the creativity and effectiveness of designs in fields with rather narrow design spaces [51, 52]. How these or related metrics could be put in practice for MCS design, and what metrics remain to be invented, is unknown.
[colback=BgGreen!15, colframe=BgGreen!90!black,enhanced,drop fuzzy shadow,sharp corners,boxsep=2pt,left=4.5pt,right=4.5pt,top=2pt,bottom=2pt] C3: Simulation-based approaches and experimentation for design space exploration. Calibration and reproducibility are key. In maze-solving, it is known that finding an exit is much harder when the alternatives are numerous than for a straight path. Yet, in our field, the complexity and the number of alternatives considered and eliminated before the design has emerged, or more broadly the characteristics of the design space, are rarely discussed in our articles or by their reviewers. How to characterize the broad and diverse design spaces available in MCS design?
5.2 Systems Challenges
[colback=BgGreen!15, colframe=BgGreen!90!black,enhanced,drop fuzzy shadow,sharp corners,boxsep=2pt,left=4.5pt,right=4.5pt,top=2pt,bottom=2pt] C4: Design for MCS, not for individual systems. We see as the grand challenge of MCS design to understand how the resulting design will fit in an entire ecosystems. Typical questions include: How to enable and how to future-proof the design of systems that need to interoperate, especially dynamically, at runtime? For example, how to enable cross-cloud operation, service delegation and federated composition, and geo-distributed data use? Is this even achievable with high likelihood of success, when ecosystems combine organically designs from different organizations and business units, and thus suffer the consequences of Melvin Conway’s empirical law  that designs “by committee” are likely to fail?
Current approaches already reveal patterns in the core topics pursued by the community. These include : (i) adaptation and self-awareness in ecosystems, (ii) ecosystem navigation: find and solve common problems of comparison, selection, composition, replacement, adaptation, and operation; (iii) discovering the new world: creating designs responding to new modes of use; (iv) the challenge to support non-functional requirements (see P4.2); (v) the ecosystem-scheduling challenge: design scheduling approaches to be flexible enough to represent MCS needs, diversity, and heterogeneity, and solve both the provisioning and the allocation problems.
Addressing this challenge could also start from understanding the workload and relative importance of individual components in current ecosystems. This could give quantitative evidence that some components are naturally more important than others, and thus focus the community efforts. One of the likely steps in this sense is to observe pragmatically which part of the current ecosystem is taking much engineering time, and re-design that part into “-as-a-Service”.
[colback=BgGreen!15, colframe=BgGreen!90!black,enhanced,drop fuzzy shadow,sharp corners,boxsep=2pt,left=4.5pt,right=4.5pt,top=2pt,bottom=2pt] C5: Establish a catalog of components for MCS design. Such a catalog would consist of design principles, known architectural and operational patterns, etc. Useful catalogs are a known approach for settled fields [53, 54], but how to build a useful catalog for MCS designs?
5.3 Peopleware Challenges
[colback=BgGreen!15, colframe=BgGreen!90!black,enhanced,drop fuzzy shadow,sharp corners,boxsep=2pt,left=4.5pt,right=4.5pt,top=2pt,bottom=2pt] C6: Create a teachable common body of knowledge for MCS designs, focusing on pragmatism, innovation, and ethics. Design effective teaching practices for this curriculum. How to teach design for MCS? From traditional courses on distributed systems, we know that students face many daunting technologies, such as clouds, clusters, and peer-to-peer, and should learn about many conceptual advances in very diverse topics, such as the functional aspects of consistency, synchronization, commit and agreement, etc., and the non-functional aspects of performance, dependability, security, etc. More recently, we have started to understand that students trained with the traditional approach lack an understanding of how the concepts have evolved [55, 56, 42]. We propose that the body of knowledge for MCS design should include the history and evolution of designs, key debates in design (e.g., end-to-end vs. local properties), evolving architectures and patterns, etc.
To avoid another “Image Crisis”, we also need to include in our curricula elements of ethics . However, one could argue that ethics courses have been added already into current curricula. We see these courses more as shoved than added, and posit that ethics courses for this field should be presented from the perspective of experts in the MCS field, and not as abstract concepts presented from the remote lens of general philosophy.
The Distributed Systems Memex: In the 1940s, Vannevar Bush defined the concept of the personal memex as a person’s device for storing and accessing all information and communication involving that person . Bush identifies many benefits for archiving large amounts of personal data into the memex, including learning about and eradicating diseases, enabling more creative and thought-related time by eliminating tasks that can be automated, etc. (Bush does not spend much time on the drawbacks, which eventually led to the privacy-related regulations, e.g., GDPR.) Similarly, we have posited  that archiving large amounts of operational traces collected from the distributed systems that currently underpin our society can be highly beneficial for MCS design. Even the design of such a the Distributed Systems Memex is non-trivial, and may teach us much about the key operational principles of distributed systems and ecosystems. What should the Distributed Systems Memex include? What data? Which types of distributed systems? How can such a Memex be designed? What instruments could it use and how could it be implemented overall?
We see now an additional aspect of the Distributed Systems Memex: the preservation of original designs and of their origins. We are losing valuable heritage by not preserving the artifacts of design, the decisions that lead to them, and the thoughts and discussions that led to these designs; capturing these later may not be possible much later, as the generation that produced them in the 1980s, 1990s, and 2000s will start to retire.
[colback=BgGreen!15, colframe=BgGreen!90!black,enhanced,drop fuzzy shadow,sharp corners,boxsep=2pt,left=4.5pt,right=4.5pt,top=2pt,bottom=2pt]
C7: Create communities and environments for people to engage with the design and operation of ecosystems, to demonstrate and explain operations.
As demonstrated by the success of Facebook over its competitors, communities must be supported by proper interactive tools. Besides the tools for social networking, we envision for this challenge:
(1) support for showing and explaining the operation of ecosystems, and of their constituent systems, to all stakeholders, continuously;
(2) tools to demonstrate and explain the impact of various design principles used in the design of distributed ecosystems and of their systems, to a diverse community;
(3) tools to explore the impact of various design trade-offs used in MCS design, aiming to support a diverse community;
5.4 Methodological Challenges
[colback=BgGreen!15, colframe=BgGreen!90!black,enhanced,drop fuzzy shadow,sharp corners,boxsep=2pt,left=4.5pt,right=4.5pt,top=2pt,bottom=2pt] C8: () Design a formalism for documenting designs. How to trace the evolution of designs? An open process for design requires more than its final results and artifacts to be made public. Whatever the reason, many design-decisions happen behind closed doors and are never revealed. Designs also incorporate intangibles, such as the experience leading the designer to take specific design decisions. These factors make the provenance of design choices difficult to track. Compounding the problem, documenting the provenance requires formalisms and description languages that should not hamper the creative process, and in particular should not punish extensive and creative designs.
How to validate the views on the evolution of designs? In general, we have very limited examples documenting the evolution of designs. The most comprehensive in this sense are the early and heroic efforts by former employees of DEC and IBM, who tried to capture how the field of hardware computer systems has evolved [54, 8]. The two resulting books capture tens of designs, each in its own way—Bell et al. through a formalism, Blaauw and Brooks through a historical evolutionary graph. Our own recent work on serverless computing  uses the latter approach to capture the technology leading to serverless computing over multiple decades; however, the community has still not started to debate these structured histories, and the possibly alternative views captured by them.
[colback=BgGreen!15, colframe=BgGreen!90!black,enhanced,drop fuzzy shadow,sharp corners,boxsep=2pt,left=4.5pt,right=4.5pt,top=2pt,bottom=2pt] C9: Understand MCS design in practice. How and when do MCS practitioners design what they design? We know normal abduction is commonly used in engineering [13, Ch.5] [11, Part I], especially coupled with complex implementation and realization processes . However, the extent and approach of using design abduction in practice are not currently known. The “When?” is also important; for example, design used to be a static process done at the start of projects, but in some dynamic organizations it is now part of the weekly sprints and helps the DevOps teams respond quickly. In consultancy, design may encounter strict time constraints, and also need to address unusual requirements, as follows.
Design for datacenter operators: Many cloud operators use design to ecosystems that are highly reliable but still flexible. Whereas requirements solicitation traditionally would take place only at the beginning of a project, such processes are now an integral part of cyclic design. New feature requests from customers and findings from operations drive cloud operators to revisit earlier designs and change them to accommodate new technology, reliability improvements, new features, etc. Design in cloud operations helps find solutions for NFRs, to solve challenges of reliability, maintainability, and security. Design used to be a static process at the start of projects, but it is now likely part of weekly sprints, and thus helps the DevOps teams respond quickly to a fast changing IT landscape.
Design for consultancy is complex, as for many projects there is a rigid time constraint coupled with unusual requirements. Every design needs to be custom-tailored to the needs of each customer. The design process must cover the full spectrum of IT transformation: from choosing the right hardware to delivering working software. Unlike other businesses, a consultancy also has to design the organizational change required for a client to maintain and operate the finished product (peopleware: types of human resources, required skills, knowledge transfer, etc.). Attracting customers involves working at the cutting edge of technology; however, many customers have legacy systems that need to integrate with the new technology (see also Section 2.5). Likely, the design process at a consultancy begins with a well-formulated initial design that covers the requirements from all stakeholders, with short iterations after discussions with the client. The focus is on automation and on customer self-service, to best support requirements such as availability, reliability, maintainability, usability, and security. To support integration with various legacy systems and with new ecosystems, solutions must be flexible and API-driven. Infrastructure designs cover the entire spectrum from private cloud to hybrid and public cloud solutions.
[colback=BgGreen!15, colframe=BgGreen!90!black,enhanced,drop fuzzy shadow,sharp corners,boxsep=2pt,left=4.5pt,right=4.5pt,top=2pt,bottom=2pt] C10: Organizational similarity in MCS design. Given how MCS designs are likely to span multiple designers and thus also organizational cultures, it is surprising to see how little detail appears in the published articles about the environment in which the designs were produced. It would be interesting to look for evidence of organizational similarity across the designs originating in largely similar organizations. Conversely, it could be valuable to consider the designs originating under different organizational cultures.
6 Experiments with the Design Framework
We have used the AtLarge design framework as our main approach to design for the past decade. Effectively, we have designed its various processes, conducted experiments with them, and refined them as we uncovered the many problems a research group faces in creating pragmatic and innovative MCS designs. (Our designs are also ethical, both in our view, and as assessed by our reviewers, institutions, and funding agencies.)
We summarize in Table IV our use of the AtLarge design framework for over a decade, for a broad range of MCS designs:
Co-evolving understanding, and protocol and system design for Peer-to-Peer systems (Section V);
Design for online gaming ecosystems (Section VI), as an example of designing in rapidly changing ecosystems operating under strict NFRs;
Design for datacenter ecosystems (Section 6.3), as an example of evolving understanding of how the field’s reference architecture is emerging;
Design for serverless ecosystems (Section 6.4), as an example of how the AtLarge design process fosters collaboration between diverse teams;
The design of the Graphalytics ecosystem (Section 6.5), as an example of DevOps support;
Design for portfolio scheduling (Section 6.6), as an example of co-evolving a detailed system-level design;
The design of experiments in autoscaling (Section 6.7), as an example of designing both real-world and simulation-based experiments.
Overall, we conclude the AtLarge design framework passes the following criteria for success:
It allow us to co-evolve problems and their solutions, even for problems with very successful solutions, or for very challenging problems with no or few solutions;
It help us identify “hot” problems, and make scientific discoveries with impact on the community;
It enables us to create pragmatic and innovative designs, as assessed by our own team and by the expert reviewers;
It keeps our design activity fit to receive competitive funding from academic and industrial funding organizations, and interesting and motivating to attract a diverse group of young researchers eager to challenge the new problems;
It results in publications accepted by high-quality venues, which we see as proxies of high-quality designs and results, and foster other useful results ancillary to good design practices (e.g., publishing high-quality software and data artifacts).
We now address each of the seven types of MCS design activities from Table IV, in turn.
6.1 The Design of P2P Systems
We present in this section an overview of our design work in Peer-to-Peer (P2P) computing. The approach of co-evolving problem-solution led us to new insights into the operation of P2P systems and to innovative new features and systems, as summarized by Table V. It has also allowed us to participate in an exciting moment of high-paced evolution in understanding and designing distributed systems.
P2P computing is a paradigm under which participating entities in a distributed system (the peers) can use direct, two-way (peer-to-peer) communication to perform and/or receive some service. The ability to communicate directly allows peers with the desire to provide and/or use a specific service (similar interest) to group (swarm). Because peers can both perform and receive service, peer-to-peer systems promise to use all the available resources, be available as long as even one peer survives, and scale up with the resources volunteered by peers even during high-intensity periods (flashcrowds). Thus, P2P systems promise very desirable properties, such as high performance, availability, and efficiency, and also cost-effectiveness (as a near-zero-cost technology).
We trace the origins of our work in P2P to our collaboration with a team, Pouwelse et al., finalizing their 8-month investigation of the BitTorrent file-sharing network . At the time, there existed only a handful of measurement studies and only a few theoretical treatments of BitTorrent (BT); each had significant shortcomings. The study led by Pouwelse et al. was indeed seminal, providing longitudinal data about the global SuprNove BT-ecosystem, and uncovering for example a real-world flashcrowd and debunking theoretical assumptions such as Poisson arrivals.
The Pouwelse et al. study started a community frenzy, a race of a type common in natural sciences, to complete the first or the largest study of previously unknown phenomena. We joined this BT-related frenzy around October 2004, which was the time the Pouwelse et al. were wrapping up their study. Our team was afforded generous, unfettered access to the original data collected by Pouwelse et al., resulting in a series of deepening studies, complemented by new measurements and by broader engagement of the community to create a Peer-to-Peer Trace Archive .
Our studies have uncovered several ecosystem-level phenomena, such as: (1) the 2005 analytics study , resulting in the discovery of the presence of aliased media, which is the presence of very similar media content in a variety of formats in the global BT-ecosystem, and the first characterization of aliased media operation; (2) the 2005 longitudinal study of the global PirateBay BT-ecosystem correlated with Internet-level measurements , which has uncovered that the bandwidth capacity of BT-users has shifted to a large imbalance between upload and download (due to widespread adoption of ADSL technologies) and, in the race, has remained the largest and most comprehensive BT study until 2009; (3) the 2010 deep study of the global BT-ecosystem , collected nearly 1 billion samples across hundreds of trackers and over 10,000,000 BT-swarms, and revealed the existence of giant swarms of hundreds of thousands of concurrent users, of spam trackers inserted by unidentified entities to presumably mislead and track BT-users, and in general of a robust global BT-ecosystem; (4) the 2011 study of BT-flashcrowds , for which we developed a method to identify flashcrowds, the first comprehensive model of BT-flashcrowds, and showed evidence of important negative phenomena that occur only during flashcrowds. These studies also led us to (5) meta-analysis , that is, to study the systematic bias introduced by the measurement instruments, and to catalog and characterize various sources of bias. Our BT-studies also led us to create (6) the Peer-to-Peer Trace Archive for sharing data publicly, as FOAD (see Section 3.5). Last, and as a warning to young researchers, (7) these studies revealed to us the importance of good design for reporting: because our reporting skills have not yet refined, and because we lacked a structured process to compensate, our work prior to 2010 got rejected repeatedly when submitted to major systems and networking conferences. (Writing seems to be an important reason, because our discoveries are on-par with the phenomena described in the accepted articles of the period, and are based on at least as deep and as comprehensive factual evidence.)
Each of these studies required the development of new systems for measurement and analysis, including MultiProbe  and BTWorld , which are both global-scale monitors for BT-ecosystems, the former also focusing on collecting Internet-tracing data (not possible anymore under GDPR laws), the latter focusing on efficient collection of aggregate-data. In 2014, while trying to analyze the full BTWorld-dataset, we developed a novel big data analytics pipeline ; the process allowed us to discover the phenomenon of vicissitude  (see Section 2.5).
The phenomenon of upload-download bandwidth asymmetry in BT-ecosystems led us to design 2fast , a BT-compatible protocol for collaborative downloads where the incentive to share does not need immediate repay and thus can lead to efficient use of asymmetric bandwidth. In particular, we showed 2fast serves not only a social function, but also can improve significantly the performance of BT-based file-sharing. Between 2005 and 2010, 2fast was one of the three main pillars of the first socially aware P2P system, Tribler , and is thus partially responsible for the nearly 500,000 downloads recorded by Tribler in that period.
6.2 Design for MMOG Ecosystems
We present in this section an overview of our design work in Massive Multiplayer Online Games (MMOG), which is a popular and lucrative application domain of MCS. Similarly to our design work for P2P systems (see Section 6.1), for MMOG we used the approach of co-evolving problem-solution for over a decade, understanding and improving the operation of MMOG ecosystems, as summarized in Table VI. But MMOGs are not yet another type of distributed system, for two main, systems-related reasons: they raise uniquely challenging NFRs, and they have evolved significantly over the past decade. Thus, this section gives evidence that the AtLarge design framework can deliver good designs in a challenging and rapidly evolving field.
MMOGs operate as very large888Around 2010, the popular MMOG World of Warcraft operated on a global distributed ecosystem of over 10 datacenters, with a total scale rivaling that of the computing grid supporting the Large Hedron Collider, one of the largest scientific experiments in the world. and diverse ecosystems [85, 6.3], raising some of the strictest NFRs in distributed systems. They require continuous consistency, high-frequency updates, low performance variability, etc., while supporting scales of possibly millions of concurrent users connected to each other over the (performance-varying) Internet, and not losing the ability of a game operator to keep oversight of the game. (This is unlike many P2P systems, where completing the service correctly seems much more important than meeting NFRs.)
Our MMOG work coincided with a moment of great interest and diversification in gaming, experienced by both the systems community and the society at large. We know now that the typical MMOG ecosystem combines four broad functions [85, 6.3]: (1) the operation of the virtual world (V-World), which is the focus of classical systems work in online gaming but also raises numerous new challenges ; (2) gaming analytics, which combines the collection and analysis of big gaming data, and the creation of actionable systems- and business-level decisions; (3) procedural game-content generation (PGCG), which combines computational and game-design challenges; and (4) meta-gaming, which raises the challenge of operating a social network for gamers to share experiences, screenshots, and videos, and to discuss game tournaments and other issues. However, at the start of our work in MMOG, the overall Function (1) meant largely MMO Role-Playing Games (MMORPGs), and only later did large-scale First-Person Shooter (FPS) and Real-Time Strategy (RTS) games become MMOs, and did multiplayer online battle arena (MOBA) and online social (OS) games appear; Function (2) was in much use inside the major game operators, but there was relatively little use of big data technology and there were few other organizations doing gaming analytics; Function (3) was uncommon; and Function (4) was barely known, as the social networking market was still emerging and very fragmented.
Because of the vast design space, our design work for MMOGs can only be characterized as exploratory. We started with gaining a deep understanding of how these applications operate in practice, uncovering the short- and long-term dynamics of popular MMORPGs . We did this by tracing, from around 2005 to 2008, the operations of multiple MMORPGs, and in particular of one that became one of the most popular MMORPGs, Runescape. This work led us to design techniques for resource management and scheduling for cloud- and datacenter-based MMOG operations [71, 87]; our efforts were followed by independent designs going in the same direction . We also combined in our design both technology and business considerations 999Combining technology and non-technology considerations, such as business and creative, makes the resulting work better fit to solve problems, but cases considerable problems to reviewers. For example, our work combining the technology and business of MMOG was rejected repeatedly until finding a community willing to consider such diverse aspects .. Having understood from our work that MMORPGs based on clouds could scale elastically, almost “by credit-card”, we turned our attention to how MMOGs operated outside of their main Function (1), with innovative designs for PGCG, for which we invented the first distributed and parallel system to generate fresh and diverse content at scale 101010POGGI won a distinguished paper award from Euro-Par, in 2009. and for cloud-based analytics, for which we combined NoSQL and cloud technology to design one of the first systems for gaming analytics at scale 111111Leading MMOG companies, such as Blizzard, started to discuss similar approaches publicly around 2016, nearly 7 years after our CAMEO publication..
Our early work with MMORPGs made us ask the important question of whether we could scale to MMORPG-like scales the existing RTS games, which have significantly more challenging NFRs (e.g., lower latency and stricter consistency). We started with trying to gain a deeper understanding of why existing RTS games fail to scale, and designed the first benchmark for this purpose, RTSenv . By applying RTSenv to one of the few RTS games avaiable as FOSS, we discovered a new form of scalability, unique to MMOGs, that combines systems and game-design concepts . The consequence of this discovery is that simply scaling RTS games by scaling their technology but without taking into account the interactive details121212Although the principles of computer-human interaction are well-understood, for the practice of MMOG design they give guidelines rather than quantitative, actionable information. In this, they play a similar role to how the laws of physics act on hardware design. of how they are used (e.g., where units are located, how many actionable items appear on the same screen). This made us change the focus of our work on scalability, from traditional problems, to scaling based on how (MMORTS) games are used. We understood we had to address not only the limitations revealed in the lab by RTSenv, but also new problems derived from how players actually interact with their games. To this end, we found out that the gaming community was already collecting data about real-world use, but for training purposes—professional, semi-professional, and amateur players learned from the best-performers by replaying their game sessions—, in an early example of the complexity of meta-gaming operations. By analyzing the game replays, we found out that RTS games, unlike MMORPGs, (i) have multiple points of interest, (ii) require careful management of up to tens of entities in some of the points, and (iii) require more casual management of up to hundreds of entities in the others; this resulted in the design of the Area of Simulation MMOG-technique and system , and, for cloud-based operation, of the Mirror system that can offload computation .
The discovery of MMORPG-related phenomena made us curious about exploring the MMOG universe more broadly, about uncovering the properties of more, if not all major types of MMOGs. (The parallel we can draw from the more conventional work on parallel and distributed systems is with the U.C.-Berkeley “views of” series [89, 90].) Over the next few years, we uncovered the short- and long-term dynamics of MOBA  and OS  games, and new, deeper phenomena occurring in emerging and more established game genres.
Among the deeper phenomena we have discovered are the implicit social-network forming in various kinds of game genres  and in meta-gaming . Their importance remains underestimated, and our work in design joins a small but emerging body of work focusing both on positive applications such as matchmaking [74, 91] and best-practice sharing , and on preventing negative situations such as online bullying and toxicity .
Because online gaming technology raises challenging NFRs, in our experience prototypes in this domain take much longer to develop than, for example, P2P designs. For example, implementing the Area of Simulation system  took over 6 months. To keep such software development projects under control, it was vital to turn to using good software design processes. We learned an important lesson. Prototypes are essential in testing NFRs, because current software design practices do not lead to sufficient guarantees of, e.g., low latency. Unfortunately, in our experience, reviewers of designs with strict NFRs do not have a good understanding of the challenges posed by prototyping, and tend to dismiss such prototypes as not important for the act of design especially for emerging application domains. We challenge this culture of reviewing designs through this work.
We learned another lesson through our design work. One of the key contributions a team can make to the field is, in our view, sharing workload and operational traces in a FAIR and/or FOAD archive, such as the Game Trace Archive . The design of such an archive is always pragmatic, but in terms of innovation it can only achieve as much as required by the type of data to be shared in the archive. We challenge that reviewers should understand better the standards of innovation in creating data archives, and judge innovation here also by the novelty of the actual data shared through the archives.
6.3 Datacenters: Designing the Digital Factory
We present in this section the evolution of a reference architecture for the ecosystems operating in the datacenter. A reference architecture facilitates the design of systems, stacks, and platforms, allowing the designer to start from an overview of how the entire ecosystem works. Figure 9 depicts both our initial design focusing on the big data ecosystem, and the revised and extended design for the entire datacenter ecosystem.
Datacenters hold a crucial place at the heart of the Digital Economy, producing efficient, dependable services. They allow clients to run diverse workloads, including data processing pipelines, scientific simulations, and online gaming, all with the promise to achieve efficiency and near-optimal resource utilization. In addition to this challenge, we can identify another one that addresses the diversity of infrastructures. Datacenters appear in different scales and designs, from multi-cluster deployments like Amazon EC2 and Microsoft Azure to cloud-edge  micro-datacenters used for video trans-coding and streaming . This raises numerous scientific, design, and engineering challenges [1, 6.1].
Our initial design of a reference architecture for datacenter-based ecosystems started in 2011, with a drawing of big data ecosystems created jointly by the community in a high-profile Dagstuhl Seminar. For nearly 5 years, we have refined that drawing and added to it our own understanding of the topic. Figure 9 (top) depicts the resulting reference architecture for big data. The four layers, High-Level Language, Programming Model, Execution Engine, and Storage Engine, are conceptual, but applications that run in this ecosystem typically use components across the full stack of layers (and more, as indicated by the in the figure). The highlighted components cover the minimum set of layers necessary for execution for the MapReduce ecosystem; the presence of several high-level languages indicates that the ecosystem has diverse users, with minimal expertise and ability in managing the ecosystem beyond the high-level language they know. This reference architecture was useful to our research, design, and engineering: with it as a guide, we have created the Fawkes elastic MapReduce system .
However useful, our original reference architecture has important limitations. How to include in it portals, Software-as-a-Service, and other application-level approaches where the users of the ecosystem barely need to know of its existence to conduct their work? How to include in it in-memory distributed file systems, and other software-based data management systems that span the memory, network, and storage boundaries? How to include in it the various DevOps tools?
To address these questions, during the course of 2016 we have significantly revised the reference architecture and extended it to cover the entire datacenter ecosystem. The new architecture131313We gave the first public talks on our new reference architecture in November 2016, at ICT with Industry 2016 and, the same day, to a plenary session of the Lorentz Center Highlights, http://www.lorentzcenter.nl/LCHighlights/abstracts.php?abstract=Iosup. includes the layers in the original reference architecture, plus a variety of other layers and a new, broader structure. In this reference architecture, there are five core layers, (5) Front-end for the application-level functionality, (4) Back-end for task, resource, and service management on behalf of the application, (3) Resources for task, resource, and service management on behalf of the cloud operator, (2) Operations Service for basic services that are typically associated with (distributed) operating systems, and (1) Infrastructure for managing physical and virtual resources. An orthogonal layer, (6) DevOps
, covers functions essential to operating the datacenter but orthogonal to the service provided to customers, such as monitoring, logging, and benchmarking. The sub-layering in Layers 4 and 5 helps classify the many emerging systems with finer granularity, and highlights the intense specialization that is currently emerging in this part of the ecosystem. Since 2016, we have mapped to the new reference architecture a large number of well-known industry ecosystems (e.g., Google, Facebook, Uber, Netflix, the broad collection of Apache projects). Our experience suggests the reference architecture does encompass these industry ecosystems.
We emphasize the difference between the two reference architectures through the example of a big data ecosystem based on MapReduce. As Figure 9 shows, the core ecosystem maps well to both our reference architectures. High-Level languages like Pig and Hive are based on the Map-Reduce programming model. Execution and Runtime management are left to Hadoop and HDFS, that distribute and execute Map-Reduce jobs. At a lower level, general-purpose resource allocation and scheduling in the datacenter is performed by Yarn or Mesos. Specific operations like the maintenance of configuration information for the upper layers can be performed by Zookeeper. This representation does not include the whole complexity of an industry datacenter stack, where there can be hundreds of additional components. Moreover, the old architecture, depicted in Figure 9 (top), does not capture in-memory file systems such as MemEFS  and Pocket , high-performance network and storage engines such as Crail  and FlashNet , DevOps tools such as Graphalytics  and Granula , etc.
6.4 Serverless: New Designs for FaaS
We present in this section an overview of our design work in serverless computing, which emphasizes two aspects related to using the AtLarge design framework in practice. First, approaching the rapidly evolving model of serverless computing as a co-evolving problem-solution enabled us to quickly gain insight into this domain. Second, the AtLarge research team has joined for this work with a variety of designers from academia and industry: a distributed team with background in performance engineering from the SPEC RG Cloud Group, another distributed team formed with serverless company Platform9, and, for control, by a team from Stanford and IBM Research Zurich working on serverless independently. The latter tests the ability of the AtLarge design framework to help designers with different intellectual backgrounds and design approaches work together. Table VII summarizes our key contributions in this emerging field.
Serverless computing is part of a trend toward applications composed of many small, self-contained, and automatically managed components . Serverless computing is a set of (cloud) computing technologies that adhere to three principles: (1) operational logic is abstracted away from the users; (2) users only pay for the resources they need, with fine granularity; (3) the computing model is event-driven and operations are scaled elastically. Core to serverless computing, Function-as-a-Service (FaaS) allows developers to provide functions, for which the entire operational life-cycle is managed by the cloud provider.
In early 2017, during an investigation into improving orchestration of container-based micro-services, we learned about the early industry efforts in serverless computing. After an informal survey, we realized the benefits of this novel paradigm. Whereas micro-service architectures delegate most of the operations effort to the developer, the serverless model delegates most of the operational life-cycle to the cloud provider. This opens up many avenues for design to leverage this additional insight and control. Following the AtLarge design framework, we shifted our focus from improving micro-service architectures to the new problem of understanding the benefits and drawbacks of serverless computing.
As with any emerging domain led by industry efforts, serverless computing suffered (and, in some respects, still suffers) from a lack of a rigorous and scientific foundation: What is the definition of serverless? What are the characteristics of serverless technologies? What are the challenges and perspectives? How do these new technologies compare with each other and with traditional alternatives? How does it fit into or overlap with existing computing models? To get answers to these questions, we established an international team within the SPEC RG Cloud Group, partnering both industry and academic organizations. In our initial publication, we addressed terminology, challenges, and perspectives — key aspects for designers.
Following this initial high-level investigation, we narrowed the scope to the domain where we could leverage our existing expertise: that of performance, and of resource management and scheduling. We then revisited and expanded the performance challenges introduced in the initial vision paper, identifying promising approaches towards addressing them . To further separate hype from reality and leverage our decades of expertise in distributed systems, in another publication we reflected on the motivations, concepts, technologies, and practices that have led to the emergence of serverless computing . Our main finding was clear: though serverless technologies leverages and overlaps many historical efforts, its emergence could not have happened ten years ago.
Continuing the exploration of serverless computing, we further narrowed our scope to the specific research challenges for which we could best leverage our (technical) expertise. Within the SPEC RG Cloud Group, we focused on the core mission of the group: to perform quantitative system evaluation and analysis in distributed (eco)systems. Towards a representative benchmark of serverless platforms, we spent a year surveying nearly 50 open-source and closed-source serverless(-like) platforms. As a culmination of these efforts, in early 2019 we proposed a FaaS reference architecture and ecosystem that identifies the common processes and components in these seemingly widely varying systems. These processes and components are the focus of any good benchmark design for serverless computing.
Alongside this community effort, the AtLarge team started addressing the technical challenges associated with serverless computing, leveraging our expertise to introduce workflow-based serverless orchestration and serverless big data processing. For example, we have created an informal public-private partnership, working with US-based company Platform9 to develop a production-ready serverless computing platform. Specifically, we have co-created the Fission Workflows141414https://github.com/fission/fission-workflows system, which acts as a workflow execution engine in the hierarchical Kubernetes-Fission ecosystem.
That the AtLarge team is further exploring a (shared) storage architecture for serverless computing. As with any emerging field, the current serverless landscape gives us the opportunity to re-evaluate many of the basic design decisions (and trade-offs) present in current designs. For this, the AtLarge team has been joined by a new high quality designer, who has already developed a recognized body of work in serverless computing with another team spanning Stanford and IBM Research Zurich [96, 104]. Their approach to design is compatible with the AtLarge problem-solving process (Figure 8). First, the team identified the problem, formulated the new requirements for temporary storage for serverless, and analyzed the available trade-offs . Then, they designed a complete system, with both high- and low-level components, and analyzed through detailed experiments the various design decisions and if they met the original objectives .
Our work in serverless computing is just starting. With numerous open challenges in this space, our efforts continue to identify evolving patterns, combine fundamental distributed systems notions in the emerging ecosystem of serverless technologies, and evaluate existing state-of-the-art systems to identify further pragmatic problems.
6.5 Design of the Graphalytics Ecosystem
We present in this section, which appears ad literam in our previous publication [16, 6.1], an overview of the design of the Graphalytics ecosystem, which has emerged through multiple iterations of the AtLarge design process. The approach of co-evolving problem-solution has led to identifying new laws in the operation of graph-processing systems, to the development of an ecosystem of performance instruments and tools, and to meaningful and novel research directions. We discuss each in the following, and summarize the iterations in Table VIII.
Intrigued by a seminal analysis of open challenges in graph processing at scale , we have started planning to conduct our own experiments to understand the real-world problems. We designed experiments focusing on multi-algorithm, multi-dataset analysis of a diverse set of graph-processing platforms, exploring the dependency of performance on the interaction between software-platform, algorithm, and dataset (the PAD triangle). It took us several years to get this curiosity-driven project done .
Finding that the PAD triangle existed (a law!) led us to a new problem, of providing the community with a benchmark that would not only support multiple “P”s, as the leading benchmarks already did at the time , but also multiple “A”s and “D”s. This bootstrapping led to the Graphalytics benchmark, the first tool in the emerging Graphalytics ecosystem of performance instruments and tools. The first solution was Graphalytics 0.1 , which is an engineered version of a subset of the initial PAD study—it takes much implementation effort to convert a scientific prototype into a production-ready software package. With community involvement, and in particular the collaboration of the LDBC community, we have continued this line of design, to Graphalytics 1.0 . Graphalytics allowed us to benchmark, in production and in the lab, over ten graph-processing platforms. This has led us to new problems: How to enable a global competition around the benchmark? How to share performance data? How to enable not only low-depth analysis, which is typical of benchmarks, but also deep results? How to use the deep results to obtain model systems, without (much) effort? Table VIII indicates our incipient answers to these questions. Equally important, the Graphalytics artifacts have become a part of the official LDBC benchmarks, and serve a growing community of industry developers of graph-processing systems.
In parallel with tool-related problems, the Graphalytics ecosystem has spurred two new research directions. First, we have recently shown  that for graph-processing platforms based on modern heterogeneous “H”ardware the entire HPAD performance-space is relevant; the PAD law is applicable only in special situations. Second, understanding the performance impact of various emerging features in graph-processing. We have been some of the first to explore the performance of graph-processing platforms that are (i) GPU-based , (ii) based on parallel and distributed systems combined into a single working system , (iii) elastic .
6.6 Design of Portfolio Schedulers
We present in this section, which appears ad literam in our previous publication [16, 6.2], an experiment in using the AtLarge design process to design datacenter schedulers. We started with a series of comprehensive experiments about the performance of online job schedulers in grid datacenters—for BoT-  and workflow-based  workloads, for the predictive component of proactive schedulers . The main conclusion across all the studies was that no individual technique or policy was consistently better than all others. A new need emerged, to select (change) the policy online, based on the current system state. This led us methods to select one policy among many, and ultimately to introduce portfolio scheduling in datacenters. Table IX captures this research development, which starts from a re-framing of the scheduling problem, triggered by a phenomenon found empirically.
We started with an exploration of the capabilities of portfolio scheduling across synthetic workloads with various computational properties . While conducting this investigation, we found a new problem: the time it took a portfolio scheduler to simulate all the alternatives could grow rapidly, proportionally with the number of policies. Compounding this problem, BoT- and workflow-based workloads are comprised of many more jobs in the same time-span than traditional parallel workloads, a phenomenon not predicted by theory and that we had found around the same time ; this means that simulators would have more to compute than predicted for previous approaches to (dynamic) online scheduling . Thus, the portfolio scheduler could no longer be used to run online. This is an example of how solving an existing problem (of making a scheduler address system dynamics) can lead to a new problem (of making a scheduler fast enough to work online).
We thus turned our attention to (1) problems of real-world online scheduling, for (2) real workloads, here, scientific computing. This has led us to design a new portfolio-scheduling approach , which could select a limited set active set of policies. The key trade-off in this design is keeping the active set large enough to make good decisions, yet small enough to estimate online.
But is portfolio scheduling generally capable? In successive iterations, we have tested portfolio scheduling on a variety of workloads and environments, which Table IX summarizes. Portfolio scheduling seems indeed general, but requires non-trivial adaptation to workload and environment. Independently, others have found portfolio scheduling useful for compute farms at Intel ; this supports our claim that the AtLarge design process can lead to meaningful designs, but also emphasizes that (i) it gives a relatively small research team the ability to compete intellectually with larger R&D teams, and (ii) it led the research team to deeper and broader designs, consequence of the focus on the co-evolving problem-solution.
Are there open problems? Our latest study , on cluster-based big data workloads, indicates portfolio scheduling can make sub-optimal selections when the performance of the policy is difficult to predict. How to alleviate this problem remains open.
6.7 Design of Autoscaling Experiments
We present in this section, which appears ad literam in our previous publication [16, 6.3], an overview of using the AtLarge experiment design for experiments on autoscaling in cloud environments. Autoscaling systems try to provision exactly as many resources as the workload demands, by provisioning on behalf of the user more or fewer resources. An autoscaler is an algorithm used by an autoscaling system to automate elasticity efficiently, subject to one or several common elasticity metrics . Overall, the process allowed us to conduct successful and deep experiments. The various aspects addressed in this section indicate both how complex the experiment design needs to be, to address various aspects of distributed systems and ecosystems, and that the process fosters successful experiment designs.
Key to our work in autoscaling was to understand how autoscalers perform in practice, for the emerging class of workflow-based cloud workloads. When we started this work, there existed already several autoscaling approaches, but none had been evaluated comprehensively. Motivated also by the emergence of a new set of elasticity metrics, we have designed and performed several experiments [126, 127, 128].
For the first set of experiments , we have designed a new morphological structure for autoscaling workflows, based on general and workflow-specific autoscalers. We have further selected in vitro emulation as the evaluation technique, ten elasticity metrics, various environment and workload configurations. For our experiments, we have developed real-world software, implemented real-world policies, and custom experimental tools to run this in the DAS multi-cluster system  set to emulate a cloud. Last, we have designed and conducted experiments, and further designed two ranking methods to aggregate the results into head-to-head comparisons—which policy is the best?
Although the in vitro experiments were useful, they could not address many questions related to diverse workloads and environments, because the former would have been too expensive to experiment with, and for the latter we did not have access to different environments (and of the right scale). We have therefore designed and conducted in silico, simulation-based experiments . We found interesting discrepancies between the real-world software of the initial in vitro experiments and the software of the simulator, which we have developed independently; these discrepancies have allowed us to correct in time the real-world results, and emphasize the need for independent corroboration in the community .
We have then extended this work with more comprehensive analysis of the results . The new analysis exemplifies the depth of stage (9) in the AtLarge approach: we added an analysis of traditional performance metrics next to the analysis of elasticity metrics, an analysis of cost metrics based on several real-world cost models, an analysis of introducing two types of deadline-based SLAs, and an analysis of the presence of performance variability in the behavior of autoscalers. We have also introduced a method to grade autoscalers, by combining their scores judiciously.
7 Related Work
In this section, we compare our and related work.
Overall novelty: The AtLarge design framework combines elements of 2010s design thinking with the specifics of MCS design. The former makes it unique among published design frameworks in distributed systems. For example, hardware design is a well-established field of design, but as noted by Brooks it has not adopted the new ways of design thinking [11, Part I]. The latter makes it unique among design frameworks. For example, works of similar scope address the design of mechanical systems [53, 131], but their physical properties makes them radically different from distributed systems and ecosystems.
Contrast to design in computer systems: We distinguish here two design cultures. Hardware design has focused for over five decades on (instruction set) architecture as function, implementation of the system to solve in particular for cost-performance among NFRs, and realization to engineer the working system [132, 133, 8, 24]. Standardization and increased capabilities of simulation software has made design space exploration largely computer-driven, focusing on the optimization of fixed design-spaces. Post-Moore’s Law, we have seen a wave of innovative hardware designs, including heterogeneous CPUs (e.g., second-generation KNL), GPUs, FPGAs, and various ASICs, but so far the emergence of a new design process has not been reported. Software system design has been much less developed, caught perhaps between software engineering and hardware design. Thus, this area of design has focused mostly on reporting best-practices and rules-of-thumb for addressing NFRs, such as scalability , various other NFRs in cloud-based ecosystems [17, 10]; and pragmatic operational issues [135, 136, 14, 15].
The AtLarge design process is not closely aligned with either of these approaches; following the field-wide critique of Brooks [11, Part I], it focuses on co-evolving problem-solutions, problem-solving and problem-finding, etc. For example, the traditional principles of system design [133, 8, 24] are not the same as the principles we propose for MCS, and the AtLarge approaches to problem-finding and problem-solving are distinctively more systematic than the published best-practices of software system design.
Contrast to design in software engineering: Software engineering has developed and keeps evolving sophisticated design methods . Elements of software design provide various analytical views [45, 137], software-oriented design patterns as problem-solution recipes [25, 138], documentation  and maintenance of code, DevOps from a Dev’s perspective , etc. In contrast to these approaches, the AtLarge design process focuses on systems.
Contrast to design in general: To design our process, we have surveyed design processes and elements from mechanical engineering [53, 131], operations research and management , architecture [27, 141, 142], material and fashion design , graphic design , industrial and facility design , etc. In contrast to these approaches, the AtLarge design process provides different solutions due to the virtual, composite, and idiosyncratic nature of the distributed systems and ecosystems.
Responding to the needs of an increasingly digital and knowledge-based society, in this work we explicitly posit that design is a key area of research for distributed systems and ecosystems (MCS), and propose a vision to establish the theory and practice of MCS design.
We propose the first attempt to understand the problem of MCS design. We give qualitative and quantitative evidence of the extent of the problem, and propose requirements derived from general design processes and from the specific needs of MCS.
We design the AtLarge design framework around the central premise that design is fundamentally different from science and engineering, requiring its own way of thinking and processes. Responding to requirements, the framework combines emerging theories about design thinking with several MCS-focused design processes, e.g., for co-evolving problem-designs, for problem-finding and -solving, and for disseminating the results.
We show how, in our experience, the framework can lead to pragmatic and innovative designs in fields such as P2P systems, datacenter ecosystems, ecosystems for the MMOG application-domain, serverless computing and FaaS cloud computing, DevOps ecosystems for performance analysis, system-level design of a portfolio scheduler for datacenters, and experiment design for analyzing autoscaling, etc.
Our vision also includes a set of core principles and challenges of MCS design, in the four broad categories related to the central premise, systems, peopleware, and method. We have started to address the research agenda formulated in this article. We hope this vision will stimulate a larger community to join us in improving design.
Work supported by the projects Vidi MagnaData and Commit. We thank all our collaborators, in particular, in the SPEC RG Cloud Group, at TUD, at VU and UV Amsterdam, at Platfom9, at Oracle, at Intel Labs, etc.
-  Iosup et al., “Massivizing computer systems: A vision to understand, design, and engineer computer ecosystems through and beyond modern distributed systems,” in ICDCS, 2018.
-  The Economist, “Taming the titans,” Jan 20–26, 2018, p.11–12, https://www.economist.com/printedition/2018-01-20, Jan 2018.
-  European Commission, “Uptake of cloud in europe. digital agenda for europe report,” Digital Agenda for Europe report. Publications Office of the European Union, Luxembourg., Sep 2014.
-  ——, “Big Data and data analytics,” EU Parliament, Sep 2016.
-  Gartner Inc., “Infrastructure and Operations (I&O) Leadership Vision for 2017, section CIO Technology Priorities,” Tech.Rep., 2017.
-  Royce, “Managing the development of large software systems,” in IEEE WESCON, 1970.
-  Boehm, “A spiral model of software development and enhancement,” IEEE Computer, vol. 21, 1988.
-  Blaauw and Brooks, Computer Architecture. Addison-Wesley, 1997.
-  Ramsin and Paige, “Process-centered review of object oriented software development methodologies,” ACM Comput. Surv., vol. 40, 2008.
-  Burns, Designing Distributed. O’Reilly, 2018.
-  F. P. Brooks, The Design of Design. Addison-Wesley / Pearson Education, 2010.
-  Parsons, The Philosophy of Design. Polity, 2015.
-  W. B. Arthur, The Nature of Technology. Free Press, 2009.
-  Beyer et al., Site Reliability Engineering. O´Reilly, 2016.
-  ——, Site Reliability Workbook. O´Reilly, 2018.
-  Iosup et al., “On the design of design: The atlarge design process for distributed systems and ecosystems,” 2019, (submitted).
-  Abbott and Fisher, The Art of Scalability. Addison-Wesley, 2015.
-  Erl et al., Cloud Computing Design Patterns. Prentice Hall, 2015.
-  Lidwell et al., Universal Principles of Design, Revised and Updated. Rockport, 2010.
-  Boeijen et al., Delft Design Guide. BIS Publishers, 2014.
-  Martin and Hanington, Universal Methods of Design. Rockport Publishers, 2012.
-  Dorst, Notes on Design. BIS Publishers, 2017.
-  Bass et al., DevOps. Addison-Wesley, 2015.
-  Hennessy and Patterson, Computer Architecture. Morgan Kaufmann, 2017.
-  Gamma et al., Design Patterns. Addison-Wesley, 1994.
-  B. Lawson, How Designers Think. Taylor and Francis, 2004.
-  Alexander et al., A Pattern Language. Oxford University Press, 1977.
-  T. DeMarco et al., Peopleware. Dorset House, 2012, 1st Ed. 1986.
-  Bouwers, and others, “Getting what you measure,” Commun. ACM, vol. 55, 2012.
-  Conway, “How do committees invent?” Datamation, vol. 14, 1968.
-  H. A. Simon, The Sciences of the Artificial. MIT Pess, 1996.
-  S. Shen et al., “Massivizing Multi-player Online Games on Clouds,” in CCGrid, 2013.
-  Simon, “The structure of ill structured problems,” Artif. Intell., vol. 4, 1973.
-  Rittel and Weber, “Dilemmas in a general theory of planning,” Policy Sciences, vol. 4, 1973.
-  Singh et al., “Jupiter rising: a decade of clos topologies and centralized control in google’s datacenter network,” Commun. ACM, vol. 59, 2016.
-  Corbett et al., “Spanner: Google’s globally distributed database,” ACM Trans. Comput. Syst., vol. 31, 2013.
-  Herbst et al., “Quantifying cloud performance and dependability: Taxonomy, metric design, and emerging challenges,” TOMPECS, vol. 3, 2018.
-  B. Ghit et al., “V for Vicissitude: The Challenge of Scaling Complex Big Data Workflows,” in CCGrid, 2014.
-  Iosup et al., “On the Performance Variability of Production Cloud Services,” in CCGrid, 2011.
-  Ballani et al., “Towards predictable datacenter networks,” in SIGCOMM, 2011.
-  Uta and Obaseki, “A performance study of big data workloads in cloud datacenters with network variability,” in ICPEW, 2018.
-  Tedre, The Science of Computing. CRC Press, 2015.
-  Franks, The Autonomy of Mathematical Knowledge: Hilbert´s Program Revisited. Cambridge University Press, 2009.
-  N. Cross, Design Thinking. Berg, 2011.
-  Rozanski and Woods, Software Systems Architecture, 2005.
-  Zwicky, “Morphological astronomy,” The Observatory, vol. 68, 1948.
-  Wilkinson et al., “The FAIR Guiding Principles for scientific data management and stewardship,” Nature SciData, vol. 3, 2016.
-  Widder et al., “I’m leaving you, travis: a continuous integration breakup story,” in MSR, 2018.
-  P. J. Denning, “The science in computer science,” Commun. ACM, vol. 56, 2013.
-  Altshuller, The Innovation Algorithm. Technical Innovation Center, Inc., 1999.
-  Shah et al., “Metrics for measuring ideation effectiveness,” Design studies, vol. 24, no. 2, pp. 111–134, 2003.
-  Sarkar & Chakrabarti, “Assessing design creativity,” Design studies, vol. 32, no. 4, pp. 348–383, 2011.
-  Pahl et al., Engineering Design. Springer-Verlag, 2007.
-  Bell et al., Computer Engineering. Digital Press, 1978.
-  Gal-Ezer & Harel, “What (else) should CS educators know?” Commun. ACM, vol. 41, no. 9, pp. 77–84, 1998.
-  Tekir, “Reading CS classics,” Commun. ACM, vol. 55, no. 4, pp. 32–34, 2012.
-  Vardi, “Are we having an ethical crisis in computing?” Commun. ACM, vol. 62, no. 1, p. 7, 2019.
-  Bush, “As we may think,” The Atlantic, https://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/, Jul 1945.
-  Iosup, “Towards logging and preserving the entire history of distributed systems,” in Is the Future of Preservation Cloudy? (Dagstuhl Seminar 12472), ser. Dagstuhl Reports, Elmroth et al., Eds., vol. 2, no. 11. Dagstuhl, 2012, pp. 126–127. [Online]. Available: https://doi.org/10.4230/DagRep.2.11.102
-  van Eyk et al., “Serverless is More: From PaaS to Present Cloud Computing,” IEEE Internet Computing, vol. 22, 2018.
-  Iosup et al., “Analyzing BitTorrent: Three lessons from one peer-level view,” in ASCI, 2005.
-  ——, “Correlating topology and path characteristics of overlay networks and the internet,” in CCGRIDW, 2006.
-  Wojciechowski et al., “BTWorld: towards observing the global bittorrent file-sharing network,” in HPDC WS, 2010.
-  E. Zhang, Iosup, “The peer-to-peer trace archive: Design and comparative trace analysis,” Delft University of Technology, Tech. Rep. PDS-2010-003, April 2010, archive available at http://p2pta.ewi.tudelft.nl/ (Retrieved Feb 2019.). [Online]. Available: http://pds.twi.tudelft.nl/reports/2010/PDS-2010-003.pdf
-  Zhang et al., “Sampling bias in bittorrent measurements,” in Euro-Par, 2010.
-  ——, “Identifying, analyzing, and modeling flashcrowds in bittorrent,” in P2P, 2011.
-  Hegeman et al., “The BTWorld use case for big data analytics: Description, MapReduce logical workflow, and empirical evaluation,” in Big Data, 2013.
-  Garbacki et al., “2Fast : Collaborative Downloads in P2P Networks,” in P2P, 2006, pp. 23–30.
-  Pouwelse et al., “TRIBLER: a social-based peer-to-peer system,” CCPE, vol. 20, 2008.
-  ——, “The bittorrent P2P file-sharing system: Measurements and analysis,” in IPTPS, 2005.
-  Nae et al., “Efficient management of data center resources for massively multiplayer online games,” in SC, 2008.
-  Guo et al., “An analysis of online match-based games,” in HAVE, 2012.
-  Olteanu et al., “Towards a workload model for online social applications,” in ICPE, 2013.
-  A. Iosup et al., “Analyzing Implicit Social Networks in Multiplayer Online Games,” IEEE Internet Computing, vol. 18, 2014.
-  Jia et al., “When Game Becomes Life: The Creators and Spectators of Online Game Replays and Live Streaming,” TOMCCAP, vol. 12, 2016.
-  Shen et al., “Rtsenv: An experimental environment for real-time strategy games,” in NetGames, 2011.
-  Märtens et al., “Toxicity detection in multiplayer online games,” in NETGAMES, 2015.
-  Iosup, “POGGI: puzzle-based online games on grid infrastructures,” in Euro-Par, 2009.
-  Iosup et al., “CAMEO: enabling social networks for massively multiplayer online games through continuous analytics and cloud computing,” in NetGames, 2010.
-  Nae et al., “A new business model for massively multiplayer online games,” in ICPE, 2011.
-  Shen et al., “Area of simulation: Mechanism and architecture for multi-avatar virtual environments,” TOMCCAP, vol. 12, no. 1, pp. 8:1–24, 2015.
-  Jiang et al., “A mirroring architecture for sophisticated mobile games using computation-offloading,” CCPE, vol. 30, 2018.
-  Guo and Iosup, “The game trace archive,” in NetGames, 2012.
-  van der Sar et al., “Yardstick: A benchmark for Minecraft-like services,” in Proceedings of the 10th ACM/SPEC on International Conference on Performance Engineering, ICPE 2019, Mumbai, India, April 7-11, 2019, 2019.
-  Iosup et al., “Massivizing computer systems: a vision to understand, design, and engineer computer ecosystems through and beyond modern distributed systems,” CoRR, vol. abs/1802.05465, 2018.
-  ——, “Massivizing online games using cloud computing: A vision,” in ICME WS, 2014.
-  Nae et al., “Dynamic Resource Provisioning in Massively Multiplayer Online Games,” TPDS, vol. 22, 2011.
-  Lee and Chen, “Is server consolidation beneficial to mmorpg? A case study of world of warcraft,” in CLOUD, 2010, pp. 435–442.
-  Asanovic et al., “A view of the parallel computing landscape,” Commun. ACM, vol. 52, no. 10, pp. 56–67, 2009.
-  Armbrust et al., “A view of cloud computing,” Commun. ACM, vol. 53, no. 4, pp. 50–58, 2010.
-  Lu Jia et al., “Socializing by Gaming: Revealing Social Relationships in Multiplayer Online Games,” TKDD, vol. 10, no. 11, 2015.
-  M. Satyanarayanan, “The Emergence of Edge Computing,” IEEE Computer, vol. 50, 2017.
-  G. Ananthanarayanan et al., “Real-time video analytics: The killer app for edge computing,” IEEE Computer, vol. 50, 2017.
-  B. Ghit et al., “Balanced resource allocations across multiple dynamic MapReduce clusters,” in SIGMETRICS, 2014.
-  Uta et al., “Memefs: A network-aware elastic in-memory runtime distributed file system,” Future Generation Comp. Syst., vol. 82, 2018.
-  Klimovic et al., “Pocket: Elastic ephemeral storage for serverless analytics,” in OSDI, 2018.
-  Stuedi et al., “Crail: A high-performance I/O architecture for distributed data processing,” IEEE Data Eng. Bull., vol. 40, 2017.
-  Trivedi et al., “Flashnet: Flash/network stack co-design,” TOS, vol. 14, 2018.
-  A. Iosup et al., “LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed Platforms,” PVLDB, vol. 9, 2016.
-  Ngai et al., “Granula: Toward fine-grained performance analysis of large-scale graph processing platforms,” in SIGMOD GRADES, 2017.
-  E. van Eyk et al., “The spec cloud group’s research vision on faas and serverless architectures,” in WoSC at Middleware 2017, 2017.
-  van Eyk et al., “A SPEC RG cloud group’s vision on the performance challenges of faas cloud architectures,” in ICPEW, 2018.
-  ——, “The SPEC-RG Reference Architecture for FaaS: From Microservices and Containers to Serverless Platforms,” submitted, Jan 2019.
-  Klimovic et al., “Understanding ephemeral storage for serverless analytics,” in USENIX ATC, 2018, pp. 789–794.
-  Y. Guo et al., “How Well Do Graph-Processing Platforms Perform? An Empirical Performance Evaluation and Analysis,” in IPDPS, 2014, pp. 395–404.
-  Uta et al., “Exploring HPC and big data convergence: A graph processing study on intel knights landing,” in CLUSTER, 2018.
-  Capota et al., “Graphalytics: A big data benchmark for graph-processing platforms,” in SIGMOD GRADES, 2015.
-  Hegeman, “Experimental performance analysis of graph analytics frameworks,” Tech. Rep., 2018.
-  Y. Guo et al., “An Empirical Performance Evaluation of GPU-Enabled Graph-Processing Systems,” in CCGrid, 2015, pp. 423–432.
-  Guo et al., “Design and experimental evaluation of distributed heterogeneous graph-processing systems,” in CCGrid, 2016.
-  Uta et al., “Elasticity in graph analytics? a benchmarking framework for elastic graph processing,” in CLUSTER, 2018.
-  Lumsdaine et al., “Challenges in parallel graph processing,” Parallel Processing Letters, vol. 17, no. 1, 2007.
-  Bonifati et al., “A survey of benchmarks for graph-processing systems,” in Graph Data Management, Fundamental Issues and Recent Developments., 2018.
-  Deng et al., “A periodic portfolio scheduler for scientific computing in the data center,” in JSSPP, 2013.
-  ——, “Exploring portfolio scheduling for long-term execution of scientific workloads in iaas clouds,” in SC, 2013.
-  Shen et al., “Scheduling Jobs in the Cloud Using On-Demand and Reserved Instances,” in Euro-Par, 2013, pp. 242–254.
Shai et al.
, “Heuristics for resource matching in intel’s compute farm,” inJSSPP, 2013.
-  V. van Beek et al., “Self-Expressive Management of Business-Critical Workloads in Virtualized Datacenters,” IEEE Computer, vol. 48, pp. 46–54, 2015.
-  Ma et al., “Ananke: A q-learning-based portfolio scheduler for complex industrial workflows,” in ICAC, 2017.
-  Voinea et al., “POSUM: A portfolio scheduler for MapReduce workloads,” in BigData, 2018.
-  Iosup et al., “The performance of bags-of-tasks in large-scale distributed systems,” in HPDC, 2008.
-  Sonmez et al., “Performance analysis of dynamic workflow scheduling in multicluster grids,” in HPDC, 2010.
-  ——, “Trace-based evaluation of job runtime and queue wait time predictions in grids,” in HPDC, 2009.
-  A. Iosup et al., “Grid Computing Workloads,” IEEE Internet Computing, vol. 15, pp. 19–26, 2011.
-  Feitelson and Naaman, “Self-tuning systems,” IEEE Software, vol. 16, no. 2, 1999.
-  Ilyushkin et al., “An Experimental Performance Evaluation of Autoscaling Policies for Complex Workflows,” in ICPE, 2017, pp. 75–86.
-  A. Ilyushkin et al., “An Experimental Performance Evaluation of Autoscalers for Complex Workflows,” in TOMPECS (Best Paper Nominations from ICPE’17, revised and extended versions), 2018.
-  Versluis et al., “A trace-based performance study of autoscaling workloads of workflows in datacenters,” in CCGRID, 2018.
-  H. E. Bal et al., “A Medium-Scale Distributed System for Computer Science Research: Infrastructure for the Long Term,” IEEE Computer, vol. 49, pp. 54–63, 2016.
-  D. Feitelson, “Experimental computer science: The need for a cultural change,” Technical Report, http://www.cs.huji.ac.il/~feit/exp/, Dec 2006.
-  Ullman, The Mechanical Design Process. David Ullman LLC, 2017.
-  Buchholz, Planning a Computer System - Project Stretch. McGraw-Hill, 1962.
-  Blaauw, “Computer architecture,” Elektron Rechneranal., vol. 14, 1973.
-  Abbott and Fisher, Scalability Rules. Addison-Wesley, 2011.
-  Nygard, Release It! Pragmatic Bookshelf, 2007.
-  Keeling, Design It! Pragmatic Bookshelf, 2017.
-  Bass et al., Software Architecture in Practice. Addison-Wesley, 2012.
-  Alexandrescu et al., Modern C++ Design. Addison-Wesley, 2001.
-  Clements et al., Documenting Software Architectures. Addison-Wesley, 2010.
-  L. Bass, Super-Flexibility for Knowledge Enterprises. Addison-Wesley, 2015.
-  Rowe, Design Thinking. MIT Press, 1991.
-  Rybczynski, How Architecture Works. Farrar, Straus and Giroux, 2013.
-  Aspelund, The Design Process. Fairchild Books, 2014.
-  Meggs and Purvis, Meggs’ History of Graphic Design. Wiley, 2016.
-  Freeman, Behemoth. W. W. Norton & Company, 2018.