ANCHOR: logically-centralized security for Software-Defined Networks

11/09/2017 ∙ by Diego Kreutz, et al. ∙ University of Lisbon 0

While the logical centralization of functional properties of the network in Software-Defined Networking (SDN) brought advantages such as a faster pace of innovation, it also disrupted some of the natural defenses of traditional architectures against different threats. The literature on SDN has mostly been concerned with the functional side, despite some specific works concerning non-functional properties like 'security' or 'dependability'. Though addressing the latter in an ad-hoc, piecemeal way, may work, it will most likely lead to efficiency and e effectiveness problems. We claim that the enforcement of non-functional properties as a pillar of SDN robustness calls for a systemic approach. We further advocate, for its materialization, the re-iteration of the successful formula behind SDN - 'logical centralization'. As a general concept, we propose ANCHOR, a subsystem architecture that promotes the logical centralization of non-functional properties. To show the effectiveness of the concept, we focus on 'security' in this paper: we identify the current security gaps in SDNs and we populate the architecture middleware with the appropriate security mechanisms, in a global and consistent manner. ANCHOR sets to provide essential security mechanisms such as strong entropy, secure device registration, and association, among other crucial services. We claim and justify in the paper that centralizing such mechanisms is key for their e ectiveness, by allowing us to: define and enforce global policies for those properties; ensure higher levels of robustness for critical services; foster interoperability of the non-functional property enforcement mechanisms; and finally, better foster the resilience of the architecture itself. We discuss design and implementation aspects, and we prove and evaluate our algorithms and mechanisms.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Software-defined networking (SDN) moves the control functions out of the forwarding devices, logically centralizing the functional properties of the network. This decoupling between control and data plane leads to higher flexibility and programmability of network control, enabling fast innovation. The deployment of network applications as software artefacts that run on a logically centralized controller provides the agility of software evolution to networks, in contrast to the comparably slow innovation dictated by progress in hardware. Moreover, as the forwarding devices are directly controlled by a centralized entity, control applications can reason based on a global network view, enabling improved network operation. In spite of all these benefits, this decoupling, associated with a common southbound API (e.g., OpenFlow), has removed an important natural protection of traditional networks. Specifically, the heterogeneity and diversity of configuration protocols, the closed (and proprietary) nature of the devices, and the distributed nature of the control plane. For instance, an attack on traditional forwarding devices would need to compromise different protocol interfaces – in SDN much harm can be done by attacking OpenFlow alone. Hence, from a security perspective, SDN introduces new attack vectors and radically changes the threat surface 

(Kreutz et al., 2013; Scott-Hayward et al., 2016; Dacier et al., 2017a).

So far, the SDN literature has been mostly concerned with functional properties, such as improved routing and traffic engineering (Jain et al., 2013; Alvizu et al., 2017). However, gaps in the enforcement of non-functional properties are critical to the deployment of SDN, especially at infrastructure/enterprise scale. For instance: insecure control plane associations or communications, network information disclosure, spoofing attacks, and hijacking of devices, can easily compromise the network operation; performance crises can escalate to globally affect QoS; unavailability and lack of reliability of controllers, forwarding devices, or clock synchronization parameters, can considerably degrade network operation (Kloti et al., 2013; Akhunzada et al., 2015; Scott-Hayward et al., 2016).

Addressing these problems in an ad-hoc, piecemeal way, may work, but will inevitably lead to efficiency and effectiveness problems. Although several specific works concerning non-functional properties have recently seen the light e.g., in dependability (Botelho et al., 2016; Ros and Ruiz, 2014; Katta et al., 2015; Kreutz et al., 2015; Berde et al., 2014) or security (Porras et al., 2012; Shin et al., 2014, 2013; Scott-Hayward et al., 2016), enforcement of non-functional properties as a pillar of SDN robustness calls, in our opinion, for a systemic approach. As such, in this paper we claim for a re-iteration of the successful formula behind SDN – ‘logical centralization’ – for its materialization.

In fact, the problematic scenarios exemplified above can be best avoided by the logical centralization of the system-wide enforcement of non-functional properties, increasing the chances that the whole architecture inherits them in a more balanced and coherent way. The steps to achieve such goal are to: (a) select the crucial properties to enforce (dependability, security, quality-of-service, etc.); (b) identify the current gaps that stand in the way of achieving such properties in SDNs; (c) design a logically-centralized subsystem architecture and middleware, with hooks to the main SDN architectural components, in a way that they can inherit the desired properties; and (d) populate the middleware with the appropriate mechanisms and protocols to enforce the desired properties/predicates, across controllers and devices, in a global and consistent manner.

Generically speaking, it is worth emphasizing that centralization has been proposed as a means to address different problems of current networks. For instance, the use of centralized cryptography schemes and centralized sources of trust to authenticate and authorize known entities has been pointed out as a solution for improving the security of Ethernet networks (Kiravuo et al., 2013). Similarly, recent research has suggested network security as a service as a means to provide the required security of enterprise networks (Scott-Hayward et al., 2016). However, centralization has its drawbacks, so let us explain why centralization of non-functional property enforcement brings important gains to software-defined networking. We claim, and justify ahead in the paper, that it allows to define and enforce global policies for those properties, reduce the complexity of networking devices, ensure higher levels of robustness for critical services, foster interoperability of the non-functional enforcement mechanisms, and better promote the resilience of the architecture itself.

To achieve these goals, we propose anchor, a subsystem architecture that does not modify the essence of the current SDN architecture with its payload controllers and devices, but rather stands aside, ‘anchors’ (logically-centralizes) crucial functionality and properties, and ‘hooks’ to the latter components, in order to secure the desired properties. The reader will note that this design philosophy concerns any kind of non-functional properties. To prove our point, in this paper we have chosen security as our use case and identified at least four gaps that stand in the way of achieving the former goals in current SDN systems: (i) security-performance gap; (ii) complexity-robustness gap; (iii) global security policies gap; and (iv) resilient roots-of-trust gap. The security-performance gap comes from the frequent conflict between mechanisms enforcing those two properties. The complexity-robustness gap represents the conflict between the current complexity of security and cryptography implementations, and the negative impact this has on robustness and hence correctness. The lack of global security policies leads to ad-hoc and discretionary solutions creating weak spots in architectures. The lack of a resilient root-of-trust burdens controllers and devices with trust enforcement mechanisms that are ad-hoc, have limited reach and are often sub-optimal. We further elaborate in the paper on the reasons behind these gaps, their negative effects in SDN architectures, and how they can possibly be mitigated through a logically-centralized security enforcement architecture. That is, in this particular case study, the architecture middleware is populated with specific functionality whose main aim is to ensure the ‘security’ of control plane associations and of communication amongst controllers and devices.

In addition, in this paper we give first steps in addressing a long-standing problem, the fact that a single root-of trust — like anchor, but also like any other standard trusted-third-party, like e.g., CAs in X.509 PKI or the KDC in Kerberos — is a single point failure (SPoF). There is nothing wrong with SPoFs, as long as they fail rarely, and/or the consequences of failure can be mitigated, which is unfortunately not the common case. As such, we start by carefully promoting reliability in the design of anchor

, endowing it with robust functions in the different modules, in order to reduce the probability of failure/compromise. Moreover, the proposed architecture only requires symmetric key cryptography. This not only ensures a very high performance, but also makes the system secure against attacks by a quantum computer. Thus, the system is also

post-quantum secure (Bernstein, 2009). Second, we mitigate the consequences of successful attacks, by protecting past, pre-compromise communication, and ensuring the quasi-automatic recovery of anchor after detection, even in the face of total control by an adversary, achieving respectively, perfect forward secrecy (PFS) and post-compromise security (PCS). Third, since protocol designs are normally error prone, we formalise our protocol using a symbolic model and verify its core properties using the Tamarin prover (Meier et al., 2013). Finally, our architecture promotes resilience, or the continued prevention of failure/compromise by automatic means. Though out of scope of this paper, the resilience of anchor using fault and intrusion tolerance techniques is part of our plans for future work, as we discuss in Section 8.

To summarize, the key contributions of our work include the following:

  1. The concept of logical centralization of SDN non-functional properties provision.

  2. The blueprint of an architectural framework based on middleware composed of a central ‘anchor’, and local ‘hooks’ in controllers and devices, hosting whatever functionality needed to enforce these properties.

  3. A gap analysis concerning barriers in the achievement of non-functional properties in the security domain, as a proof-of-concept case study.

  4. Definition, design and implementation of the mechanisms and algorithms to populate the middleware in order to fill those gaps, and achieve a logically-centralized security architecture that is reliable and efficient.

  5. The enforcement of strong properties such as post-quantum security, perfect forward secrecy, and post-compromise recovery. As we discuss in Section 8, these properties are not ensured by previous work and thus represent a clear advance over the state-of-the-art on SDN security.

  6. A formalisation of the main protocol, and a formal verification of its correctness and core security properties, through symbolic modelling using the Tamarin prover.

  7. Evaluation of the proposed mechanisms.

We show that, compared to the state-of-the-art in SDN security, our solution preserves at least the same security functionality, but achieves higher levels of implementation robustness, by vulnerability reduction, while providing high performance. Whilst we try to prove our point with security, our contribution is generic enough to inspire further research concerning other non-functional properties (such as dependability or quality-of-service). It is also worth emphasizing that the architectural concept that we propose in this paper would require a greater effort to be deployed in traditional networks, due to the heterogeneity of the infrastructure and its vertical integration. This will be made clear throughout the paper.

We have structured the paper as follows. Section 2 gives the rationale and presents the generic logically-centralized architecture for the system-wide enforcement of non-functional properties, and explains its benefits and limitations. In Section 3, we discuss the challenges and requirements brought by the current gaps in security-related non-functional properties. Section 4 describes the logically-centralized security architecture that we propose, along with its mechanisms and algorithms. The main algorithms are co-designed with a formal model, and the formal verification of their security properties is presented in Section 5. Then, in Sections 6 and 7, we discuss design and implementation aspects of the architecture, and present evaluation results. In Sections 8 and 9, we give a brief overview of related work, discuss some challenges and justify some design options of our architecture. Finally, in Section 10, we conclude.

2. The anchor Architecture

In this section we introduce anchor, a general architecture for logically-centralized enforcement of non-functional properties – such as ‘security’, ‘dependability’, or ‘quality-of-service’ (Figure 1) – in SDN. The logical centralization of the provision of non-functional properties allows us to: (1) define and enforce global policies for those properties; (2) reduce the complexity of controllers and forwarding devices; (3) ensure higher levels of robustness for critical services; (4) foster interoperability of the non-functional property enforcement mechanisms; and finally (5) better promote the resilience of the architecture itself. Let us explain the rationale for these claims.

Define and enforce global policies for non-functional properties. One can enforce non-functional properties through piece-wise, partial policies. But it is easier and less error-prone, as attested by SDN architectures with respect to the functional properties, to enforce e.g., security policies, from a central trust point, in a globally consistent way. Especially when one considers changing policies during system lifetime.

Reduce the complexity of controllers and forwarding devices. One of the ideas of SDN was exactly to simplify the construction of devices, by stripping them of functionality, centralized on controllers. We are extending the scope of the concept, by relieving both controllers and devices from ad-hoc and redundant implementations of mechanisms that are bound to have a critical impact on the whole network.

Ensure higher levels of robustness for critical services. Enforcing non-functional properties like dependability or security has a critical scope, as it potentially affects the entire network. Unfortunately, the robustness of devices and controllers is still a concern, as they are becoming rather complex, which leads to several critical vulnerabilities, as amply exemplified in (Scott-Hayward et al., 2016). A centralized concept as we advocate might considerably improve on the situation, exactly because the enforcement of non-functional properties would be achieved through a specialized (carefully designed and verified) subsystem, minimally interfering with the SDN payload architecture.

Foster interoperability of the non-functional property enforcement mechanisms. Different controllers require different configurations today, and a potential lack of interoperability in terms of non-functional properties arises. Having global policies and mechanisms for non-functional property enforcement also creates an easier path to foster controller and device interoperability (e.g., East and Westbound APIs).

Better promote the resilience of the architecture itself. Having a specialized subsystem architecture already helps for a start, since for example, its operation is not affected by latency and throughput fluctuations of the (payload) control platforms themselves. However, the considerable advantage of both the decoupling and the centralization is that it becomes straightforward to design in security and dependability measures for the architecture itself, such as advanced techniques and mechanisms to tolerate faults and intrusions (and in essence overcome the main disadvantage of centralization, the potential single-point-of-failure risk).

Figure 1. anchor general architecture

The outline of our architecture is depicted in Figure 1. The “logically-centralized” perspective of non-functional property enforcement is materialized through a subsystem architecture relying on an anchor of trust, a middleware whose main aim is to ensure that certain properties – e.g., the security of control plane associations and of communication amongst controllers and devices – are met throughout the architecture.

In a manner similar to traditional security services, such as Kerberos and RADIUS, anchor is a set of services for the SDN architecture. It ‘anchors’ crucial functionality and properties, and ‘hooks’ to the former components, in order to secure the desired properties. So, on the devices, we just need the local counterparts to the anchor middleware mechanisms and protocols, or hooks, to interpret and follow the anchor’s instructions. In contrast to traditional services, however, anchor targets SDN infrastructures – its advantage over existing systems is in part due to its specificity to this domain.

After having made the case for logically-centralized non-functional property enforcement in SDN, and presenting the outline of our general architecture, in the next two sections we introduce the use case we elected to show in this paper, i.e., logically-centralized security. We start with a gap analysis that establishes the requirements for the architecture functionality in Section 3, and then, in Section  4, we show how to populate anchor with the necessary mechanisms and protocols to meet those requirements.

3. Challenges and requirements for security

3.1. Security performance

The security-performance gap comes from the conflict between ensuring high performance and using secure primitives. This gap affects directly the control plane communication, which is the crucial link between controllers and forwarding devices, allowing remote configuration of the data plane at runtime. Control channels need to provide high performance (high throughput and low latency) while keeping the communication secure.

The latency experienced by control plane communication is particularly critical for SDN operation. The increased latency is a problem per se, in terms of reduced responsiveness, but may also limit control plane scalability, which can be particularly problematic in large datacenters (Benson et al., 2010a). Most of the existing commercial switches already have low control plane performance with TCP (e.g., a few hundred flows/s (Kreutz et al., 2015), Section V.A.). Adding security worsens the problem: previous works have demonstrated that the use of cryptographic primitives has a perceivable impact on the latency of sensitive communication, such as VoIP (Shen et al., 2012) (e.g., TLS incurs 166% of additional CPU cycles compared to TCP), network operations protocols such as SNMP (Schonwalder and Marinov, 2011), NTP (Dowling et al., 2016), OpenFlow-enabled networks (Kreutz et al., 2017, 2018), and HTTPS connections (Naylor et al., 2014). Perhaps not surprisingly, the number of SDN controllers and switching hardware supporting TLS (the protocol recommended by ONF to address security of control plane communication (ONF, 2014, 2015)) is still low (Abdullaziz et al., 2016; Scott-Hayward et al., 2016). Recent research has indeed suggested that one of the reasons for the slow adoption is to be related with the security-performance trade-off (Kreutz et al., 2017).

Ideally, we would have both security robustness and performance on control plane channels. Considering the current state of SDN, it therefore seems clear that there is a need to investigate lightweight alternatives for securing control plane communication. In the context of the security-performance gap, some directions that we point to in our architectural proposal ahead are, for instance, the careful selection of cryptographic primitives (Kreutz et al., 2017), and the adoption of cryptographic libraries exhibiting a good performance-security tradeoff, such as NaCl (Bernstein et al., 2012), or of mechanisms allowing per-message one-time-key distribution, such as iDVV (Kreutz et al., 2017, 2018). We return to these mechanisms later.

3.2. Complexity robustness

The complexity-robustness gap represents the conflict between the current complexity of security based on cryptography and system implementations, and the negative impact this has on robustness and hence correctness, hindering the ultimate goal.

In the past few years, several studies have recurrently shown critical misuse issues of cryptographic APIs of different TLS implementations (Egele et al., 2013; Buhov et al., 2015; Razaghpanah et al., 2017). One of the main root causes of these problems is the inherent complexity of traditional solutions and the knowledge required to use them without compromising security. For instance, more than 80% of the Android mobile applications make at least one mistake related to cryptographic APIs. Recent studies have also found different vulnerabilities in TLS implementations and have shown that longstanding implementations, such as OpenSSL111OpenSSL suffers from different fundamental issues such as too many legacy features accumulated over time, too many alternative modes as result of tradeoffs made in the standardization, and too much focus on the web and DNS names., including its extensive cryptography, are unlikely to be completely verified in the near future (Beurdouche et al., 2015; Fan et al., 2016). To address this issue, a few projects, such as miTLS (Bhargavan et al., 2013) and Everest (Bhargavan et al., 2017), propose new and verified implementations of TLS. However, several challenges remain to be addressed before having a solution ready for wide use (Bhargavan et al., 2017).

While the problem persists, the number of security incidents remains non-negligible. Recent examples include vulnerabilities that allow the recovery of the secret key of OpenSSL at low cost (Yarom and Benger, 2014), and timing attacks that explore vulnerabilities in both PolarSSL and OpenSSL (Arnaud and Fouque, 2013; Brumley and Tuveri, 2011). On the other hand, failures in classical PKI-based authentication and authorisation subsystems have been persistently happening (Cromwell, 2017; PwC, CSO magazine and CERT/CMU, 2014; Hill, 2013), with the sheer complexity of those systems being considered one of the root causes behind these problems.

Similarly to the cryptographic APIs example, the leading cause of most security issues in systems – and this includes SDN controllers, operating systems, hypervisors, etc. – is the inherent complexity of their implementation and the amount of aggregated functions and services, resulting in challenging ecosystems in terms of security (Dacier et al., 2017b; Klein et al., 2009; Steinberg and Kauer, 2010; Ponemon Institute Research, 2018; Arbettu et al., 2016; Yoon et al., 2017; Secci et al., 2017; Lee et al., 2017; Singaravelu et al., 2006; Ho et al., 2003). It is recognized by the community that complexity reduction (e.g., by means of isolation, modularization, reduced and verifiable code bases, loosely coupled and well-defined micro-services) plays a vital role in ensuring the security of systems.

Considering the widely acknowledged principle that simplicity is key to robustness, especially for secure systems, we advocate and try to demonstrate in this paper, that the complexity-robustness gap can be significantly reduced through less complex but equally secure alternative solutions. NaCl (Bernstein et al., 2012), which we mentioned in the previous section, can be used again as an example in this context: it is one of the first attempts to provide a less complex, efficient, yet secure alternative to OpenSSL-like implementations. Mechanisms simplifying key distribution, authentication and authorization, such as iDVVs (Kreutz et al., 2017), could help mitigate PKIs’ problems. Furthermore, simple and efficient protocols for ensuring the secure registration and association of devices are two other examples of reduced complexity when compared to traditional solutions (e.g., PKI/X.509 and TLS). By following this direction, we are applying the same principle of vulnerability reduction used in other systems, such as unikernels, where the idea is to reduce the attack surface by generating a smaller overall footprint of the operating system and applications (Williams and Koller, 2016).

3.3. Global security policies

The impact of the lack of global security policies can be illustrated with different examples. Although ONF describes data authenticity, confidentiality, integrity, and freshness as fundamental requirements to ensure the security of control plane communication in SDN, it does so in an abstract way, and these measures are often ignored, or implemented in an ad-hoc manner (Scott-Hayward et al., 2016). Another example is the lack of strong authentication and authorisation in the control plane. Recent reports show that widely used controllers, such as Floodlight and OpenDaylight, employ weak network authentication mechanisms (Wan et al., 2017; Scott-Hayward et al., 2016; Secci et al., 2017; Lee et al., 2017). This leads to any forwarding device being able to connect to any controller.

From a security perspective, it is non-controversial that device identification, authentication and authorization should be among the forefront requirements of any network. All data plane devices should be appropriately registered and authenticated within the network domain, with each association request between any two devices (e.g., between a switch and a controller) being strictly authorized by a security policy enforcement point. In addition, control traffic should be secured, since it is the fundamental vehicle for network control programmability. This begs the question: why aren’t these mechanisms employed in most deployments?

A reason for the current state of affairs is the lack of awareness, guidance, and enforcement policies. It is therefore becoming crucial to define and establish global policies, and design, or adopt, the mechanisms needed to enforce them and meet the essential requirements (e.g., secure authentication and trustworthy authorization), to fill the policy gap.

3.4. Resilient roots-of-trust

A globally recognized, resilient root-of-trust, could dramatically improve the global security of SDN, since current approaches to achieve trust are ad-hoc and partial (Abdullaziz et al., 2016). Solving this gap would certainly assist in fostering global mechanisms to ensure trustworthy registration and association between devices, as discussed before, but the benefits would go beyond that. For instance, a root-of-trust can be used to provide fundamental mechanisms (e.g., sources of strong entropy or pseudo-random generators (PRGs)), which would serve as building blocks for specific security functions.

As a first example, modern cryptography relies heavily on strong keys and the ability to keep them secret. The core feature that defines a strong key is its randomness. However, the randomness of keys is still a widely neglected issue (Vassilev and Hall, 2014) and, not surprisingly, weak entropy, and weak random number generation have been the cause of several significant vulnerabilities (Kim et al., 2013). Even long-standing cryptographic libraries such as OpenSSL have been recurrently affected by this problem (Kim et al., 2013;, 2016). Importantly, recent research has shown that this problem also affects networking equipment (Heninger et al., 2012; Albrecht et al., 2015; Hastings et al., 2016). For instance, a common pattern found in low-resource devices, such as switches, is that the random number generator of the operating system may lack the input of external sources of entropy to generate reliable cryptographic keys.

Similarly, as a second example, sources of accurate time, such as the local clock and the network time protocol, have to be secured to avoid attacks that can compromise network operation, since time manipulation attacks (e.g., NTP attack (Malhotra et al., 2015; Stenn, 2015)) can affect the operation of controllers and applications. For instance, a controller can be led to deliberately disconnect forwarding devices if it wrongly perceives the expiration of heartbeat message timeouts.

It is worth emphasizing that the resilient roots-of-trust gap lies exactly in the relative trust that can be put in partial ad-hoc implementations of critical functions by controller developers and manufacturers of devices, in contrast to a careful, once-and-for-all architectural approach that can be reinstantiated in different SDN deployments. The list not being exhaustive, we claim that strong sources of entropy, resilient, indistinguishable-from-random number generators, and accurate, non-forgeable global time services, are fitting examples of such critical functions to be provided by logically-centralized roots-of-trust, helping close the former gap.

4. Logically-centralized security

In this section we introduce the specialization of the anchor architecture for logically-centralized security properties enforcement (Figure 2), guided by the conclusions from the previous section. Our main goal is to provide security properties such as authenticity, integrity, and confidentiality for control plane communication. To achieve this goal, the anchor provides mechanisms (e.g., registration, authentication, a source of strong entropy, a PRG) required to fulfill some of the major security requirements of SDNs.

As illustrated in Figure 2, we “anchor” the enforcement of security properties on anchor, which provides all the necessary mechanisms and protocols to achieve the goal. It is also a central point for enforcing security policies by means of services such as device registration, device association, controller recommendation, or global time, thereby reducing the burden on controllers and forwarding devices, which just need the local hooks, protocol elements that interpret and follow the anchor’s instructions.

Figure 2. Logically-centralized Security

Next, we review the components and essential security services provided by anchor. We first illustrate, in Section 4.1, how we implement our strategy of improving the robustness of anchor as a single root-of-trust, by hardening anchor in the face of failures. For example, concerning the mitigation of possible (though expectedly infrequent) security failures, we provide countermeasures such as Perfect Forward Secrecy (PFS) and Post-Compromise Security (PCS), protecting pre- and post-compromise communications in the presence of successful attacks. Next, we propose a source of strong entropy (Section 4.2) and a resilient pseudo random generator (Section 4.3) for generating security-sensitive materials. These are crucial components, as attested by the impact of vulnerabilities discovered in the recent past, in sub-optimal implementations of the former in several software packages (Bernstein et al., 2016; Mimoso, 2016; ZETTER, 2015; Schneier, 2012). We implement and evaluate the robustness of these mechanisms. We also leverage a recently proposed mechanism, the integrated device verification value (iDVV), to simplify authentication, authorization, and key generation amongst SDN components (Kreutz et al., 2017), which we review and put in the context of anchor (Section 4.4). Namely, the iDVV protocol runs between the anchor, and the hooks in controllers and switching devices. We implement and evaluate iDVV generators for OpenFlow-enabled control plane communication. After defining system roles and its setup in Section 4.5, we present two essential services for secure network operation — device registration (Section 4.6) and device association (Section 4.7) — and we describe how the above mechanisms interplay with our secure device-to-device communication approach (Section 4.9). The list of services of anchor is certainly not closed. One can think of other functionalities, such as tracking of forwarding devices association, alert generation in case of anomalous behaviours (e.g., recurrent reconnections), and so forth.

In what follows, we describe the main anchor services in detail. To help the reader following our descriptions, we summarize the most important notations used in Table 1.

Description Example

Cryptographic hash function SHA512
Message Authentication Code algorithm Poly1305
One entity belonging to {, , , , } Device (e.g., switch)
Encryption secret key used between entities X and Y 256 bits random key
MAC/HMAC secret key used between entities X and Y 256 bits random key
Encryption primitive using secret key AES
[], keyed-Hash MAC of message [] using secret key HMAC-SHA512
KDF Key Derivation Function OpenSSL PBKDF2

Table 1. Summary of notations

4.1. Hardening anchor

The compromise of a root-of-trust is of great concern, since crucial services normally depend on it being secure and dependable. As we stated before, we have a long-term strategy towards the resilience of anchor. In the context of this paper, it starts by improving the inherent reliability of its simplex (non-replicated) version, by hardening it in the face of failures. For instance, different from existing traditional security services such as Kerberos and RADIUS, we still provide some security guarantees even when anchor has been compromised. In particular, we propose protocols to achieve two security properties guaranteeing respectively, the security of past (pre-compromise) communications, and of future (post-recovery) communications. This provides a significant improvement over other existing root-of-trust infrastructures.

The first security property is perfect forward secrecy (PFS), namely, the assurance that the compromise of all secrets in a current session does not compromise the confidentiality of the communications of the past sessions. The enforcement of PFS is systematically approached in the algorithms we present next.

The second property is post-compromise security (PCS). While PFS considers how to protect the past communications, PCS considers how to automatically reinstate and re-establish the secure communication channels, for future communications. This security property has so far been considered only in the specific scenario of secure messaging (Yu and Ryan, 2015), and only limited works (Yu et al., 2017a, b) are available. In particular, we consider that when anchor has been compromised by an attacker (e.g., through the exploitation of software vulnerabilities), and has been reinstated by the operator (e.g., by applying software patches and rebuilding servers), the system should have a way to automatically re-establish secure communications between anchor and all other participants, without having to reinstate these components (controllers and forwarding devices in this case, whose shared secrets became compromised). In particular, in Section 4.10 we explain how to re-establish secure communication channels in a semi-automatic way, after complete failure of anchor.

In summary, even though anchor is a single root-of-trust in our system, we mitigate the associated risks by guaranteeing:

  • PFS: the compromise of anchor in the current session does not expose past communications;

  • PCS: when anchor is compromised and reinstated, anchor can automatically re-establish secure communication channels with all other participants in the system to protect the security of future communications.

As a side note, since our system only uses symmetric key cryptography, it will stand up even against an attacker with quantum computers. In other words, our infrastructure will be post-quantum secure (PQS).

4.2. A source of strong entropy

Entropy still represents a challenge for modern computers because they have been designed to behave deterministically (Vassilev and Hall, 2014). Sources of true randomness (e.g., physical phenomena such as atmospheric noise) can be difficult to use because they work differently from a typical computer.

To avoid the pitfalls of weak sources of entropy, in particular in networking devices, anchor provides a source of strong entropy to ensure the randomness required to generate seeds, pseudorandom values, secrets, among other cryptographic material. The strong source of entropy has the following property:

Strong Entropy - Every value entropy returned by entropy_get is indistinguishable-from-random.

Algorithm 1 shows how the external (from other devices) and internal (from the local operating system) sources of entropy are kept updated and used to generate random bytes per function call (entropy_get()). The state of the internal and external entropy is initially set by calling the entropy_setup(data). This function requires an input data, which can be a combination of current system time, process number, bytes from special devices, among other things, and random bytes (rand_bytes()) from a local (deterministic) source of entropy (e.g., /dev/urandom) to initialize the state of the entropy generator. As we cannot assume anything regarding the predictability of the input data, we use it in conjunction with a rand_bytes() function call (line 2). A call to rand_bytes() is assumed to return (by default) 64 bytes of random data.

2: e_entropy rand_bytes() H(data)
3: i_entropy rand_bytes() e_entropy
5: e_entropy H() i_entropy
6: E_counter 0
8: if E_counter >= MAX_LONG call entropy_update()
9: i_entropy H(rand_bytes() E_counter)
10: entropy e_entropy i_entropy
Algorithm 1 Source of strong entropy

Function entropy_update() uses as input the statistics of external sources and the anchor’s own packet arrival rate to update the external entropy. The noise (events) of the external sources of entropy is stored in 32 pools (, , , , …, ), as suggested by previous work (Ferguson et al., 2011). Each pool has an event counter, which is reset to zero once the pool is used to update the external entropy. At every update, two different pools of noise ( and ) are used as input of a hashing function . The two pools of noise can be randomly selected, for instance. The output of this function is XORed with the internal entropy to generate the new state of the external entropy. It is worth emphasizing that entropy_update() is automatically called when E_counter (the global event counter) reaches its maximum value and whenever needed, i.e., the user can define when to do the function call.

The resulting 64 bytes of entropy, indistinguishable-from-random bytes (entropy_get()), are the outcome of an XOR operation between the external and internal entropy. While the external entropy provides the unpredictability required by strong entropy, the internal source provides a good, yet predictable (Vassilev and Hall, 2014), continuous source of entropy. At each time the entropy_get() function is called, the internal entropy is updated by using a local source of random data, which is typically provided by a library or by the operating system itself, and the global number of events currently in the 32 pools of noise (). These two values are used as input of a hashing function .

Such sources of strong entropy can be achieved in practice by combining different sources of noise, such as the unpredictability of network traffic (Greenberg et al., 2009), the unpredictability of idleness of links (Benson et al., 2010c), packet arrival rate of network controllers, and sources of entropy provided by operating systems. We provide implementation details in Section 6.1. A discussion about the correctness of Algorithm 1 can be found in Appendix A.

4.3. Pseudorandom generator (PRG)

A source of entropy is necessary but not sufficient. Most cryptographic algorithms are highly vulnerable to the weaknesses of random generators (Dodis et al., 2013). For instance, nonces generated with weak pseudo-random generators can lead to attacks capable of recovering secret keys. Different security properties need to be ensured when building strong pseudo-random number generators (PRG), such as resilience, forward security, backward security and recovering security. In particular, the latter was recently proposed as a measure to recover the internal state of a PRG (Dodis et al., 2013). We propose a PRG that uses our source of strong entropy and implements a refresh function to increase its resilience and recovery capability. The pseudo-random number generator has the following property:

Robust PRG - Every value nprd returned by the function PRG_next is indistinguishable-from-random.

A robust PRG needs three well-defined constructions, namely setup(), refresh() (or re-seed), and next(), as described in Algorithm 2. The internal state of our PRG is represented by three variables, the , the and the next pseudo-random data . The setup process generates a new seed, by using our strong source of entropy, which is used to update the internal state. In line 3, we initialize the by calling the function, which returns a long unsigned int value that will be used to re-seed and to generate the next pseudorandom value. In line 4, we call to make sure that the external entropy gets updated before calling one more time the function. The first is the outcome of an XOR operation between the newly generated seed and a second call to our source of entropy. It is worth emphasizing that the set up of the initial state of the PRG does not require any intervention or interaction with the end user. We provide strong and reliable entropy to set up the initial values of all three variables. This ensures that our PRG is non-sensitive to the initial state. For instance, in a tradicional PRG the user could provide an initial seed, or other setup values, that could compromise the quality of the generator’s output. The , which is concatenated with the (lines 9 and 13), gives the idea of an unbounded state space (Stark, 2017). This is possible because we are using cryptographically strong primitives such as a hash function H and the MAC function HMAC. Thus, in theory, we have unbounded state spaces, i.e., we can keep concatenating values to the input of these primitives.

2: seed entropy_get()
3: counter long_uint(entropy_get())
4: call entropy_update()
5: nprd seed entropy_get()
7: seed entropy_get()
8: counter long_uint(entropy_get())
9: nprd H(seed nprd counter)
11: counter counter - 1
12: if counter <= 0 call PRG_refresh()
13: nprd HMAC(seed, nprd counter)
Algorithm 2 Pseudo-random number generator

The PRG_refresh() function updates the internal state, i.e., the , the and the . It uses H to update the state of the . Finally, the PRG_next() function outputs a new, indistinguishable-from-random stream of bytes, applying HMAC on the internal state. In this function, the is decremented by one. The idea is for it to start with a very large unsigned 8-bytes value, which is used until it reaches zero. At this point, the PRG_refresh() function will be called to update the internal state of the generator. The newly generated is the outcome of an HMAC function with a dimension of 128 bits.

The main motivation for having a PRG along with a strong source of entropy is speed. Studies have shown that entropy generation can be rather slow, such as to for generating 128 bits of entropy (Mahu et al., 2015). Our source of entropy uses external entropy and random bytes from special devices, whereas the PRG uses an HMAC function, in order to have a fast and reliable generation of pseudo-random values.

In spite of the fact that we could use any good PRG to generate cryptographic material (e.g. keys, nonce), it is worth emphasizing that we introduce a PRG that works in a seamless way with our strong source of entropy, improving its quality. In Section 6.2, we discuss the specifics of the implementation. We also evaluate the robustness and level of confidence of our algorithms in Section 7.1. A discussion about the correctness of Algorithm 2 can be found in Appendix B.

4.4. Integrated device verification value (iDVV)

The design of our logically-centralized security architecture also includes the iDVV component (Kreutz et al., 2017). The iDVV idea was inspired by the iCVVs (integrated card verification values) used in credit cards to authenticate and authorize transactions in a secure and inexpensive way. In (Kreutz et al., 2017) the concept was applied to SDN, proposing a flexible method of generating iDVVs that can be safely used to secure communication between any two devices. As a result, iDVVs can be used to partially address two gaps of non-functional properties, security-performance and complexity-robustness.

An iDVV is a unique value generated by device A (e.g., forwarding device) which can be verified by device B (e.g., controller). An iDVV generator has essentially two interfaces. First, idvv_setup (seed, secret), which is used to set up the generator. It receives as input two secret, random and unique values, the seed and the (higher-level protocol dependent) secret. The source of strong entropy and the robust PRG are, amongst other things, used to bootstrap and keep the iDVV generators fresh. Second, the idvv_next() interface returns the next iDVV. This interface can be called as many times as needed.

So, iDVVs are sequentially generated to authenticate and authorize requests between two networking devices, and/or protect communication. Starting with the same seed and secret, the iDVV generator will generate, for example, at both ends of a controller-device association, the exact same sequence of values. In other words, it is a deterministic generator of authentication or authorization codes, or one-time keys, which are, however, indistinguishable from random. The main advantages of iDVVs are their low cost, which makes them even usable on a per-message basis, and the fact that they can be generated off-line, i.e., without having to establish any previous agreement.

Correctness. The randomness and performance of the iDVV algorithm as deterministic generator of authentication or authorization codes, or one-time keys which are however indistinguishable from random, have been analyzed, and its properties proved, in (Kreutz et al., 2017). The performance study is complemented in Section 7.2. Overall, these analyses show that iDVVs are robust, achieve a high level of confidence and outperform traditional key generation and derivation functions without compromising the security.

4.5. System roles and setup

In our system we assume the existence of personnel with two different roles: the system administrator, that controls the operation of central services such as anchor, and the network administrator (a.k.a. manager), responsible for the operation of network devices. Every time a new network device (a forwarding device or a controller) is added to the network, it must first be registered, before being able to operate.

In the current practice, the device registration is a manual process triggered by a network administrator through an out-of-band channel. Given the potentially large number of network devices in SDN, such a manual process is unsatisfactory. Thus, we propose a protocol, described below, to fulfil the desire of a semi-automated device registration process, which is efficient, secure, and requires the least involvement of anchor. The anchor is first set up by the system administrator. Next, each network device is set up by anchor. Before that process, the network administrator has to share a secret key with the device. The set up of this key and the registration of devices is described in Section 4.6. Then, the devices can be registered automatically.

Now we present the deployment, communication and set up required for anchor (by the system administrator), network administrator, and devices. Afterwards, we describe the device registration and association algorithms.

Deploying blackAnchor. Currently, anchor is designed to work in a single domain, with single ownership, such as a data center, enterprise, or university campus network. anchor supports deployments with multiple controller instances (Koponen et al., 2010), for scalability and availability of network control, as is required in production systems (Jain et al., 2013). It is worth emphasizing it is part of our plan to extend the anchor’s features and services to multiple domains with multiple ownership.

Connectivity infrastructure. anchor is designed to logically centralize non-functional properties of generic SDN deployments. As such, it is not restricted to OpenFlow. Other southbound APIs can be used, such as POF, ForCES, or P4. The anchor connectivity infrastructure, used for communication between SDN devices (controllers and networking gear) and anchor, can use traditional in-band or out-of-band mechanisms (for instance, traditional routing protocols such as OSPF or IS-IS, as is common for control plane channels (Koponen et al., 2010)).

For simplicity and without loss of generality, in what follows we denote () an encryption using encryption key , and we denote [],, respectively, a message field inside [], followed by an HMAC over the whole material within [], using MAC key , where , for ANCHOR, Device, network adMinistrator (or Manager), Controller, and Forwarding device. In what follows, anchor can generate strong keys using a suitable key derivation function (KDF) based on the high entropy random material described in the previous sections.

blackAnchor setup. The anchor needs two master recovery keys, namely the master recovery encryption key and master recovery MAC key , fundamental for the post-compromise recovery steps described ahead. However, these two master recovery keys, in possession of the authority overseeing anchor (the system administrator), must never appear in the anchor server (if they are to recover from a possible full server compromise), being securely stored and used only in an offline manner 222Just to give a real feel, one possible implementation of this principle is: a pristine anchor server image is created; it boots offline in single user mode; it generates and through a strong KDF as discussed above; keys are written into a USB device, and then deleted; first online boot proceeds.. Due to the space constraint, we refer the reader to Appendix C for more information (including a visual representation) regarding the three phases of anchor, namely setup, normal operation, and recovery.

As we will present later, the master recovery keys are only used in two cases, namely (a) when a new network administrator is registered with anchor (i.e. the network administrator setup process); and (b) when anchor was compromised and is reinstated into a trustworthy state (i.e. the post-compromise recovery process presented in §4.10). When either case occurs, the anchor authority only needs to use the master recovery keys once, to recursively compute the recovery keys of all devices and network administrators. The output of the calculation will be imported into the anchor server through an out-of-band channel (e.g. by using a USB).

Network administrator setup. Each network administrator (or manager, denoted ) with identity M_ID is registered with anchor manually. This is the only manual process to initialize a new . Afterwards, all devices managed by this administrator can be registered with anchor through our device registration protocol.

During the network administrator registration phase, anchor locally generates encryption key and MAC key to be shared with , and they are manually imported into through an out-of-band channel (again, by using a USB, for example).

Further, recovery keys = H() and = H() are also computed by anchor offline. recovery keys live essentially offline, since needs to perform only infrequent operations with these keys (e.g. upon device registration). Note that anchor does not store or as well, but can recompute them offline when the post-compromise recovery process is triggered, as we detail in Section 4.10.

Device setup. A device with identity is either a forwarding device (F) or a controller (C), but we do not differentiate them during the set up and registration processes. The first operation to be made after a device is first brought to the system is the setup, which, in the context of this paper, concerns the establishment of credentials, for secure management access by the network administrator.

Upon request from , anchor locally generates a pair of keys for each device being set up , and , to be respectively the encryption and MAC key to be shared between and , for management. They are sent to under the protection of and . Then, they are manually imported by the network administrator into each through an out-of-band channel.

4.6. Device registration

The device registration protocol is presented in Algorithm 3. We assume that and , described above, are in place.

1.3          {Bootstrap for devices } 1. M A [()], 2. A for each , generate 3. A M [(())],          {For each device } 4. M H(); H(). 5. M [()], 6. A [()], 7. A M [()], 8. A tag() = registered; 9. for , if Type()==t, then 10. , if tag() == registered is True 11.      = H(); = H(). 12. M [()], 13. M tag() = registered; 14. destroys (); 15. = H(); = H(); 16. , if tag() == registered is True 17.      = H(); = H(). 18. = H(); = H().
Algorithm 3 Device registration

The first part concerns the bootstrap of the registration of a batch of devices with anchor (), by a network administrator . Let be the set of device identities that the administrator wants to register. requests (line 1) the registration to , accompanying each with a nonce . computes its own nonce , and keys , for each , and returns them encrypted to (lines 2,3). The random nonces and are used to prevent replay attacks.

The process then follows for each device . First, the device recovery key is created (line 4), using ’s recovery keys and . Then sends the relevant cryptographic keys (line 5). Device follows-up confirmation to , which closes the loop with , using the original nonce from (lines 6,7). then performs a set of operations (lines 8-11) to commit the registration of , namely by inserting it into the controller or forwarding device list, respectively CList or FList, and updating several keys.

Note that in Algorithm 3, the update of several shared keys (i.e., lines 11, 15, 17, 18) at the end of the registration steps at , , and , is used to provide PFS. When a key is updated, the old one is destroyed. Continuing, in line 12 closes the loop with , using the original nonce from , finally confirming ’s registration. Upon this step, both and perform the key update just mentioned.

Note that the generation process of the recovery keys and lies with (line 4), though using its recovery keys shared with anchor, and . This reduces the number of uses of the master recovery key. However, as we will see, albeit not knowing , , , and , anchor can easily compute them offline, if needed. Second, and possessed by the network administrator are only used when new devices need to be registered. So, and can be usually stored offline. This provides an extra layer of security.

4.7. Device association

The association service is required for authorizing control plane channels between any two devices, such as a forwarding device and a controller. A forwarding device has to request an association with a controller it wishes to communicate with. This association is mediated by the anchor.

The association process between two devices is performed by the sequence of steps detailed in Algorithm 4. Registered controllers and forwarding devices are inserted in CList and FList, respectively. Notation: As explained above, the registration process set in place shared secret keys between anchor (A) and any controller C or forwarding device F.

1.3               {Of forwarding device with controller } 1. F A [, F, GetCList], 2. A F [, F, (CList(F), )], 3. F C , GetAiD, F, C, (GetAiD, F, C, , ) 4. C A [, GetAiD, F, C, (GetAiD, F, C, , ), (GetAiD, F, C, , )], 5. A C [, (, AiD), (, AiD)], 6. A destroys () 7. C F , (, AiD), (SEED, ) 8. F C , (SEED ) 9. A, F = H(); = H() 10. A, C = H(); = H()
Algorithm 4 Device association

The device association implemented by Algorithm 4 has the following properties:

Controller Authorization - Any device F can only associate to a controller C authorized by the anchor.

Device Authorization - Any device F can associate to some controller, only if F is authorized by the anchor.

Association ID Secrecy - After termination of the algorithm, the association ID () is only known to F and C.

Seed Secrecy - After termination of the algorithm, the seed () is only known to F and C.

The algorithm coarse structure follows the line of the Needham-Schroeder (NS) original authentication and key distribution algorithm (Needham and Schroeder, 1978), but contemplates anti-replay measures such as including participant IDs, and a global initial nonce as suggested in (Otway and Rees, 1987). Unlike NS, it uses encrypt-then-mac to further prevent impersonation. Furthermore, it is specialized for device association, managing authorization lists, and distributing a double secret in the end (association ID and seed). Secure communication protocols running after association can, as explained below in Section 4.9, use iDVVs on a key-per-message or key-per-session basis, rolling from the initial seed and secret association ID.

The association process starts with a forwarding device (F) sending an association request to the anchor (A) (line 1 in Algorithm 4). This request contains a nonce , the identification of the device and the operation request (get list of controllers). The request also contains an HMAC to avoid device impersonation attacks. The anchor checks if F is in FList (registered devices), and if so, it replies (line 2) with a list of controllers (CList(F)) which F is authorized to associate with. The list of controllers (and the nonce ) is encrypted using a key (set up during registration) shared between A and F. This protects the confidentiality of the list of controllers, and ensures that the message is fresh, providing protection against replay attacks. A message authentication code also protects the integrity of the anchor’s reply, avoiding impersonation attacks. Next, F sends an association request to the chosen controller C (line 3). The request contains a message that is encrypted using a key shared between F and A. This message contains the get association id () request, the identity of the principals involved (F,C), a nonce , and binds to the nonce . The controller forwards this message to A (line 4), adding its own encrypted association request field, similar to F’s, but containing C’s own nonce instead. This prevents the impersonation of the controller since only it would be able to encrypt the freshly generated . In line 5, C trusts that A’s reply is fresh because it contains . The controller also trusts that it is genuine (from A) because it contains . As such, C endorses F as an authorized device and as the association key for F. Future compromise of A should not represent any threat to established communication between C and F. To achieve this goal, A immediately destroys the (line 6) and C and F further share a seed not known by A (line 7).

C forwards both the encrypted message and its seed to F (line 7). The forwarding device trusts that this message is fresh and correct because it contains , and under encryption, together with the , only know to F and C, which it endorses then as the association key. F trusts that C is the correct correspondent, otherwise A would not have advanced to step 5. That being true, future interactions will use . F believes that the is genuine, as random entropy for future interactions, because it is encapsulated by , known only to C and F. The forwarding device also trusts that the message is fresh because it contains . Finally (line 8), C trusts it is associated with F (as identified in step 3 and confirmed by A in step 5), when F replies showing it knows both the and the , by encrypting the XOR’ed with the current nonce , with . Replay and impersonation attacks are avoided because all encrypted interactions are dependent on nonces, so will become void in the future. At the end of each device association protocol, all keys shared between a device (F or C) and anchor will be updated to the hash value of this key (lines 9, 10). Again, this is used to provide perfect forward secrecy. All nonces are random, i.e., not predictable.

A discussion of the correctness of Algorithm 4 can be found in Appendix D.

4.8. Controller recommendation

Similarly to moving target defense strategies (Wang et al., 2014), devices (e.g., controllers) are hidden by default, i.e., only registered and authenticated devices can get information about other devices. Even if a forwarding device finds out the IP of a controller, it will be able to establish a connection with the controller only after being registered and authorized by the anchor.

Controllers can be recommended to forwarding devices using different parameters, such as latency, load, or trustworthiness. When a forwarding device requests an association with one or more controllers, the anchor sends back a list of authorized controllers to connect with. The forwarding device will be restricted to associate itself with any of the controllers on the list. In other words, forwarding devices will not be allowed to establish connections with other (e.g., hostile or fake) controllers. Similarly, fake forwarding devices will be, by default, forbidden to set up communication channels with non-compromised controllers.

4.9. Device-to-device communication

Communication between any two devices happens only after a successful association. Consider the end of an association establishment, as per Algorithm 4, e.g. between a controller C and a forwarding device F: at this point, both sides, and only them, have the secret and unique material . Using them, they can bootstrap the iDVV protocol (see Section 4.4 above), which from now on can be used at will by any secure communication primitives. As explained earlier, iDVV generation is flexible and low cost, to allow the generation: (a) on a per message basis; (b) for a sequence of messages; (c) for a specific interval of time; or (d) for one communication session.

NaCl (Bernstein et al., 2012), as mentioned in previous sections, is a simple, efficient, and secure alternative to OpenSSL-like implementations, and is thus our choice for secure communication amongst devices.

Researchers have shown that NaCL is resistant to side channel attacks (Almeida et al., 2013) and that its implementation is robust (Bernstein et al., 2012). Different from other crypto libraries, NaCL’s API and implementation is kept very simple, justifying its robustness. Through anchor, the SDN communication channels are securely encrypted using symmetric key ciphers provided by NaCl, with the strong cryptographic material required by the ciphers generated by our mechanisms, allowing secret codes per packet, session, time interval, or pre-defined ranges of packets.

4.10. Post-compromise recovery

As previously explained, when anchor is reinstated after a compromise, it is crucial to have a way to automatically re-establish the secure communication channels between anchor and all participants.

Algorithm 5 presents our solution to re-establish the secure communication channels when anchor is compromised. Intuitively, since anchor’s master recovery keys and are stored securely offline, they are unknown to an attacker who has stolen all secrets from the anchor server. As described before, all and all recovery keys can be recursively computed from the master keys, offline (line 1). Afterwards, the system administrator imports those keys into the anchor server. To continue the recovery process, anchor generates new random keys to be shared with all s, and all (line 2).

Then, anchor sends to each (line 3) a recovery message to re-share keys (contained in ) with the devices under the network administrator’s control. The messages are encrypted with the corresponding recovery keys. The new shared keys will be used to protect future communications. Note that in line 3 we create an additional MAC value on the entire message under the current MAC key . Since the recovery keys are stored offline, without having this additional MAC value the network administrator would have to perform the verification offline, manually. This MAC value prevents possible DoS attacks where an attacker creates and sends fake recovery messages to network managers, as this additional MAC value can be verified online efficiently.

Each implements the recovery operation with each of the devices it manages (line 4). The new keys replace the possibly compromised keys at and each (lines 5-6, and 9). Likewise, when the recovery process has been completed, the recovery keys will be updated to their hash value (lines 7-8, and 10-11). As mentioned previously, this key update is used to provide perfect forward secrecy (PFS).

1.3          {For each manager and its associated devices } 1. A computes and ; 2. generates . 3. A M .          {For each device } 4. M . 5. M destroys ; 6. ; ; 7. = H(); = ); 8. = H(); = H(). 9. ; ; 10. = H(); = H(); 11. = H(); = H().
Algorithm 5 anchor recovery.

If the keys of the network administrator get compromised (e.g., if loses its keys), they can be recovered using the recovery keys provided by . Moreover, can also re-establish its shared secrets with anchor and its devices in a similar way as described in Algorithm 5. However, the steps are made only for a single instead of all , and with some differences, which we detail next. First, gets the recovery keys (line 1) from anchor through an out-of-band channel: , , and all , from . Then, in lines 2-3, will get (generated by ) the keys for managing all devices, instead of , which do not need to be changed. Finally, in line 4 keys are sent to each , instead of .

5. Security Analysis

We provide formal machine-checked verification of the core security properties of anchor, using the Tamarin prover. In particular, we formalise the core protocols of anchor, including device registration protocol, device association protocol, and post-compromise recovery protocol, through symbolic modeling. In addition, for each of the protocols, we verify its correctness, message confidentiality, and perfect forward secrecy. Moreover, we additionally verify the post-compromise security of anchor with the post-compromise recovery protocol.

The full model contains 1712 lines of code. In total, we have proved 33 properties — 23 of them are helper lemmas for the theorem prover to understand anchor better; 4 lemmas are sanity proofs which check the correctness of our protocols and their formalisation; and 6 main security properties that ensure the message confidentiality, perfect forward secrecy, and post-compromise security of anchor. We provide all input files and complete formal model required to understand and reproduce our security analysis at (Anchor, 2018).

5.1. Security properties

anchor achieves both classical security properties and novel security properties. In a classical sense, the confidentiality of communications between any two devices is guaranteed. In particular, anchor also provides perfect forward secrecy, namely if a device is compromised, then all communications of this device in the past are still secure.

For the novel security guarantee, as mentioned before, rather than assuming the trusted party cannot be compromised, such as CAs in X.509 PKI or the KDC in Kerberos, we also consider that anchor might be compromised. In this case, we assume that there are means to detect that the compromise has happened, and then the system can be recovered through our post-compromise recovery protocol, which also guarantees perfect forward security, when anchor is compromised and recovered.

5.2. Formal analysis

We analyse the main security properties of the protocol using the Tamarin prover (Meier et al., 2013). The Tamarin prover is a symbolic analysis tool that can prove properties of security protocols for an unbounded number of instances and supports reasoning about protocols with mutable global state, which makes it suitable for our protocols. Protocols are specified using multiset rewriting rules, and properties are expressed in a guarded fragment of first order logic that allows quantification over timepoints.

Tamarin is capable of automatic verification in many cases, and it also supports interactive verification by manual traversal of the proof tree. If the tool terminates without finding a proof, it returns a counter-example. Counter-examples are given as so-called dependency graphs, which are partially ordered sets of rule instances that represent a set of executions that violate the property. Counter-examples can then be used to refine the model, and give feedback to the implementer and designer.

5.3. Modeling aspects

As explained, we consider four protocol roles in anchor, namely A (anchor), M (network adMinistrator or Manager), F (Forwarding device), and C (Controller). To simplify our model, we consider an additional role D (Device) to represent any kind of network device, when it is irrelevant to distinguish its type (i.e., forwarding device or controller).

We model the above protocol roles by a set of rewrite rules. Our modeling of the roles follows the typical Tamarin models, and directly corresponds to the algorithm descriptions in the previous sections. Specifically, each rewrite rule typically models receiving a message, taking an appropriate action, and sending a response message. Tamarin provides built-in support for a Dolev-Yao style network attacker, i.e., one who is in full control of the network. We also specify rules that enable the attacker to compromise anchor and/or any device in the network, and learn all of their session keys.

5.4. Proof goals

We state several proof goals as specified in Tamarin’s syntax. Since Tamarin’s property specification language is a fragment of first-order logic, it contains logical connectives (|, &, ==>, not, …) and quantifiers (All, Ex). In Tamarin, proof goals are marked as lemma. The #-prefix is used to denote timepoints, and “E @ #i” expresses that the event occurs at timepoint . Due to the space limitation, we only present a set of examples selected from our full model, to explain the core ideas. We refer the reader to the full model and detailed proof results available at (Anchor, 2018).

The first example goal is a check for executability that ensures that our model allows for the successful transmission of a message. The following example, which is a correctness lemma in the device registration protocol, shows how it is encoded in our proof.

lemma protocol_correctness [use_induction]:
  "Ex A D Did k keAD #i1.
     SendSec(A, D, Did,k, keAD) @ i1"

The property holds if the Tamarin model exhibits a behaviour in which a device D of any type with unique identity Did can successfully exchange with anchor A a message k encrypted by using a secret keAD shared between D and A. This property mainly serves as a sanity check on the model. If it does not hold, it would mean our model does not model the normal message flow, which could indicate a flaw in the model. Tamarin automatically proves this property and generates the expected trace in the form of a graphical representation of the rule instantiations and the message flow. We additionally proved several other sanity-checking properties to minimize the risk of modeling errors.

The second example goal is the core secrecy property with respect to a classical attacker. When a controller C is associated with a forwarding device F, then the following expresses that unless the attacker compromises either C or F, he cannot learn any messages exchanged between them. Note that K(m) is a special event that denotes that the attacker knows at this time.

lemma message_secrecy [use_induction]:
 "All C F Did1 Did2  k seed #i.
   /* If a message k is exchanged */
      ( SendSec(C, F, Did1, Did2, k, seed) @ #i &
   /* without the adversary compromising any device */
      not (Ex #j.
        Compromise_Device(C, F, Did1, Did2, seed) @ #j)
      ) ==>
   /* then the adversary cannot know k */
     not ( Ex #j. K(k) @ #j) "

Tamarin also proves this property automatically. The above result implies that if a forwarding device F with identity Did1 and a controller C with identity Did2 has exchanged a message k encrypted under a shared seed, and the attacker did not compromise any device at any time, then the attacker will not learn k.

Similarly, the following example expresses the perfect forward secrecy for the communications between two devices.

lemma message_forward_secrecy [use_induction]:
 "All C F Did1 Did2  k seed #i.
      ( SendSec(C, F, Did1, Did2, k, seed) @ #i &
        not (Ex #j seed2.
         Compromise_Device(C, F, Did1, Did2, seed2) @ j &
      ( /* then the adversary cannot know k */
        not ( Ex #j. K(k) @ #j)

Tamarin proves this property automatically, and the result additionally implies that the message is secure if the attacker did not compromise any device before the current communication session.

The final example property encodes the post-compromise security guarantees provided by anchor. In this example, if anchor was compromised, and then recovered through our protocol, then the confidentiality of communications between anchor and forwarding device F is guaranteed.

lemma message_secrecy_after_recovery [use_induction]:
 "All A M F C Did k enckey #i1 #i2 #i3.
          (Comppromised_A(A) @#i1 &
           Recovery_Done(A,M,F,C)@ #i2 & i1<i2 &
           SendSec(A, F, Did, k, enckey) @ #i3 & i2<i3)
     /* then the adversary cannot know k */
           not ( Ex #i4. K(k) @ #i4)

The property states that if anchor was compromised at session , and the recovery action has been completed afterwards at session , then the confidentiality of message k exchanged in a later time between A and forwarding device F is guaranteed.

The above properties are all proven automatically by the Tamarin prover on a PC333Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz, 16GB memory. within .

6. Implementation

A prototype of anchor has been implemented as envisioned in Figure 2. Our implementation, using the POX controller and CBench444CBench is the default and most widely used tool for benchmarking control plane performance (Shalimov et al., 2013; Khattak et al., 2014; Zhao et al., 2015). (OpenFlow switches emulator), has approximately 2k lines of Python code and 700 lines of C code (integration with CBench). It uses Google’s protobuf (Google, 2017) for defining the protocols and efficiently serializing the data. In this section we give an overview of important implementation details. The evaluation of the different components of the architecture is presented in Section 7.

6.1. A source of strong entropy

We have 32 pools of events fed by four different sources, (1) incoming packet rate sent by controllers; (2) incoming packet rate of anchor; (3) network statistics of forwarding devices; and (4) random bytes from local systems. Each of the sources feeds the pools in its own way. Sources (1) and (3) use a round-robin approach, whereas sources (2) and (4) randomly select the next pool to put the new event in. In this way, we have a diversity of approaches for feeding the pools of noise, making the “guessing task” of an attacker harder. Each pool needs to store only the digest of the SHA512 hashing function. The current digest and the newly arrived events are used as input of the hashing function. Lastly, once the pool has been used by the source of strong entropy, it is reset to a new initial state, which consists of the digest of a hash function using as input random bytes of a local entropy source such as /dev/urandom.

To implement the entropy_update() function (see Algorithm 1), we can use the pools of noise circularly (e.g., and , and , and so forth), in a combined circular and random way ( and , and , and so forth), or in a completely random fashion. The number of pools (32) and this diversity of approaches for using the pools make it very hard for an attacker to enumerate the possible values for the events used to update the generator’s internal state (Ferguson et al., 2011).

Even if an attacker is controlling two or more external sources in a timely manner, it will be hard to guess the new state of the external entropy. First, the attacker needs to enumerate the events of the pools being used on each update. This, by itself, is something hard to achieve since the attacker does not know the update sequence of these pools, i.e., which external sources are being used, in which sequence, to update each pool. In other words, he/she would have to know all sources of noise, and the sequence in which they are being used to update the pools. It is also worth emphasizing that the external sources need to have a pre-defined maximum rate for sending the heartbeats, i.e., compromised sources cannot send data at a higher frequency to influence subsequent updates of the external entropy. Second, the attacker would need to have additional knowledge regarding the internal entropy.

6.2. Pseudorandom generator (PRG)

Our PRG combines the implementation strengths of different solutions such as the PRF of SPINS (Perrig et al., 2002) (which is based on an HMAC function), provably secure constructions for building robust PRGs (Dodis et al., 2013; Ferguson et al., 2011), and unbounded state spaces through cryptographic primitives (Stark, 2017).

As HASH function we have chosen SHA512. As HMAC function, we have chosen the one time authentication function crypto_onetimeauth() from NaCl (Bernstein et al., 2012). This function ensures security and performance while generating outputs of 16 bytes that are indistinguishable from random.

PRG at the devices. As the controllers and forwarding devices do not have a source of strong entropy, we implemented a slightly modified version of the algorithm for these components to use this logically-centralized security service provided by the anchor. Essentially, we replace the entropy_get() function by entropy_remote(). Instead of using local data, this function makes an entropy request to the anchor to obtain a source of strong entropy. This function is essential to provide recovering security by refreshing, improving the resilience of the PRG.

6.3. Secure cryptographic keys generators

Based on the algorithm proposed in (Kreutz et al., 2017), we have implemented an iDVV-based secure cryptographic keys generator that supports seven different cryptographic primitives. Specifically, we use each of these primitives as input to the idvv_next(primitive_id) function that is used to generate the next key. In our implementation, we used the following primitives: MD5, SHA1, SHA512, SHA256, poly1305aes_ authenticate, crypto_onetimeauth, and crypto_hash. While the first four functions are provided by OpenSSL, the last three are provided by an independent implementation of Poly1305-AES and NaCl. As MD5 and SHA1 are deprecated, we use them only for performance comparison purposes.

To understand the rationale for our implementation, we give a bit of context to clarify the difference between our solution and traditional key derivation functions (KDFs). Both solutions are used to generate secure cryptographic keys that can resist different types of attacks, such as exhaustive key search attacks (Yao and Yin, 2005). KDFs have common design characteristics, such as strong hash functions to compute digests for the raw key material. A secure KDF can be defined as ((Yao and Yin, 2005). is a strong hash function such as SHA256 or SHA512. The exponent represents the number of iterations used to make the task of the attackers harder. A common value for is . This exponent is particularly necessary if the entropy of the input (e.g., password, seed, key) is unknown. In practice, the input of the KDF is likely to be of low-entropy (Yao and Yin, 2005). While in some use cases a high exponent might be necessary to increase the cost of an attack trying to recover the key, it also significantly increases the cost of the key derivation function for high performance latency-sensitive applications.

Differently from a traditional key derivation scheme, our implementation using the iDVV generator in the context of anchor uses high-entropy values. In other words, we do not need to recur to the exponent as a means to compensate a potentially low-entropy . By using by default two 32 bytes indistinguishable-from-random values in our generator, we make the task of an attacker very hard. It is also worth mentioning that iDVVs are essentially used in an association basis, i.e., they have a relatively short lifetime.

7. Evaluation

In this section we evaluate the essential security mechanisms and services of our architecture.

For the performance measurements, we used machines with two quad-core Intel Xeon E5620 2.4GHz, with 2x4x256KB L2 / 2x12MB L3 cache, 32GB SDIMM at 1066MHz, with hyper-threading enabled. These machines were interconnected by a Gigabit Ethernet switch and ran Ubuntu Server 14.04 LTS.

7.1. Source of entropy and PRGs

We empirically evaluate both the source of strong entropy and PRGs through statistical methods and tools, following state of the art recommendations (Bassham et al., 2010). To achieve our goal, we used NIST’s test suite (NIST, 2017). We generated one file containing 50MB of random bits per generator. These files were used as input for the test suite tool STS (NIST, 2017). In the end, our generators passed the absolute majority of tests and sub-tests: they failed only 2 sub-tests out of 188 (passed 146 out of 148 non-overlapping template matching), as summarized in Table 2. This gives a very high level of confidence to our generators.

Test Result
Block Frequency
Cumulative Sums (forward)
Cumulative Sums (backward)
Longest Run of Ones
Binary Matrix Rank

Discrete Fourier Transform

Non-overlapping Template Matching 146/148
Approximate Entropy
Random Excursions 8/8
Random Excursions Variant 18/18
Serial (first)
Serial (second)
Linear Complexity
Table 2. STS: results of the single tests

7.2. On the performance of key generation

In this section, we analyse the performance of our key generator, which is essential to provide low latency and high throughput control plane communication at a low cost.

Figure 4 shows the latency of the seven cryptographic primitives we used with our generator. We tested each primitive by generating keys of different sizes (16, 32, 64, and 128 bytes). The best performance is achieved by the implementations based on SHA1 and MD5, as expected. However, these two implementations have also the worst serial correlation coefficient, as shown in (Kreutz et al., 2017). The generators that use SHA512 or Poly-OTP have good performance, achieving a good security-performance tradeoff.

7.3. Device-to-device communication performance

Connection establishment. While a TLS connection takes around to be established, a device association using the anchor takes less than . This means that anchor can easily support large-scale data centers (e.g., 1k switches and 100k hosts (Greenberg et al., 2008; Al-Fares et al., 2008; Benson et al., 2010b)) while being orders of magnitude more efficient than traditional solutions for this particular metric. The scale of the improvement of our connection setup process when compared to the TLS handshake is due to three main factors. First, our algorithm has half the number of steps. Second, we use symmetric cryptography only. Third, we use the fast ciphering provided by NaCl.

Communications overhead. Figure 3 shows the results of control plane communications using OpenSSL, TCP, and two versions of ANCHOR. For communication of up to 128 forwarding devices, sending 10k control messages each, our solution requires (while offering stronger security guarantees - see below) only half of the resources and time of an OpenSSL-based implementation using AES256-SHA, the most widely available cipher suite.

In Figure 3, we can also observe the overhead of confidentiality (TCP-ANCHOR-EMAC). In contrast to providing only authenticity and integrity (TCP-ANCHOR-MAC), confidentiality incurs an overhead of around 15%.

It is worth emphasizing that we achieved these results by ensuring also much stronger security, as we generated one secret key per packet. On the other hand, the OpenSSL-based implementation used a single key (for the symmetric ciphering) for the entire communication session.

Figure 3. Control plane communication costs
Figure 4. Latency of different key generators

7.4. Traditional solutions Anchor

In Table 3 we provide a summarised comparison between a traditional solution and our anchor. As traditional solutions we considered the EJBCA ( and OpenSSL, two popular implementations of PKI and TLS, respectively. As we have shown before, our bootstrap process (device registration and association) is much faster and our connection latency is also significantly lower. In addition, our solution has nearly one order of magnitude less LOC and uses four times fewer external libraries. This makes a difference from a resilience perspective. For instance, to formally prove more than 717k LOC (EJBCA + OpenSSL) is by itself a tremendous challenge. Moreover, it gets considerably worse if we take into account eighty external libraries and eleven programming languages.

Our proposed architecture offers a functionally equivalent level of security with respect to properties such as authenticity, integrity and confidentiality, when compared to traditional alternatives. Additionally, anchor offers a higher level of security by providing post-compromise security (PCS) and post-quantum security (PQS). While the former is ensured through post-compromise recovery (see Section 4.10), the latter is a consequence of using only symmetric cryptography. Further, the lightweight nature of our mechanisms, such as the iDVV, make them amenable to be used on a per message basis to secure communication, increasing cryptographic robustness. Moreover, by having fewer LOC, we significantly reduce the threat surface.

Finally, it is worth emphasizing that the perfect forward secrecy (*) of traditional solutions, such as those provided by the different implementations of TLS, is not easy or simple to enforce. First, in spite of TLS providing ciphers that offer PFS, in practice, different cipher suites do not feature it (Sheffer et al., 2015). This means that not all implementations and deployments of TLS offer PFS, or provide it with very low encryption grade (Huang et al., 2014; DigiCert Inc, 2017;, 2015). To give an example, widely deployed web servers, such as Apache and Nginx (DigiCert Inc, 2017) and most DHE- and ECDHE-enabled servers suffer from weak PFS configurations (Huang et al., 2014; Adrian et al., 2015; Springall et al., 2016).

Functionality Traditional solutions blackAnchor

Typical Software
EJBCA (PKI) + OpenSSL (TLS) anchor + iDVV + NaCl

Device Identification
based on certificates; costs = issue a certificate based on unique IDs controlled by the anchor; costs = register the device (assign a unique ID)

Device Registration
based on certificates; costs = certificate installation + security control policy/service registration protocol; costs = register the device + iDVV bootstrap

Device Association & KeyGen
12 step mutual handshake + DH for session keys (incl. certificate validation - any two device can establish an association) 6 step trust establishment with anchor + iDVVs per message, session, interval of time, … (anchor has to authorize association)

Security Properties
Authenticity in 1,…,8    ✓ in 1,…,8    ✓
Integrity in 1,…,8    ✓ in 1,…,8    ✓
Confidentiality in 1,…,8    ✓ in 1,…,8    ✓
PFS in 1,…,8    ✓(*) in 1,…,8    ✓
PCS in 1,…,8    ✗ in 1,…,8    ✓
PQS in 1,…,8    ✗ in 1,…,8    ✓

symmetric cryptography (cipher: AES256-SHA) symmetric cryptography (cipher: Salsa20)

TLS stack
highly configurable and complex (717k LOC) easy to use, simple (85k LOC), and efficient

Table 3. Traditional solutions anchor

8. Related Work in SDN security

Related work on SDN security (see (Scott-Hayward et al., 2016; Kreutz et al., 2015; Dacier et al., 2017a; Yoon et al., 2017) for broad surveys) focuses on securing specific components of the architecture. In particular, as most attacks exploit vulnerabilities of the control plane, the security of the controller and the applications running on top has deserved special attention. For instance, the controller Rosemary (Shin et al., 2014) implements a network application containment and resilience strategy that addresses the problem of malicious applications leading to loss of network control. Similarly, FortNOX (Porras et al., 2012), a software extension for the NOX controller, is robust to adversarial applications by providing role-based authorization and security constraint enforcement. While orthogonal to our work, these solutions could take advantage of anchor to implement some of its services. For instance, Rosemary requires a PKI infrastructure for application authorisation that could be replaced by anchor, inheriting its advantages.

Another line of work in SDN security is devoted to DoS/DDoS attack detection and prevention. As an example, the use of lightweight information hiding based authentication (by means of secrecy through obscurity) has been proposed as one way of protecting SDN controllers from this type of attack (Abdullaziz et al., 2016). The idea is to use a specific field in the IP protocol to hide the switch authentication ID. In order for the scheme to be workable, it is assumed that a look-up table and unique IDs are shared among devices through existing key distribution protocols. Again, this point solution could take advantage of anchor for this purpose.

Interestingly, not much attention has been paid to the security of control plane associations and communication between devices, one of the aspects we address in this paper. While TLS is the solution recommended by ONF, recent research discusses the strengths and weaknesses of this protocol as a means to provide authenticated and encrypted control channels (Samociuk, 2015), which is aligned with many of the arguments we make here. As we explained, while the use of TLS gives important security properties, it has an impact on control plane performance. Additionally, the complexity of the infrastructure software has been recurrently pointed out as one of the main causes for a high number of reported vulnerabilities, that in many cases have led to security attacks (Zhou and Jiang, 2012; Markowsky, 2013; McGraw, 2004; Hoepman and Jacobs, 2007). As we argue in this paper, by logically-centralizing crucial security mechanisms, anchor removes complexity from both controllers and switches, enhancing the robustness of the infrastructure, and still achieving a gain in performance.

Finally, to protect control plane communications between controllers and forwarding devices our solution makes use of two existing mechanisms: iDVV (Kreutz et al., 2018, 2017), as a secure and low-cost method for generation of authentication codes, and NaCl (Bernstein et al., 2012), as a robust alternative to OpenSSL. We apply these solutions to SDN, but given their standalone nature they can be applied to different scenarios.

To our knowledge, an architectural approach as the one we propose here (which ultimately led to following the SDN philosophy of “logical centralization”) was lacking. Importantly, this approach allowed us to gain a global perspective of the relevant gaps in SDN and the limitations of existing solutions to the problem. This first step gave insight into one of the most relevant problems of SDN (as noted by the ONF or MEF security groups (ONF, 2017; MEF, 2017)): the security of the associations and communications between devices— which jointly with the architecture itself, is one of the main contributions of our paper.

9. Discussion

We briefly discuss how we filled the gaps identified in Section 3. Incidentally, we also show, in Appendix E, to which extent these solutions cover eleven of ONF’s security requirements. We conclude the section with a critique of our choices and results.

9.1. Meeting the challenges

Security performance? Control channels need to provide high performance (high throughput and low latency) while keeping the communication secure. However, as it has been shown, security primitives have a non-negligible impact on performance. To mitigate this problem, we selected appropriate cryptographic primitives (SHA512), libraries (NaCl), and key generation mechanisms (iDVV) to ensure the security of control plane communications maintaining high performance. By logically centralizing the fundamental aspects of these mechanisms in the anchor, the performance overhead introduced in forwarding devices and controllers is limited, as they require only minimal functionality to ‘hook’ to the anchor instructions.

Complexity robustness? Traditional implementations of SSL/TLS, such as OpenSSL, have a large, complex code base, that recurrently leads to vulnerabilities being discovered. Similar problems are observed in PKI subsystems. It is well know that an effective means to achieve robustness is by reducing complexity. Hence our choice for the NaCl and iDVV mechanisms to help fill the gap, since they are respectively lightweight (small code base), efficient, yet secure alternatives to OpenSSL-like implementations. As such, they are a robust solution to provide authentication and authorisation material for the secure communications protocols we propose. They are also amenable to verification mechanisms aimed to assure correctness, which are much harder to employ in very large code bases. Again, the centralization of the non-functional mechanisms introduced in our solution is the key to reduce complexity of networking devices, improving their robustness.

Global security policies? We have argued that controllers and network devices often employ suboptimal network authentication and secure communication mechanisms, despite recommendations from ONF and other such organizations for the opposite. This problem is very similar in nature to the state of affairs in networking before SDN. In traditional networks, enforcing relatively “simple” policies such as access control rules (Casado et al., 2007) or traffic engineering mechanisms (Jain et al., 2013) was either very hard or even impossible in practice. Given the current undesirable state of affairs, we believe the same to be true to non-functional properties, with security as a prominent example. Our logically centralized anchor architecture addresses this gap by providing a means for making centralized policy rules and the necessary mechanisms to enforce them, permeating the SDN architecture in a global and coherent way.

Resilient roots-of-trust? We debated that there is a (probably reduced) number of functions which should not be left to ad-hoc implementations, due to their criticality on system correctness. The list is not closed, but we hope to have shown that strong sources of entropy and resilient indistinguishable-from-random number generators are clear examples of difficult-to-get-right mechanisms that benefit from such logically centralized approach. anchor addresses this issue, by providing the motivation to design and verify careful and resilient once-and-for-all implementations of such root-of-trust mechanisms, which can then be reinstantiated in different SDN deployments.

9.2. Devil’s advocate analysis

Doesn’t the logical centralization of non-functional properties create a single point of failure?

The results of this paper already go a long way to improving robustness of a single root-of-trust, compared to the state of the art: lowering failure probability; mitigating and recovering from the consequences of failure. The logical next step would be to try and prevent failures in the first place. However, the failure of a simplex system of reasonable complexity cannot be prevented.

Nevertheless, note that logical centralization is not necessarily physical centralization. And indeed, our plan for future work (and the way we drafted our architecture paved the way) is to leverage state-of-the-art security and dependability mechanisms using replication. For instance, to achieve tolerance of Byzantine faults, we can readily enhance anchor by replication, taking advantage of state machine replication libraries such as BFT-SMaRt (Bessani et al., 2014), replicating and diversifying components to prevent failure of this logically central point, with the desired confidence. These concepts have been applied to root-of-trust like configurations similar to anchor (Zhou et al., 2002; Cachin and Samar, 2004; Kreutz et al., 2014). Furthermore, systems designed with state machine replication in mind can also handle different types of threats, such as DoS, without compromising the operation of the service (Kreutz et al., 2016).

Won’t the natural hardware evolution be by itself enough to reduce the penalty imposed by cryptographic primitives? The recent reality seems to contradict this assertion – hardware evolution does not seem enough, for several reasons. First, new hardware architectures can benefit different existing software-based solutions. For instance, both NaCl and OpenSSL take advantage of hardware-based AES accelerators. Second, and as is well known, the fixed price of advancements in hardware seems to be coming to an end (IEEE Spectrum, 2015). This is made clear by most of the major IT companies, such as Google and Microsoft, to be redesigning existing software as a response to cope with this problem (Livshits et al., 2015).

Aren’t traditional PKI and TLS implementations enough? Following what is becoming recurrently advocated by many in the industry and in the security community, we have tried to argue that the simplicity and size of software and IT infrastructure matters (Cisco, 2014; Verizon, 2015). Higher complexity has been shown to lead inevitably to an increased likelihood of bugs and security incidents in software. Indeed, different implementations of PKI and TLS have been recently used as powerful “weapons” for cyber-attacks and cyber-espionage (PwC, CSO magazine and CERT/CMU, 2014; BOCEK, 2015), leading to concerns about their robustness. Contrary to what this argument may suggest, that does not mean PKI and TLS are “broken”. We believe they remain fundamental to various IT infrastructures. However, as the main challenges of securing SDN are usually relatively constrained to within a network domain, we have come to understand that simpler, domain-specific solutions seem to be preferable in this environment when compared to complex infrastructures such as the PKI, and large code bases as OpenSSL.

Wouldn’t the use of out-of-band control channels solve most problems? Out-of-band channels may be useful in some contexts, but they are not “intrinsically” secure. It is a recurrent mistake to consider physical isolation, per se, as a form of security. Several studies have indeed argued the opposite: that out-of-band channels worsen the problem, by making control plane management more complex and less flexible, endangering control plane communications (Edwards, 2014; Manousakis and Ellinas, 2015). We do not take a stance in this discussion, but the fact is that real incidents, such as NSA sniffing of Google’s cables between data centers (Schneier, 2015), seem clear examples that out-of-band channels are not, per se, enough.

9.3. Other use cases of ANCHOR

Using anchor beyond control plane communications. As already alluded to in Section 8, anchor can be extended to support other use cases. For instance, one application running on top of the SDN controller could be required to provide proper credentials to identify itself. Once successfully authenticated, it should have access to a specific set of system attributes defined by the operator during registration (e.g., read, write, notify, among other system calls (Ferguson et al., 2013; Aliyu et al., 2017)). Towards this goal, different controllers could rely on authentication and authorization attributes globally enforced by anchor. Another interesting use case for anchor would be to offer security support for controller clustering. This is a timely problem. To give an example, the current release of OpenDaylight does not provide encryption or authentication of control messages exchanged among controller instances (OpenDaylight Project, 2018). Since each controller instance would need to be registered with anchor, it would be possible to provide the same security mechanisms and services we grant to the southbound connection, to ensure security in east-west communication between controllers.

Addressing other non-functional properties of SDN. The design of anchor is generic enough to accommodate non-functional properties beyond security, such as dependability or quality of service. With respect to the former, anchor could help in modularising the problem of replicated control. Specifically, anchor could be responsible for coordination between controller replicas, for instance by guaranteeing a strongly consistent view of the network across all instances. Similar to our security use case, the additional modularity of such design would allow a clean separation of concerns that could simplify the design of the various components. Recent proposals (Botelho et al., 2016) have indeed started following a similar design choice. anchor could also provide trusted measurement services for ensuring a certain level of service even in the presence of malicious forwarding devices. For instance, once a malicious forwarding device were detected (Chi et al., 2015; Kamisiński and Fung, 2015), anchor could automatically remove it from the list of legitimate devices, forcing the disconnection of those devices by the controllers of the network, which would be registered to receive such events. The subsequent topology updates on the controllers would trigger automatic traffic re-routing to ensure the quality of service of applications.

10. Concluding remarks

In this paper, we debated the problem of enforcing non-functional properties in SDN, such as security or dependability. Re-iterating the successful philosophy behind the inception of SDN itself, we advocate the concept of logical centralization of SDN non-functional properties provision, which we materialize in terms of the blueprint of an architectural framework, anchor.

Taking ‘security’ as a proof-of-concept use case, we have shown the effectiveness of our proposal. We made a gap analysis of security in SDN and proposed solutions, by populating the anchor middleware with crucial mechanisms and services to fill those gaps and enhance the security of SDN.

We evaluated the architecture, especially focusing on the security-performance analysis tradeoff, giving proofs of the algorithms, cryptographic robustness analyses, and experimental performance evaluations. By resorting to lightweight yet secure primitives, we outperform the most widely used encryption of OpenSSL by 50%, with a higher level of security. Our solution also fulfills eleven of the security requirements recommended by ONF.

The mechanisms we propose are certainly not the final answer to SDN security problems. That is not our claim. We however believe, and hope to have justified in the paper, that an architecture that logically centralizes non-functional properties of an SDN, has the potential to address some of the most preeminent unsolved problems regarding the robustness of the infrastructure. We thus hope our work to trigger an important discussion on these fundamental architectural aspects of SDN.


  • (1)
  • Abdullaziz et al. (2016) O. I. Abdullaziz, Y. J. Chen, and L. C. Wang. 2016. Lightweight Authentication Mechanism for Software Defined Network Using Information Hiding. In 2016 IEEE Global Communications Conference (GLOBECOM). 1–6.
  • Adrian et al. (2015) David Adrian, Karthikeyan Bhargavan, Zakir Durumeric, Pierrick Gaudry, Matthew Green, J. Alex Halderman, Nadia Heninger, Drew Springall, Emmanuel Thomé, Luke Valenta, Benjamin VanderSloot, Eric Wustrow, Santiago Zanella-Béguelin, and Paul Zimmermann. 2015. Imperfect Forward Secrecy: How Diffie-Hellman Fails in Practice. In Proceedings of the 22Nd ACM SIGSAC Conference on Computer and Communications Security (CCS ’15). ACM, New York, NY, USA, 5–17.
  • Akhunzada et al. (2015) Adnan Akhunzada, Ejaz Ahmed, Abdullah Gani, Muhammad Khurram Khan, Muhammad Imran, and Sghaier Guizani. 2015. Securing software defined networks: taxonomy, requirements, and open issues. IEEE Communications Magazine 53, 4 (2015), 36–44.
  • Al-Fares et al. (2008) Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. 2008. A Scalable, Commodity Data Center Network Architecture. SIGCOMM Comput. Commun. Rev. 38, 4 (Aug. 2008), 63–74.
  • Albrecht et al. (2015) Martin R Albrecht, Davide Papini, Kenneth G Paterson, and Ricardo Villanueva-Polanco. 2015. Factoring 512-bit RSA moduli for fun (and a profit of $9,000). (2015).
  • Aliyu et al. (2017) A. L. Aliyu, P. Bull, and A. Abdallah. 2017. A Trust Management Framework for Network Applications within an SDN Environment. In 2017 31st International Conference on Advanced Information Networking and Applications Workshops (WAINA). 93–98.
  • Almeida et al. (2013) J. Bacelar Almeida, Manuel Barbosa, Jorge S. Pinto, and Barbara Vieira. 2013. Formal verification of side-channel countermeasures using self-composition. Science of Computer Programming 78, 7 (2013), 796 – 812. Special section on Formal Methods for Industrial Critical Systems (FMICS 2009 + FMICS 2010) & Special section on Object-Oriented Programming and Systems (OOPS 2009), a special track at the 24th ACM Symposium on Applied Computing.
  • Alvizu et al. (2017) R. Alvizu, G. Maier, N. Kukreja, A. Pattavina, R. Morro, A. Capello, and C. Cavazzoni. 2017. Comprehensive survey on T-SDN: Software-defined Networking for Transport Networks. IEEE Communications Surveys Tutorials PP, 99 (2017), 1–1.
  • Anchor (2018) Anchor. 2018. Tamarin models for ANCHOR. (2018).
  • Arbettu et al. (2016) R. K. Arbettu, R. Khondoker, K. Bayarou, and F. Weber. 2016. Security analysis of OpenDaylight, ONOS, Rosemary and Ryu SDN controllers. In 2016 17th International Telecommunications Network Strategy and Planning Symposium (Networks). 37–44.
  • Arnaud and Fouque (2013) Cyril Arnaud and Pierre-Alain Fouque. 2013. Timing Attack against Protected RSA-CRT Implementation Used in PolarSSL. In Topics in Cryptology - CT-RSA 2013, Ed Dawson (Ed.). Lecture Notes in Computer Science, Vol. 7779. Springer Berlin Heidelberg, 18–33.
  • Bassham et al. (2010) Lawrence E. Bassham, III, Andrew L. Rukhin, Juan Soto, James R. Nechvatal, Miles E. Smid, Elaine B. Barker, Stefan D. Leigh, Mark Levenson, Mark Vangel, David L. Banks, Nathanael Alan Heckert, James F. Dray, and San Vo. 2010. SP 800-22 Rev. 1a. A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications. Technical Report. Gaithersburg, MD, United States.
  • Benson et al. (2010a) Theophilus Benson, Aditya Akella, and David A. Maltz. 2010a. Network Traffic Characteristics of Data Centers in the Wild. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement (IMC ’10). ACM, New York, NY, USA, 267–280.
  • Benson et al. (2010b) Theophilus Benson, Aditya Akella, and David A. Maltz. 2010b. Network Traffic Characteristics of Data Centers in the Wild. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement (IMC ’10). ACM, New York, NY, USA, 267–280.
  • Benson et al. (2010c) Theophilus Benson, Ashok Anand, Aditya Akella, and Ming Zhang. 2010c. Understanding Data Center Traffic Characteristics. SIGCOMM Comput. Commun. Rev. 40, 1 (Jan. 2010), 92–99.
  • Berde et al. (2014) Pankaj Berde, Matteo Gerola, Jonathan Hart, Yuta Higuchi, Masayoshi Kobayashi, Toshio Koide, Bob Lantz, Brian O’Connor, Pavlin Radoslavov, William Snow, et al. 2014. ONOS: towards an open, distributed SDN OS. In Proceedings of the third workshop on Hot topics in software defined networking. ACM, 1–6.
  • Bernstein et al. (2012) DanielJ. Bernstein, Tanja Lange, and Peter Schwabe. 2012. The Security Impact of a New Cryptographic Library. In Progress in Cryptology - LATINCRYPT 2012, Alejandro Hevia and Gregory Neven (Eds.). Lecture Notes in Computer Science, Vol. 7533. Springer Berlin Heidelberg, 159–176.
  • Bernstein (2009) Daniel J. Bernstein. 2009. Introduction to post-quantum cryptography. Springer Berlin Heidelberg, Berlin, Heidelberg, 1–14.
  • Bernstein et al. (2016) Daniel J Bernstein, Tanja Lange, and Ruben Niederhagen. 2016. Dual EC: a standardized back door. In The New Codebreakers. Springer, 256–281.
  • Bessani et al. (2014) A. Bessani, J. Sousa, and E. E. P. Alchieri. 2014. State Machine Replication for the Masses with BFT-SMART. In 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. 355–362.
  • Beurdouche et al. (2015) Benjamin Beurdouche, Karthikeyan Bhargavan, Antoine Delignat-Lavaud, Cédric Fournet, Markulf Kohlweiss, Alfredo Pironti, Pierre-Yves Strub, and Jean Karim Zinzindohoue. 2015. A messy state of the union: Taming the composite state machines of TLS. In 2015 IEEE Symposium on Security and Privacy. IEEE, 535–552.
  • Bhargavan et al. (2017) Karthikeyan Bhargavan, Barry Bond, Antoine Delignat-Lavaud, Cédric Fournet, Chris Hawblitzel, Catalin Hritcu, Samin Ishtiaq, Markulf Kohlweiss, Rustan Leino, Jay Lorch, et al. 2017. Everest: Towards a Verified, Drop-in Replacement of HTTPS. In LIPIcs-Leibniz International Proceedings in Informatics, Vol. 71. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.
  • Bhargavan et al. (2013) Karthikeyan Bhargavan, Cédric Fournet, Markulf Kohlweiss, Alfredo Pironti, and Pierre-Yves Strub. 2013. Implementing TLS with verified cryptographic security. In Security and Privacy (SP), 2013 IEEE Symposium on. IEEE, 445–459.
  • BOCEK (2015) KEVIN BOCEK. 2015. Infographic: How an Attack by a Cyber-espionage Operator Bypassed Security Controls. (Jan. 2015).
  • Botelho et al. (2016) Fábio Botelho, Tulio A Ribeiro, Paulo Ferreira, Fernando MV Ramos, and Alysson Bessani. 2016. Design and Implementation of a Consistent Data Store for a Distributed SDN Control Plane. In Dependable Computing Conference (EDCC), 2016 12th European. IEEE, 169–180.
  • Brumley and Tuveri (2011) BillyBob Brumley and Nicola Tuveri. 2011. Remote Timing Attacks Are Still Practical. In Computer Security - ESORICS 2011, Vijay Atluri and Claudia Diaz (Eds.). Lecture Notes in Computer Science, Vol. 6879. Springer Berlin Heidelberg, 355–371.
  • Buhov et al. (2015) D. Buhov, M. Huber, G. Merzdovnik, E. Weippl, and V. Dimitrova. 2015. Network Security Challenges in Android Applications. In 2015 10th International Conference on Availability, Reliability and Security. 327–332.
  • Cachin and Samar (2004) C. Cachin and A. Samar. 2004. Secure distributed DNS. In International Conference on Dependable Systems and Networks, 2004. 423–432.
  • Casado et al. (2007) Martin Casado, Michael J. Freedman, Justin Pettit, Jianying Luo, Nick McKeown, and Scott Shenker. 2007. Ethane: Taking Control of the Enterprise. In Proceedings of the 2007 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM ’07).
  • Chi et al. (2015) Po-Wen Chi, Chien-Ting Kuo, Jing-Wei Guo, and Chin-Laung Lei. 2015. How to detect a compromised SDN switch. In Proceedings of the 2015 1st IEEE Conference on Network Softwarization (NetSoft). 1–6.
  • Cisco (2014) Cisco. 2014. Annual Security Report. (2014).
  • Cromwell (2017) Bob Cromwell. 2017. Massive Failures of Internet PKI. (2017).
  • Dacier et al. (2017a) M. C. Dacier, H. Konig, R. Cwalinski, F. Kargl, and S. Dietrich. 2017a. Security Challenges and Opportunities of Software-Defined Networking. IEEE Security Privacy 15, 2 (March 2017), 96–100.
  • Dacier et al. (2017b) Marc C Dacier, Hartmut König, Radoslaw Cwalinski, Frank Kargl, and Sven Dietrich. 2017b. Security Challenges and Opportunities of Software-Defined Networking. IEEE Security & Privacy 15, 2 (2017), 96–100.
  • DigiCert Inc (2017) DigiCert Inc. 2017. Enabling Perfect Forward Secrecy. (2017).
  • Dodis et al. (2013) Yevgeniy Dodis, David Pointcheval, Sylvain Ruhault, Damien Vergniaud, and Daniel Wichs. 2013. Security Analysis of Pseudo-random Number Generators with Input: /Dev/Random is Not Robust. In Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security (CCS ’13). ACM, New York, NY, USA, 647–658.
  • Dowling et al. (2016) Benjamin Dowling, Douglas Stebila, and Greg Zaverucha. 2016. Authenticated Network Time Synchronization. In 25th USENIX Security Symposium (USENIX Security 16). USENIX Association, Austin, TX, 823–840.
  • Edwards (2014) Chris Edwards. 2014. Researchers probe security through obscurity. Commun. ACM 57, 8 (2014), 11–13.
  • Egele et al. (2013) Manuel Egele, David Brumley, Yanick Fratantonio, and Christopher Kruegel. 2013. An Empirical Study of Cryptographic Misuse in Android Applications. In Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security (CCS ’13). ACM, New York, NY, USA, 73–84.
  • Fan et al. (2016) Shuqin Fan, Wenbo Wang, and Qingfeng Cheng. 2016. Attacking OpenSSL Implementation of ECDSA with a Few Signatures. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, 1505–1515.
  • Ferguson et al. (2013) Andrew D. Ferguson, Arjun Guha, Chen Liang, Rodrigo Fonseca, and Shriram Krishnamurthi. 2013. Participatory networking: an API for application control of SDNs. In Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM (SIGCOMM ’13). ACM, New York, NY, USA, 327–338.
  • Ferguson et al. (2011) Niels Ferguson, Bruce Schneier, and Tadayoshi Kohno. 2011. Cryptography engineering: design principles and practical applications. John Wiley & Sons.
  • Google (2017) Google. 2017. Protocol Buffers. (2017).
  • Greenberg et al. (2009) Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta. 2009. VL2: A Scalable and Flexible Data Center Network. SIGCOMM Comput. Commun. Rev. 39, 4 (Aug. 2009), 51–62.
  • Greenberg et al. (2008) Albert Greenberg, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta. 2008. Towards a Next Generation Data Center Architecture: Scalability and Commoditization. In Proceedings of the ACM Workshop on Programmable Routers for Extensible Services of Tomorrow (PRESTO ’08). ACM, New York, NY, USA, 57–62.
  • Hastings et al. (2016) Marcella Hastings, Joshua Fried, and Nadia Heninger. 2016. Weak Keys Remain Widespread in Network Devices. In Proceedings of the 2016 ACM on Internet Measurement Conference. ACM, 49–63.
  • Heninger et al. (2012) Nadia Heninger, Zakir Durumeric, Eric Wustrow, and J. Alex Halderman. 2012. Mining Your Ps and Qs: Detection of Widespread Weak Keys in Network Devices. In Proceedings of the 21st USENIX Conference on Security Symposium (Security’12). USENIX Association, Berkeley, CA, USA, 35–35.
  • Hill (2013) Brad Hill. 2013. Failures of Trust in the Online PKI Marketplace Cannot be Fixed by "Raising the Bar" on Certificate Authority Security. (2013).
  • Ho et al. (2003) Yu-Chi Ho, Qian-Chuan Zhao, and D. L. Pepyne. 2003. The no free lunch theorems: complexity and security. IEEE Trans. Automat. Control 48, 5 (2003), 783–793.
  • Hoepman and Jacobs (2007) Jaap-Henk Hoepman and Bart Jacobs. 2007. Increased Security Through Open Source. Commun. ACM 50, 1 (Jan. 2007), 79–83.
  • Huang et al. (2014) L. S. Huang, S. Adhikarla, D. Boneh, and C. Jackson. 2014. An Experimental Study of TLS Forward Secrecy Deployments. IEEE Internet Computing 18, 6 (Nov 2014), 43–51.
  • IEEE Spectrum (2015) IEEE Spectrum. 2015. SPECIAL REPORT: 50 YEARS OF Moore’s LAW. (2015).
  • Jain et al. (2013) Sushant Jain, Alok Kumar, Subhasree Mandal, Joon Ong, Leon Poutievski, Arjun Singh, Subbaiah Venkata, Jim Wanderer, Junlan Zhou, Min Zhu, Jon Zolla, Urs Hölzle, Stephen Stuart, and Amin Vahdat. 2013. B4: experience with a globally-deployed software defined wan. In Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM (SIGCOMM ’13). ACM, New York, NY, USA, 3–14.
  • Kamisiński and Fung (2015) Andrzej Kamisiński and Carol Fung. 2015. FlowMon: Detecting Malicious Switches in Software-Defined Networks. In Proceedings of the 2015 Workshop on Automated Decision Making for Active Cyber Defense (SafeConfig ’15). ACM, New York, NY, USA, 39–45.
  • Katta et al. (2015) Naga Katta, Haoyu Zhang, Michael Freedman, and Jennifer Rexford. 2015. Ravana: Controller Fault-tolerance in Software-defined Networking. In Proceedings of the 1st ACM SIGCOMM Symposium on Software Defined Networking Research (SOSR ’15).
  • Khattak et al. (2014) Z. K. Khattak, M. Awais, and A. Iqbal. 2014. Performance evaluation of OpenDaylight SDN controller. In 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS). 671–676.
  • Kim et al. (2013) Soo Hyeon Kim, Daewan Han, and Dong Hoon Lee. 2013. Predictability of Android OpenSSL’s Pseudo Random Number Generator. In Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security (CCS ’13). ACM, New York, NY, USA, 659–668.
  • Kiravuo et al. (2013) Timo Kiravuo, Mikko Sarela, and Jukka Manner. 2013. A survey of ethernet lan security. IEEE Communications Surveys & Tutorials 15, 3 (2013), 1477–1491.
  • Klein et al. (2009) Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai Engelhardt, Rafal Kolanski, Michael Norrish, Thomas Sewell, Harvey Tuch, and Simon Winwood. 2009. seL4: Formal Verification of an OS Kernel. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles (SOSP ’09). ACM, New York, NY, USA, 207–220.
  • Kloti et al. (2013) Rowan Kloti, Vasileios Kotronis, and Paul Smith. 2013. Openflow: A security analysis. In Network Protocols (ICNP), 2013 21st IEEE International Conference on. IEEE, 1–6.
  • Koponen et al. (2010) Teemu Koponen, Martin Casado, Natasha Gude, Jeremy Stribling, Leon Poutievski, Min Zhu, Rajiv Ramanathan, Yuichiro Iwata, Hiroaki Inoue, Takayuki Hama, and Scott Shenker. 2010. Onix: a distributed control platform for large-scale production networks. In OSDI.
  • Kreutz et al. (2014) D. Kreutz, A. Bessani, E. Feitosa, and H. Cunha. 2014. Towards Secure and Dependable Authentication and Authorization Infrastructures. In 2014 IEEE 20th Pacific Rim International Symposium on Dependable Computing. 43–52.
  • Kreutz et al. (2016) Diego Kreutz, Oleksandr Malichevskyy, Eduardo Feitosa, Hugo Cunha, Rodrigo da Rosa Righi, and Douglas D.J. de Macedo. 2016. A cyber-resilient architecture for critical security services. Journal of Network and Computer Applications 63 (2016), 173 – 189.
  • Kreutz et al. (2015) D. Kreutz, F.M.V. Ramos, P. Esteves Verissimo, C. Esteve Rothenberg, S. Azodolmolky, and S. Uhlig. 2015. Software-Defined Networking: A Comprehensive Survey. Proc. IEEE 103, 1 (Jan 2015), 14–76.
  • Kreutz et al. (2013) Diego Kreutz, Fernando M.V. Ramos, and Paulo Verissimo. 2013. Towards Secure and Dependable Software-defined Networks. In Proceedings of the Second ACM SIGCOMM Workshop on Hot Topics in Software Defined Networking (HotSDN ’13). ACM, New York, NY, USA, 55–60.
  • Kreutz et al. (2017) D. Kreutz, J. Yu, P. Esteves-Verissimo, C. Magalhaes, and F. M. V. Ramos. 2017. The KISS principle in Software-Defined Networking: An architecture for Keeping It Simple and Secure. ArXiv e-prints (Nov. 2017). arXiv:cs.NI/1702.04294
  • Kreutz et al. (2018) D. Kreutz, J. Yu, P. Esteves-Verissimo, C. Magalhaes, and F. M. V. Ramos. 2018. The KISS principle in Software-Defined Networking: a framework for secure communications. IEEE Security & Privacy (2018). Accepted for publication.
  • Lee et al. (2017) Seungsoo Lee, Changhoon Yoon, Chanhee Lee, Seungwon Shin, Vinod Yegneswaran, and Phillip Porras. 2017. DELTA: A security assessment framework for software-defined networks. In Proceedings of NDSS, Vol. 17.
  • Livshits et al. (2015) Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis, Ondřej Lhoták, J. Nelson Amaral, Bor-Yuh Evan Chang, Samuel Z. Guyer, Uday P. Khedker, Anders Møller, and Dimitrios Vardoulakis. 2015. In Defense of Soundiness: A Manifesto. Commun. ACM 58, 2 (Jan. 2015), 44–46.
  • Mahu et al. (2015) D. Mahu, V. Dumitrel, and F. Pop. 2015. Secure Entropy Gatherer. In 2015 20th International Conference on Control Systems and Computer Science. 185–190.
  • Malhotra et al. (2015) Aanchal Malhotra, Isaac E Cohen, Erik Brakke, and Sharon Goldberg. 2015. Attacking the Network Time Protocol. IACR Cryptology ePrint Archive 2015 (2015), 1020.
  • Manousakis and Ellinas (2015) Konstantinos Manousakis and Georgios Ellinas. 2015. Attack-aware planning of transparent optical networks. Optical Switching and Networking 0 (2015), –.
  • Markowsky (2013) G. Markowsky. 2013. Was the 2006 Debian SSL Debacle a system accident?. In 2013 IEEE 7th International Conference on Intelligent Data Acquisition and Advanced Computing Systems (IDAACS), Vol. 02. 624–629.
  • McGraw (2004) G. McGraw. 2004. Software security. IEEE Security Privacy 2, 2 (Mar 2004), 80–83.
  • MEF (2017) MEF. 2017. MEF. (2017).
  • Meier et al. (2013) Simon Meier, Benedikt Schmidt, Cas Cremers, and David A. Basin. 2013. The TAMARIN Prover for the Symbolic Analysis of Security Protocols. In CAV 2013, Saint Petersburg, Russia, July 13-19, 2013. 696–701.
  • Mimoso (2016) Michael Mimoso. 2016. GPG PATCHES 18-YEAR-OLD LIBGCRYPT RNG BUG. (2016).
  • (2015) 2015. Cipher Suites Configuration (and forcing Perfect Forward Secrecy). (2015).
  • Naylor et al. (2014) David Naylor, Alessandro Finamore, Ilias Leontiadis, Yan Grunenberger, Marco Mellia, Maurizio Munafo, Konstantina Papagiannaki, , and Peter Steenkiste. 2014. The Cost of the "S" in HTTPS. In Proceedings of the Tenth ACM Conference on Emerging Networking Experiments and Technologies (CoNEXT ’14). ACM, New York, NY, USA, 7.
  • Needham and Schroeder (1978) Roger M. Needham and Michael D. Schroeder. 1978. Using Encryption for Authentication in Large Networks of Computers. Commun. ACM 21, 12 (Dec. 1978).
  • NIST (2017) NIST. 2017. NIST Statistical Test Suite. (2017).
  • ONF (2014) ONF. 2014. OpenFlow Switch Specification (Version 1.5.0). (Dec. 2014).
  • ONF (2015) ONF. 2015. Principles and Practices for Securing Software-Defined Networks. Technical Report. Open Networking Foundation. ONF TR-511.
  • ONF (2017) ONF. 2017. Open Networking Foundation. (2017).
  • OpenDaylight Project (2018) OpenDaylight Project. 2018. Security Considerations. (2018).
  • (2016) 2016. OpenSSL Security Advisory [10 Nov 2016]. (Nov. 2016).
  • Otway and Rees (1987) Dave Otway and Owen Rees. 1987. Efficient and Timely Mutual Authentication. SIGOPS Oper. Syst. Rev. 21, 1 (Jan. 1987).
  • Perrig et al. (2002) Adrian Perrig, Robert Szewczyk, J. D. Tygar, Victor Wen, and David E. Culler. 2002. SPINS: Security Protocols for Sensor Networks. Wirel. Netw. 8, 5 (Sept. 2002), 521–534.
  • Ponemon Institute Research (2018) Ponemon Institute Research. 2018. The Cost & Consequences of Security Complexity. (2018).
  • Porras et al. (2012) Philip Porras, Seungwon Shin, Vinod Yegneswaran, Martin Fong, Mabry Tyson, and Guofei Gu. 2012. A security enforcement kernel for OpenFlow networks. In HotSDN. ACM, 6.
  • PwC, CSO magazine and CERT/CMU (2014) PwC, CSO magazine and CERT/CMU. 2014. US cybercrime: Rising risks, reduced readiness. Technical Report. PwC. 21 pages.
  • Razaghpanah et al. (2017) Abbas Razaghpanah, Arian Akhavan Niaki, Narseo Vallina-Rodriguez, Srikanth Sundaresan, Johanna Amann, and Phillipa Gill. 2017. Studying TLS Usage in Android Apps. In Proceedings of the 13th ACM Conference on Emerging Networking Experiments and Technologies (CoNEXT ’17). ACM, New York, NY, USA, 7.
  • Ros and Ruiz (2014) Francisco Javier Ros and Pedro Miguel Ruiz. 2014. Five nines of southbound reliability in software-defined networks. In Proceedings of the third workshop on Hot topics in software defined networking. ACM, 31–36.
  • Samociuk (2015) Dominik Samociuk. 2015. Secure communication between OpenFlow switches and controllers. AFIN 2015 (2015), 39.
  • Schneier (2012) Bruce Schneier. 2012. Lousy Random Numbers Cause Insecure Public Keys. (Feb 2012).
  • Schneier (2015) Bruce Schneier. 2015. Data and Goliath: The hidden battles to collect your data and control your world. WW Norton & Company.
  • Schonwalder and Marinov (2011) J. Schonwalder and V. Marinov. 2011. On the Impact of Security Protocols on the Performance of SNMP. Network and Service Management, IEEE Transactions on 8, 1 (March 2011), 52–64.
  • Scott-Hayward et al. (2016) S. Scott-Hayward, S. Natarajan, and S. Sezer. 2016. A Survey of Security in Software Defined Networks. IEEE Communications Surveys Tutorials 18, 1 (Firstquarter 2016), 623–654.
  • Secci et al. (2017) Stefano Secci, Kamel Attou, Dung Chi Phung, Sandra Scott-Hayward, Dylan Smyth, Suchitra Vemuri, and You Wang. 2017. ONOS Security and Performance Analysis: Report No. 1. Technical Report. ONOS Project.
  • Shalimov et al. (2013) Alexander Shalimov, Dmitry Zuikov, Daria Zimarina, Vasily Pashkov, and Ruslan Smeliansky. 2013. Advanced Study of SDN/OpenFlow Controllers. In Proceedings of the 9th Central and Eastern European Software Engineering Conference in Russia (CEE-SECR ’13). ACM, New York, NY, USA, Article 1, 6 pages.
  • Sheffer et al. (2015) Y. Sheffer, R. Holz, and P. Saint-Andre. 2015. Recommendations for Secure Use of Transport Layer Security (TLS) and Datagram Transport Layer Security (DTLS). RFC 7525. (May 2015).
  • Shen et al. (2012) C. Shen, E. Nahum, H. Schulzrinne, and C. P. Wright. 2012. The Impact of TLS on SIP Server Performance: Measurement and Modeling. Networking, IEEE/ACM Transactions on 20, 4 (Aug 2012), 1217–1230.
  • Shin et al. (2013) Seugwon Shin, Phillip Porras, Vinod Yegneswaran, Martin Fong, Guofei Gu, and Mabry Tyson. 2013. FRESCO: Modular Composable Security Services for Software-Defined Networks. In Internet Society NDSS.
  • Shin et al. (2014) Seungwon Shin, Yongjoo Song, Taekyung Lee, Sangho Lee, Jaewoong Chung, Phillip Porras, Vinod Yegneswaran, Jisung Noh, and Brent Byunghoon Kang. 2014. Rosemary: A Robust, Secure, and High-performance Network Operating System. In Proceedings of the 21st ACM Conference on Computer and Communications Security (CCS). To appear.
  • Singaravelu et al. (2006) Lenin Singaravelu, Calton Pu, Hermann Härtig, and Christian Helmuth. 2006. Reducing TCB Complexity for Security-sensitive Applications: Three Case Studies. SIGOPS Oper. Syst. Rev. 40, 4 (April 2006), 161–174.
  • Springall et al. (2016) Drew Springall, Zakir Durumeric, and J. Alex Halderman. 2016. Measuring the Security Harm of TLS Crypto Shortcuts. In Proceedings of the 2016 Internet Measurement Conference (IMC ’16). ACM, New York, NY, USA, 33–47.
  • Stark (2017) Philip B. Stark. 2017. Don’t Bet on your Random Number Generator. (Mar 2017).
  • Steinberg and Kauer (2010) Udo Steinberg and Bernhard Kauer. 2010. NOVA: A Microhypervisor-based Secure Virtualization Architecture. In Proceedings of the 5th European Conference on Computer Systems (EuroSys ’10). ACM, New York, NY, USA, 209–222.
  • Stenn (2015) Harlan Stenn. 2015. Securing Network Time Protocol. Commun. ACM 58, 2 (Jan. 2015), 48–51.
  • Vassilev and Hall (2014) Apostol Vassilev and Timothy A. Hall. 2014. The Importance of Entropy to Information Security. Computer 47, 2 (2014), 78–81.
  • Verizon (2015) Verizon. 2015. 2015 Data Breach Investigations Report. Technical Report. Verizon.
  • Wan et al. (2017) T. Wan, A. Abdou, and P. C. van Oorschot. 2017. A Framework and Comparative Analysis of Control Plane Security of SDN and Conventional Networks. ArXiv e-prints (March 2017). arXiv:cs.NI/1703.06992
  • Wang et al. (2014) Huangxin Wang, Quan Jia, Dan Fleck, Walter Powell, Fei Li, and Angelos Stavrou. 2014. A moving target DDoS defense mechanism. Computer Communications 46, 0 (2014), 10 – 21.
  • Williams and Koller (2016) Dan Williams and Ricardo Koller. 2016. Unikernel monitors: extending minimalism outside of the box. In 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 16). USENIX Association.
  • Yao and Yin (2005) FrancesF. Yao and YiqunLisa Yin. 2005. Design and Analysis of Password-Based Key Derivation Functions. In Topics in Cryptology - CT-RSA 2005, Alfred Menezes (Ed.). Lecture Notes in Computer Science, Vol. 3376. Springer Berlin Heidelberg, 245–261.
  • Yarom and Benger (2014) Yuval Yarom and Naomi Benger. 2014. Recovering OpenSSL ECDSA Nonces Using the FLUSH+RELOAD Cache Side-channel Attack. IACR Cryptology ePrint Archive 2014 (2014), 140.
  • Yoon et al. (2017) Changhoon Yoon, Seungsoo Lee, Heedo Kang, Taejune Park, Seungwon Shin, Vinod Yegneswaran, Phillip Porras, and Guofei Gu. 2017. Flow Wars: Systemizing the Attack Surface and Defenses in Software-Defined Networks. IEEE/ACM Transactions on Networking 25, 6 (2017), 3514–3530.
  • Yu et al. (2017a) Jiangshan Yu, Mark Ryan, and Cas Cremers. 2017a. DECIM: Detecting Endpoint Compromise In Messaging. Cryptology ePrint Archive, Report 2015/486. (2017).
  • Yu et al. (2017b) Jiangshan Yu, Mark Ryan, and Cas Cremers. 2017b. DECIM: Detecting Endpoint Compromise In Messaging. IEEE Trans. Information Forensics and Security (2017).
  • Yu and Ryan (2015) Jiangshan Yu and Mark Dermot Ryan. 2015. Device Attacker Models: Fact and Fiction. In Security Protocols XXIII - 23rd International Workshop, Cambridge, UK, March 31 - April 2, 2015, Revised Selected Papers. 158–167.
  • ZETTER (2015) KIM ZETTER. 2015. Researchers Solve Juniper Backdoor Mystery; Signs Point to NSA. (Dec 2015).
  • Zhao et al. (2015) Y. Zhao, L. Iannone, and M. Riguidel. 2015. On the performance of SDN controllers: A reality check. In 2015 IEEE Conference on Network Function Virtualization and Software Defined Network (NFV-SDN). 79–85.
  • Zhou et al. (2002) Lidong Zhou, Fred B. Schneider, and Robbert Van Renesse. 2002. COCA: A Secure Distributed Online Certification Authority. ACM Trans. Comput. Syst. 20, 4 (Nov. 2002), 329–368.
  • Zhou and Jiang (2012) Y. Zhou and X. Jiang. 2012. Dissecting Android Malware: Characterization and Evolution. In 2012 IEEE Symposium on Security and Privacy. 95–109.

Appendix A A source of strong entropy

Correctness. We argue about the properties of Algorithm 1, as a source of strong entropy.

Lemma 1 ().

If the initial values of rand_bytes() and H(data) are indistinguishable from random, then the resulting initial external entropy (e_entropy - line 2) is indistinguishable from random. Then, the initial internal entropy (i_entropy - line 3) will be also indistinguishable from random.

Proof: Assuming that rand_bytes() uses one of the strongest pools of entropy of an operating system, such as /dev/urandom, the outcome of this function call will be indistinguishable from random. Assuming that H is a cryptographically strong hashing function, the output of H(data) will be indistinguishable from random for every different input data. Consequently, the XOR operation between rand_bytes() and H(data) will result in an indistinguishable-from-random initial e_entropy. Following, the XOR operation between rand_bytes() and e_entropy will result in an indistinguishable-from-random initial i_entropy. In other words, both internal and external entropy are initialized with indistinguishable-from-random values.

Lemma 2 ().

If , , and i_entropy are indistinguishable from random, then the updated external entropy (e_entropy - line 5) will be indistinguishable from random.

Proof: As discussed before, the pools of entropy and contain unpredictable events of external sources of entropy, such as network traffic and idleness of links. Thus, assuming that H is a cryptographically strong hashing function, then the output of H(||) will be indistinguishable from random. Lemma 1 shows that the internal entropy () is indistinguishable from random. In consequence, the updated external entropy ( - line 5), which is the output of an XOR operation between two indistinguishable-from-random values, will be indistinguishable from random.

Lemma 3 ().

If the initial value of rand_bytes() is indistinguishable from random, then the resulting internal entropy (i_entropy - line 7) is indistinguishable from random.

Proof: The proof of Lemma 1 establishes that the output of is indistinguishable from random. Additionally, is an internal counter not known by external entities. Therefore, assuming that H is a cryptographically strong hashing function, then output by H() will be indistinguishable from random.

Theorem 1 ().

If e_entropy and i_entropy are indistinguishable from random, then the resulting entropy returned by entropy_get (line 8) will be indistinguishable from random.

Proof: Lemmata 1 and 2 show that the initial and updated external entropy are indistinguishable from random. Lemma 3 has shown that the internal entropy generated in line 7 is indeed indistinguishable from random. As a consequence, , as the output of an XOR operation between and (line 8) will be indistinguishable from random. This proves that Algorithm 1 satisfies property Strong Entropy.

Appendix B Pseudorandom generator (PRG)

Correctness. We argue about the properties of Algorithm 2, as a source of indistinguishable-from-random pseudo-random values.

Lemma 4 ().

If entropy_get() returns an indistinguishable-from-random value, then the initial (line 2), (line 3) and pseudo random value ( - line 4) will be indistinguishable from random.

Proof: Theorem 1 establishes that the output of is indistinguishable from random. Thus, both the and the first will be indistinguishable from random. Similarly, the function (using as input - line 3), which, on most architectures, uses 64 bits to represent an unsigned long int, will return the value , indistinguishable from random.

Lemma 5 ().

If entropy_get() returns a value indistinguishable from random, then the refreshed PRG internal state (lines 6-8) will lead to indistinguishable from random values for , and .

Proof: The proof follows the same argumentation of the proof of Lemma 4, for and . As for , assuming that neither the seed or counter are known outside the PRG, and assuming that H is a cryptographically strong hashing function, then the output of H, having as input a concatenation of the new , current , and new , will be indistinguishable from random.

Theorem 2 ().

If seed and nprd are indistinguishable-from-random values, then the next nprd returned by PRG_next (line 12) will be indistinguishable from random.

Proof: Lemmata 4 and 5 established that both the and are always indistinguishable from random, since the initial state. Assuming that HMAC is a cryptographically strong message authentication code primitive, and that the counter is not known outside of the PRG, then the output of HMAC, keyed by and having as input a concatenation of and , will be indistinguishable from random. This proves that Algorithm 2 satisfies property Robust PRG.

Appendix C The three stages of ANCHOR

Figure 5 illustrates the three stages of anchor, namely, setup, normal operation, and post-compromise recovery. After setup and post-compromise recovery, it goes to normal operation. The details of normal operation (e.g., device registration and association) are discussed in Sections 4.6 and 4.7. The complete post-compromise recovery protocol is presented in Section 4.10.

Figure 5. Setup, normal operation and PCR

c.1. Setup

During the setup, three things happen:

  1. Off-line single mode user boot. The first boot should be off-line to generate the master recover keys safely. These keys need to be generated a single time and stored in a safe place.

  2. Store master recovery keys. The network admin should store the master recovery keys, for future use in case of a compromise, in an off-line device (e.g., USB stick). This device should be kept as secure as possible.

  3. Normal boot. After generating and safely storing the master recovery keys, the network admin can proceed the normal boot of anchor. This boot is going to bring up all services and functionalities of anchor and put it online, ready for use.

c.2. Normal operation

The normal operation represents the phase in which anchor should be most of the time, i.e., online and fully operational. The normal operation phase can happen after a first boot (setup phase) or after a recovery from a compromised state.

c.3. Recovery after a compromise

To recover anchor after a compromise, the network admin has to:

  1. Compute the keys off-line using the master recovery keys. The network admin must recursively generate the network manager recovery keys and the device recovery keys. These are special purpose keys used to automatically and safely recover communications between anchor and all other entities, i.e., without needing additional procedures such as device re-registration. For more details on how it works, see Section 4.10.

  2. Boot anchor and copy the keys. After recursively computing the master recovery keys of managers and devices, the network admin should proceed a normal boot of the system and copy these keys into anchor.

Appendix D Correctness of Algorithm 4

Correctness. We now formalize and prove the properties of Algorithm 4.

As a result of the registration process, anchor keeps lists of registered devices and controllers, and lists of the controllers each device is authorized to associate with.

Proposition 1 ().

Any device F can only associate to a controller C authorized by the anchor.

Proof: Forwarding devices will be able to associate only to controllers listed in the CList(F) provided by A (step 2 of Algorithm 4), since if F tries to associate with a non-authorized controller (for F), A will not proceed past step 4 after being contacted by that controller, aborting the association. On the other hand, a rogue controller posing to F as authorised in reply to step 3, cannot jump to step 6 and invent an association key that convinces F, since it does not know . This proves that Algorithm 4 satisfies property Controller Authorization.

Proposition 2 ().

Any device F can associate to some controller, only if F is authorized by the anchor.

Proof: Only if a device F is legitimate, i.e. it is in the list of registered devices, will it be able to associate to some registered controller. A will not proceed past step 1 of Algorithm 4 after being contacted by a rogue device, aborting the association. On the other hand, a rogue device posing to C as legitimate and authorised in step 3, will make C proceed with step 4, indeed, but the request will be rejected by A, since is not recognisable by A, corresponding to no shared key with a legitimate device. The replay of an old (but legitimate) encrypted request in step 3 will also fail, since it is bound to the (current) nonces. This proves that Algorithm 4 satisfies property Device Authorization.

Proposition 3 ().

At the end of Algorithm 4 execution, the association ID () is only known to F and C.

Proof: A creates in step 5, and forgets about it after sending it to C (see Section 4.7). is sent from A to C, encrypted both by and , keys shared by A only with F and C respectively. C trusts it came from A, due to the HMAC, so the two encrypted blocks should contain the same value, and sends the under encryption to F. So, at the end of the execution of the algorithm, both F and C, and only them, hold . This proves that Algorithm 4 satisfies property Association ID Secrecy.

Proposition 4 ().

At the end of Algorithm 4 execution, the seed () is only known to F and C.

Proof: C creates in step 7. is sent from C to F, encrypted by , association key known only to C and F, as per Proposition 3. C trusts that F, and only F, has the same sent, when it receives back from F the XOR of with the current nonce encrypted with , since (as per Proposition 3) only F could have opened the encryption of with in the first place, and encrypt the reply. This proves that Algorithm 4 satisfies property Seed Secrecy.

Appendix E ONF’s security requirements

Several security requirements should be fulfilled in control plane communications. Most of these requirements are enumerated in ONF’s best practice recommendations (ONF, 2015). In this appendix we go through the eleven (out of twenty four) such requirements that are addressed by the anchor, iDVV and NaCl.

Both communicating devices should be authenticated (REQ 4.1.1). Using our anchor, all devices have to be properly registered and authenticated before proceeding any other operation.

Operations (e.g., association) of components should be authorized (REQ 4.1.2). The anchor needs to explicitly authorize associations between any two devices. Each association has a unique identification.

Devices should agree upon the security (e.g., key materials) associations (REQ 4.1.3). By using the anchor and its mechanisms, such as the source of strong entropy, we ensure strong key materials. The iDVV mechanism is initialized by the two communicating devices once the association has been authorized by the anchor.

Integrity of packets should be ensured (REQ 4.1.4). We provide integrity and authenticity of packets through message authentication codes. By default, we generate one iDVV per packet, providing stronger security.

Each device should have a unique ID and other devices should be able to verify the identity (REQ 4.2.1). Devices are uniquely identified by the anchor. The unique IDs are associated to the devices as soon as they are registered within the anchor.

Issues related to the lifecycle of IDs should be managed, such as generation, distribution, maintenance, and revocation (REQ 4.2.2). The anchor provides the services required for managing device IDs. IDs are assigned to devices during the registration phase. Revocation can be done by network administrators at any time.

Devices should be able to verify the integrity of each message (REQ 4.4.4). Any two communicating devices are able to verify the integrity of each message through message authentication codes.

Amplification effects should be taken into account, i.e., attackers should not be able to perform reflection attacks (REQ 4.4.5). We use requests and replies of the same size between devices and the anchor, which avoids reflection attacks.

Automated key/credential management should be implemented by default, allowing generation, distribution, and revocation of security credentials (REQ 4.8.3). We have in place automated mechanisms for refreshing credentials, such as refresh the iDVV’s seed using the anchor’s source of strong entropy.

Data confidentiality, integrity, freshness and authenticity are ensured by the integrated device verification value. iDVVs are used to encrypt data and generate message authentication codes. Additionally, iDVVs can also be used as nonces, ensuring data freshness.

Availability is ensured by recommending multiple controllers to the forwarding devices. This is one of the essential tasks of the anchor.

Lastly, it is also worth mentioning that whilst we do not meet all security requirements of ONF’s guidelines, we do meet the fundamental ones with regard to security. For instance, requirements such as REQ 4.4.2, REQ 4.4.3, REQ 4.7.1, REQ 4.7.2, and REQ 4.7.3 (ONF, 2015) are not yet covered by our architecture and protocols. However, most of these requirements are related to rate control of messages, additional signaling messages for dealing with future network attack types, and accountability and traceability. Such kind of requirements can be added (in the future) without impairing our conceptual architecture. In fact, some of these requirements, such as rate control of messages, are technical, rather than conceptual, which can be addressed with the right amount of engineering.