This work was supported in part by the Center for Cyber Security at New York University Abu Dhabi.
The Internet of Things (IoT) gubbi2013internet interconnects a huge number of ‘smart’ devices (such as mobile phones, sensors, routers, microcontrollers) alongside large data centers, and provides mechanisms for collection and processing of big data, communication, as well as cloud services. A closely related framework lies in Cyberphysical Systems (CPS) rajkumar2010cyber ; kim2012cyber , where smart agents that possess sensing, computation, communication, and control capabilities are internetworked to control physical entities and processes. Prominent applications enlist Intelligent Transportation Systems (ITS), smart grids, wireless sensor networks WSN , smart buildings, and mobile healthcare (mHealth).
At such large scale, existing control architectures far exceed their capacity in efficiently administering the conjugation of physical space with the cyberspace. Notable challenges that have to be accounted for in IoT and CPS applications feature the urge for adaptability, scalability, security, safety, and robustness to abrupt changes in the modus operandi of the network rajkumar2010cyber ; lee2008cyber . A key pathway is to design hybrid systems hybrid , in which the software controls are decoupled from the embedded components lee2011introduction .
Software Defined Systems (SDSys) come a systematic paradigm to design such systems by abstracting the controls laws from the hardware devices at the physical layer, and placing them in a software-defined control layer. Such decoupling is intended to provide reliable, cost-effective, and real-time control solutions for CPS and IoT. This concept extends and expands structured development of large-scale software, and was first introduced within the context of cognitive radios mitola2000cognitive . Subsequently, it has been used to develop Software Defined Networking (SDN) jain2013network as well as in several facets and aspects of IoT SDIoTAla ; al2016novel ; SDCloud .
In this paper, we utilize software defined principles to propose a comprehensive architectural design for CPS and IoT systems. The proposed architecture specifically intends for decentralized decision-making within the IoT, by leveraging the computational resources that are ubiquitous within the many entities it comprises. We specify how the proposed model reduces the control complexity and allows for flexible integration and adaptation in the cyberspace. Our architecture entails three main domains: the physical space, the cyberspace, and the structured control space, all being SDSys-described. The hierarchical and decentralized structure of the control space is carefully designed in a way that assigns the responsibilities of each agent within the IoT technology domain. The services of the middleware layer kim2013real of the proposed model are abstracted and enhanced to pledge increased safety, reliability and performance in a highly dynamic environment, primarily due to changes in the network topology (mainly resulting from agent mobility and battery drainage in portable devices). Moreover, we specify key requirements and software-defined solutions for achieving high quality-of-service (QoS) alongside cyber security. To this end, we interplay both a bottom-up and a top-bottom workflow for spreading information and actuation throughout the network. In addition, we identify several classes of potential cyber attacks and vulnerabilities across the multiple levels, and propose effective detection and recovery solutions. Finally, we have built an object-oriented simulator in Python using features and principles from general purpose SDSys simulators such as Mininet, Maxinet and Mininet-WiFi, and use it to test and evaluate several performance indicators of our proposed modules.
This paper extends and expands our preliminary studies in SDCPSconf in several directions: a) we provide a more detailed description of several important attributes of the architecture and solidify the connections with IoT technology and fledgling applications; b) we explicitly discuss a software-defined design for cyber security in CPS and IoT applications; c) we devise an object-oriented simulator and verify the merits of our approach via various simulations studies.
The remainder of the paper is structured as follows. We discuss the main design challenges pertinent to CPS in Sec. 1.1, and recap the key concepts of Software Defined Systems (SDSys) in Sec. 1.2. Sec. 2 is designated for the proposed software-defined CPS architecture (SDCPS): we expose the model requirements (Sec. 2.1), the architectural overview (Sec. 2.2), the main elements (Sec. 2.3), the control architecture (Sec. 2.4), the middleware layer (Sec. 2.5), the work-flow (Sec. 2.6) and other important features (Sec. 2.7). We test aspects of our proposed solution in Sec. 3. Sec. 4 concludes the paper.
1.1 Design challenges in CPS
In order for CPS to truly emerge as the fourth industrial revolution they envisioned rajkumar2010cyber , various attributes in modeling, communication, sensing, control, and cyber security have to be attended to. We expose a brief overview of the main challenges and design requirements in this fascinating domain gungor2009industrial :
Large volumes of data: Sensors constitute an effective bridge between the cyberspace and the physical space. In large-scale CPS, such as transportation and sensor networks WSN , real-time sensing produces really big data. It is indispensable to provide algorithms for efficiently filtering and mining big data distances ; watermarking in the real-time RCS ; EUSIPCO .
Scalability: The large number of devices in CPS that come equipped with heterogeneous hardware and software is a prior aspect to tackle in CPS rajkumar2010cyber . It is integral to administer the right APIs for off-the-shelf integration, and autonomous configuration, accompanied with new theories for scalable inference and control rksimax1 .
Real-Time Decisions: CPS operate subject to stringent real-time constraints in communication, computation and control. It is therefore crucial to account for deadlines in allocating resources and making decisions hou2009theory , to develop algorithms for online computing RCS ; EUSIPCO , as well as software that can facilitate a smooth operation in the real-time kim2013real .
Security and Fault-Tolerance: The coupling between the cyber and the physical spaces open the door for many vulnerabilities to attacks and components failures. In the regime of CPS, there is a need for new algorithms that relax the implicit assumption of benign agents and fault-free operation, which can further provide theoretical guarantees in the presence of malevolent or Byzantine users cardenas2008secure ; SATS ; DSC .
New Theories: In the light of overwhelming technological advances, arises the need for new theories gupta2000capacity ; freris2011fundamental ; WSN ; kim2012cyber that approach CPS from a fundamental theoretical viewpoint and outcome efficient, low-complexity algorithms with provable performance guarantees distances ; rksimax1 .
To summarize, modeling and algorithmic tools from system theory and software engineering associated with abstraction, wireless networking, system verification, control, and fault tolerance have to be invoked, but at the same time revisited. To this end, distributed, decentralized, and software-defined solutions can set the stage for building the engineering systems of the (not so remote) future.
1.2 Software Defined Systems (SDSys)
Software Defined Systems (SDSys) aim to decouple the physical space from the cyberspace by retracting controls from the embedded hardware and abstracting them into a software layer. There has been extensive work in several contexts: Radios (SDR) mitola2000cognitive , Networking (SDN) jain2013network , Security sdsec2 , Storage sdstor2 ; SDCache and Cloud SDCloud . Software-Defined Networking (SDN) defines a new way to control the process of forwarding packets in a network. In this context, open-source Python-based simulators are available to evaluate the performance of SDN-based protocols, such as Mininet Mininet , Mininet-WiFi fontes2015mininet , Maxinet Maxinet . A software-defined model for cloud management which may effectively mitigate several cyber-threats was proposed in SDCloud .
2 SDCPS : The proposed architecture for CPS
In this section, we illustrate our proposed architecture for cyberphysical systems and IoT applications that which builds upon software-defined principles. The abundant and ubiquitous communication and computation power of smart devices is exploited to introduce a light-weight, secure and reliable systemic control solution for real-time management of IoT systems.
2.1 Model Requirements
We define a set of requirements and classify them in two main categories: 1) quality-of-service (QoS) requirements, and 2) security requirements.
Resource Exploitation: Resources are plentiful in a CPS. An effective management mechanism is one that maximizes the benefits from exploiting as much as possible the available computational, communication, sensing, and actuation amenities in a coherent and concurrent manner.
Load Balancing: It is important to leverage resources in a fair manner. Service requests should be handled timely, while maintaining load balancing among the various system controllers. A key objective of the schedulers is to distribute the workload as evenly as possible so as to alleviate network congestion, minimize delays, and avoid bottlenecks (in the sense of communication or computation over-use) which may drain the batteries of remote smart devices and consequently drastically compromise system operation.
Real-time: Controlling physical entities such as cars, sensors, smart grids and medicare operations imposes rigid real-time constraints. It is therefore important to define deadlines for the completion of time-critical tasks and devise real-time schedulers hou2009theory ; kim2013real that provably honor them.
Cyber security constitutes a chief concern in modern networked-control systems, where it is of vital importance to design and implement effective mechanisms to prevent, detect and recover from a range of cyber-attacks. This objective can be accommodated in multiple complementary ways such as: a) encryption, which yields both data privacy as well as security in packet-based communication DSC , b) control-based approaches tabuada ; SATS ; satchidanandan2017dynamic that seek to identify attacks as well as to handle ‘stealthy’ attacks (by assuring that undetectable attacks may not harm the system operation), c) software-based solutions, where different tasks and entities are dynamically assigned privileges by a dedicated security unit in the system.
In CPS, attack strategies are constantly evolving. Accordingly, the various tools for scanning, isolating, and resolving threats have to be constantly updated to meet their crucial mission. Several attack models apply in the regime of CPS and IoT systems (see also yampolskiy2013taxonomy ):
Trojans: Trojans are executable program files injected by malicious users into the Internet. The effects of a trojan are triggered when downloaded and executed by a user. There are several types: Sending Trojans, Remote-access Trojans, Proxy Trojans, Denial-of-Service (DoS) Trojans and many others.
DoS/DDoS attack: Denial-of-Service (DoS) refers to the action of preventing a user from accessing system resources. Distributed Denial-of-Service (DDoS) is a type of DoS attack where multiple compromised systems are used to target a single system by rendering a subset of resources inaccessible to it.
Packet Forging attack: This is also known as “packet injection” and entails creating seemingly normal packets to interrupt direct communication between users. Consequently, “man-in-the-middle” attacks can be launched, where an attacker secretly relays and possibly alters the messages between two parties under the perception of directly communicating with each other. There are several tools that an attacker can use to generate such attacks, e.g., TCPinject and packETH.
Fingerprinting attack: An attacker eavesdrops the conversation (even when encrypted) between two users and obtains possession of some critical features of the sender/receiver in order to identify the network status and analyze traffic patterns with the intention of deploying harmful actions.
Application Layer attack: This genre is classified into four types: the first one uses the HTTP protocol requests to overwhelm a site, the second one reflects threats to the SMTP protocol, the third one to the FTP protocol and the last one concerns SNMP attacks intending to monitoring and reconfiguring the system. Detecting such attacks is typically much harder than the detection of attacks on the network layer.
User attacks: In this type of attack, a malicious user seeks to trick the supervisor into obtaining the same privileges as a legitimate user, e.g., by exploiting vulnerabilities in a local machine to create an account inside it. There are different forms of this attack, such as U2R and R2L attacks.
Malevolent users exploit the heterogeneity of CPS to launch attacks to all system components; this leads to the additional taxonomy of attacks into:
Sensor-related attacks: An attacker tries to eavesdrop or alter sensed data in order to compromise the system operation.
Actuator-related attacks: An attacker seeks to change the control commands of an actuator.
Controller-related attacks: Attacks to high-level decision-making processes such as schedulers, dispatchers, and middleware services.
Communication-related attacks: Communication channels are principal targets of attackers. There are many ways to defend the channels, primarily based on cryptography and coding.
2.2 SDCPS:“A High Level View”
In this section, we present our SDCPS system architecture and illustrate how its spaces and elements are structured within IoT systems. A general, ‘high-level’ overview of the proposed software-defined model for CPS (SDCPS) is shown in Fig. 1.
The main aspects of the proposed architecture span three layers.
Physical entities that need to be managed and controlled by IoT systems are enclosed in this space. Take as an example a smart home that comprises several devices like TVs, heating and AC, ovens, wash machines, doors, and many more entities that need to be controlled locally or remotely in an interconnected fashion. Another example is in Intelligent Transportation Systems (ITS), where the physical space comprises cars, traffic lights, sensors, etc. The physical space is organized in domains and subdomains which can interact with the cyberspace via sensors, actuators, and dispatchers.
Cyber space: This space encapsulates hardware and software designated for communication, sensing and information gathering and processing. It includes heterogeneous sensors, actuators and access-points. The quantity and quality of these devices depend on the IoT application under consideration. For instance, smart transportation systems may require more powerful processing units compared to smart homes. A formidable attribute of the abstraction in SDCPS is that no matter the type or number of devices or the application in place, the same control space can be installed and structured over the cyberspace to manage the physical space.
Control space: This is the heart of the proposed architecture, where all decision-making processes are initialized and taken. This space involves dispatchers, schedulers, security controllers and coordinators.
The distributed control layer is the key point behind our proposed model, as illustrated in Fig. 2. The physical space is structured into several zones and sub-domains, where each one is controlled and managed by a corresponding local controller. Local controllers communicate with one another to exchange information so as to carry out collaborative decisions. In transportation systems, for example, a given city can be divided into different areas, each one controlled by a local traffic control center, where the local controllers of different areas communicate information and control actions so as to improve experienced traffic conditions throughout the metropolitan area.
Fig. 3 presents the building block for a multi-layer composition in which different sub-domains may be integrated into a vertical (i.e., hierarchical) or horizontal (i.e., decentralized) fashion.
2.3 Architectural Elements
We present the main architectural components and elements of SDCPS along with their roles, responsibilities and interaction within IoT applications. To make an analogy with control systems, we use the language of linear111We use deterministic linear time-invariant systems with no loss in generality for simplicity of exposition. For the same reason, we assume an undirected communication graph. dynamical systems:
where denote the state, measurement, and control input for plant at time. We further capture the communication network topology by an undirected graph where two sub-systems may communicate at time if and only if ; the graph is time-varying, in general, due to agent mobility as well as the features of the wireless medium. For a node , we define its neighborhood .
Mobile node: An entity with time-varying location due to mobility (take for example vehicles in a transportation network).
Cluster: The system can be partitioned into several clusters consisting of multiple atoms (mobile nodes). The set inclusion relation of each node to a cluster varies over time, as mobile nodes may partake in different clusters, primarily based on their location.
A sensor gathers information from its surroundings as prescribed by its sensing range. The resulting data is locally filtered and forwarded to the corresponding aggregate sensor. Each mobile node may possess several sensors. This corresponds to an entry (or subset of entries) of vector.
Aggregate sensor: Each mobile node has one aggregate sensor that summarizes the sensing information gathered from on-board sensors. This is precisely what we have denoted as measurement vector .
Actuator: The information gathered is used by the system (through external upper control layers) and local controller to synthesize the control input of the actuator, i.e., . Decentralized actions amount to determining individual entries of while distributed control laws describe strategies for synthesizing from .
Access point: Several sensors are linked to the system via wireless communication access points.
Local coordinator: This controller is responsible to take actions for a subset of neighboring nodes by collecting data from the corresponding agents. In the smart home example, we may organize the home into several local areas, i.e., rooms, each one managed by a local coordinator. In such case, several decisions like turning on/off the light in a room do not require any information from sensors in another room.
Cluster Coordinator: Each cluster is assigned to a coordinator that applies roles and actions to the associated nodes within its sphere of influence. The main roles are shaped to manage and control nodes, transfer information from/to other cluster coordinators, update the middleware (cf. Sec. 2.5) about the state of its associated nodes, and obtain and execute rules and instructions from upper layers through the middleware. As an example in the smart home application, several rooms are managed and controlled by a cluster controller to maintain balanced power supply among all of them.
Area coordinator: In the smart home application this can be the floor coordinator, where several clusters are managed and controlled by this coordinator. This entity is a meta-controller which sets the rules for distributed and decentralized strategies, e.g., selecting from a subset of rules and privileges at each decision instant (i.e., how to design the feedback matrices } in our canonical example).
The coordinators can take actions based on acquired information from several resources as shown in Fig. 4.
Self-Controller: Each mobile nod, such as a vehicle in the transportation system, has an autonomous controller which is called self-controller. It takes information from the aggregate sensors as input or, in some cases, may take higher commands from the local coordinator and controllers to launch a self-control decision.
In our running example, self-controller may be an autonomous feedback control law, i.e.,
such as in maintaining a constant velocity of a given vehicle, as determined by other coordinators and controllers.
Local Controller: Local decisions that do not require permissions from upper-layer controllers are taken by the local controller. Information that is needed to take these decisions is acquired from aggregate sensors in the mobile node itself or from the ones in neighboring agents.
The distinction between the local coordinator and local controller lies in that the coordinator may make a decision related to more than one nodes (the coordinator defines control policies that a controller is responsible for enforcing).
In control terminology, the local controller is a local feedback control of the form:
Super-controller: It is to the local controller like the local controller is to the self-controller. It enables and supports the hierarchical decision-making process from top to bottom, while information flows from bottom to top.
In our example, the super-controller designs the feedback gains in a tree-like dependence.
Global Controller: This constitutes the engine in the proposed model, set in the root of the hierarchical model. All critical and high-level decisions are established at this controller. It has a holistic view of all remote nodes.
In the transportation system example, we may say that different parts of the car have their own controller like steering, engine and lights, where all such parts are controlled and managed by a local controller. The local controllers of all cars within a specific zone are controlled and managed by a super controller when there is a need to enforce a decision on that area, e.g., for collision avoidance purposes.
A set of design choices may be inferred to decide the number of layers and the depth of controller levels based on the specifics of the application under consideration. In this aspect, trade-offs are omnipresent: responsiveness vs. load balancing vs. security vs. complexity and so on. As an example, dividing the network into several clusters implies that the upper controller is responsible to control fewer lower level nodes. This may speed up the transfer of decisions and load balancing, but on the other hand it may increase the complexity of control synthesis and the vulnerability to a number of cyberattacks. Different scenarios may exist, no choices are absolute or clear in CPS, and everything has to be studied as part of a common whole.
Interface: The Middleware
Higher and lower coordinators and controllers are connected by a bridge, which is called the Middleware. It is the software residence for schedulers, services, dispatchers and several software-defined controllers.
The main challenge in designing the middleware lies in maximizing scalability, adaptability and reliability. Our proposed architecture borrows principles from the Etherware kim2008architecture , destined for network control due to its real-time capabilities kim2013real .
The Middleware is composed of two types of components: controllers and services. The former is a set of SD-controllers developed for control and coordination between the cyber and the physical, while the latter is responsible for managing the communication between the controllers and simplifying installation, development and execution. For real-time services, a real-time scheduler is a key component of the middleware: packets and commands are assigned different priorities and execution deadlines that have to be accommodated by means of scheduling with QoS guarantees hou2009theory . A more detailed description of the middleware layer will be provided in Sec. 2.5.
Fig. 6 abbreviates the interactions and interconnections between system components.
2.4 Control architecture
In this section, we zoom in to discuss in further the units and components of the proposed model; see Fig. 7.
The physical structure of each controller consists of several processors, and each possessor is mapped to multiple processing units. The controller can run several processes simultaneously in a multi-threading fashion. Moreover, CPUs and GPUs can be leveraged, and one-pass controller is used, as it is typically faster and simpler than its multi-pass counterpart, and it helps improve reliability, security and system performance. Pipes between producer/consumer threads are used as safeguard communication channels for security purposes.
A set of dedicated software-defined (sub-)controllers are installed in each controller, which communicate and collaborate with one other to maintain a smooth work-flow. Fig. 8 illustrates the logical view.
Each one comes equipped with several units which we present and define in the following. Precisely, we show how these units alongside their main responsibilities work together to control and manage IoT applications in a software-defined manner.
SDN_Controller: This controller is responsible to control and manage the networking part of the system. It maintains network information for all nodes within its range, which is used for forwarding-table generation. These tables are forwarded to the switches that are connected to this controller, for multi-hop communication purposes. Different algorithms can be followed to generate these tables like minimum spanning tree, shortest-path, etc.), depending on the target of each IoT application. For instance, in transportation systems, real-time decisions have to be taken and for such situation, algorithms that specifically target real-time scheduling will be advantageous.
The main units inside SDN Controller are:
Path calculation unit: Calculating the path from source to destination is the responsibility of this unit. Such a path is taken based on several criteria.
Forwarding table generation unit: All forwarding tables are generated inside this unit based on the path calculation unit outcome. These tables are generated and stored by this unit. It also gets feedback from the network status tracking unit to adjust the calculated path when needed, as explained next.
Network status tracking unit: Any change (bottlenecks, channel degradation or broken links) in the status of the network is tracked by this unit. For instance, in a transportation network, when a new road is built in a specific region this unit receives a notification about this update to take further actions.
Other units for enabling the joining and extraction of nodes are also available.
All information about IoT application smart devices is kept here. Information such as logical status and physical location is used in the coordination and management process.
Smart device information tracking unit: The role of this unit is to scan and track the smart devices for requests that have to be communicated to the various controllers.
Smart device status tracking unit: This is responsible to track the status of a smart device as relevant to communication and actuation (for example, busy, sending, receiving, ‘asleep’ in duty-cycling, low battery, etc.).
Smart device location tracking unit: As the mobile nodes are moving, the physical location of smart devices also varies. It is crucial to track the location continually and accurately to enable location-aware services and controls (such as set inclusion in area coordinators, and distributed communication, computation and control).
Smart device joining and extracting processes are also the responsibility of this controller.
The responsibility of keeping a secure system is assigned to this controller. It has a set of units with specific mechanisms and tools that are interconnected to each other to detect, prevent and resolve several types of cyberattacks.
Auditing and Mapping unit: It is important to keep information about the switches, routers, access points and other network nodes for security and safety purposes. This unit takes the responsibility for auditing all relevant information about the network infrastructure nodes like its vendor, location, and type, in order to discover all abnormal behaviors.
Knowledge-based unit: This unit is responsible to store and keep all discovered attacks in the system. In case a new attack is discovered, the inline mechanisms and tools are used to determine the required solution, while this attack is stored in the unit.
Scanning and Screening unit: It uses several mechanisms and tools to scan traffic and detect if there is a threat to trigger the resolution unit. Many scanning tools are available, each one responsible to scan a specific type of information.
Monitoring unit: The tools in this unit help network defenders discover and analyze anomaly activities in the network. For this purpose, there are many visualization tools, and an unceasing need to develop real-time monitoring methods.
Detection unit: This unit works with other controller units to detect any type of cyberattacks. There are a lot of existing tools for this purpose: MINDS, ADAM, NIDS being notable examples.
Prevention unit: This tool is responsible to prevent the attacks from inducing any anomalous actions and spreading them over other CPS domains.
Handling unit: The tools of this unit are responsible to resolve and handle the attacks in case that they cannot be prevented, or when the detection was too late. It tries to eliminate and resolve its effects and then notify the knowledge-based unit for this type of attack.
Policy unit: A set of security policies correlated to CPS are sustained inside this unit. Dedicated policies for each IoT application are defined and applied. For example, in a smart home, the user can set policies to keep the room temperature no more than a given threshold and increase it only in specific situations.
Security status tracking unit: This unit recaps the status of the system as related to security.
Encryption/Decryption unit: Encrypting traffic is considered an efficient way to protect packet-based communication from intruders. This unit is responsible for ascertaining data privacy by means of applying encryption/decryption methods on the data. Nonetheless, we note that encryption introduces storage and time overheads (transmission and processing) and cannot be used as a passepartout, in particular for time-critical applications.
Backup unit: Keeping backup versions of the system status facilitates its effective protection. This unit keeps frequent back-ups to restore the system when the effects of an attack cannot be resolved otherwise.
Up-To-Date unit: It has a set of procedures to update the existing software to tackle new attacks with new solutions.
Network computation resources like CPUs, GPUs, and RAMs are controlled and managed by this controller. Dedicated tools for GPU, CPU and memory managements are installed.
SDS_Controller: This controller manages storage devices and processes.
Data storing unit: This unit takes the responsibility of controlling the data storing process in the storage arrays.
Data caching unit: In large-scale CPS where the hosts are distributed over a large area, it is important to cache parts of the most frequently requested/used information locally SDCache .
Data de-duplication unit: In distributed database systems, it is important to maintain concurrency of information; this unit accounts for eliminating duplicate values so that any user gets access to the most up-to-date available information.
Shared units by all types: Some commonplace units that are available by almost any type of controller include:
I/O Unit: The input/output unit sends, receives and forwards packets from and to other units.
Organizer Unit : This unit assigns priorities to packets based upon predetermined QoS criteria.
Scheduling Unit: This unit schedules packet transmission taking into account the assigned priority; there is a great number of scheduling algorithms that can be used for this purpose, see for example hou2009theory .
Aggregate Unit: This unit is responsible for aggregating packets of a given flow before forwarding them to the next processing unit.
SDCPS_Controller: This controller comprises the main engine of the system. It is responsible to effectuate all of the functionality described in the prequel and organize the interplay of the phyical and cyber space by overlooking and coordinating all other controllers in the system, either directly or indirectly.
2.5 Middleware Layer Architecture
The middleware layer is configured to accelerate real-time decision making and facilitate the communication and interactions between system control layers. All services and entities are software-defined, which empowers the modifications and component migration processes on-the-fly. Fig. 9 shows the structure of the middleware layer and its three spaces: controller space, kernel space, and services space.
Controller Space: Each controller is implemented as a component itself, and all components interact with each other as explicated before.
Kernel Space: This space is responsible to schedule packets based on their priority and time-criticality, as assigned by the tools of the real-time controller. For instance, a signal coming from an ambulance or firetruck should be served first. A wealth of scheduling algorithms hou2009theory can be implemented. Each IoT application has different QoS requirements, and based on these requirements the appropriate scheduling algorithm will be selected.
Post scheduling decisions, a packet is assigned to a specific queue based on its priority, but also the physical positions of the mobile sender/receiver. It is therefore essential to provide position tracking service for location-aware real-time scheduling. Scheduling queues in a network is well-studied srikant2013communication , and extensions are possible for optimal scheduling subject to deadlines and priorities hou2009theory .
Service Space: Various services are encapsulated inside this space to enable real-time decision making and improve network communication:
Messenger Service: This service keeps a smooth communication between all controllers at the same layer through the west/east APIs in Fig. 3; take for example controllers within the same floor in a smart building. Additionally, it handles the communication between control layers located at different floors through SDN.
Position Tracking Service: The positions of mobile nodes over time are tracked by this service. Note that node position may be changed as a result of a control decision (e.g., in a transportation system a local controller routes a car at an intersection).
Speed Stamp Service: Estimating and computing mobile nodes’ speed is useful to predict its position over time (e.g., by using tracking tools such Kalman filtering), for example to predict when a can car will reach a specific route to avoid collision or deadlock.
Fault-Tolerance Service: For large-scale IoT systems fault-tolerance is crucial. A set of solutions and policies are implemented inside this service to recover from hardware/software failures (i.e., how to switch to a nearby controller in case the local controller fails).
Controller Registration Service: This service is responsible to insert a new controller along with its relevant information so as to keep the system updated.
Time-stamping service: A dedicated unit is implemented inside the middleware layer to record the times of past, present and future events such as sensed information and control actions.
Time-translation service: In a large-scale CPS, clocks don’t agree freris2011fundamental and it is fundamental to convert packet’ time to controller’s time, and vice versa.
Time synchronization service: Accurate clock synchronization is a instrumental for distributed coordination in CPS. It affects both performance as well as safety freris2011fundamental , with examples enumerating wireless protocols such as MAC, duty-cycling, formation control, concurrency in databases, and more. To this end, algorithms with high scalability, and low communication and computation overhead are especially important to implement in the middleware RK ; RK2 ; nfrer_algocdc .
Resource Tracking Service: This service targets tracking the resources in the entire system with the intention of providing load balancing and fairness among users.
Emergency services: This set of services tracks components and takes actions when they become unavailable or fail.
2.6 Communication & Control Planes
The proposed architecture features an interplay between two planes of communication and actuation, namely horizontal and vertical. Fig. 10 illustrates a global view of these planes, further elaborated in the sequel (note that we use the transportation application for demonstration purposes, which explains the cars in Fig. 10).
Vertical Plane (Control Flow): As explicated above, control flows in the network in a hierarchical fashion. At the very top level resides the root controller, while all mobile nodes (such as vehicles in a transportation system) reside at the low level. The upper layer has coordination power over its immediate lower layer, and higher privileges in resolving conflicts. A middleware layer is responsible for combining and coordinating the controllers at the same layer and managing the communication across different layers. The number of layers at the vertical plane is closely related to the size of the network at the horizontal plane (e.g., it depends on the sizes of the area clusters such as area dispatchers in urban transportation).
Horizontal Plane (Data Flow): This plane reflects the communication between mobile agents and controllers within the same layer (for example direct communication among nearby vehicles, and among dispatchers at the same layer). Recall the trade-off between the partition sizes and vertical plane depth: when nodes are horizontally clustered to smaller groups (areas), a larger number of cluster coordinators and area coordinators is required; this affects the accuracy, responsiveness, performance and security of the entire system, as well as infrastructure costs, that should all be considered by system designers.
Decision Making Process
Decision making process is performed in a systematic way by the proposed system. Such rethinking promotes the real-time decision making process, system reliability, security and scalability. A brief discussion on control packet flow is explained in the following.
2.6.1 Centralized vs. Decentralized / Distributed
An effective control and management plane is a key to system proliferation. Distributed control is integral for handling huge network traffic volumes via decentralized autonomy that achieves a scalable, sustainable, robust and adaptable system operation in a highly dynamic environment (take for example the failure of a traffic light or closure of a route due to an accident or natural disaster). However, “to distribute or not to distribute?” does not always have an easy answer: local communication and interactions may cause a conflict through several system levels, which require a supervisory control to solve it, whose complexity increases as the network size is growing. The proposed architecture aims to leverage the benefits of both worlds (centralized and decentralized / distributed) in a coherent methodological fashion.
2.6.2 Flow of control packets
The proposed methodological way of the decision making process fundamentally influences the control packet flow. Information is harvested from underlying control layers and is communicated to controllers in higher layers that take higher-level control decisions and actions. Simultaneously, controllers in the underlying layers need to collaborate and communicate with each other and they are responsible to control all system entities assigned to them by taking the proper actions and forward the information to super controllers at a higher layer. All and all, this is treated diversely in different contexts: for instance, the auditing and screening security unit within the SDSecurity controller, previously described, is installed and implemented on all software-defined controllers, but does not perform the exact same operations in each component. To conclude, decision-making can be categorized ( among others ) to:
Self decisions: A given node can take a decision by itself without consolidating with other controllers or coordinators at the same or higher levels.
Coordinated decisions: Nodes are not fully autonomous in taking a certain decision locally, and coordination with higher layers is needed to take further actions.
This taxonomy applies inductively to all levels of the decision-making hierarchy; cf. Fig. 11 illustrates the work-flow in the system.
Packet Structure: To support the proposed architecture, it is also important to define the packet attributes required for control, communication and scheduling; for example agent-specific positioning and timing information, packet priority and many others; cf. Fig. 12.
The setup of the system is exposed in Algorithm 1.
2.7 The features of the proposed model
The main features of the proposed architecture are briefly summarized in what follows:
Real-time: The proposed system architecture promotes real-time decision making in three ways: (a) adding a middleware layer to facilitate and scale the communication and interaction between various system layers, (b) implementing services and controllers as self-components at the middleware, (c) applying QoS-based packet scheduling over multiple levels.
Reliability and Security: Service availability, in the presence of malicious users, can be guaranteed by prescribing ways of taking control actions and accessing data. This can, in turn, be achieved by sharing and distributing responsibilities over several system modules, and employing redundancies that improve resilience to single points of failure or attack.
Flexibility and Scalability: In volatile large-scale CPS, high-frequency control and management is required as some nodes may undergo failure or cyber attack. Such considerations can be served by our proposed solution in a fast, simple and self-configured programmable way. Moreover, our design combines decentralized distributed sensing, information retrieval, and control to guarantee scalability even for millions of mobile nodes.
3 Experimental results
In this section, we describe our emulation testbed and expose several experimental findings to underline the merits of adopting software-defined control procedures for IoT applications, in terms of efficiency and adaptability.
3.1 Emulation Environment
Our testbed environment was implemented by installing Mininet 2.2.0rc1 VM Mininet over the Oracle Virtual Box and remotely linking it with the Ubuntu OS. We used Python as the programming language and extended the POX controller of Mininet (namely User_Switch) along with its host main classes so as to capture several elements of the proposed architecture.
3.2 Test Scenarios and Experiments
We have tested a range of scenarios for variable network sizes and topologies. The network topology is captured by a graph consisting of vertices (controllers and end users), , with edges that prescribe the feasible communication among the system entities. We have adopted a tree of depth 3 in our experiments: the global controller vertex resides at the root, local controllers reside at the first level, switches (2 switches per local controller) reside at the second level and hosts (users) lie at the third; the number of users is taken variable. In all test cases, we chose packet flows where the source and destination nodes are sampled uniformly at random.
We have studied four scenarios via altering a subset of parameters in our model while fixing others, as illustrated in Table 1: “Requests” represents the total number of served requests, “Controllers” refers to the number of local controllers in our topology, “Users” is the number of users for each switch, and “Time” denotes the accumulation of configuration time and test time.
Figure 13 demonstrates the effect of varying the number of local controllers and users on the configuration time for the first two scenarios. We use blue bars to show the required time for variable number of controllers with the hosts per switch being fixed to 8 (scenario 1). We use red bars to illustrate the time needed for variable number of hosts per switch with a fixed number of 8 local controllers (scenario 2). It was observed that increasing the number of controllers requires more configuration time compared to increasing the number of users in the system for larger numbers, whereas the opposite was observed for smaller numbers. Additionally, a balanced time was reported when the number of controllers equals the number of hosts per switch (both equal to 8).
Figure 14 shows the total number of requests served by our system for 8 local controllers and 8 hosts-per switch over several time periods. It is noted that, by design, the system gives an equally likely execution time for each packet across several time periods so that the number of requests served increases linearly with time.
Figure 15 illustrates the total number of served requests and simulation test time across three different network configurations for a fixed total number of users (which equals the product Number of Local Controllers * Number of Switches per Local Controller * Number of Hosts per Switch). Observe that 8 local controllers with 8 hosts/switch serve a little bit more than 16 controllers and 4 hosts with less amount of time. This reveals the benefit of obtaining optimal performance with minimum configuration cost for balanced topologies.
We have proposed a software-defined architecture for Cyberphysical Systems and IoT applications. We have specified the main requirements for different IoT applications in terms of performance, security, quality-of-service and real-time operation. We have demonstrated how the proposed model exploits the computational power of a great number of system components (coordinators, controllers, sensors, and portable devices) to control systems in a scalable and flexible way while prioritizing cyber security. All components are implemented as software defined nodes inside the middleware layer, and control and information flow in both top-bottom and bottom-up fashions. Finally, we have built a simulation testbed tool in Python to measure the performance of the proposed model, and ran extensive experiments that reveal the main benefits of the proposed model.
- (1) Al-Ayyoub, M., Jararweh, Y., Benkhelifa, E., Vouk, M., Rindos, A., et al.: A novel framework for software defined based secure storage systems. Simulation Modelling Practice and Theory (2016)
- (2) Alur, R., Courcoubetis, C., Halbwachs, N., Henzinger, T.A., Ho, P.H., Nicollin, X., Olivero, A., Sifakis, J., Yovine, S.: The algorithmic analysis of hybrid systems. Theoretical Computer Science 138(1), 3–34 (1995)
- (3) Cardenas, A.A., Amin, S., Sastry, S.: Secure control: Towards survivable cyber-physical systems. In: 28th IEEE International Conference on Distributed Computing Systems Workshops (ICDCS), pp. 495–500 (2008)
- (4) Darabseh, A., Al-Ayyoub, M., Jararweh, Y., Benkhelifa, E., Vouk, M., Rindos, A.: SDSecurity: A software defined security experimental framework. In: IEEE International Conference on Communication Workshop (ICCW), pp. 1871–1876 (2015)
- (5) Darabseh, A., Al-Ayyoub, M., Jararweh, Y., Benkhelifa, E., Vouk, M., Rindos, A.: SDStorage: A software defined storage experimental framework. In: IEEE International Conference on Cloud Engineering (IC2E), pp. 341–346 (2015)
- (6) Darabseh, A., Freris, N.: A software defined architecture for cyberphysical systems. In: 4th IEEE International Conference on Software Defined Systems (SDS), pp. 54–60 (2017)
- (7) Darabseh, A., Freris, N., Jararweh, Y., Al-Ayyoub, M.: SDCache: Software defined data caching control for cloud services. In: 4th IEEE International Conference on Future Internet of Things and Cloud (FiCloud) (2016)
- (8) De Oliveira, R., Shinoda, A., Schweitzer, C., Rodrigues Prete, L.: Using Mininet for emulation and prototyping software-defined networks. In: IEEE Colombian Conference on Communications and Computing (COLCOM), pp. 1–6 (2014)
- (9) Duan, X., Freris, N., Cheng, P.: Secure clock synchronization under collusion attacks. In: Proceedings of the 54th Allerton Conference on Communication, Control and Computing, pp. 1142–1148 (2016)
- (10) Fawzi, H., Tabuada, P., Diggavi, S.: Secure estimation and control for cyber-physical systems under adversarial attacks. IEEE Transactions on Automatic Control 59(6), 1454–1467 (2014)
- (11) Fontes, R.R., Afzal, S., Brito, S.H., Santos, M.A., Rothenberg, C.E.: Mininet-WiFi: Emulating software-defined wireless networks. In: 11th International Conference on Network and Service Management (CNSM), pp. 384–389 (2015)
- (12) Freris, N., Borkar, V., Kumar, P.R.: A model-based approach to clock synchronization. In: Proceedings of the 48th IEEE Conference on Decision and Control (CDC), pp. 5744–5749 (2009)
- (13) Freris, N., Graham, S., Kumar, P.R.: Fundamental limits on synchronizing clocks over networks. IEEE Transactions on Automatic Control 56(6), 1352–1364 (2011)
- (14) Freris, N., Kowshik, H., Kumar, P.R.: Fundamentals of Large Sensor Networks: Connectivity, Capacity, Clocks, and Computation. Proceedings of the IEEE 98(11), 1828–1846 (2010)
- (15) Freris, N., Öçal, O., Vetterli, M.: Compressed Sensing of Streaming data. In: Proceedings of the 51st Allerton Conference on Communication, Control and Computing, pp. 1242–1249 (2013)
- (16) Freris, N., Patrinos, P.: Distributed computing over encrypted data. In: Proceedings of the 54th Allerton Conference on Communication, Control and Computing, pp. 1116–1122 (2016)
- (17) Freris, N., Zouzias, A.: Fast distributed smoothing of relative measurements. In: 51st IEEE Conference on Decision and Control (CDC), pp. 1411–1416 (2012)
- (18) Gubbi, J., Buyya, R., Marusic, S., Palaniswami, M.: Internet of Things (IoT): A vision, architectural elements, and future directions. Future Generation Computer Systems 29(7), 1645–1660 (2013)
- (19) Gungor, V.C., Hancke, G.P.: Industrial wireless sensor networks: Challenges, design principles, and technical approaches. IEEE Transactions on Industrial Electronics 56(10), 4258–4265 (2009)
- (20) Gupta, P., Kumar, P.R.: The capacity of wireless networks. IEEE Transactions on information theory 46(2), 388–404 (2000)
- (21) Hou, I.H., Borkar, V., Kumar, P.R.: A theory of QoS for wireless. In: IEEE INFOCOM, pp. 486–494 (2009)
- (22) Jain, R., Paul, S.: Network virtualization and software defined networking for cloud computing: a survey. IEEE Communications Magazine 51(11), 24–31 (2013)
- (23) Jararweh, Y., Al-Ayyoub, M., Darabseh, A., Benkhelifa, E., Rindos, A.: Software defined cloud: Survey, system and evaluation. Future Generation Computer Systems 58, 56–74 (2016)
- (24) Jararweh, Y., Al-Ayyoub, M., Darabseh, A., Benkhelifa, E., Vouk, M., Rindos, A.: SDIoT: a software defined based internet of things framework. Journal of Ambient Intelligence and Humanized Computing 6(4), 453–461 (2015)
- (25) Kim, K.D., Kumar, P.R.: Architecture and mechanism design for real-time and fault-tolerant etherware for networked control. In: Proceeding of the 17th IFAC World Congress, pp. 9421–9426 (2008)
- (26) Kim, K.D., Kumar, P.R.: Cyber–physical systems: A perspective at the centennial. Proceedings of the IEEE 100(Special Centennial Issue), 1287–1308 (2012)
- (27) Kim, K.D., Kumar, P.R.: Real-time middleware for networked control systems and application to an unstable system. IEEE Transactions on Control Systems Technology 21(5), 1898–1906 (2013)
- (28) Lee, E.A.: Cyber physical systems: Design challenges. In: 11th IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing (ISORC), pp. 363–369 (2008)
- (29) Lee, E.A., Seshia, S.A.: Introduction to embedded systems: A cyber-physical systems approach. MIT Press (2011)
- (30) Mitola, J.: Cognitive radio—an integrated agent architecture for software defined radio. Ph.D. thesis, Royal Institute of Technology (KTH) (2000)
- (31) Rajkumar, R.R., Lee, I., Sha, L., Stankovic, J.: Cyber-physical systems: the next computing revolution. In: Proceedings of the 47th Design Automation Conference, pp. 731–736 (2010)
- (32) Satchidanandan, B., Kumar, P.R.: Dynamic watermarking: Active defense of networked cyber–physical systems. Proceedings of the IEEE 105(2), 219–240 (2017)
- (33) Sopasakis, P., Freris, N., Patrinos, P.: Accelerated reconstruction of a compressively sampled data stream. In: 24th European Signal Processing Conference (EUSIPCO) (2016)
- (34) Srikant, R., Ying, L.: Communication networks: an optimization, control, and stochastic networks perspective. Cambridge University Press (2013)
- (35) Vlachos, M., Freris, N., Kyrillidis, A.: Compressive mining: fast and optimal data mining in the compressed domain. The VLDB Journal 24(1), 1–24 (2014)
- (36) Wette, P., Draxler, M., Schwabe, A., Wallaschek, F., Zahraee, M., Karl, H.: Maxinet: Distributed emulation of software-defined networks. In: IFIP Networking Conference, pp. 1–9 (2014)
- (37) Yampolskiy, M., Horvath, P., Koutsoukos, X.D., Xue, Y., Sztipanovits, J.: Taxonomy for description of cross-domain attacks on CPS. In: Proceedings of the 2nd ACM International Conference on high confidence networked systems, pp. 135–142 (2013)
- (38) Zoumpoulis, S., Vlachos, M., Freris, N., Lucchese, C.: Right-protected data publishing with provable distance-based mining. IEEE Transactions on Knowledge and Data Engineering 26(8), 2014–2028 (2014)
- (39) Zouzias, A., Freris, N.: Randomized Extended Kaczmarz for solving least squares. SIAM Journal on Matrix Analysis and Applications 34(2), 773–793 (2013)
- (40) Zouzias, A., Freris, N.: Randomized gossip algorithms for solving Laplacian systems. In: Proceedings of the 14th European Control Conference (ECC), pp. 1920–1925 (2015)