The increasing use of software has been allowing considerable changes in business management and the growing revenues of companies all over the world. Software architecture involves the structure and the organization of components and subsystems to interact among themselves, aiming at assembling systems . Developers often start coding an application without a previously defined software architecture, that is, without having a sense of how several components and connectors will communicate to assemble a new system. Hence, many projects adopt the traditional layered software architecture, also known as n-tier architecture, which separates the source code into modules and packages .
Unfortunately, this practice often results in a disorganized collection of source code with modules that lack clear roles, responsibilities, and relationships . This poor software development practice generates solutions that are fragile, difficult to change, and costly to maintain .
The Internet of Things (IoT) brings additional complexity to software development due to its inherent distributed nature and the massive number of heterogeneous connected devices (sensors and actuators) . The process of developing software architectures for IoT involves the interaction of a wide variety of components that play distinct roles. Although there are already initiatives to build these architectures, some challenges remain, such as[6, 7]: 1) connectivity: things need to be connected to follow the IoT paradigm; 2) languages and tools: although there are several programming languages, there is no tool capable of orchestrating thousands of devices in a real-life scenario; 3) natural dynamics of IoT systems: the emergence of new communication protocols, new devices, and new programming languages renders the development of IoT solutions more complex [7, 8].
The development process of IoT-based applications can be enhanced to meet new challenges by using patterns, where a pattern is a description of a set of predefined subsystems, written by experienced developers, for promoting best programming practices . In other words, when a particular type of recurring problem is solved by many developers in a similar manner and accepted by many other developers, this becomes a pattern. For example, Model-View-Controller (MVC), an architectural pattern created for web applications and implemented by several programming languages, proposes a clear separation into three layers: 1) Model: handles data processing and business rules; 2) View: renders the graphics components and notifies the Controller layer; 3) Controller: defines the application behavior regarding data input/output and application browsability .The use of patterns has several advantages, such as [6, 11]: a) software reuse becomes easier since patterns provide vocabulary, and increase the understanding of project solutions; b) standard names become part of a widespread design language, as they eliminate the need to explain a solution to a given problem with a long description; c) patterns become a means to document software architectures. In other words, patterns contribute to software development with defined properties .
. There are several IoT software solutions in the literature, but no work classifies and justifies the components in order to facilitate software architecture decisions for IoT application developers. In general, these only show an ad-hoc resolution of a specific problem, with no explanation and justification of the choices of software components and connectors. By organizing the components and connectors, as proposed in this paper, it is possible to have a clearer view of how potential components can connect and compose a software architecture to solve a specific problem.
This paper classifies architectural patterns into data ingestion, data interaction, data integration, data processing, data visualization, and data security. This classification allows the developer to have a clear view when choosing which software components are best suited for a specific IoT application, such as cities, agriculture, health, and industry. We exemplify the use of some identified components and connectors with three examples of applications in cities, buildings, and agriculture. Our primary purpose is to review the literature regarding software architectures for IoT and, therefore, propose a classification for components related to the development of IoT applications. The importance of this classification is because, as previously stated, the choice of software components and the connectors between them is an essential factor for better decision making, and also brings more clarity to viewing and understanding the software architecture. Therefore, this paper has the following contributions:
IoT Architectural Patterns: A classification of existing software architectural patterns and how they relate to the inherent characteristics of IoT.
Software Components for IoT Deployment: Based on the classification of IoT architectural patterns, a set of essential IoT software components and connectors is presented.
Case Studies: Three case studies demonstrate the contribution of patterns, components, and connectors to the literature of the area and to guide future developments of IoT-based applications.
The following sections are organized as follows. Section 2 reviews the literature, offering a discussion of IoT-related work, and Section 3 presents the methodology used in the development of this paper. Section 4 presents the classification of eminent architectural patterns related to IoT, and Section 5 presents a set of software components and connectors, based on the previous classification, which can be used to organize and improve the choices of a software developer or architect. In section 6, there is a study on some software architectures known in the literature. Section 7 shows a study with three classes of IoT applications, from which software architecture patterns are extracted from the literature. Section 8 presents a discussion of lessons learned, as well as challenges for the future, while section 9 presents the final remarks.
2 Background and Related Work
There are some widely accepted definitions of Software Architectures, such as the one given by Kruchten , where “software architecture encompasses the set of significant decisions about the organization of a software system.” More formally, software architecture can be defined as the set of structures needed to reason about the software system, which comprises the software elements, the relations between them, and the properties of both elements and relations . Software Architectures also play a fundamental role as a bridge between requirements and implementation. The ISO 42010 standard  defines requirements on the description of system architectures, focusing on views as key integral part of the architecture description that addresses concerns of stakeholders.
IoT introduces a new level of complexity for software development due to its inherent distributed nature with a vast amount of devices and, therefore, a new breed of software architectures is required . The process of building software architectures for IoT involves the interaction of a variety of components with different roles. The IoT-A project proposed an architectural reference model and a preliminary set of buildings blocks to promote a fully interoperable and scalable vision of IoT . The foundation of the IoT-A Reference Model is the IoT Domain Model, which introduces the main concepts of the Internet of Things like Devices, IoT Services, and Virtual Entities (VE), as well as the relations between these concepts. The abstraction level of the IoT Domain Model also has been adopted in this work due to its concepts are independent of specific technologies and use-cases.
Building interoperable IoT services and applications requires a set of middleware components and system development and deployment tools for rapid software development. In order to avoid developing extremely focused and vertical IoT applications not able to interact with each other, common and generic middleware services used by different application domains become necessary. Razzaque et al.  identified a variety of requirements in different categories for an IoT Middleware, which are also aligned with the functionality groups identified by IoT-A. One of these IoT requirements is to offer a means for enabling the cooperation between objects and humans and creating awareness about the surrounding environment (context awareness) in a fully connected environment. Context-aware systems can be defined as systems that adapt their behavior to the current context conditions without explicit user intervention . Currently, with the advent of the Internet of Things and the countless applications in different areas, context-aware management has been increasingly used within the scope of big data analytics [20, 21, 22].
Lee et al.  proposed five design patterns related to the security of IoT systems, which are secure logger pattern, input validation pattern, secure directory pattern, secure adopter pattern, and exception manager pattern. Qanbari et al. 2016  proposed design patterns for applications that use edge computing: 1) edge computing pattern to handle the provision of all edge devices automatically; 2) source code deployment pattern for edge devices to handle the deployment of the code to all devices connected to the IoT system; 3) edge orchestration pattern to handle the automation of creation, monitoring, and deployment of resources in the IoT Environment. Brambilla et al.  proposed a set of design patterns for user interaction with IoT applications. Graphical user interface patterns are addressed, which involve the development of solutions for IoT, as well as an Interaction Flow Modeling Language (IFML) to express content, user interaction, and front-end behavior control of software applications. This paper recognizes the importance of design patterns for data visualization but addresses the theme more broadly.
Reinfurt et al.  studied many IoT solutions and extracted five lower-level design patterns for IoT devices: 1) device connection pattern on the network; 2) rule-based inference pattern, whose rule specification language exempts the user from knowing programming languages; 3) device activation trigger pattern, used to send messages to devices that are not connected to the server but can listen to the messages; 4) wake device pattern, which allows disconnecting devices from the network to reconnect them when needed; 5) remote lock and cleaning pattern, used to control stolen or missing devices. Koster  briefly discusses some design patterns for connected devices, IoT use cases, information models, interaction, application programming, infrastructure, and IoT security. These design patterns provide a better solution for building an architecture model for IoT. However, Koster only addressed IoT-related design patterns but did not relate design patterns to components of IoT software architectures, as presented in this paper.
Silva et al.  discuss the main software architecture projects being proposed in the literature, mainly an approach to the requirements involved in each project and their respective software architectures. The authors also discuss the importance of software requirements in Smart Cities. Although this paper discusses some components, it does not classify and associate them with patterns of existing software architectures. In comparison, in this paper, our discussion is focused on components and connectors of an IoT software architecture. Yin et al.  also address projects for smart cities, presenting an understanding of different smart city domains: government, citizens, business, and environment. They also analyze software architectures with particular attention to the data. They present some research challenges and propose a four-layer software architecture: data acquisition, data vitalization (the relationship between physical and virtual data), data-related services, and application domain. Yin et al. discuss theoretical aspects of smart cities and do not contextualize practices of choosing components and connectors of a software architecture.
Ray et al.  present the state of the art in IoT and the open problems associated with IoT software architectures, also discussing essential concepts behind them. The work focuses on specific architectures of the area of IoT applications, highlights the challenges and enables future research opportunities in software architectures and IoT as a whole. Mahmoud et al.  describe a three-layer architecture - perception, network, and application layers - and present different challenges related to security and IoT devices. They also present the IoT security state of the art and future work related to IoT. Here, the discussion is focused on components, and connectors of an IoT software architecture, whereas Mahmoud et al. discuss some components, but do not classify and associate them with existing architectural patterns.
Santana et al.  analyzed 23 software platforms for smart cities based on functional and non-functional requirements, and classified them into four categories: Cyber-Physical Systems, Internet of Things, Big Data, and Cloud Computing. As an outcome of this study, they proposed a reference architecture to guide the development of next generation smart cities platforms, highlighting a variety of system domains that may facilitate software development, such as urban mobility, air pollution and healthcare. While this paper has a role in the development of smart cities platforms, it follows a traditional approach based on requirements. On the other hand, here we focus on different architectural patterns for the development of IoT smart applications, not only for smart cities.
Finally, the well-known survey of Atzori et al.  presents algorithms, protocols, and solutions for IoT, as well as open challenges. They discuss different views of the IoT paradigm, according to the evolution up to 2010, describing the main classes of possible applications. However, they do not address software architectures and patterns for IoT application development because, at the time of their publication, the challenge was understanding potential applications, protocols, and technologies.
The papers mentioned above made significant contributions to literature and practice as they focus mostly on IoT devices and smart city concepts. However, they do not focus on building software architectures and development patterns for IoT applications, with a study that covers the iteration between components and their connectors. Instead, this paper addresses design patterns for developing IoT systems more broadly, including aspects such as data ingestion, interaction, storage, visualization, and processing. To the best of our knowledge, the existing literature does not cover together patterns of software architectures, components and, connectors, as well as examples of IoT applications. Our goal is to properly contextualize this area, which is likely to show significant growth in the coming years and to guide software developers to build advanced IoT applications.
Since this paper seeks to present a classification of architectural patterns required to implement an IoT solution, a methodological approach based on hands-on learning gained from real IoT usage scenarios has been adopted. In particular, this work builds on the experience gained from two international projects in tool development: a) IMPReSS  and b) SWAMP .
The IMPReSS project focused on the efficient management of electricity in public buildings, but it is also possible to apply it in scenarios that aim to make society smarter. Through this project, a platform was created that allows quick development of applications for context-sensitive scenarios. This platform comprises a variety of components to render the task of developing IoT-enabled applications more straightforward, including tools for agile development of user interface, data storage with recognition, context analysis and management, mixed-criticality management, and wireless IoT communication management . The experience with IMPReSS has brought two critical learnings. Firstly, arbitrary decisions of patterns, architectures, and software components do not necessarily meet the needs of IoT applications. These decisions came from experiences with developing applications with different IoT characteristics, and the resulting solutions did not deliver the expected results. For example, achieving performance compatible with IoT applications with thousands or millions of sensors required many adjustments to the architecture and connection of the components , . Secondly, the pattern of connection between software components has a significant influence on the performance and the scalability of solutions, which is often overlooked in the literature. For example, with the IMPReSS context-aware manager, many combinations of components and connectors have been tested to attain a performance compatible with IoT needs .
More recently, the SWAMP project seeks to develop IoT-based methods and approaches for smart water management in the precision irrigation domain and to deploy the results obtained by the project in four places: two pilots in Europe (Italy and Spain) and two pilots in Brazil. Besides, this project aims to improve precision irrigation by monitoring field status (size, growth phase) based on crop and environment, and adjusting the irrigation plan. Water management pilots aim to ensure that the technological components are flexible enough to adapt to different contexts and to be replicable to different locations and configurations . The experience with SWAMP, still under development, is complementary to IMPReSS. Firstly, simpler architectures centered on the efficient data distribution (for example, using FIWARE Orion Context Broker), sharing data processing and storage among different software components, add strength and scalability to solutions. Secondly, the environment where IoT applications run is inherently distributed, involving different devices and locations for processing and use. The sensors and actuators are installed on the field (farms) and communicate with intermediate elements, called IoT gateways or radio gateways, which may or may not use fog or edge computing support. Data is sent to the cloud for processing according to the application’s models (irrigation) and the consequent generation of user interface services.
Hands-on experience with the development of IoT applications for smart buildings and precision irrigation has led to the need for extensive state-of-the-art study (still in an early state), experimentation with different technologies and concepts not yet fully consolidated and understanding how choices affect functionality and performance. In other words, best practices were understood and classified during the process of developing IoT applications. In general, the literature presents the use of IoT devices (sensors and actuators), device technologies (e.g., Arduino, Raspberry Pi, Gateways), wireless communication technologies (e.g., LoRaWAN), protocols (e.g., MQTT, CoAP), IoT platforms (e.g., FIWARE, AWS IoT, Google IoT) and data management and processing systems designed for applications such as online social networking (e.g., Kafka, Spark). Design choices and component connection patterns are often arbitrary, meaning that there is no apparent justification for the need, requirements, suitability, and trade-offs of existing solutions to address the challenges of the applications. The purpose of this paper is to shed some light on this discussion and provide insights to guide and justify decisions related to IoT application development projects.
From the lessons learned from these research papers, a classification for design patterns has been defined, consisting of seven classes of patterns, which should be analyzed, considered, and investigated during the structuring, design, and development of a solution for a real IoT problem.
4 Architectural Patterns
Here we explore the following aspects considered necessary for IoT software development, compiled from our experience and relevant literature in the area:
Data Ingestion: specify the management of message input, which also involves output and transmission among components.
Data Interaction: explores how components exchange messages in a system.
Data Storage: relates to data storage and retrieval in a system.
Data Integration: involves computational techniques to combine data from different sources .
Data Processing: focuses on data processing, whether for decision making or transformation of data into information.
Data Visualization: explores techniques for the visualization of information by users.
Data Security: explores methods and techniques for enabling applications to protect their data and physical devices.
These aspects are detailed as follows in the context of IoT software architectures.
4.1 Data Ingestion
In most scenarios involving real applications, data input, or ingestion plays a vital role. Data ingestion is characterized by the existence of many data sources , in which processing becomes more complex as the number of data sources increases. Some of the main patterns for data ingestion are:
Multisource Extractor: refers to the ingestion of multiple data sources efficiently . The multisource extractor pattern is recommended in scenarios where large data collections are available in different application domains, and it is necessary to investigate these data sets generally found in databases that do not follow the relational model [40, 41]. Generally, in large data ingestion systems, enrichers are used to aggregate and clean the initial data. A reliable enricher transfers, validates, reduces noise, and compresses files and transforms them into a native format to deliver a representation that is easy to interpret . For example, Bashir and Gill  used the concept of enrichers when storing data coming from sensors in the Hadoop Distributed File System (HDFS) with Apache Flume as the message bus.
Protocol Converter: employs a mediation component to provide an abstraction for data received from different protocol layers . The protocol converter pattern is applicable in scenarios with a wide range of unstructured data from sources using different protocols and data formats. This situation often occurs in IoT, as sensors from different manufacturers with different communication technologies transfer data in different formats. Conversion is required when data sources use several different protocols to standardize the structures of many different messages for making it is possible to analyze the information using an analysis tool. This pattern is common in IoT middleware , which tends to convert several communication protocols transferring data from many sources, such as sensors. Marosi et al.  have created an IoT software architecture for collecting weather data, images, and soil data to enable precision agriculture. Conversion protocols allow storing information coming from these many sources.
Multidestination: used in scenarios where processing components need to transport data to many storage destinations, such as Hadoop Distributed File System (HDFS), data lakes , or real-time analytical engines . The multidestination pattern is similar to the multisource ingestion pattern. Cenni et al. 
Just-in-Time Transformation: large amounts of unstructured data are processed into batches using traditional ETL (Extract, Transform, Load) tools and methods. However, in just-in-time, data is transformed only when needed to save computing time, as described in Section 6.1. Colmenares et al.  introduce a data storage system capable of ingesting sensor data at very high rates and query times suitable for large-scale smart applications. This work used global and local data structures to store data in storage components with indexes updated according to the insertions. Thus, data transformation occurs as a data set is inserted.
some issues require instant analysis of the data at the moment it is generated. Under these circumstances, ingestion and real-time analysis of streaming data are required. Ta-Shmaet al.  proposed an architecture for real-time data ingestion and historical data processing. Typically, IoT applications have a requirement to respond to real-time events based on the knowledge of past events. Real-Time Streaming design pattern is used to process events at a pace compatible with the needs of smart applications.
4.2 Data Interaction
Data interaction patterns describe how different components of a system interact and communicate with each other, including communication protocols. Data Interaction is different from Data Ingestion because the former focuses on the best way to establish a communication between two system components, whereas the latter aims at understanding the mechanisms used to enter data into the system. Some data interaction patterns are presented below.
proposed InterSCity, an open-source platform for smart cities based on microservices. Its goal is to provide a high-quality, modular, scalable, and reusable middleware infrastructure to support smart city solutions. All microservices offer communication via RESTful based on the Request/Response pattern. Almeida et al.  present the Thing Broker, a Web of Things platform that provides RESTful interfaces  using a set of abstractions to enable communication with Twitter based on the Request/Response pattern via the Twitter Streaming API.
propose a generic open-source architecture based on components to combine machine learning with data processing to predict complex events for IoT applications. Apache Node-RED provides the ingestion of data from different sources, such as MQTT  or RESTful API. After the data is ingested via Node-RED, it is published in Apache Kafka  to be accessed by machine learning algorithms . Apache Kafka is an example of an asynchronous messaging component.
Publish/Subscribe: an efficient data distribution pattern by reducing network traffic that allows a publisher to send a message only once to an intermediary, usually called broker, which in turn sends the message to the subscribers . MQTT is one of the most commonly used IoT protocols today to bring data from devices connected directly to the Internet through an IP address and deploys the publish/subscribe communication model. Chen and Lin  implemented an MQTT Proxy in their RESTful architecture, comparing latency and performance between protocols. However, they do not discuss how to implement a RESTful-like functionality with MQTT. Zyrianoff et al.  used the MQTT protocol to send data from the sensors to the data fusion module.
Synchronous Messaging: allows software components to send messages synchronously in real-time to other software components. Zyrianoff et al.  conducted a study on the impact of storing data synchronously (the application waits for a response from the database) or asynchronously (the application queues data to be stored in the database by another service and resumes its activity). In other words, whenever an application waits for a confirmation from the database that the data was stored, the message exchange between the components is synchronous.
4.3 Data Storage
The need to store large volumes of data from different sources has forced databases to follow new rules of relationships and integrity, which differ from the traditional relational database management systems (DBMS). Some Data Storage patterns that can be used in IoT systems are presented below.
SQL: Relational databases are based on query language (SQL) to define and manipulate data, which is extremely powerful: SQL is one of the most versatile and widely used options, being a safe choice and especially suited for complex queries. SQL requires the use of predefined schemas (visual and logical architectures) to determine the structure of the data before working with it. It also requires all the data to follow the same structure .
In most situations, SQL databases are scalable vertically, allowing a performance boost when loading into a server by improving aspects such as CPU, RAM, or SSD . Phan et al.  studied different types of cloud databases, assessing and comparing databases, as well as pointing out their differences in performance, usage, and complexity. Their focus was on assessing the most common types of IoT data, with extensive experiments using four prominent databases, including MySQL.
However, relational DBMSs follow ACID rules of atomicity, consistency, isolation, and durability, which make the database reliable for its users . Storing and retrieving large volumes of data generated by IoT applications collide with ACID properties, which processes data slower than it is generated . For addressing these new challenges brought by IoT, new data storage patterns have been used.
NoSQL: NoSQL brings together a range of databases that do not follow the relational model, and thus eliminates a fixed schema, avoids joins and facilitates scalability because, in SQL, schemas are predefined to determine the structure of the data before working with it, and all the data must follow the same structure. On the other hand, NoSQL databases have a dynamic schema for unstructured data, storing data in many different ways . NoSQL stands for "Not Only SQL" or "Not SQL" . While relational databases use SQL to store and retrieve data, NoSQL encompasses a wide range of database technologies to store structured, semi-structured, unstructured, and polymorphic data .
Key-value databases store data as simple key and value pairs. The keys are unique and have no restrictions. Besides, this technology does not embrace concepts such as foreign key and integrity. Key-values are suitable for parallel searches because the data sources do not have relationships between them. Due to the lack of referential integrity, integrity must be managed by the applications using the data .
Column-oriented databases have many columns, each with its key, for each tuple. Related columns have a column family qualifier to enable joint retrieval during a search. Since each column also has a key, these databases are suitable for fast writes .
Document databases store text, media, JSON, or XML data. This type of NoSQL database is suitable for cases where it is necessary to search for many documents for a specific query .
Graph databases store data entities and connections between them as vertices and edges, to which graph and social network algorithms and metrics may be applied, such as shorter paths and centrality . Cecchinel et al.  describe a software architecture that collects IoT data. In such a situation, the data is sent by the sensors and stored in storage components. This architecture faces several challenges, e.g., data storage, avoiding processing bottlenecks, sensor heterogeneity, and high productivity. To deploy the database, we opted for MongoDB, which is a NoSQL database.
HDFS: Hadoop Distributed File System (HDFS) is the primary data storage system used by Hadoop, supporting high-speed data transfer rates between compute nodes. Initially closely linked to MapReduce, a programmatic framework for data processing [71, 72], today, it is also adopted by modern big data processing systems.
When HDFS collects data, it divides the information into separate blocks and distributes them to different nodes in a cluster, thus enabling a highly efficient parallel processing. Fault-tolerance was also a design guideline for HDFS. The file system replicates or copies each piece of data many times and distributes the copies to individual compute nodes, placing at least one copy in a server rack different from the others. As a result, data on unavailable compute nodes can be found elsewhere within a cluster, ensuring continuity of processing while data is retrieved [71, 72].
HDFS uses a master/slave architecture. In their early versions, each Hadoop cluster consisted of a single master (NameNode) that managed file system operations and supported slaves (DataNodes) that managed data storage on individual compute nodes [71, 72, 73]. Miner and Sook  present different design patterns involving MapReduce, including some that use HDFS to store and distribute data processing.
Polyglot: systems can use multiple types of storage, such as relational databases, files, HDFS, and NoSQL. Liu et al.  proposed a model for smart cities where data management uses ingestion tools to process large amounts of data. To create this ETL-based data ingestion tool, the authors used many storage components and deployed the Polyglot pattern.
This section discusses the major design patterns for data storage. However, it is also necessary to integrate this data, whose patterns are presented in the next section.
4.4 Data Integration
Data moves across systems but is not always in a standard format. Data integration aims to make data usable across all systems of interest so that they can be accessed and manipulated by their composing subsystems . Data integration patterns facilitate data usability by standardizing the integration process. Some data integration patterns are:
Migration: migration means moving a dataset from one system to another. A migration contains a source system in which data resides before execution, a criterion that determines the scope of the data to be migrated, a transformation through which the data set will go through, a destination system into which the data will be inserted, and a feature to capture the migration results to know the final state versus the desired state . Migrations are essential for all data systems and are used extensively in any organization that has data operations.
Aggregation: aggregation is the act of collecting or receiving data from multiple systems and entering it into a single system. The aggregation pattern derives its value from extracting and processing data from multiple systems into a unified application . As an outcome, data is up-to-date at the exact moment it is needed, is not replicated, and can be processed to generate the desired dataset. The aggregation pattern is ideal for creating orchestration APIs to modernize legacy systems, especially for APIs that get data from multiple systems and then process the data into a single response. Bellini et al.  proposed a smart city architecture that includes a data aggregation layer using a multidomain ontology, creating a knowledge base for the city (some sort of expert system capable of inference).
4.5 Data Processing
Data processing is a conversion of raw data into meaningful information. Data is technically manipulated to generate results that guide and help to solve a problem or improving an existing situation.
MapReduce: MapReduce  is a model developed for programming parallel applications using the divide-and-conquer paradigm  with two phases. In the first phase, called Map, the data is separated into key and value pairs, divided into fragments, and distributed to the compute nodes for processing. In the second phase, called Reduce, the processing performed by compute nodes by a master node 
is combined, which generates a single response to the request made by the user. MapReduce is an adequate solution for a wide range of domains, including data mining and machine learning, social media analysis, financial analysis, image retrieval and processing, simulation, website tracking, machine translation, and bioinformatics. Currently, MapReduce is considered one of the most important parallel programming models for distributed environments.
Apache Hadoop  is the most widely used open-source MapReduce implementation. It can be adopted for developing distributed and parallel applications using different programming languages. Hadoop relieves developers from having to deal with classical distributed computing issues, such as load balancing, fault tolerance, data locality, and network bandwidth saving .
Qureshi et al.  propose storing data block replicas in edge components and running MapReduce to group the data processed by the edge, thus reducing the network workload of moving large data sets and reduces latency on the network.
Functional: this is a programming paradigm that treats computing as an assessment of mathematical functions and avoids changeable states or data. It emphasizes the application of functions, in contrast to imperative programming, which emphasizes changes in the program state [83, 84]. Functional programming is becoming the emerging paradigm for distributed data (big data) processing systems such as Apache Spark  and Flink , which use functional interfaces to make it easy for programmers to write data in applications in an uncomplicated and declarative manner. In functional programming, interfaces are specified as functions applied to the input of data sources. Compared to object-oriented programming, functional programming is more compact and intuitive to represent transformations based on data and applications .
Feng et al.  propose an IoT-based processing system of hydrological rainfall data using Apache Flink . Experimental results show that the processing capacity of the hydrological data processing system using Apache Flink is much higher than the traditional multilayered architecture system based on Java EE (Enterprise Edition) or pure NoSQL databases. The authors consider this solution suitable for the automation of water conservation systems.
Statistical: statistical programming refers to computing techniques that assist in data analysis. Statistical programming packages offer a wide variety of techniques for exploring large data sets and creating charts to improve understanding of the results. These packages support statistical techniques such as linear and nonlinear modeling, classification, grouping, and time series analysis .
Nesa et al. 
present an IoT architecture for detecting errors and events in a forest environment with the help of four statistical models, considering the data spatial and temporal dependencies. Simulation results show that models can effectively detect both types of outliers, with an accuracy of up to 100% for error detection, and up to 98.51% for event detection.
SQL-Like: NoSQL databases address several issues related to data storage and management, but in many cases, they are not suitable for performing analyzes . Although systems based on MapReduce can handle scalability issues and decrease query times, users with little knowledge of this approach spend much time to perform simple operations such as an addition or calculating an average. To address this scenario, some systems have been developed to facilitate access to query resources of systems based on MapReduce, such as Hadoop, and development of data analysis applications using a language similar to SQL .
Grover et al.  benchmarked several big data SQL-type technologies on the Hadoop Distributed File System (HDFS) used in clinical trial databases.
Data Fusion: this is a technique that combines data from multiple sources and associates them to increase accuracy and improve inferences when compared to the information obtained from only one data source [37, 91, 92]. Zyrianoff et al.  and Kamienski et al.  used data fusion to combine data from temperature sensors. This fusion was based on the average in a smart city scenario. By merging the data, it was possible to preprocess data before sending it to the next component of the software architecture in the smart city scenario.
Rule Engine: this is a solution for managing business rules in constant evolution by separating the knowledge of the business rules from their deployment in the system of interest . Zyrianoff et al.  and Kamienski et al.  used a business rule inference engine to verify what action an actuator inserted in a smart city scenario should take based on the data processed by the data fusion component. Thus, business rules can be changed at any time because they have been separated from the logical deployment of the system to manage a smart city scenario.
Bashir et al.  analyzed large amounts of smart building data using sensors to measure the concentration of oxygen and other gases in indoor environments. The authors propose a three-layer framework: IoT sensors, data management, and data analysis, and used a processing component analyzes data stored in HDFS in real-time. If the oxygen concentration level captured by the sensor is within a threshold preset as comfortable, no action is required. Otherwise, an oxygen pump is activated until oxygen levels are within a comfortable limit for the user.
4.6 Data Visualization
Data visualization has changed in recent years, evolving from a simple visual representation into analysis techniques to aid the interpretation of data in a broader sense.
Mashup View: Mashup View is used to maximize query performance by storing a view of mashups aggregated in the storage layer . This data visualization pattern reduces analysis time by aggregating the result into a storage layer. Blackstock and Lea  propose a platform that provides a simple way for users to find, control, view, and share device data. This platform is based on mashups. Soukaras et al.  present a development platform with a toolkit called the IoTSuite, consisting of an editor, a compiler, and a development module connected to a mapper and a link connector. This development platform is based on mashups to connect all IoTSuite components.
Portal: an organization that has a corporate portal can follow this pattern and reuse it for big data visualization. Merlino et al.  propose an extension of OpenStack in a smart city scenario to manage sensors, analyze data, and provide a real-time sensor and data visualization panel.
4.7 Data Security
Implementing security measures is critical to ensure the proper operation of networks carrying data from IoT devices. Several challenges prevent the protection of IoT devices and end-to-end security in an IoT environment. Security was not always considered a top priority during the product design phase. Besides, since IoT is a new and expanding market, many product designers and manufacturers are more interested in getting their products to the market quickly rather than taking the necessary steps to ensure security since the design phase [99, 100].
A significant vulnerability in IoT systems is the use of standard or embedded passwords, which may lead to security breaches. Even if passwords are changed, they are usually not strong enough to prevent hacking. Another common problem faced by IoT devices is the restriction of computational resources to implement strong security, which generates weaknesses across multiple devices. For example, sensors that monitor soil moisture or air temperature cannot handle advanced encryption or other security measures. Also, because many IoT devices are configured and forgotten - that is, they are put in the field and left there until the end of their life - they rarely receive security updates or patches [99, 101, 102].
The lack of industry-accepted standards also undermines security in IoT. Although there are many proposed IoT security approaches, there is no consensus on a preferred one. Large companies and organizations may have their specific standards, while specific segments, such as industrial IoT, have proprietary and incompatible industry-lead standards. The diversity of standards makes it difficult not only to protect systems, but also to ensure interoperability between them . Some security patterns proposed by the literature are presented as follows:
Authentication: authentication enables the integration of different IoT devices and their deployment in many scenarios such as smart cities and smart agriculture. Authentication involves validation between peers of IoT devices before exchanging information to ensure that the data source is legitimate, i.e., devices of interest to the application [103, 104]. Gubbi et al.  focused on a standard authentication scheme for IoT between different layers and terminal nodes. The scheme is based on hashing and extraction of shared elements to prevent interference attacks. This scheme essentially provides an adequate security solution for authentication in IoT. The extraction procedure comprises some irreversible properties that guarantee security in the IoT domain.
Authorization: authorization involves allowing access rights to resources such as sensors or data. Data should be safe and accessible only to authorized users and systems. Gaur et al.  proposed the authentication of IoT sensor nodes that relies on a unique coding request and response scheme. The scheme uses a preshared matrix, applying a cipher of a variable when the communication involves many parties. Each communication (message exchange) between the parties is encrypted using a node key and identifier with a timestamp.
Physical Security: there is a concern about how to protect memory software and its vulnerabilities at runtime. Solutions to this problem may have to consider the specific programming languages used by IoT devices, such as the use of TinyOS . Using software management such as patching, software firmware, and remote updates can help to physically protect IoT devices [103, 104].
Data privacy is another critical issue for IoT business procedures, and practical solutions remains a challenge . The privacy of user data must be ensured by design as users need maximum protection for their personal information. Transferring and ensuring data privacy between different nodes on a heterogeneous IoT is a challenging problem because different network nodes have different trust criteria . The European General Data Protection Regulation (GDPR) adds urgency to the need for providing strict privacy guarantees to users of IoT applications . Lu et al.  propose aggregation of data for computing to perform source authentication on devices at the network edge in order to pre-filter false injected data.
The previous topics described subclasses of IoT software architectural patterns involving IoT. Based on this classification, the next section presents a set of components and connectors that can help software developers and architects to design a solution to a problem involving IoT. This classification allows developers to choose more clearly the components they will use. In the literature, there is a set of articles that propose the solution to a problem and mention which patterns they used. However, most authors only describe the architecture and components chosen, not mentioning, or barely mentioning, the grounds for such a choice. Thus, although new IoT application developers find references to architectures, patterns, and components used in different projects, no compilation nor classification simplifies learning about the development of applications for IoT. The idea here is to help developers choose components for these architectures.
5 Components for IoT Software Architectures
In this section, a set of components and connectors illustrates the design of IoT software architectures according to the pattern classification introduced in Section 4. Here, a component is understood as an independent element, which can be replaced, but is significant because it has a clear function in its specific context. We chose an IoT problem from a scenario based on Kamienski et al. , in which a public building has sensors and actuators that send messages to a context manager, where it can fuse data and infer rules that dictate how the actuators behave, to illustrate some relevant components.
In general, devices are sensors, actuators, and gateways, including mechanical, electronic, and computational components. Electronic devices that house sensors and actuators are known as constrained nodes and are classified into two categories: microcontroller-class devices and general-purpose devices . Microcontrollers often include RAM and on-chip code storage, offer limited support for general-purpose operating systems, and are generally used to deploy sensors and actuators. General-purpose devices, such as Raspberry Pi, often have RAM and SSD storage on separate chips, offer support for general purpose operating systems, and are used to deploy IoT Radio Gateways .
In IoT scenarios, sensors are data sources of ingestion patterns presented in section 4.1, particularly of the Multisource Extractor Pattern. They are essential components for any IoT scenario, as this component captures data to be processed and transformed into information. Actuators receive a command and execute an action in the context of the applications. For example, a lamp receives a command to turn on the light. The use of sensors, actuators, and gateways, as well as the constrained nodes that implement them, is a pattern applicable to all IoT applications.
In addition to microcontrollers and general-purpose devices, IoT applications use a wide variety of equipment that cannot be strictly classified as restricted nodes, such as smartphones, laptops, desktops, and servers . These devices also have restrictions, but at different levels than constrained nodes and are used for different functions in an end-to-end view of a system.
5.2 Data Fusion
The definition of data fusion has been the subject of intense discussion among authors , but, in general, it is related to the process of integrating multiple data sources to generate information that is more consistent, accurate and useful than it would be if generated by only one single source. Thus, the information is of better quality or more relevant [115, 116], where quality and relevance depend on the application. Examples of software components that can be used to perform data fusion are Esper , Apache Flink , and Apache Spark . Esper was used in Kamienski et al.  to process data from sensors and report the average temperature of a classroom so that a Reasoning Engine could process actions on the devices, which, in this case, were air conditioners.
Some components, such as Apache NiFi, perform data fusion in addition to their primary function, which, in this case, is data pipeline. This component can be associated with the Data Processing pattern, described in section 4.5 as it processes data and transforms it into information, and may reveal patterns in the analyzed data.
5.3 Reasoning Engine
A rule inference engine is a component that executes one or more business rules in a running production environment. The business rules system allows company policies and other operational decisions to be defined, executed, and separated from the application code. Typically, inference engines support rules, facts, punctuation, mutual exclusion, preconditions, and other functions. The most widely used inference engines are JBoss Drools , OpenRules , and ThingsBoard . This component can be associated with the Data Processing design pattern presented in section 4.5 and used as an example in works Kamienski et al.  and Pramudianto et al. . It can be physically located in the cloud on machines with high processing power or edge devices with low processing power. For example, this component was used by authors Kamienski et al.  to process information coming from the fusion component and to take some action, such as turning off a device.
In the context of web services, the throttle is a component that limits connections arriving at the web server . In the context of IoT, the throttle is used to prevent large streams and data from being sent to the next components. When a component sends multiple identical messages to another component, a component with the throttle function may prevent the next component from becoming overloaded. It can be used to implement design patterns related to data analysis presented in section 4.5.
This component can be associated with the Data Processing design pattern presented in section 4.5. Hiromoto et al.  discussed security in IoT as chain risk management. A throttle-based software architecture manages incoming and outgoing messages according to the security alerts generated by the security component. Zyrianoff et al.  saw the need for a throttle component in IoT scenarios because vast amounts of messages are sent from the fusion component to the reasoning engine component, where it processes and sends several actions to devices, which are often repeated and overload these devices. Thus, the throttle is a component that limits the messages sent from the inference engine to the devices. Although this component is not implemented by Zyrianoff et al. , the authors are aware of the utmost importance of using this component through hands-on experiences in the IMPReSS Project.
5.5 Ingestion Components
This component is responsible for the process of obtaining, importing, and processing for later use in a data repository. Data, especially unstructured data, is moved from the location it originated to another component. This process usually involves altering individual files by editing their content and formatting their structure. This component can be associated with the Data Ingestion architecture pattern presented in section 4.1.
Data can be transferred in real-time or ingested in batches. When data is ingested in real-time, the data reaches its destination almost immediately after leaving the source. When data is ingested in batch, data blocks are consumed within a time interval.
As IoT devices increase, volume and variation of data sources are expanding quickly. Thus, extracting data for use by a target system is a significant challenge in terms of time and resources. Other issues faced by data ingestion components are: 1) several data sources are being born with data formats different from the relational model and these formats are changing at a great speed and often without warning; 2) in the face of different data formats, ingesting them at a reasonable speed and processing them more efficiently to make better business decisions; 3) detection and capture of changed data due to the semi-structured or unstructured nature of the data and to the low latency required in some real IoT scenarios [124, 125].
This data ingestion component was used in the works of Kamienski et al.  together with an MQTT Broker to ingest the messages coming from the sensors and to send these messages to the data fusion component.
5.6 Interaction Components
This component is responsible for taking data from one component to another, which implements the interaction patterns described in subsection 4.2.
In systems with multiple components, communication between them is usually through messages, which are transferred following some connector patterns. Thus, the connector is an essential component of software architectures and belongs to interaction patterns. Here are some types of connectors that can be used. All connectors presented here can implement the Publish/Subscribe, Asynchronous Messaging, Synchronous Messaging, and Request/Response interaction patterns presented in section 4.2.
5.6.1 Type of Connectors
Components are critical elements of a software system, but how they are connected can significantly affect the performance of a set of components in a scenario. The most common manner of communication between components is through message exchange. For two components to communicate, they must use a communication component. For example, a sensor to communicate with the data fusion component needs a communication component. Many components communicate through APIs, for example, a data fusion component communicating with a rule inference component. Some well-known manners of connecting components are:
Serial Connector: sends only one message from one component to another at a given time (Figure 1) and is therefore suitable for small volumes of messages
Parallel Connector: sends multiple messages between components in the same period (Figure 2), usually implemented across multiple threads, which can be created on demand. It is also possible to create initially a thread pool that are already active to perform data transfer. When threads are created on demand, some messages may experience slight communication delays between two components because the operating systems and connector applications are creating this thread. The creation of thread pools at startup time generates a slight delay in the application initialization, which is compensated by the faster use of threads afterward.
Producer-Consumer: uses the computing technique known as Producer-Consumer to send messages between components (Figure 3). Unlike serial and parallel connectors, this connector makes the relationship asynchronous, so that variations in the arrival rate of the data at the producer are not blocked by a slower consumer service time. A queue buffers messages between the Producer and the Consumer. The Producer stores messages in the buffer, and the Consumer retrieves them when it is ready.
Data Pipeline: uses a high-performance communication component to take messages from one component to another (Figure 4), recommended in scenarios that require fault tolerance, or need to take messages to other components with delivery guarantees. The most well-known Data Pipeline connectors are Apache Kafka , Apache NiFi , Apache Flume [128, 129], and Apache Flink . They make it easy to transport data between multiple software components, adding features such as fault tolerance and guarantee that the recipient will receive messages.
API: Application Programming Interface (API) is a set of routines and standards set and documented by an application whose functionalities can be used by other applications without the need to know implementation details, for allowing interoperability between applications. RESTFul is currently a widely used standard for implementing web services APIs, based on the REST (Representational State Transfer) software architecture style . Other standards exist, such as GraphQL , which implements APIs with a query language.
Kamienski et al.  used a data interaction (connector) component to transfer messages from one data fusion component to a Reasoning Engine component. To implement this communication, we used the producer-consumer connector detailed in section V-F1. This connector creates a thread pool when the application is created, in which it receives the messages and buffer them for another thread pool to consume.
FIWARE IoT Agent is a component that transforms data coming from sensors (and going to actuators) into the NGSI standard using different protocols, such as Ultralight 2.0  or LoRaWAN . Ultralight 2.0 is a lightweight text-based protocol for restricted devices and communications where device bandwidth and memory may be limited .
5.7 Storage Components
This component is responsible for storing the information of a system in many ways, such as SQL or NoSQL files, and databases. Regarding the design patterns presented in subsection 4.3, IoT systems mainly use NoSQL databases due to high volume, variety of formats and speed of data insertion. Several works, such as Cai et al. , Cai et al. , Cure et al. , Zhu et al. , Ma et al. , Mallapuram et al.  used storage components in their solutions.
These solutions involve storing data schemas for IoT semantics, logs, and dataset storage, as well as storing only information from services and entities of interest. Jiang et al.  used NoSQL storage to save logs and data from RFID devices. Orion is a core component of the FIWARE platform  that acts as a data distributor, managing data context life cycle, and can also be used as a temporary storage module for IoT entities .
5.8 Data Analytics
This component is responsible for using artificial intelligence, data mining, and machine learning techniques to analyze large amounts of data and provide information, which implements analysis patterns presented in section4.5. There are several components available that perform both open source and commercial data analysis functions. An example is FIWARE Cosmos, used to analyze data in batches or stream to gain insight into that data by revealing new information that was hidden . Cosmos is a wrapper that interconnects FIWARE components with existing frameworks for big data analytics, especially Apache Flink. In IoT systems, time series databases are critical to store and process data, subsequently due to the constant arrival of data from large numbers of sensors . The FIWARE platform offers the Quantum Leap component with time series database storage (currently CrateDB ) and provides an NGSI-based API for data entry and query.
5.9 Visualization Components
Based on the pattern presented in section 4.6, visualization components are responsible for displaying data in a user-friendly manner and particularly crucial for system managers and administrators.
Using visualization components allows us to display assessment metrics in a clear and organized manner automatically and allows the user to switch between different contexts that require different subsets of sensors (e.g., comfort and energy saving) installed in the IoT scenario.
Ji et al. present a systemic analysis of requirements related to component visualization from the smart city perspective. Besides, they propose a new visual IoT architecture known as A-VIoT. The proposed system includes six main components: a) smart insight to detect complex environments; b) smart video analytics to reduce the amount of visual data; c) software-defined video to generate elastic visual streams; d) flexible controls to produce an ideal adaptation; e) cost-effective transmission to improve resource utilization, and; f) crowd coordination to improve cooperation performance .
6 Generic Architectural Patterns
This section presents some examples of generic software architectures that fit the design patterns described here and can be used for the development of smart IoT applications.
6.1 ETL vs. ELT
Extraction, Transformation, and Load (ETL) technologies and processes have emerged with the data warehouse concept and have now reached great maturity, remaining as the appropriate technique for Business Intelligence (BI) and Analytics solutions . The extraction phase is characterized by retrieving raw data from a set of unstructured data and migrating it to a temporary repository. The transformation phase structures, enriches, and converts raw data into a different content. Finally, the loading phase is the ingesting of structured data into a repository where it will be processed by analysis tools [148, 149]. Figure 5a) shows the data stream for ETL.
Although the ETL process offers a suitable solution for different applications, it also generates some problems of its own. An ETL process allows the Extract and Load phases to be performed at different times according to the source and destination maintenance windows so that neither source nor destination will be idle at all. With the emergence and widespread use of NoSQL databases and cloud technologies that ensure elasticity, availability, and high throughput, data can be loaded into data lakes, rendering it available to different data consumers and applications .
Thus, the ELT approach (inverting the Transform and Load phases) provides an alternative to ETL. Instead of transforming data before loading it, ELT leverages the target system to do the transformation. The data is copied to the destination and then transformed there. Figure 5b) shows this paradigm inversion in more detail.
ELT does not have a transformation mechanism because this work is carried out by the target system. On the other hand, in cases where the target system is not powerful enough for ELT, ETL may be more advantageous. In IoT scenarios, data from some entities can often be stored in a data lake or legacy system. A data lake is a data repository that keeps the data in its raw form without the need to worry about the structure of the data being ingested and stored . The ELT tool can help extract data from these legacy systems to be processed by analytical processing components.
Both ETL and ELT are architectures for data ingestion, as described in Section 4.1. In the extraction phase, both software architectures use components of the data ingestion pattern, especially the multisource extractor pattern. In the transformation phase, both use components of the Data Processing pattern to adjust the data according to the need of the application that will be used for ETL. In the load phase, a component of the Data Storage pattern is used to store data and components of the Data Interaction pattern, to move data to the Load phase.
The sheer volume of unstructured data generated by online social network services and their requirement for real-time updating has led to the need for new scalable data management architectures. Two examples of architecture stand out: Lambda and Kappa.
6.2 Lambda Architecture
Lambda is an architecture for processing large amounts of data that unifies online and batch processing into a single structure to balance latency, stream, and fault tolerance . This pattern is suitable for applications with delays in data collection that need to show in in dashboards afterward [151, 152]. Lambda architecture also allows processing datasets in batch, aiming at finding behavioral patterns according to the application needs.
Figure 6 shows the essential components of the Lambda architecture, with three layers: 1) Batch layer to pre-compute large amounts of data; 2) Speed layer to minimize response latency, performing calculations as data arrives; and 3) Service layer to view query results over data processed . Lambda uses data ingestion patterns, especially the multisource pattern, to power Speed and Batch layers, data storage patterns to store results of data processing, and communication and visualization patterns to access the data in the Service layer.
This architecture allows developers to optimize their data processing costs by understanding which parts of the data need to be processed in real time or in batches . However, the need to develop and maintain two different codes for Batch and Speed layers requires more work from the development team, increasing the complexity of the solution . Nevertheless, Lambda is suitable for big data problems, especially when processing data from sensors or another source that sends data continuously. In such cases, the Speed layer can detect anomalies in the data, and this verified data can then be stored in databases. Finally, data can be periodically processed in batch (e.g., once a day, week, or month) to study and extract behavioral patterns.
6.3 Kappa Architecture
Kappa architecture focuses only on data processing as a continuous stream, unifying codes of Batch and Speed layers of the Lambda architecture . Although proposed as an alternative to the Lambda architecture that solves the problem of duplicate code, Kappa has specific use cases and does not replace Lambda in all scenarios. In Kappa, incoming data is processed by a Streaming layer, and the results are placed in the Service layer for queries. The idea of Kappa architecture is to handle real time data processing and continuous reprocessing in a single stream processing engine. Reprocessing occurs from the stream. If the source code changes, a second stream process repeats all previous data through the latest real-time engine and will overwrite data stored in the presentation layer .
This architecture attempts to simplify by maintaining just one source code base rather than managing one for each data batch and by accelerating the layers in Lambda Architecture (section 6.2). The disadvantages of Kappa are related to the need to process data in a stream that is not suitable for all cases, such as handling duplicate events, cross-referencing events, or maintaining order operations that are generally easier to carry out in batch processing.
Kappa architecture uses data ingestion patterns to feed the Streaming layer, data storage patterns to store results of the data processing, and communication and visualization patterns to access data in the Service layer.
6.4 Data Analytics
Data analysis from a big data perspective is different from a traditional analysis because it involves many types of unstructured data and generally related to text analysis and natural language processing 
. Data analysis is part of a design pattern known as pattern recognition using machine learning algorithms. Pattern recognition can be defined as the classification of data based on knowledge already gained or statistical information extracted from patterns and their representation
. Statistical methods of pattern recognition have been widely applied in the field of artificial intelligence. Successful applications of these methods in the field of computer vision include extracting low-level visual information from visual images, edge detection, extraction of information of shade shapes, object segmentation, and object labeling[156, 157].
Among the numerous applications of pattern recognition, data analysis has increasingly sophisticated methods for discovering complex structural regularities in large data sets, used in many fields such as social and behavioral sciences. Classical statistical pattern recognition techniques are used, such as factor analysis, principal component analysis, cluster analysis, and multidimensional scaling techniques. More sophisticated methods for statistical pattern recognition, such as artificial neural networks and graphical statistical models, form the basis of relevant tools for detecting structural regularities in data collected by social and behavioral scientists. Among pattern recognition techniques, machine learning provides systems with the ability to learn and improve from data without being explicitly programmed automatically. Machine learning focuses on developing computer programs that can access data and use it to learn for themselves[156, 157].
The process of applying data analysis methods to specific areas involves defining data types (such as volume, variety, and velocity), data models (such as neural networks, classification, and clustering methods), and using efficient algorithms that match the characteristics of the data . What makes IoT data processing a challenge is: 1) data characteristics: IoT generates a massive volume of data at a very high speed, with varying formats. Data frequently is raw, with low level of abstraction, which makes it difficult to analyze it. Semantic techniques may be used to improve IoT data analyzes , which may require more effort to deal with data volume, velocity and variability; 2) data privacy: protecting privacy is crucial because data collection processes may include personal, business and other sensible data; 3) algorithms: finding the best model that fits the data is one of the most important issues for pattern recognition and better analysis of IoT data; the results yield by these models and algorithms may be affected by noise, as well as it may be difficult to interpret the results.
Cheng et al.  introduced a systematic method for reviewing data mining knowledge and techniques in the most common IoT applications. In this study, they reviewed some data mining functions, such as classification, clustering, association analysis, and time series analysis in IoT scenarios. The authors also assigned more data mining methods to each type of IoT application and suggested a new data mining application using open source software.
Tsai et al.  conducted research to address some of the challenges in preparing and processing data for IoT using data mining techniques. The authors explain about IoT data and the challenges in this area, such as building mining models and algorithms. These are some of these challenges: 1) to show that the data chosen to be processed will solve the IoT problem in question; 2) to choose the best data analytics algorithm according to the data characteristics .
Li et al. 
present how to apply deep learning techniques to the IoT environment to improve learning performance and to reduce network traffic. The authors prepared an elastic model compatible with different learning models. Experimental results show that the proposed solution outperforms other IoT optimization methods.
7 Architectural Patterns for IoT Smart Applications
This section use categories, components, connectors, and architectures introduced before to provide three examples of smart applications for cities, buildings, and agriculture.
7.1 Parking Management for Smart Cities
In today’s cities, finding an available parking space is always tricky for drivers, and tends to become even more difficult as the number of cars on the streets increases. This situation is an opportunity for smart cities to take action to increase the efficiency of their parking resources, leading to a reduction in parking times, traffic jams, and accidents. Problems related to parking and traffic jams could be solved if drivers could be informed in advance about the availability of parking spaces at and around the intended destination. Figure 8 shows an example of a smart parking scenario.
This type of application is a traditional problem in large cities. Khanna et al.  and Pham et al.  dealt with this issue with a software architecture pattern, as presented in Figure 9. A device, typically a presence sensor, checks if a car is in a parking space sends messages to a communication component, usually an MQTT broker. Components with low computing capacity, such as Raspberry Pi, subscribe to the MQTT broker and consume data from the sensors, calculate the number of parking spaces, and display the results on a dashboard. Data processed in the low capacity computing component is sent to a cloud-based high-processing server that calculates the payment for the use of the parking space and the distributions and occupancy mode between cars and parking spaces. Finally, this data is returned to the low capacity computing component.
This IoT application uses the following patterns:
Data Ingestion: Real-Time Streaming Pattern to process and distribute which users use which parking spaces
Data Interaction: Publish/Subscribe Pattern to establish communication between sensors and low capacity processing components
Data Storage: SQL to store data coming from sensors and their respective processing and the NoSQL pattern to manage the geographic positioning of the parking spaces
Data Visualization: Pattern Portal to design an information portal for system administrators and managers.
Connectors: Light Interaction to connect parking space sensors to the MQTT broker; API to connect the cloud to management applications
7.2 Energy Efficiency Management for Smart Buildings
The importance of energy in the contemporary world is growing steadily, but there are still many sources of inefficiencies in its management, such as public buildings. The path to transforming public spaces into smart environments faces several challenges, such as energy management in buildings, which are designed to automate lighting and HVAC (heating, ventilation, and air conditioning) applications . Figure 10 shows an example of a smart building scenario. Some authors have addressed the problem of power management in smart buildings [165, 36, 93, 34, 166]. Many proposed solutions for building energy management require an element of high computational power to process, store, and infer data. Figure 11 shows a summary of this reference architecture: sensors send messages to a cloud server with high computing capacity, which in turn is responsible for storing, processing, and inferring decisions about changes in context, which may be a turn on/off and increase/decrease some equipment or device.
Existing energy management solutions for smart buildings use context-sensitive techniques. The architecture of these systems consists of sensors that send messages to a server that preprocesses and fuses data and makes decisions based on an inference process. As a result, commands are sent to actuators for changing system behavior, such as turning the air conditioner or lights on or off. Zyrianoff et al. , Kamienski et al. , Kamienski et al. , and Pramudianto et al.  addressed similar situations.
This IoT application uses the following patterns:
Data Ingestion: Real Time Streaming Pattern for the system to capture and transfer data from sensors; Protocol Converter Pattern to convert different protocols used by different sensors
Data Interaction: Publish/Subscribe Pattern to send sensor data to a high-capacity computing component
Data Processing: Data Fusion and Data Analysis Pattern to process and manage context information
Data Storage: SQL Pattern to store data in a relational model
Data Visualization: RESTful API so the application can query information through an interaction component
Connectors: Light Interaction to connect sensors to the MQTT broker; API to connect the cloud to query and management applications
7.3 Precision Irrigation for Smart Agriculture
Agricultural production plays a vital role in each nation’s economy and has a continual improvement of its processes and techniques. However, agriculture consumes most of the freshwater available in the world. With climate change, the introduction of IoT-based technologies in the field is essential to secure our future through precision irrigation. Water usage can be substantially reduced, but fears of decrease in productivity due to water stress on plants lead farmers to over irrigate, which can lead to waste through the infiltration of water into the soil, as well as the energy used for irrigation. These also are a challenge for the sustainability of the planet. Then comes the need for greater water control in the irrigation of crops.
Precision agriculture collects and interprets vast amounts of data for enhanced field management. The precise management of irrigation plays a significant role in the continuous increase of production. Firstly, there is a need to identify the tools to acquire an enormous amount of data generated by sensors and other sources to be analyzed and compared. Some of the difficulties in adopting precision irrigation are related to data transferring, handling and processing, and the high cost of investing in hardware solutions to save this massive amount of data. Kamienski et al. , Liqiang et al. , Shahanas et al. , Ntuli et al., Rad et al., Robles et al., Xiao et al. presented possible solutions for agriculture irrigation.
shows this reference architecture: a device, typically a soil moisture sensor, sends messages to an interaction component, such as a LoraWAN gateway, residing in a low capacity computing element, such as a Raspberry Pi. From the gateway, data is sent by MQTT to the IoT platform and to components that implement models to estimate water needs and optimize irrigation, located in the cloud. As a result, an irrigation plan is sent to the irrigation system, which may involve controlling the actuators directly (such as valves, pumps, and sprinklers) or the interaction with a third-party system through an API.
This IoT application uses the following patterns:
Data Ingestion: Real Time Streaming Pattern for the system to capture and transfer data from sensors; Protocol Converter Pattern to convert different protocols used by different sensors
Data Interaction: Publish/Subscribe Pattern to send sensor data to a high-capacity computing component
Data Processing: Data Fusion and Data Analysis Pattern to process and manage context information related to irrigation.
Data Storage: SQL Pattern to store data in a relational model. NoSQL pattern to store unstructured and semi-structured data. HDFS pattern to create a data lake and store the data to be processed.
Data Visualization: RESTful API so the application can query information through an interaction component
Connectors: Light Interaction to connect sensors to the MQTT broker or the LoRaWAN gateway; API to connect the cloud to query and management applications
8 Discussion and Challenges
8.1 Message exchanges in IoT
The use of data interaction components and data ingestion components patterns ensures support to message delivery, mediation between different communication protocols, and message consumption tracking. Large companies want to build a solution that analyzes substantial amounts of data in a short time, but artificial intelligence algorithms need reliable data to work correctly [173, 174]. Obtaining data from different sources and reliably moving it is still a challenge, and the existing batch-based solutions have not solved the problem [158, 175].
These systems are extremely important when large amounts of data need to be moved between the most diverse components of different categories, according to section 4. From an IoT perspective, the use of data interaction components and data ingestion components patterns is necessary because the volume of messages grows exponentially with the increase of devices connected to the Internet. Few IoT solutions propose using these components to move data between components massively in IoT scenarios [158, 176].
8.2 IoT Device Interoperability
Component connectors are essential for an IoT solution as they can negatively influence the performance of this ecosystem. Zyrianoff et al.  carried out some performance tests on a smart building scenario, and the results show the influence of the use of data fusion components and connectors and data inference (data processing). An architecture that processes data in sequentially connected components that are connected sequentially is not always an adequate option because the message needs to arrive in one component to be transmitted to the next. This scheme generates processing bottlenecks throughout the component chain because if the message takes a while to be processed on a specific component, it delays the entire message exchange stream of the application. An alternative to a software architecture with sequentially connected components is to place a component that can centralize data sending and distribute to other components, such as FIWARE Orion. Thus, if any component delays the message processing, it only affects a part of the message processing and sending stream, since the components are not connected in sequence.
Using standardized protocols for sending and receiving messages is a fair practice when choosing communication in a project. Although there are software components that convert protocols between the communication of two or more systems, using a single communication protocol facilitates the integration of projects that have multiple teams working on the same project. An example is FIWARE , which facilitates the development of IoT solutions. All FIWARE components communicate via a single protocol, NGSI JSON, which simplifies the development of the solution as a whole.
8.3 IoT-sourced Data Storage
With the decentralization of software architectures, new storage types, such as NoSQL, are emerging to enable scalability of IoT solutions. NoSQL DBMSs are used for specific problems. Hills  discusses a modeling approach that uses SQL and NoSQL to build software. New IoT platforms will use hybrid modeling to solve solutions to real scenarios in the future.
8.4 Future works
Future challenges include:
Understand the influence of creating hierarchies of low capacity communication and processing elements on a distributed architecture. Some IoT scenarios require the use of this type of knowledge, such as a scenario in which an ambulance is in an emergency and needs to pass through slow car traffic. The ambulance can communicate with edge (low capacity) devices that can make decisions even before sending data to software components located in the cloud.
Study and propose an insight into the legal approach to data protection and how far applications can collect and process data from users. There are few discussions regarding the issue of data privacy, and the legislation is gradually adapting to the technological reality. It is necessary to understand how data is collected, analyzed, and stored by IoT devices. Privacy policies should explain the data lifecycle. The lack of these policies is a vulnerability to the data.
Develop and research new IoT-specific code security solutions that can defend its systems against internal network attacks. IoT requires custom-made security solutions because, unlike a traditional computer, IoT elements usually: 1) have less processing capacity, memory and power supply; 2) execute tasks collaboratively; and 3) run eminently developed systems using C language or languages based on it. Since existing security proposals for computers that make up the traditional Internet do not take these characteristics into account, they are not always suitable for IoT.
Propose new software architectures so that organizations can adapt legacy systems to the new near real-time data processing paradigm. For example, banking companies still use scripts that run during hours of low data processing and read files with financial records to insert data into storage components. This data movement could be treated as a stream and processed in near real time with Kappa architecture, for example.
Propose new software architectures capable of ingesting data even with the emergence of new data formats and increasing data volume brought by IoT. Create new data ingestion components from new software architectures.
This paper presented an overview of software architecture patterns in an approach related to the development of IoT applications, as well as a classification of the major component classes that can be used in IoT software solutions. This component classification is expected to facilitate the development of IoT applications by software developers, who can save hours to study to understand how to propose a software architecture for IoT. Besides, this paper discussed the main difficulties in creating a software architecture for IoT, which will help future developers to make better design decisions.
The authors would like to thank to SWAMP Project.
-  PPhilippe Kruchten, Henk Obbink, and Judith Stafford. The past, present, and future for software architecture. IEEE Software, 23(2):22–30, March 2006.
-  M. Shaw and P. Clements. The golden age of software architecture. IEEE Software, 23(2):31–39, March 2006.
-  Elisabeth Freeman, Eric Freeman, Bert Bates, and Kathy Sierra. Head First Design Patterns. O’ Reilly & Associates, Inc., 2004.
-  Mark Richards. Software Architecture Patterns. O’Reilly Media, Inc., 2015.
-  Michael Weyrich and Christof Ebert. Reference architectures for the internet of things. IEEE Software, 33(1):112–116, Jan 2016.
-  Hidayet Aksu, Leonardo Babun, Mauro Conti, Gabriele Tolomei, and Selcuk Uluagac. Advertising in the iot era: Vision and challenges. IEEE Communications Magazine, PP, 01 2018.
-  Antero Taivalsaari and Tommi Mikkonen. A roadmap to the programmable world: Software challenges in the iot era. IEEE Software, 34(1):72–80, Jan 2017.
-  Rebeca Motta, Káthia M. de Oliveira, and Guilherme H. Travassos. On challenges in engineering iot software systems. In Proceedings of the XXXII Brazilian Symposium on Software Engineering, SBES ’18, pages 42–51, New York, NY, USA, 2018. ACM.
-  Martin Fowler. Patterns of Enterprise Application Architecture. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2002.
-  A. Leff and J. T. Rayfield. Web-application development using the model/view/controller design pattern. In Proceedings Fifth IEEE International Enterprise Distributed Object Computing Conference, pages 118–127, Sep. 2001.
-  Erich Gamma, Richard Helm, Ralph Johnson, and John M. Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley Professional, 1 edition, 1994.
-  David Garlan and Mary Shaw. An introduction to software architecture. Technical report, Pittsburgh, PA, USA, 1994.
-  D. Heuzeroth, T. Holl, G. Hogstrom, and W. Lowe. Automatic design pattern detection. In 11th IEEE International Workshop on Program Comprehension, 2003., pages 94–103, May 2003.
-  Philippe Kruchten. The Rational Unified Process: An Introduction, Second Edition. 01 2000.
-  Bass Len, Clements Paul, , and Rick Kazman. Software Architecture In Practice. 01 2003.
-  Iso/iec/ieee systems and software engineering – architecture description. ISO/IEC/IEEE 42010:2011(E) (Revision of ISO/IEC 42010:2007 and IEEE Std 1471-2000), pages 1–46, Dec 2011.
-  Alessandro Bassi, Martin Bauer, Martin Fiedler, Thorsten Kramp, Rob van Kranenburg, Sebastian Lange, and Stefan Meissner. Enabling Things to Talk: Designing IoT Solutions with the IoT Architectural Reference Model. Springer Publishing Company, Incorporated, 1st edition, 2016.
-  Mohammad Abdur Razzaque, Marija Milojevic-Jevric, Andrei Palade, and Siobhán Clarke. Middleware for internet of things: a survey. 3:1–1, 01 2015.
-  Matthias Baldauf, Schahram Dustdar, and Florian Rosenberg. A survey on context-aware systems. Int. J. Ad Hoc Ubiquitous Comput., 2(4):263–277, June 2007.
-  M. V. Moreno, F. Terroso-Sáenz, A. González-Vidal, M. Valdés-Vela, A. F. Skarmeta, M. A. Zamora, and V. Chang. Applicability of big data techniques to smart cities deployments. IEEE Transactions on Industrial Informatics, 13(2):800–809, April 2017.
-  Nader Mohamed Eiman Al Nuaimi, Hind Al Neyadi and Jameela Al-Jaroodi. Applications of big data to smart cities. Journal of Internet Services and Applications, 6, 08 2015.
-  Carlos A. Kamienski, Fabrizio F. Borelli, Gabriela O. Biondi, Isaac Pinheiro, Ivan D. Zyrianoff, and Marc Jentsch. Context design and tracking for iot-based energy management in smart cities. IEEE Internet of Things Journal, 5(2):687–695, April 2018.
-  Wen-Tin Lee and Po-Jen Law. A case study in applying security design patterns for iot software system. In 2017 International Conference on Applied System Innovation (ICASI), pages 1162–1165, May 2017.
-  Soheil Qanbari, Samim Pezeshki, Rozita Raisi, Samira Mahdizadeh, Rabee Rahimzadeh, Negar Behinaein, Fada Mahmoudi, Shiva Ayoubzadeh, Parham Fazlali, Keyvan Roshani, Azalia Yaghini, Mozhdeh Amiri, Ashkan Farivarmoheb, Arash Zamani, and Schahram Dustdar. Iot design patterns: Computational constructs to design, build and engineer edge applications. In 2016 IEEE First International Conference on Internet-of-Things Design and Implementation (IoTDI), pages 277–282, April 2016.
-  Marco Brambilla, Eric Umuhoza, and Roberto Acerbis. Model-driven development of user interfaces for iot systems via domain-specific components and patterns. Journal of Internet Services and Applications, 8(1):14, Sep 2017.
-  Lukas Reinfurt, Uwe Breitenbücher, Michael Falkenthal, Frank Leymann, and Andreas Riegg. Internet of things patterns. In Proceedings of the 21st European Conference on Pattern Languages of Programs, EuroPlop ’16, pages 5:1–5:21, New York, NY, USA, 2016. ACM.
-  Michael Koster. Design patterns are reusable solutions to common problems, 2019.
-  Welington M. da Silva, Alexandre Alvaro, Gustavo Tomas, Ricardo Afonso, Kelvin Dias, and Vinicius Garcia. Smart cities software architectures: A survey. pages 1722–1727, 03 2013.
-  Chuantao Yin, Zhang Xiong, Hui Chen, Jingyuan Wang, Daven Cooper, and Bertrand David. A literature survey on smart cities. Science China Information Sciences, 58, 08 2015.
-  Partha Pratim Ray. A survey on internet of things architectures. Journal of King Saud University - Computer and Information Sciences, 30(3):291 – 319, 2018.
-  Rwan Mahmoud, Tasneem Yousuf, Fadi Aloul, and Imran Zualkernan. Internet of things (iot) security: Current status, challenges and prospective measures. In 2015 10th International Conference for Internet Technology and Secured Transactions (ICITST), pages 336–341, Dec 2015.
-  Eduardo Felipe Zambom Santana, Ana Paula Chaves, Marco Aurelio Gerosa, Fabio Kon, and Dejan Milojicic. Software platforms for smart cities: Concepts, requirements, challenges, and a unified reference architecture. ACM Computing Surveys, 2017.
-  Luigi Atzori, Antonio Iera, and Giacomo Morabito. The internet of things: A survey. Computer Networks, 54(15):2787 – 2805, 2010.
-  Carlos Alberto Kamienski, Mark Jentsch, Markus Eisenhauer, Juisse Kiljander, Enrico Ferrera, Eduardo Souto Walter Andrade Peter Rosengren, Peter Thestrup, and Djamel Sadok. Application development for the internet of things: A context-aware mixed criticality systems development platform. Computer Communications, 104:1–16, 2017.
-  C. Kamienski, J. Soininen, M. Taumberger, S. Fernandes, A. Toscano, T. S. Cinotti, R. F. Maia, and A. T. Neto. Swamp: an iot-based smart water management platform for precision irrigation in agriculture. In 2018 Global Internet of Things Summit (GIoTS), pages 1–6, June 2018.
-  Ivan Zyrianoff, Fabrizio F. Borelli, Alexandre Heideker, Gabriela O. Biondi, and Carlos Kamienski. Scalability of iot-enabled context-aware management systems for smart cities. In IEEE Symposium on Computers and Communications (ISCC), 2018.
-  D. L. Hall and J. Llinas. An introduction to multisensor data fusion. Proceedings of the IEEE, 85(1):6–23, Jan 1997.
-  Lin Qiao, Yinan Li, Sahil Takiar, Ziyang Liu, Narasimha Veeramreddy, Min Tu, Ying Dai, Issac Buenrostro, Kapil Surlaker, Shirshanka Das, and Chavdar Botev. Gobblin: Unifying data ingestion for hadoop. Proc. VLDB Endow., 8(12):1764–1769, August 2015.
-  Nitin Sawant and Himanshu Shah. Big Data Application Architecture Q & A: A Problem-Solution Approach. Apress, Berkeley, CA, 2013.
-  László Lengyel, Péter Ekler, Tamás Ujj, Tamás Balogh, and Hassan Charaf. Sensorhub: An iot driver framework for supporting sensor networks and data analysis. International Journal of Distributed Sensor Networks, 11(7):454379, 2015.
-  Sheng Huang, Yaoliang Chen, Xiaoyan Chen, Kai Liu, Xiaomin Xu, Chen Wang, Kevin Brown, and Inge Halilovic. The next generation operational data historian for iot based on informix. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD ’14, pages 169–176, New York, NY, USA, 2014. ACM.
Muhammad Rizwan Bashir and Asif Qumer Gill.
Towards an iot big data analytics framework: Smart buildings systems.
2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pages 1325–1332, Dec 2016.
-  Attila Csaba Marosi, Attila Farkas, and Robert Lovas. An adaptive cloud-based iot back-end architecture and its applications. In 2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), pages 513–520, March 2018.
-  Natalia Miloslavskaya and Alexander Tolstoy. Big data, fast data and data lake concepts. Procedia Computer Science, 88:300 – 305, 2016. 7th Annual International Conference on Biologically Inspired Cognitive Architectures, BICA 2016, held July 16 to July 19, 2016 in New York City, NY, USA.
-  S. Verma, Y. Kawamoto, Z. M. Fadlullah, H. Nishiyama, and N. Kato. A survey on network methodologies for real-time analytics of massive iot data and open research issues. IEEE Communications Surveys Tutorials, 19(3):1457–1477, thirdquarter 2017.
-  Daniele Cenni, Paolo Nesi, Gianni Pantaleo, and Imad Zaza. Twitter vigilance: A multi-user platform for cross-domain twitter data analytics, nlp and sentiment analysis. In 2017 IEEE SmartWorld, Ubiquitous Intelligence Computing, Advanced Trusted Computed, Scalable Computing Communications, Cloud Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pages 1–8, Aug 2017.
-  Juan Colmenares, Reza Dorrigiv, and Daniel Waddington. A single-node datastore for high-velocity multidimensional sensor data. In 2017 IEEE International Conference on Big Data (Big Data), pages 445–452, Dec 2017.
-  Paula Ta-Shma, Adnan Akbar, Guy Gerson-Golan, Guy Hadash, Francois Carrez, and Klaus Moessner. An ingestion and analytics architecture for iot applied to smart city use cases. IEEE Internet of Things Journal, 5(2):765–774, April 2018.
-  Andrew Tanenbaum and Maarten van Steen. Distributed Systems: Principles and Paradigms (2Nd Edition). Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 2006.
-  Mark Endrei, Jenny Ang, Ali Arsanjani, Sook Chua, Philippe Comte, Pål Krogdahl Min Luo, and Tony Newling. Patterns: Service-oriented Architecture and Web Services. IBM Corp., Riverton, NJ, USA, 2004.
-  Gustavo Alonso, Fabio Casati, Harumi Kuno, and Vijay Machiraju. Web Services: Concepts, Architectures and Applications (Data-Centric Systems and Applications). Springer-Verlag Berlin Heidelberg, 2004.
-  Arthur de M Del Esposte, Fabio Kon, Fabio Costa, and Nelson Lago. Interscity: A scalable microservice-based open source platform for smart cities. Proceedings of the 6th International Conference on Smart Cities and Green ICT Systems, pages 35–46, 2017.
-  Robert Battle and Edward Benson. Bridging the semantic web and web 2.0 with representational state transfer (rest). Journal of Web Semantics, 6(1):61 – 69, 2008. Semantic Web and Web 2.0.
-  Ricardo Aparecido Perez de Almeida, Michael Blackstock, Rodger Lea, Roberto Calderon, Antonio Francisco do Prado, and Helio Crestana Guardia. Thing broker: A twitter for things. In Proceedings of the 2013 ACM Conference on Pervasive and Ubiquitous Computing Adjunct Publication, UbiComp ’13 Adjunct, pages 1545–1554, New York, NY, USA, 2013. ACM.
-  Leonard Richardson and Sam Ruby. Restful Web Services. O’Reilly, first edition, 2007.
-  A. Akbar, A. Khan, F. Carrez, and K. Moessner. Predictive analytics for complex iot data streams. IEEE Internet of Things Journal, 4(5):1571–1582, Oct 2017.
-  OpenJS Foundation. Node-red: Flow-based programming for the internet of things, 2019.
-  OASIS. Oasis committee specification 02, 2019.
-  N Garg. Apache kafka, 2013.
-  Sasu Tarkoma. Publish / Subscribe Systems: Design and Principles. Wiley Publishing, 1st edition, 2012.
-  Hsiang Wen Chen and Fuchun Joseph Lin. Converging mqtt resources in etsi standards based m2m platform. In 2014 IEEE International Conference on Internet of Things (iThings), and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom), pages 292–295, Sep. 2014.
-  Lynn Beighley. Head First Sql. O’Reilly, first edition, 2007.
-  S. Rautmare and D. M. Bhalerao. Mysql and nosql database comparison for iot application. In 2016 IEEE International Conference on Advances in Computer Applications (ICACA), pages 235–238, Oct 2016.
-  T. A. M. Phan, J. K. Nurminen, and M. Di Francesco. Cloud databases for internet-of-things data. In 2014 IEEE International Conference on Internet of Things (iThings), and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom), pages 117–124, Sep. 2014.
-  Vadim Tropashko and Donald Burleson. SQL Design Patterns: Expert Guide to SQL Programming. Rampant TechPress, 2007.
-  C. Lee and Y. Zheng. Sql-to-nosql schema denormalization and migration: A study on content management systems. In 2015 IEEE International Conference on Systems, Man, and Cybernetics, pages 2022–2026, Oct 2015.
-  Jing Han, Haihong E, Guan Le, and Jian Du. Survey on nosql database. In 2011 6th International Conference on Pervasive Computing and Applications, pages 363–366, Oct 2011.
-  Michael Stonebraker. Sql databases v. nosql databases. Commun. ACM, 53(4):10–11, April 2010.
-  Avi Silberschatz, Henry Korth, and S Sudarshan. . Data models. ACM Comput. Surv., 28(1):105–108, March 1996.
-  C. Cecchinel, M. Jimenez, S. Mosser, and M. Riveill. An architecture to support the collection of big data in the internet of things. In 2014 IEEE World Congress on Services, pages 442–449, June 2014.
-  Mark Grover, Ted Malaska, Jonathan Seidman, and Gwen Shapira. Hadoop Application Architectures: Designing Real-World Big Data Applications. O’Reilly Media, 2015.
-  Benjamin Bengfort and Jenny Kim. Data Analytics with Hadoop: An Introduction for Data Scientists. O’Reilly Media, Inc., 1st edition, 2016.
-  Allae Erraissi, Abdessamad Belangour, and Abderrahim Tragha. A comparative study of hadoop-based big data architectures. International Journal of Web Applications, 9:129–137, 12 2017.
-  Donald Miner and Adam Shook. MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems. O’Reilly Media, Inc., 1st edition, 2012.
-  Xiufeng Liu, Alfred Heller, and Per Sieverts Nielsen. Citiesdata: a smart city data management framework. Knowledge and Information Systems, 2017.
-  Pierfrancesco Bellini, Paolo Nesi, Michela Paolucci, and Imad Zaza. Smart city architecture for data ingestion and analytics: Processes and solutions. In 2018 IEEE Fourth International Conference on Big Data Computing Service and Applications (BigDataService), pages 137–144, March 2018.
-  Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. Commun. ACM, 51(1):107–113, January 2008.
-  Thomas Cormen and Charles Leisersonand Ronald Rivestand Clifford Stein. Introduction to Algorithms, Third Edition. The MIT Press, 3rd edition, 2009.
-  Dongyao Wu, Sherif Sakr, and Liming Zhu. Big Data Programming Models, pages 31–63. Springer International Publishing, Cham, 2017.
-  Loris Belcastro, Fabrizio Marozzo, and Domenico Talia. Programming models and systems for big data analysis. International Journal of Parallel, Emergent and Distributed Systems, 34(6):632–652, 2019.
-  Apache hadoop, 2019.
-  Nawab Muhammad Faseeh Qureshi, Isma Farah Siddiqui, Mukhtiar Ali Unar, Muhammad Aslam Uqaili, Choon Sung Nam, Dong Ryeol Shin, Jaehyoun Kim, Ali Kashif Bashir, and Asad Abbas. An aggregate mapreduce data block placement strategy for wireless iot edge nodes in smart grid. Wireless Personal Communications, 106(4):2225–2236, Jun 2019.
-  John Hughes. Why functional programming matters. Comput. J., 32(2):98–107, April 1989.
-  Paul Hudak. Conception, evolution, and application of functional programming languages. ACM Comput. Surv., 21(3):359–411, September 1989.
-  Apache spark, 2019.
-  Apache flink, 2019.
-  Feng Ye, Peng Zhang, Cheng Hu, Songjie Zhu, and Ling Li. The tentative research of hydrological iot data processing system based on apache flink. In Service-Oriented Computing – ICSOC 2018 Workshops, pages 161–168, Cham, 2019. Springer International Publishing.
-  Joshua Wiley and Larry Pace. Beginning R: An Introduction to Statistical Programming. Apress, Berkely, CA, USA, 2nd edition, 2015.
-  N. Nesa, T. Ghosh, and I. Banerjee. Outlier detection in sensed data using statistical learning models for iot. In 2018 IEEE Wireless Communications and Networking Conference (WCNC), pages 1–6, April 2018.
-  A. Grover, J. Gholap, V. P. Janeja, Y. Yesha, R. Chintalapati, H. Marwaha, and K. Modi. Sql-like big data environments: Case study in clinical trial analytics. In 2015 IEEE International Conference on Big Data (Big Data), pages 2680–2689, Oct 2015.
-  Lawrence Klein. Sensor and Data Fusion Concepts and Applications. Society of Photo-Optical Instrumentation Engineers (SPIE), Bellingham, WA, USA, 2nd edition, 1999.
-  J. Llinas, D. L. Hall, and E. Waltz. Data fusion technology forecast for c/sup 3/mis. In 1989 Third International Conference on Command, Control, Communications and Management Information Systems, pages 148–158, May 1989.
-  Carlos Kamienski, Fabrizio Borelli, Gabriela Biondi, Wiliam Rosa, Isaac Pinheiro, Ivan Zyrianoff, Djamel Sadok, and Ferry Pramudianto. Context-aware energy efficiency management for smart buildings. In 2015 IEEE 2nd World Forum on Internet of Things (WF-IoT), pages 699–704, Dec 2015.
-  C. Nagl, F. Rosenberg, and S. Dustdar. Vidre–a distributed service-oriented business rule engine based on ruleml. In 2006 10th IEEE International Enterprise Distributed Object Computing Conference (EDOC’06), pages 35–44, Oct 2006.
-  Serge Abiteboul, Ohad Greenshpan, and Tova Milo. Modeling the mashup space. In Proceedings of the 10th ACM Workshop on Web Information and Data Management, WIDM ’08, pages 87–94, New York, NY, USA, 2008. ACM.
-  Michael Blackstock and Rodger Lea. Iot mashups with the wotkit. pages 159–166, 10 2012.
-  Dimitris Soukaras, Pankesh Patel, Hui Song, and Sanjay Chaudhary. Iotsuite: a toolsuite for prototyping internet of things applications. The 4th International Workshop on Computing and Networking for Internet of Things (ComNet-IoT), co-located with 16th International Conference on Distributed Computing and Networking (ICDCN), 2015.
-  G. Merlino, D. Bruneo, S. Distefano, F. Longo, and A. Puliafito. Stack4things: Integrating iot with openstack in a smart city context. In 2014 International Conference on Smart Computing Workshops, pages 21–28, Nov 2014.
-  Mauro Conti, Ali Dehghantanha, Katrin Franke, and Steve Watson. Internet of things security and forensics: Challenges and opportunities. Future Generation Computer Systems, 78:544 – 546, 2018.
-  Ioannis Andrea, Chrysostomos Chrysostomou, and George Hadjichristofi. Internet of things: Security vulnerabilities and challenges. In 2015 IEEE Symposium on Computers and Communication (ISCC), pages 180–187, July 2015.
-  J. Zhou, Z. Cao, X. Dong, and A. V. Vasilakos. Security and privacy for cloud-based iot: Challenges. IEEE Communications Magazine, 55(1):26–33, January 2017.
-  T. Choudhury, A. Gupta, S. Pradhan, P. Kumar, and Y. S. Rathore. Privacy and security of cloud-based internet of things (iot). In 2017 3rd International Conference on Computational Intelligence and Networks (CINE), pages 40–45, Oct 2017.
-  M. Hafiz, P. Adamczyk, and R. E. Johnson. Organizing security patterns. IEEE Software, 24(4):52–60, July 2007.
-  Fadele Ayotunde Alaba, Mazliza Othman, Ibrahim Abaker Targio Hashem, and Faiz Alotaibi. Internet of things security: A survey. In 2018 International Conference on Advanced Science and Engineering (ICOASE), pages 162–166, Oct 2018.
-  Jayavardhana Gubbi, Rajkumar Buyya, Slaven Marusic, and Marimuthu Palaniswami. Internet of things (iot): A vision, architectural elements, and future directions. Future Generation Computer, 29(7):1645–1660, 2013.
-  Aditya Gaur, Bryan Scotney, Gerard Parr, and Sally McClean. Smart city architecture and its applications based on iot. Procedia Computer Science, 52:1089 – 1094, 2015. The 6th International Conference on Ambient Systems, Networks and Technologies (ANT-2015), the 5th International Conference on Sustainable Energy Information Technology (SEIT-2015).
-  R. V. Nehme, H. Lim, and E. Bertino. Fence: Continuous access control enforcement in dynamic data stream environments. In 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), pages 940–943, March 2010.
-  Alessio Botta, Walter de Donato, Valerio Persico, and Antonio Pescapé. Integration of cloud computing and internet of things: A survey. Future Generation Computer Systems, 56:684 – 700, 2016.
-  Laurent Eschenauer and Virgil Gligor. A key-management scheme for distributed sensor networks. In Proceedings of the 9th ACM Conference on Computer and Communications Security, CCS ’02, pages 41–47, New York, NY, USA, 2002. ACM.
-  Sandra Wachter. Normative challenges of identification in the internet of things: Privacy, profiling, discrimination, and the GDPR. Computer Law & Security Review, 34(3):436–449, jun 2018.
-  R. Lu, K. Heung, A. H. Lashkari, and A. A. Ghorbani. A lightweight privacy-preserving data aggregation scheme for fog computing-enhanced iot. IEEE Access, 5:3302–3312, 2017.
-  Carsten Bormann, Mehmet Ersue, and Ari Keränen. Terminology for Constrained-Node Networks. RFC 7228, May 2014.
-  U. Raza, P. Kulkarni, and M. Sooriyabandara. Low power wide area networks: An overview. IEEE Communications Surveys Tutorials, 19(2):855–873, Secondquarter 2017.
-  Eduardo Nakamura, Antonio Loureiro, and Alejandro Frery. Information fusion for wireless sensor networks: Methods, models, and classifications. ACM Computing Surveys (CSUR), 39:9, 09 2007.
-  Federico Castanedo. A review of data fusion techniques. 2013, 10 2013.
-  Lawrence A. Klein. Sensor and Data Fusion Concepts and Applications. Society of Photo-Optical Instrumentation Engineers (SPIE), Bellingham, WA, USA, 1993.
-  Esper, 2019.
-  Jboss drools, 2019.
-  Openrules, 2019.
-  Thingsboard 2.0, 2019.
-  Ferry Pramudianto, Markus Eisenhauer, Carlos Alberto Kamienski, Djamel Sadok, and Eduardo J. Souto. Connecting the internet of things rapidly through a model driven approach. In 3rd IEEE World Forum on Internet of Things, WF-IoT 2016, Reston, VA, USA, December 12-14, 2016, pages 135–140, 2016.
-  Cal Henderson. Building Scalable Web Sites: Building, Scaling, and Optimizing the Next Generation of Web Applications. O’Reilly Media, Inc., 2006.
-  R. E. Hiromoto, M. Haney, and A. Vakanski. A secure architecture for iot with supply chain risk management. In 2017 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), volume 1, pages 431–435, Sep. 2017.
-  S. Kaisler, F. Armour, J. A. Espinosa, and W. Money. Big data: Issues and challenges moving forward. In 2013 46th Hawaii International Conference on System Sciences, pages 995–1004, Jan 2013.
-  M. Marjani, F. Nasaruddin, A. Gani, A. Karim, I. A. T. Hashem, A. Siddiqa, and I. Yaqoob. Big iot data analytics: Architecture, opportunities, and open research challenges. IEEE Access, 5:5247–5261, 2017.
-  Apache kafka, 2019.
-  Apache nifi, 2019.
-  Apache flume, 2019.
-  Hari Shreedharan. Using Flume: Flexible, Scalable, and Reliable Data Streaming. O’Reilly Media, 2014.
-  Graphql. a query language for your api, 2019.
-  Ultralight 2.0, 2019.
-  Lorawan, 2019.
-  Fiware-iot-agent, 2019.
-  H. Cai, B. Xu, L. Jiang, and A. V. Vasilakos. Iot-based big data storage systems in cloud computing: Perspectives and challenges. IEEE Internet of Things Journal, 4(1):75–87, Feb 2017.
-  Hongming Cai, Li Da Xu, Cheng Xie, Shaojun Qin, and Lihong Jiang. Iot-based configurable information service platform for product lifecycle management. IEEE Transactions on Industrial Informatics, 10(2):1558–1567, May 2014.
-  Olivier Curé, Fadhela Kerdjoudj, David Faye, Chan Le Duc, and Myriam Lamolle. On the potential integration of an ontology-based data access approach in nosql stores. In 2012 Third International Conference on Emerging Intelligent Data and Web Technologies, pages 166–173, Sep. 2012.
-  Minbo Li, Zhu Zhu, and Guangyu Chen. A scalable and high-efficiency discovery service using a new storage. In 2013 IEEE 37th Annual Computer Software and Applications Conference, pages 754–759, July 2013.
-  Youzhong Ma, Jia Rao, Weisong Hu, Xiaofeng Meng, Xu Han, Yu Zhang, Yunpeng Chai, and Chunqiu Liu. An efficient index for massive iot data in cloud environment. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM ’12, pages 2129–2133, New York, NY, USA, 2012. ACM.
-  S. Mallapuram, N. Ngwum, F. Yuan, C. Lu, and W. Yu. Smart city: The state of the art, datasets, and evaluation platforms. In 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS), pages 447–452, May 2017.
-  Lihong Jiang, Li Da Xu, Hongming Cai, Zuhai Jiang, Fenglin Bu, and Boyi Xu. An iot-oriented data storage framework in cloud computing platform. IEEE Transactions on Industrial Informatics, 10(2):1443–1451, May 2014.
-  Fiware, 2019.
-  Fiware orion, 2019.
-  Fiware cosmos, 2019.
-  Quantum leap, 2019.
-  Cratedb, 2019.
-  W. Ji, J. Xu, H. Qiao, M. Zhou, and B. Liang. Visual iot: Enabling internet of things visualization in smart cities. IEEE Network, 33(2):102–110, March 2019.
-  Ralph Kimball and Joe Caserta. The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming and Delivering Data. John Wiley & Sons, Inc., USA, 2004.
-  Srividya K. Bansal and Sebastian Kagemann. Integrating big data: A semantic extract-transform-load framework. Computer, 48(3):42–50, Mar 2015.
-  Panos Vassiliadis. A survey of extract-transform-load technology. International Journal of Data Warehousing and Mining, 5:1–27, 07 2009.
-  H. Fang. Managing data lakes in big data era: What’s a data lake and why has it became popular in data management ecosystem. In 2015 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER), pages 820–824, June 2015.
-  Nathan Marz. Big data: principles and best practices of scalable realtime data systems. O’Reilly Media, [S.l.], 2013.
-  Mariam Kiran, Peter Murphy, Inder Monga, Jon Dugan, and Sartaj Singh Baveja. Lambda architecture for cost-effective batch and speed big data processing. In 2015 IEEE International Conference on Big Data (Big Data), pages 2785–2792, Oct 2015.
-  Questioning the lambda architecture, 2014.
-  Jie Lin, Wei Yu, Nan Zhang, Xinyu Yang, Hanlin Zhang, and Wei Zhao. A survey on internet of things: Architecture, enabling technologies, security and privacy, and applications. IEEE Internet of Things Journal, 4(5):1125–1142, Oct 2017.
-  Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg, 2006.
-  R.M. Golden. Statistical pattern recognition. In Neil J. Smelser and Paul B. Baltes, editors, International Encyclopedia of the Social & Behavioral Sciences, pages 15040 – 15044. Pergamon, Oxford, 2001.
-  Anil K. Jain, Robert Duin, and Jianchang Mao. Statistical pattern recognition: A review. IEEE Trans. Pattern Anal. Mach. Intell., 22:4–37, 01 2000.
-  Mohammad Saeid Mahdavinejad, Mohammadreza Rezvan, Mohammadamin Barekatain, Peyman Adibi, Payam Barnaghi, and Amit P. Sheth. Machine learning for internet of things data analysis: a survey. Digital Communications and Networks, 4(3):161 – 175, 2018.
-  Luca Roffia, Paolo Azzoni, Cristiano Aguzzi, Fabio Viola, Francesco Antoniazzi, and Tullio Cinotti. Dynamic linked data: A sparql event processing architecture. Future Internet, 10:36, 04 2018.
-  Feng Chen, Pan Deng, Jiafu Wan, Daqiang Zhang, Athanasios V. Vasilakos, and Xiaohui Rong. Data mining for the internet of things: Literature review and challenges. International Journal of Distributed Sensor Networks, 11(8):431047, 2015.
-  C. Tsai, C. Lai, M. Chiang, and L. T. Yang. Data mining for internet of things: A survey. IEEE Communications Surveys Tutorials, 16(1):77–97, First 2014.
-  H. Li, K. Ota, and M. Dong. Learning iot in edge: Deep learning for the internet of things with edge computing. IEEE Network, 32(1):96–101, Jan 2018.
-  Abhirup Khanna and Rishi Anand. Iot based smart parking system. In 2016 International Conference on Internet of Things and Applications (IOTA), pages 266–270, Jan 2016.
-  T. N. Pham, M. Tsai, D. B. Nguyen, C. Dow, and D. Deng. A cloud-based smart-parking system based on internet-of-things technologies. IEEE Access, 3:1581–1591, 2015.
-  Balaji Kalluri, Clayton Miller, Bharath Seshadri, and Arno Schlueter. A cyber-physical middleware platform for buildings in smart cities. pages 645–652, 10 2018.
-  D. Minoli, K. Sohraby, and B. Occhiogrosso. Iot considerations, requirements, and architectures for smart buildings—energy optimization and next-generation building management systems. IEEE Internet of Things Journal, 4(1):269–283, Feb 2017.
-  Zhao Liqiang, Yin Shouyi, Liu Leibo, Zhang Zhen, and Wei Shaojun. A crop monitoring system based on wireless sensor network. Procedia Environmental Sciences, 11:558–565, 12 2011.
-  K. Mohammed Shahanas and P. Bagavathi Sivakumar. Framework for a smart water management system in the context of smart city initiatives in india. Procedia Computer Science, 92:142 – 147, 2016. 2nd International Conference on Intelligent Computing, Communication & Convergence, ICCC 2016, 24-25 January 2016, Bhubaneswar, Odisha, India.
-  Nonhlanhla Ntuli and Adnan Abu-Mahfouz. A simple security architecture for smart water management system. Procedia Computer Science, 83:1164–1169, 04 2016.
-  Ciprian-Radu Rad, Olimpiu Hancu, Ioana-Alexandra Takacs, and Gheorghe Olteanu. Smart monitoring of potato crop: A cyber-physical system architecture model in the field of precision agriculture. Agriculture and Agricultural Science Procedia, 6:73–79, 12 2015.
-  Tomás Robles, Ramón Alcarria, Diego Martín, Augusto Morales, Mariano Navarro, Rodrigo Calero, Sofia Iglesias, and Manuel López. An internet of things-based model for smart water management. In Proceedings of the 2014 28th International Conference on Advanced Information Networking and Applications Workshops, WAINA ’14, pages 821–826, Washington, DC, USA, 2014. IEEE Computer Society.
-  K.H. Xiao, D.Q. Xiao, and X.W. Luo. Smart water-saving irrigation system in precision agriculture based on wireless sensor network. Trans. Chin. Soc. Agric. Eng., 26:170–175, 11 2010.
-  Lina Zhou, Shimei Pan, Jianwu Wang, and Athanasios V. Vasilakos. Machine learning on big data: Opportunities and challenges. Neurocomputing, 237:350 – 361, 2017.
-  Samet Tonyali, Kemal Akkaya, Nico Saputro, A. Selcuk Uluagac, and Mehrdad Nojoumian. Privacy-preserving protocols for secure and reliable data aggregation in iot-enabled smart metering systems. Future Generation Computer Systems, 78:547 – 557, 2018.
-  Hai Wang, Zeshui Xu, Hamido Fujita, and Shousheng Liu. Towards felicitous decision making: An overview on challenges and trends of big data. Information Sciences, 367-368:747 – 765, 2016.
-  J. Liu Z. Zheng, P. Wang and S. Sun. Real-time big data processing framework: Challenges and solutions. Applied Mathematics and Information Sciences, 9:3169–3190, 01 2015.
-  Ted Hills. NoSQL and SQL Data Modeling: Bringing Together Data, Semantics, and Software. Technics Publications, New York, NY, USA, 2016.