The Internet of Things (IoT) has been gaining momentum in both the industry and research communities due to an explosion in the number of smart mobile devices and sensors and the potential applications of the data produced from a wide spectrum of domains. In their 2013 report, McKinsey note a 300% growth in connected IoT devices in the last five years and rate the potential economic impact of the IoT at $2.7 trillion to $6.2 trillion annually by 2025 (Manyika et al., 2013). These figures grew to $4 trillion and $11 trillion in 2015 (Manyika et al., 2015). A study of Gartner’s 2010 to 2017 hype cycle reports, which we aggregate in Fig. 1, shows the advent of the IoT, steady growth, expansion and creation of new technology areas like the IoT platform. Another interesting technology that exceeds the IoT in momentum on the hype cycle is that of big data, which the IoT serves as a source and sink of.
Big data is data that are too big (volume), too fast (velocity) and too diverse (variety) (Madden, 2012). In the context of the IoT, we see an example of volume in the DEBS 2014 Grand Challenge (Ziekow and Jerzak, 2014), where data from 40 houses with smart plugs produced 4 billion events in a month (Fernandez et al., 2014), given that a 2011 census showed that there were 26.4 million households in the United Kingdom (Office of National Statistics, 2013), the projected data size of 2.64 quadrillion (short scale) per month if every house had a meter, is a good example of too big data. In the IoT use cases of intelligent transportation systems (van Nunen et al., 2012; O’Hara et al., 2012) and telecommunication, data streams can come in too fast for processing, representing a data velocity problem. Finally, too diverse is the catchall term used to describe the presence of heterogenous data sources in the IoT that make it difficult for existing tools to analyse them. In a 2014 survey of data scientists, 71% interviewed said that analytics is becoming increasingly difficult due to the variety and types of data sources (Paradigm4, 2014). An example is in the personal health care use case of the IoT (Niewolny, 2013), where unstructured textual electronic health records, connected mobile devices and sensors (Amendola et al., 2014) all add to the variety problem.
Analytics is the science or method of using analysis to examine something complex (Oxford English Dictionary, 2017). When applied to data, analytics is the process of deriving (the analysis step) knowledge and insights from data (something complex). The evolution to the concept of analytics we see today can be traced back to 1962. Tukey first defined data analysis as procedures for analysing data, techniques for interpreting the results, data gathering that makes analysis easier, more precise and accurate and finally, all the related machinery and statistical methods used (Tukey, 1962). In 1996, Fayyad et al. published an article explaining Knowledge Discovery in Databases (KDD) as “the overall process of discovering useful knowledge from data” where data mining serves aa a step in this process - “the application of specific algorithms for extracting patterns from data” (Fayyad et al., 1996)
. In 2006, Davenport introduced analytics as quantitative, statistical or predictive models to analyse business problems like financial performance or supply chains and stressed its emergence as a fact-based decision-making tool in businesses(Davenport, 2006). In 2009, Varian highlighted the ability to take data and “understand it, process it, extract value from it, visualise it and communicate it”, as a hugely important skill in the coming decade (Varian, 2009). In 2013, Davenport introduced the concepts of Analytics 1.0, traditional analytics, 2.0, the development of big data technology and 3.0 where this big data technology is integrated agilely with analytics, yielding rapid insights and business impact (Davenport, 2013).
To better understand each of these areas, the IoT, Big Data and Analytics, and their intersection, we look chronologically at the existing reviews and surveys on these topics. This will help to establish the need for our review from the new dimension of analytics on the IoT especially in big data scenarios. A summary of the reviews is shown in Table 1.
In 2010, Atzori et al. (Atzori et al., 2010) survey the vision of the IoT, the enabling technologies and potential applications while identifying three perspectives: Things, Semantics and Network. Sharma et al. (Sharma et al., 2010) study analytics applications in the industry and propose a framework of how business analytics can be applied to processes for organisations to gain a sustainable, competitive advantage.
In 2012, Miorandi et al. (Miorandi et al., 2012) survey the IoT mainly from the perspective of the key issues and research challenges and some initiatives going on to address them. Barnaghi et al. (Barnaghi et al., 2012) look at developments in the semantic web community, analysing the advantages of semantics but also highlighting the challenges they face and review work on applying semantics to the IoT. Chen et al. (Chen et al., 2012)
study, using bibliometrics, some of the key research areas in business intelligence and analytics, some application areas and propose a framework to classify them.
In 2013, Sagiroglu et al. (Sagiroglu and Sinanc, 2013) give an overview of the big data problem, methods to handle the big data, analysis techniques and challenges. Vermesan et al. (Vermesan and Friess, 2013) look at the vision, applications, governance and challenges of the IoT and some proposed solutions like semantics.
In 2014, Perera et al. (Perera et al., 2014) present a study of context-aware computing and discuss how it can be applied to the IoT. Zanella et al. (Zanella et al., 2014) survey the enabling infrastructure and architecture for the Internet of Things in an urban, connected, smart city scenario while Xu et al. (Xu et al., 2014a) review the development of IoT technologies for industries. Zhou et al. (Zhou et al., 2014) discuss the challenges brought to data analytics by big data from the perspective of various applications while Kambatla et al. (Kambatla et al., 2014) discuss trends with a focus on hardware and software platforms, virtualisation and application scopes for analytics. Another big data survey is done by Chen et al. (Chen et al., 2014a) who look at challenges and work done from each stage of “data generation, data acquisition, data storage, and data analysis”. They also look at applications of big data briefly, where one such area is the IoT. Finally, Stankovic (Stankovic, 2014) proposes a set of research directions and considerations for future research on the IoT.
In 2015, Granjal et al. (Granjal et al., 2015) survey existing protocols for protecting communications on the IoT, comparing against a set of fundamental security requirements, and highlight the open challenges and strategies for future research work. Al-Fuqaha et al. (Al-Fuqaha et al., 2015) focus on giving a thorough summary of protocols for the IoT and how they work together for applications in big data scenarios.
In 2016, Ray (Ray, 2016) surveys domain-specific architectures for the IoT providing a brief summary of whether cloud platforms in the IoT support data analytics. Razzaque et al. (Razzaque et al., 2016) survey middleware platforms for the IoT against a set of comprehensive service and architectural requirements.
In 2017, Akoka et al. (Akoka et al., 2017) perform a systematic mapping study, a method for structuring a research field, to classify big data academic research and identify trends in the research. Both analytics and the IoT were identified as popular topics. Reviews by Lin et al. (Lin et al., 2017) and Farahzadia et al. (Farahzadia et al., 2017) focus on specific IoT research areas of fog computing architectures and middleware for cloud computing platforms. Sethi et al. (Sethi and Sarangi, 2017) take the approach of surveying IoT architectures, protocols and applications which help them organise a taxonomy of IoT research.
One can see that the vision of the IoT through these surveys is still very much about interconnecting physical objects with protocols, however, the introduction of Big Data and Analytics has meant that there has been a broadening of focus from communications technologies to applications with impact, scalability and utilising context within cross-domain use cases like the smart city while also coalescing around fog computing and edge technologies, middleware platforms and the cloud. Information rather than data is increasingly envisioned as the new language of the IoT, while infrastructure and enabling technologies have shifted towards dealing with Big Data use-cases with high scalability or within distributed systems.
Given the traction of big data analytics in the industry and the IoT’s potential to become a “dominant source” of big data (Chen et al., 2014b), while also a consumer of insights and optimisation drawn from analytics, we foresee that researchers will be looking to understand the process of deriving analytical insights from the IoT. This is further justified by the argument of Akoka et al. (Akoka et al., 2017) that “data of IoT is useful only when analyzed”. As we have noted in our chronological study of previous reviews, this particular combination of areas, with a focus on IoT analytics, to the best of our knowledge, has not been explored in depth. The contribution of this paper is then to:
review IoT analytics applications and research from a variety of domains,
propose a classification and taxonomy for IoT analytics to guide future work and
review the enabling infrastructure for analytics in the context of big data and examine the tradeoffs to shape research directions.
The methodology used and organisation of the rest of the article is explained next (Section 2).
2. Methodology and Organisation of Article
Section 3 starts by introducing the IoT vision and application domains, highlighting how this motivates this paper, which is then followed by the main survey content of the paper. The approach employed for the survey follows that of an evidence-based systematic review (Khan et al., 2003). Firstly, two research questions (RQ) were framed:
What IoT analytics research/applications are being published?
What enabling infrastructure is required for big data IoT analytics applications?
Next, we employed an approach of identifying relevant articles through search on the Web of Science platform that indexed an extensive list of multi-disciplinary journals and conferences across multiple databases. The search criteria included the keywords ‘big data’ or ‘analytics’, filtered by ‘internet of things’. 460 articles were retrieved from 2011 to 2015. This was updated with 311 articles from 2016 and 2017 when the paper was revised.
The articles were further screened manually following an inclusion criteria that mandated they 1) were from original research, 2) described actual designs, implementations and results, 3) applied analytics and 4) served IoT use-cases. The highest ranked 6 papers were chosen from each of 5 IoT application domains determined from IoT literature, forming a high-quality pool of 30 papers according to the systematic review method. This addressed RQ1 and is presented in Section 4. The ranking was decided proportionately by the number of citations and a qualitative score from 0 to 5 of the technological complexity and completeness of the application (mitigating recency bias).
This understanding of IoT applications was combined with business analytics literature, which has successfully drawn insights from data to optimise business processes, to propose a classification for analytics in Section 5. This classification will help us to better define and target research through an IoT analytics taxonomy as part of the summarisation step of a systematic review.
Finally, we go on to review the current state-of-the-art in IoT infrastructure in Section 6 that answers RQ2. We used the survey and applications publications previously retrieved on the IoT and identified, by manual inspection, groups of work in cloud, middleware, distributed and fog computing and expanded the search through these keywords to retrieve relevant articles for IoT scenarios as part of the ‘interpreting the findings’ step. Our goal was to consider analytics infrastructure from the perspective of data generation, collection, integration, storage and compute.
3. The Internet of Things Vision and Application Domains
3.1. IoT Definition and Common Vision
Both the European Commission and the UK Government Office of Science have a similar vision of the IoT as “a world in which everyday objects are connected to a network so that data can be shared”, greatly impacting society (Walport, 2014; European Commission, 2015). The International Telecommunication Union (ITU) calls the IoT “a global infrastructure for the information society, enabling advanced services by interconnecting things based on existing and evolving interoperable information and communication technologies” (International Telecommunication Union, 2012) and from a broader perspective, “a vision with technological and societal implications”, which draws its language from a report by the World Economic Forum (World Economic Forum, 2012).
Common to each of these visions are four principles that are well-defined in IoT literature:
The motivation of this paper builds on the third and fourth principles to identify and understand how analytics can enable advanced services from shared and integrated IoT data. The goal then, from these findings, would be to develop various means to help determine what analytics need to be applied and what enabling infrastructure is necessary. First though, we need to define ‘advanced services’. The next section builds on previous literature to define a set of advanced services domains that help organise the survey of analytics and ensure the paper fulfils a broad coverage.
3.2. IoT Advanced Services and Application Domains
As the IoT develops, many more potential applications and use cases for the IoT will emerge, providing advanced services which offer positive externalities (Holler et al., 2014). A range of advanced service application areas were elicited from each of the surveys describing applications from the 22 in Table 1 in Section 1. They were then classified under their impact to the themes of environment, society and economy which are the drivers of sustainable development used for analysing medium to long term development issues at a large scale (Giddings et al., 2002). Fig. 3 shows the categorisation of the various application areas according to their economic, environmental and societal impact.
From these applications areas, a range of application domains including health, transport, living, environment and industry are used to group them, forming the hierarchical classification shown in Fig. 3. Certain IoT research topics like Smart Cities (Caragliu et al., 2011), Smart Transportation (Sill et al., 2011), Smart Buildings and Smart Homes (Chan et al., 2008) which impact multiple themes are also listed.
4. IoT Applications with Analytics
An important question to ask following our definition of the IoT and its vision is the advantage that connected ‘things’ offer over isolated devices. For example, what is the benefit of deploying a smart parking system as compared to having isolated sensors in a car park using visual signals of green or red on the ceiling to indicate whether a parking lot is empty or occupied? Analytics adds value to integrated data and context from the IoT, producing higher value insights. The analytics-powered smart parking system has a much wider observation space and also guides the user to the available parking lot efficiently, without human intervention, reducing traffic and pollution (Salpietro et al., 2015; Bagula et al., 2015).
Research publications of IoT applications that make use of analytics from 2011 to 2017 were surveyed and the top 6 based on the systematic review methodology (Section 2) from each application domain introduced in Section 3.2 is presented. This is described as follows and summarised in Table 2, which includes the analytics techniques employed, data sources used, and the currency of the data. Currency refers to whether analytics was applied mainly on historical or real-time data.
4.1. Health: Ambient Assisted Living, Neo-natal care, Prognosis, Monitoring
Mukherjee et al. (Mukherjee et al., 2012) review the use of data analytics in healthcare information systems. Two analytics applications are Ambient Assisted Living (AAL) (Dohr et al., 2010) and neo-natal care. In AAL, rules are applied to IoT data collected from smart objects in the homes of elderly or chronic disease patients while advanced solutions take into consideration contextual information and apply inferencing using ontologies to give health advisories to users, update care-givers or contact the hospital in emergencies. By analysing contextual knowledge in connection with physiological data and being sensitive and adaptive to parameters that vary less frequently, such systems are able to provide descriptive analytics to care-givers and a form of discovery analytics to detect anomalies to trigger emergency warnings. Neo-natal care involves the care of newborn babies where data mining is applied to multiple data streams to find relationships and patterns and to diagnose any possible medical conditions in infants who are not able to give the doctors verbal feedback.
Analysing the content of video data to aid the elderly and visually handicapped for AAL and navigation respectively is another IoT healthcare application of analytics (Liu et al., 2013).
In their work, Chen et al. (Chen et al., 2016) design a smart clothing monitoring system with visualisations of wearable sensor data through a mobile application for use cases like baby, elderly and fitness monitoring. This data is also stored on a ‘health cloud’ integrated with a machine learning library for diagnostic and predictive analytics of medical conditions and users health trends respectively.
Hossain and Muhammad (Hossain and Muhammad, 2015)
show how electro-cardiogram (ECG) and other healthcare data collected from wearable IoT devices and sensors can be watermarked to ensure integrity and sent to the cloud for analysis through feature extraction and classification with a support vector machine (SVM) in real-time. Abnormal patterns are discovered and healthcare professionals alerted.
Analytics can also be applied in the form of prognosis, the science of predicting the future medical condition of a patient, to help healthcare professionals make more informed decisions (Hunink et al., 2014). Health indicators collected from sensors of a patient can be compared with data of similar patients and combined with domain knowledge and medical research to make conjectures.
Banos et al. (Banos et al., 2016)
developed a digital health and wellness framework that collects data streams of IoT health data forming a ‘life-log’ for each user and includes descriptive analytics visualisations of activities. A human activity recogniser combines signal processing, SVM and Gaussian Mixture Models to distinguish activities and recommends activities using rule-based reasoning.
4.2. Transport: Traffic Control and Routing, Pedestrian Detection, Smart Parking
Applying analytics on video content has a variety of applications in different fields. In their review paper, Liu et al. (Liu et al., 2013) looked at the latest technologies and applications of video analytics and intelligent video systems. Video analytics has been successfully applied in traffic control systems to detect traffic volume for planning, highlighting incidents and enhancing safety by enforcing traffic rules (Mak and Fan, 2006). Another set of applications is for intelligent vehicles to assist the driver. Danner et al. (Danner et al., 2016) introduce their Precedent-Aware Classification (PAC) technique which combines information from previously traveled routes and minimal classification features from sensors to computer vision analytics for pedestrian and car detection on constrained IoT platforms.
Jara et al. (Jara et al., 2015) derive insights about human dynamics by analysing the correlation between traffic, temperature and time using IoT sensor data from the SmartSantander smart city testbed (Sanchez et al., 2011)
. They apply visual analytics to understand and discover insights on human behaviour and use a poisson model to interpolate and predict traffic density. Liebiget al. (Liebig et al., 2014) go further by prescribing good routes in travel planning using analytical techniques (a spatiotemporal random field based on conditional random fields (Pereira et al., 2001)
for traffic flow prediction and a gaussian process model to fill in missing values in traffic data) to predict the future traffic flow and to estimate traffic flow in areas with limited sensor coverage. These were then used to provide the cost function for the A* search algorithm(Hart et al., 1968)
that uses the combination of a search heuristic and cost function to prescribe optimal routes (provided the heuristic is admissible and predicted costs are accurate).
He et al. (He et al., 2014) develop a smart parking service that combines geographic location information, parking availability, traffic and reservation information. The parking process is modelled as a birth-death stochastic process which allows prediction and optimisation of parking availability. Piovesan et al. (Piovesan et al., 2016)
describe the application of their unsupervised form of self-organising maps (SOM) clustering to the classification of parking spaces according to spatio-temporal patterns. This type of analytics automatically discovers outliers for sensor maintenance and usage anomalies.
4.3. Living: Cultural Behaviour, Public Safety, Smart Buildings, Memory Augmentation, Lifestyle Monitoring
Chianese et al. (Chianese et al., 2017) describe a system for cultural behaviour analysis. They combine models and proximity evaluation algorithms to classify movement in museums from sensors with semantic enrichment from knowledge bases of cultural exhibits and social media of cultural tourism to analyse cultural behaviour using visualisations within an associative model.
Visualisation that taps the human cognitive ability to recognise patterns has also been employed by Razip et al. (Razip et al., 2014) in helping law enforcement officers increase their situational awareness. Officers are equipped with mobile devices that tap into crime data and spatio-temporal sensor data to show interactive alerts of hotspots, risk profiles and on demand chemical plume models.
Additionally, there are public safety and military applications that apply video analytics in detecting movement, intruders or targets. The public safety use case is elaborated on by Gimenez et al. (Gimenez et al., 2012) where they discuss how given the big data problem of having huge amounts of video footage, smart video analytics systems can proactively monitor, automatically recognise and bring to notice situations, flag out suspicious people, trigger alarms and lock down facilities through the recognition of patterns and directional motion, recognising faces and spotting potential problems by tracking, with multiple cameras, how people move in crowded scenes.
Ploennigs et al. (Ploennigs et al., 2014) show how analytics can be applied to energy monitoring used in heating for smart buildings. The system is able to diagnose anomalies in the building temperature, for example, break downs of the cooling system, high occupancy of rooms, or open windows causing air exchange with the external surroundings. Using a semantics based approach, the Building Automation and Control Systems (BACS) (Aste et al., 2017) could, from the sensor definitions, automatically derive diagnosis rules and behaviour of a specific building, making it sensitive to new anomalies.
Guo et al. (Guo et al., 2011) look at discovering various insights from mining the digital traces left by IoT data from cameras, wearables, mobile phones and smart appliances. Resulting applications are life logging systems to augment human memory with recorded data, real world search for objects and interactions with people and a system to improve urban mobility systems by studying large-scale human mobility patterns.
Mukherjee et al. (Mukherjee and Chatterjee, 2014)
present a fast algorithm for detecting anomalies and also for classifying high dimensional data. These were tested with accelerometer data from a wearable personal digital assistant to recognise human activity in real time but can be generalised to other types of high dimensional data. The importance of such algorithms in detecting anomalies and discovering patterns to classify activity from sensor data are analytical tools that form a basis for smart and intelligent devices and in this example, for activity tracking and monitoring.
4.4. Environment: Disaster Detection & Response, Wind Forecasting, Smart Energy
Schnizler et al. (Schnizler et al., 2014) describe a disaster detection system that works on heterogenous streams of sensor data. Their method includes Intelligent Sensor Agents (ISAs) that produce anomalies, low level events with location and time information e.g. an abnormal change in mobile phone connections at a ISA in a telecom cell or base station, a sudden decrease in traffic, increase in twitter messages, change in water level or change in the volume of moving objects at a certain location. These anomaly events then enter Round Table (RT) components that fuse heterogenous sources together by mapping them to a common incident ontology through feedback loops that might involve crowdsourcing, human-in-the-loop or adjusting parameters of other ISAs to find matches. The now homogenous incident stream, can then be processed by a Complex Event Processing (CEP) (Luckham, 2002) engine to complete the situation reconstruction by doing aggregation and clustering with higher-level semantic data, simulation and prediction of outcomes and damage. The resultant incident stream can provide early warning effecting early disaster response.
Xu et al. (Xu et al., 2017) also present a disaster detection system targeted instead at urban disasters. They utilise social media events from multi-modal microblog posts (videos, images and text) to mine semantic, spatiotemporal and visual information producing a story. This real-time story of urban emergencies unfolding serves to increase the situational awareness of emergency response teams.
Another environmental application is wind forecasting (Mukherjee et al., 2013). Data is collected from wind speed sensors in wind turbines and an Artificial Neural Network is used on this data and historical data to perform the forecasting. This is useful for energy provision and planning.
Ghosh et al. (Ghosh et al., 2013)
have implemented a localised smart energy system that uses smart plugs and data analysis to actively monitor energy policy and by performing pattern recognition analysis on accumulated data, spot additional opportunities to save energy. This resulted in saving on electricity bills especially by reducing the amount of power wasted in non-office hours from appliances, desktops and printers.
Similar work by Alonso et al. (Alonso et al., 2013) works on using machine learning and an expert system (rule-based) to provide personalised recommendations, based on energy usage data collected in Smart Homes, that help a user to more efficiently utilise energy. They go one step further to provide recommendations through predicting cheaper options by detecting similar patterns in big data collected from other homes. Ahmed (Ahmed, 2014) applies similar analysis on combined consumption data for use in organisations to help in energy policy planning. He develops a model to classify the energy efficiency of buildings and the seasonal shifts in this classification and using more detailed appliance specific data, forecasts future energy usage.
4.5. Industry: Supply Chain Management, Smart Farming, Chemical Process
Vargheese et al. (Vargheese and Dahir, 2014) propose a system that improves shoppers’ experience by enhancing the On the Shelf Availability (OSA) of products. Furthermore, the system also looks to forecast demand and provide insights on buyers’ behaviour. A multi-tiered approach is employed, where sensors like video cameras, process video streams locally and analyse the products on the shelf, this data is then verified by other sensors like light, infra-red and RFID sensors and the metadata produced is sent to the the cloud to be further processed. In the cloud, this real time data is combined with models from learning systems, data from enterprise Point of Sale (POS) systems and inventory systems to recommend action plans to maintain the OSA of products. The staff of the store are informed and action is taken to restock products. Weather data, local events and promotion details are then analysed with the current OSA to provide demand forecasting and to model buyers behaviour which is fed back into the system.
Nechifor et al. (Nechifor et al., 2014) describe the use of real time data in analytics in a cold chain monitoring (Abad et al., 2009) process. Trucks are used for transporting perishable goods and drugs that require particular thermal and humidity conditions, sensors measure the position and conditions in the truck and of each package, while actuators - air conditioning and ventilation can be controlled automatically. On a larger scale, predictions can be made on delays in routes and when necessary to satisfy the product condition needs, longer but faster routes (less congestion) might be selected.
Similarly, Verdouw et al. (Verdouw et al., 2013) and Robak et al. (Robak et al., 2013) examine supply chains - the integrated, physical flow from raw material to end products with a shared objective, and formulate a framework based on their virtualisation in the IoT. At its highest level, a virtual supply chain supports intelligent analysis and reporting. This is applied to floricultural and a Fourth Party Logistics (4PL) integrator respectively, where business intelligence, data mining and predictive analytics can provide early warning in case of disruptions or unexpected deviations and advanced forecasting about consequences of the detected changes when the product reaches destination.
In the above examples on product and supply chain management, we see a common theme of predictive analytics being employed to business processes. This predictive analytics is often powered by learning from data to discover models or through data mining for patterns in data. The effectiveness of these algorithms benefits from the big data of the IoT in providing a large observation space for discovering patterns and trends. Real time data from sensors then provide the information required to immediately control actuators to rectify problems like products being out of stock on the shelf or conditions in trucks being unsuitable for perishable food.
Kamilaris et al. (Kamilaris et al., 2017) describe the use of a Complex Event Processing (CEP) engine to discover significant events on semantically-enriched data streams from sensors within two smart farming scenarios. One scenario included detecting the fertility of cows from temperature readings and other information on a dairy farm to suggest the best insemination timings. The other was to adaptively control the soil conditions for crop cultivation.
The chemical process industry deploys inferential industrial IoT sensors to process monitoring chains (Chiang et al., 2017b)
. Some techniques applied by sensors include linear regression, artificial neural networks (ANN) and Gaussian process regression which predict variables using available process data. These predictions enable quality monitoring and advance control systems in plants to automatically react and prescribe process modifications “to prevent off-grade products”.
5. Types of Analytics and their importance
Following the study of the current work in analytics in the IoT, we explore a classification of analytics that is applicable to these domains. We derive a categorisation of analytical capabilities from business analytics literature, which the term analytics comes from. Bertolucci et al. (Bertolucci, 2013) propose descriptive, predictive and prescriptive categories while Gartner (Kart, 2012) (Chandler et al., 2011) proposes the extra category of diagnostic analytics. Finally, Corcoran et al. (Corcoran, 2012) introduce the additional category of discovery analytics. We build upon these to form a comprehensive classification of analytic capabilities consisting of five categories: descriptive, diagnostic, discovery, predictive and prescriptive analytics. Each category is described in detail in Section 5.1 and we also summarise how each IoT application surveyed in the previous section is categorised in Table 3. Each application domain has applications which support multiple analytical capabilities. We also note that all the categories of capabilities are well-represented in the literature survey, while mature domains like the industrial IoT focus on high value analytics.
|Health||(Mukherjee et al., 2012; Chen et al., 2016)||(Chen et al., 2016; Liu et al., 2013)||(Mukherjee et al., 2012; Hossain and Muhammad, 2015; Banos et al., 2016)||(Hunink et al., 2014)||(Chen et al., 2016)|
|Transport||(Mak and Fan, 2006)||(Danner et al., 2016; Piovesan et al., 2016; Jara et al., 2015)||(He et al., 2014)||(Liebig et al., 2014)|
|Living||(Razip et al., 2014; Chianese et al., 2017)||(Ploennigs et al., 2014)||(Gimenez et al., 2012; Chianese et al., 2017; Guo et al., 2011; Mukherjee and Chatterjee, 2014)|
|Environment||(Xu et al., 2017)||(Schnizler et al., 2014; Ghosh et al., 2013)||(Mukherjee et al., 2013; Alonso et al., 2013; Ahmed, 2014)|
|Industry||(Verdouw et al., 2013; Robak et al., 2013)||(Nechifor et al., 2014; Verdouw et al., 2013; Chiang et al., 2017b)||(Vargheese and Dahir, 2014; Kamilaris et al., 2017; Chiang et al., 2017b)|
Fig. 4 looks at how each analytical capability fits within the Knowledge Hierarchy (Bernstein, 2011) which is a common framework used in the Knowledge Management domain. This categorisation of analytic capabilities enables us to establish what the aim of analysis is and allows us to relate to the vision of IoT deployment as often expressed in research roadmaps. The value of each capability, is also highlighted in the figure. The knowledge hierarchy starts with data at the base, examples of which are facts, figures and observations (e.g. the raw data produced by IoT ’things’). Information is interpreted data with context, for example, temperature as represented by descriptive analytics: an average over a month or a categorical description of the day being sunny and warm. Knowledge is information within a context with added understanding and meaning, perhaps possible reasons for the high average temperature this month. Finally, wisdom is knowledge with insight, for example, discovering a particular trend in temperature and projecting it across future months while providing cost saving energy management solutions for heating a smart home based on these predictions. Each component of the knowledge hierarchy builds on the previous tier and we can see something similar with analytical capabilities. To add a practical view from business management literature to our discussion, a review of organisations adopting analytics (Lavalle et al., 2010) categorised them as Aspirational, Experienced and Transformed. Aspirational organisations were seen to use analytics in hindsight as a justification for actions, utilising the data, information and knowledge tiers in the process. Experienced organisations utilised insights to guide decisions and transformed organisations were characterised by their ability to use analytics to prescribe their actions, effectively applying foresight in their decision making process.
5.1. Five Categories of Analytics Capabilities
5.1.1. Descriptive Analytics
It helps us to answer the question, “what happened?”. It can take the form of describing, summarising or presenting raw IoT data that has been gathered. Data are decoded, interpreted in context, fused and then presented so that it can be understood and might take the form of a chart, a report, statistics or some aggregation of information.
5.1.2. Diagnostic Analytics
It is the process of understanding why something has happened. This goes one step deeper then descriptive analytics in that we try to find out the root cause and explanations for the IoT data. Both descriptive and diagnostic analytics give us hindsight on what and why things have happened.
5.1.3. Discovery in Analytics
Through the application of inference, reasoning or detecting non trivial information from raw IoT data, we have the capability of Discovery in Analytics. Given the acute problem of volume that big data presents, Discovery in Analytics is also very valuable in narrowing down the search space of analytics applications. Discovery in Analytics on data tries to answer the question of what happened that we don’t know about and the outcome is insight into what happened. What differentiates this from the previous types of analytics is using the data to detect something new, novel or different (e.g. trends, exceptions or clusters) rather than describing or explaining it.
5.1.4. Predictive Analytics
For the final two categories of analytics, we move from hindsight and insight to foresight. Predictive Analytics tries to answer the question: “what is likely to happen?”. It uses past data and knowledge to predict future outcomes (Hair Jr, 2007) and provides methods to assess the quality of these predictions (Shmueli and Koppiu, 2010).
5.1.5. Prescriptive Analytics
It looks at the question of what should I do about what has happened or is likely to happen. It enables decision-makers to not only look into the future about opportunities (and issues) that are potentially out there, but it also presents the best course of action to act on foresight in a timely manner (Basu, 2013) with the consideration of uncertainty. This form of analytical capability is closely coupled with optimisation, answering ‘what if’ questions so as to evaluate and present the best solution.
5.2. Specific Types of Analytics
Having looked at analytical capabilities which help to define the aims of analytics, we look at specific analytics that can guide stakeholders involved in the deployment of analytics on IoT applications. A summary of the specific types of analytics and their corresponding analytical capabilities can be found in Fig. 5.
5.2.1. Visual Analytics
Visual analytics combines interactive visualisations with data analytics techniques “for an effective understanding, reasoning and decision making on the basis of very large and complex data sets” (Keim et al., 2008). Hence, visual analytics can contribute to not only describing and diagnosing what happened but also help users to discover new insights. In the work by Zhang et. al (Zhang), we see visual analytics being applied to health care data and describing, through answering of questions like “What is the distribution of pregnancy age?”, diagnosing, through hypothesising two disease patterns due to “diarrhoea” and “fever” not being correlated and discovery, through detecting the delayed outbreak of two diseases.
5.2.2. Data Mining
Data Mining is part of the Knowledge Discovery from Data (KDD) process in which interesting patterns and knowledge are discovered from large amounts of data (Han et al., 2012). The IoT is a source for a large amount of data in which the techniques of data mining can be applied. These include:
Multi-dimensional data summary is often associated with Online analytical processing (OLAP) operations that make use of background knowledge of the domain to allow presentation of data at different levels of abstraction. For example, you could drill-down and roll-up data to present it at different degrees of summarisation.
Association & correlation is the process of finding the relationship between two variables which vary according to some pattern. This could allow us to find out whether buying product A, led to buying product B with a degree of confidence and support.
Classification is the process of finding some model or function that has the ability to distinguish between data classes or concepts.
Clustering is the process of grouping data objects into classes without labels. The clustered data objects have maximum similarity to in-class objects and minimum similarity between objects from other classes.
Pattern discovery is the process of detecting and extracting interesting patterns from data, an example of which are frequent item sets, a set of items that often appear together in a transactional data set. Anomaly detection refers to the problem of “finding patterns in data that do not conform to expected behaviour” (Chandola et al., 2009).
5.2.3. Content and Text Analytics
Content Analytics is the broad area of which analytical techniques are applied to digital content. Text analytics is the derivation of high quality information from unstructured text, for example, extracting named entities and relations, analyse sentiment, extract events and time series information, etc.
5.2.4. Video Analytics
Video Analytics (VA) is about the use of specialised software and hardware “to analyse captured video and automatically identify specific objects, events, behaviour or attitudes in video footage in real-time” (Gimenez et al., 2012).
5.2.5. Trend Analytics
Trend analytics is concerned with looking at data and events across time, understanding it and making predictions to future trends and providing early warning systems. Trend analytics is also closely related to the analysis of time-series information (Chatfield, 2013), where looking at a time-series we try to find a ‘long-term change in the mean level’.
5.2.6. Business Analytics
Business Analytics is the practice of using an organisations data to gain insights through analytical techniques that can better inform business decisions and automate and optimise business processes.
5.3. A Layered Taxonomy of Data, Analytics and Applications for the IoT
Fig. 6 shows a layered taxonomy of analytics for the IoT that summarises our survey with respect to analytics capabilities and specific analytics. There are three layers in the taxonomy: data, analytics and applications. Within each layer are various concepts, classes and techniques which are well-defined in background literature and gathered from reviews in each area.
In the analytics layer, visual analytics processes are defined by Keim et al. (Keim et al., 2010) while techniques for each data type are summarised in surveys (Mittelstadt et al., 2012; Sun et al., 2013; Aigner et al., 2008). Data mining (Liao et al., 2012; Goebel and Gruenwald, 1999; Shmueli et al., 2017), text analytics (Aggarwal and Zhai, 2012) and video analytics (Liu et al., 2013) each are well-described in the referenced authoritative texts. Time-series forecasting (Mahalakshmi et al., 2016), analysis and control (Box et al., 2015) have also been reviewed in detail. Literature also covers business analytics processes (Larson and Chang, 2016), prescriptive analytics (Basu, 2013) and techniques (Turban et al., 2014).
In the application layer, themes and domains are from Section 3.2 while the IoT applications from each domain surveyed in Section 4 are shown connected to their various analytics capabilities. Analytics techniques can then be referenced under each capability.
In the data layer, big data as defined in Section 1 is summarised along with terms used throughout the survey including currency, types of data and their sources. Two other terms for big data, veracity and variability are introduced for completeness. Veracity is concerned with the noise within data and how accurate the data is for whatever purpose it is to serve. Variability is concerned with data whose meaning changes due to differences in interpretation of data within a specific context. Finally, processes, distribution levels and distributed technologies for storage and compute are covered in Section 6 that follows this.
6. Enabling Infrastructure for IoT Analytics
In the previous section we looked at classifying analytics and building a taxonomy for understanding analytics. In this section, we will review work that enables analytics to be applied on IoT data.
Enabling infrastructure for analytics on the IoT are components, techniques and technology that contribute to the process whereby data is utilised in analytics applications. Fig. 7 shows the process of how data goes through the steps of generation and collection, aggregation and integration and finally is applied in analytics applications (Chen et al., 2014a). Storage and compute are abstract processes involved with each step of this data flow. In practice, data could be pipelined from one step to another, hence, need not necessarily be stored, physically, in a separate location. Compute could also be done on the device or in transit and need not imply a separate compute component.
The following sections elaborate on each step of the data flow in IoT analytics from data generation and collection to aggregation and integration with storage and compute alongside. Fig. 8 summarises the technologies covered.
6.1. Data Generation: Sensors and Tags, Hardware and OS, Power
A major source of data in the IoT is generated from sensors including many types of environmental, spatial sensors and health sensors (Ray, 2015). Tags also generate data and can be passive like QR code and barcode patterns which require a device to scan or be active like iBeacon (Apple, 2017) and UriBeacon (Google, 2017) technologies which project signals to mobile applications. RFID tags can be either passive or active, the active type requiring a power source to broadcast signals, and can be UHF (Ultra High Frequency), HF (High Frequency), or LF (Low Frequency). A list of hardware platforms for sensors or base stations receiving the generated data and a list of lightweight operating systems for the IoT are discussed in the surveys by Ray (Ray, 2016) and Razzaque et al. (Razzaque et al., 2016) respectively.
Remotely-deployed IoT sensors also require power especially for the energy consuming process of wirelessly transmitting data. Wolf (Wolf, 2017) describes a number of energy scavenging systems that harvest energy from the environment, while wireless charging technologies like ubeam (Ubeam, 2017) and motion charging like Ampy (Ampy, 2017) are alternatives. Data is then transmitted and collected as follows.
6.2. Data Collection: Discovery, Management, Transmission, Context and Fog
A significant amount of work on the IoT has been to develop middleware, the software layer that connects various components like the device, storage, compute and network together. Middleware in the IoT has functional requirements (Razzaque et al., 2016; Bandyopadhyay et al., 2011) including: 1) resource discovery, 2) resource management, 3) data management, 4) event management and 5) code management. Of these requirements, resource discovery and management fit within the collection step while data and event management fit within the aggregation and storage processes while code management fits within the compute process.
There are a number of technologies for the IoT that support the discovery of devices, Multicast DNS (mDNS) (Internet Engineering Task Force, 2013), DNS Service Discovery (DNS-SD) (Cheshire, 2017), Micro Plug and Play (PnP) (Yang et al., 2015), Simple Service Discovery Protocol (SSDP) (Internet Engineering Task Force, 1999) and Multicast CoAP (MC-CoAP) (Internet Engineering Task Force, 2014). One means of managing the discovered resources is through Thing Directories that serve as catalogues of resources. HyperCat (Alliance, 2017), CoRE Resource Directory (Internet Engineering Task Force, 2017), Sensor Instance Registry (SIR) (Jirka and Nüst, 2010) and digrectory (Jara et al., 2013) are various implementations supporting resource lookup and search.
Another important process in data collection is the transmission of generated data. We divide the transmission technologies into those for communication within sensor networks like Zigbee and those for communication within gateway networks and the wider IoT like LTE and GSM. These technologies are discussed in the survey by Ray (Ray, 2015) on IoT architecures. Network and transport layer protocols like IPv4/v6 and TCP/UDP are well-defined in literature. IPSec (Internet Engineering Task Force, 2005) is a security protocol suite for the network layer that authenticates and encrypts packet data while 1888.3 (IEEE, 2013) is a security standard for the IEEE Ubiquitous Green Community Control Network.
Context-based computing is a research area within the IoT that involves the detection, sharing and grouping of devices according to context in the IoT. Context from the conceptual perspective, as described by Perera et al. (Perera et al., 2014), refers to the location, time, activity and identity related to data collected. Grim et al. (Grim et al., 2012) design a bloom filter (Bloom, 1970) inspired data structure that summarises this context and identifies set membership in a probabilistic way so resources can be discovered and grouped. Perera et al. (Perera et al., 2013) implement resource search and management on a context-based framework. A Comparative Priority-based Weighted Index is generated for each resource, combining priorities like accuracy, reliability, energy, cost and availability which optimises the selection process for the aggregation of data sources.
Chiang et al. (Chiang et al., 2017a) define fog computing as an “end-to-end horizontal architecture” for the IoT that distributes the compute and storage, control and communication planes nearer to users “along the cloud-to-thing continuum”. Aazam and Huh (Aazam and Huh, 2014) describe specifically how this vision can be realised in terms of additional security, storage, processing and monitoring sub-layers between the physical layer and the transport layer of an IoT architecture. Hence, Fog Computing extends to the data aggregation layer and can even extend to the analytics process.
6.3. Data Aggregation and Integration: Interoperability
Besides the functional requirements of middleware defined in the previous section, the survey by Razzaque et al. (Razzaque et al., 2016) also describes architectural requirements, design approaches and non-functional requirements of middleware, as shown in Fig. 8, along with a detailed review of various software and publications. Interoperability is one of the architectural requirements and is essential for the data aggregation and integration process. McKinsey (Manyika et al., 2015) estimate that such interoperability will unlock an additional 40 to 60 percent of the total projected future IoT market value.
Berrios et al. (Berrios et al., 2017) describe how various cross-industry consortia concerned with the IoT are converging on semantic interoperability within the application layer which they split into interoperability of business semantics, device semantics, unit of measure semantics and API and service standards. All the consortia involved are working on device semantics for interoperability while at least one consortium has defined standards for each of the home & buildings, retail, healthcare, transport & logistics and energy industries. The series of articles, co-authored by representatives from each consortia, also recommended a top-level ontology, an ontology representing the intersection of business and device semantics and a common data format.
Milenkovic (Milenkovic, 2015) also argue for a common representation for metadata, that provides context to the data collected. Linked Data, which is defined as “a set of best practices for publishing data on the Web so that distributed structured data can be interconnected and made more useful by semantic queries” (Bizer et al., 2009), is seen as one means. Barnaghi et al. (Barnaghi et al., 2012)
argue that Linked Data and semantic technologies can serve to facilitate interoperability, data abstraction, access and integration with other cyber, social or physical world data. RDFS, which inspired the popular schema.org vocabulary that allows persons, events, places and products to be defined on the web and the Web Ontology Language (OWL) for complex modelling and non-trivial automated reasoning in ontologies are related technologies that allow metadata to be represented. There are also proposals for other data models like YANG(Schonwalder et al., 2010), JSON Schema (Galiegue et al., 2013) and JSON Content Rules (Cordell and Newton, 2016) to be adopted.
6.4. Architectures for Storage and Compute
At a high level, architectures help to define how to build infrastructures and how to handle big IoT data in the storage and compute components for analytics. One such architecture is the lambda architecture by Marz et al. (Marz and Warren, 2014) which consists of a speed, a serving and a batch layer. The idea is that for huge datasets it is necessary to precompute batch views in the batch layer and update them in the serving layer, at the same time a speed layer compensates for the high latency of the batch computations by looking at recent data and doing fast incremental updates.
This big data architecture is useful in providing us with a general idea of how analytics can scale to the volume of IoT data. Ye et al. (Ye et al., 2013) implement a service for big data analytics (using R and Hadoop for efficient parallel processing (Das et al., 2010)) in the batch layer to do data mining tasks like clustering. Products like Onix (Shtykh and Suzuki, 2014), which do analytics on streams, work on implementing solutions for the speed layer while industry players like MapR (MapR Technologies, 2014) have also proposed the Lambda Architecture as part of their data processing architecture. The Lambda Architecture has also been used in an IoT context by Villari et al. (Villari et al., 2014), who apply it to a Smart Environment use case.
Baldominos et al. (Baldominos et al., 2014) also propose a design that is similar in structure to the Lambda architecture and is another example of how an analytics system, for doing machine learning and recommendations in this case, can be implemented with this separation of batch (batch machine learning module/storage), speed (stream machine learning module) and serving (dashboard) layers.
The Hadoop and Spark ecosystems are two other big data processing architectures. Hadoop consists of two main parts, a Distributed File System (DFS) like HDFS and a distributed programming model like MapReduce. The Hadoop ecosystem111Available from http://thebigdatablog.weebly.com/blog/the-hadoop-ecosystem-overview consists of various technologies built on top and around these two parts including warehousing like Hive, NoSQL databases like HBase, data ingestion pipes like Flume and machine learning libraries like Mahout and a host of other technologies222Available from https://hadoopecosystemtable.github.io/.
The Spark ecosystem is built on Spark Core and consists of components like SparkSQL, Spark Streaming, MLLIB and GraphX amongst others. Spark is described in more detail in Section 6.6.
The Lambda architecture, Hadoop and Spark ecosystems, however, are suited for big data systems in which compute and storage are in centralised or cloud-based clusters rather than decentralised fog and edge based computing. The next two sections describe storage and compute technologies which can be used for the IoT and big data analytics including fog computing technologies. Table 4 summarises the distributed storage and compute technologies and their references.
6.5. Storage Technologies
Storage file systems need to cope with the huge amount of data from the IoT and work on ‘exascale’ filesystems by Raicu et al. (Raicu et al., 2011) look to address issues of scalability to millions of nodes and billions of concurrent input/output requests. The idea is to combine advances in non-volatile storage with those of distributed file systems. These include the management of distributed metadata, partitioning and knowledge of data access patterns to maximise data locality, resilience and high availability, data indexing and cooperative caching. An implementation exists in the form of FusionFS (Raicu et al., 2012) which implements a zero-hop distributed hash-table (ZHT) for metadata management.
Similar decentralised distributed file systems (DFS) like Ceph (Weil et al., 2006) and GlusterFS also manage metadata in a distributed way while other DFS like HDFS, which is part of the Hadoop ecosystem from Section 6.4, iRODS and Lustre (Lustre, 2015) are centralised with a single or replicated metadata servers. This group of DFS are classified as locally managed DFS and are compared in a survey by Depardon et al. (Depardon et al., 2013). Another group of DFS are remote access DFS like cloud storage from Google Cloud Storage (Google, 2015), S3 (Amazon Web Services, 2015) and Azure Blob (Microsoft, 2015). Another interesting dimension to remote access DFS is the emerging Container Storage Interface (CSI) specification (Hindman, 2017) for provisioning and managing storage, including cloud DFS like Quobyte (Quobyte, 2017), from container applications.
Bent et al. (Bent et al., 2008) have designed a distributed, federated database architecture, Gaian Databases, that uses biologically-inspired, self-organising principles to organise a network of heterogenous relational or flat file databases and enable queries across them through query flooding. The work has become part of IBM’s Smarter Planet (IBM, 2015b) project - an IoT-related vision of the planet that together with Edgware Fabric (IBM, 2015a) form a middleware layer for analytics and intelligence. The advantage of Gaian Databases is that through minimising network diameter and maximising connections to fit nodes, analytical queries on distributed data can be performed quickly and reliably.
Linked Data was seen as an approach to the aggregation step previously and work to access Linked Data across distributed sources has led to the area of federated querying. FedX (Schwarte et al., 2011), SPLENDID (Gorlitz and Staab, 2011), LHD (Wang et al., 2013) and DARQ (Quilitz and Leser, 2008) are all engines that optimise federated query performance. They achieve improved performance by optimising the join order in queries. FedX takes a heuristic approach while the other engines take statistical approaches. Saleem et al. (Saleem et al., 2014) and Hartig (Hartig, 2013) review and evaluate the systems. More specific performance bottlenecks like data distribution (Rakhmawati and Hausenblas, 2012) and other challenges (Rakhmawati et al., 2013) for federated engines have still to be addressed though.
A message broker is an intermediary that routes a message from publishers to subscribers. A message broker can serve as a storage and interoperability technology in distributed systems as it can provide a formal message protocol for publishing and subscribing, reliable storage and guaranteed message delivery. Log-structured storage has been utilised for high throughput distributed message brokers like Kafka (Jay Kreps, 2013) or the scalable data middleware for smart grids described by Yin et al. (Yin et al., 2011). MQTT (Locke, 2010), a lightweight publish-subscribe protocol for the IoT, ZeroMQ (Hintjens, 2013), a messaging protocol library and Edgware Fabric (IBM, 2015a), an IoT service bus, are other examples of technologies used in distributed message broker systems.
Autonomous or self-driving database management systems like Pelaton (Pavlo et al., 2017)
integrate artificial intelligence components to automatically classify and forecast workloads so that the database can optimise physical storage, data location and partitioning in distributed or cloud-based environments and runtime resources, configuration and query cost models. Panoply(Panoply, 2017) is a similar machine learning optimised autonomous data warehouse, which additionally allows the “self-preparation” (automated transformation and integration) of ingested semi-structured data.
Massive Parallel Processing (MPP) databases are build on top of shared-nothing MPP grids where data is sharded between nodes and nodes processes computations, queries to retrieve and process data, in parallel. Greenplum (Pivotal Inc., 2015), which uses a master-segment approach with each segment a PostgresSQL database, and Teradata (Teradata, 2015) are examples of MPP databases. Volcano (Graefe, 1994) was an early system that presented research on parallelising query operators through the exchange of meta-operators.
6.6. Compute Technologies and IoT Analytics Applications
The elasticity of resources on the cloud is often considered an advantage for deploying horizontally-scalable parallel processing paradigms that work with big IoT data. Compute on the cloud can be divided into virtualisation, serverless computing and container technologies and orchestration. Major vendors like the Google Cloud Platform (Google, 2015), Amazon Web Services (Amazon Web Services, 2015) and Microsoft Azure (Microsoft, 2015) each have options for virtualisation, Compute Engine, EC2 and Azure Virtual Machines respectively, which allow a full server to be provisioned for compute and storage tasks. Each also supports the serverless execution of compute functions through Cloud Functions, Lambda and Azure Functions respectively. Finally, container technologies (Casalicchio, 2017) are becoming increasingly popular as they increase application portability and reduce dependencies, have lower overhead and faster launch times than virtual machines and the orchestration of containers allows the efficient provisioning, deployment and management of distributed compute clusters. Kubernetes, Docker Compose (Tosatto et al., 2015) and Mesosphere (Mesosphere, 2017) are such container orchestration technologies.
Various IoT infrastructures and deployments have implemented distributed cloud-based compute. Xu et al. (Xu et al., 2014b) have developed a cloud-based time-series analytics platform for the IoT that stores and indexes time series data, analyses and mines for patterns and allows searching on patterns and abnormal pattern discovery. Indexes specifically optimised for time-series data help achieve real time analytics at a lower latency (at the cost of increased storage space). Ding et al. (Ding et al., 2013), propose a means of doing statistical analysis on the cloud. Spatial aggregation of the area in a city where the pollution level is above a certain threshold or parameter aggregation to calculate the average pollution level at a certain time in a city are examples. The novel part of this approach is that analytics is implemented within the database kernel itself, improving performance by reducing the transfer of data (to the master node for processing).
Nastic et al. (Nastic et al., 2013) have designed a high level programming model abstraction for the IoT running on the cloud. In the model, there are abstractions called Intents and Intent Scopes which describe a task and a group of ‘things’ respectively, from underlying distributed and heterogenous sources that share a common context. By coupling Intents and Intent Scopes with analytics operators, complex IoT applications can be designed, optimised on a distributed compute system, and run on the cloud. Guazzelli et al. (Guazzelli et al., 2009) make use of the Predictive Model Markup Language (PMML) (Data Mining Group, 2015)
, an XML based markup language to describe data mining models, to run analytics on the cloud. Web service calls can be made to instances on the cloud, submitting markup that then execute tasks like running regression models, clustering, learning based on artificial neural networks (ANN), decision trees, support vector machines or mining association rules.
Next, we briefly summarise specific distributed compute technologies from the small programming constructs and components used to build distributed compute to the large big data systems made from these components which we divide into: in-memory and stream systems, parallel programming models, graph parallel models and edge/fog computing systems.
A means that nodes in a distributed compute system can communicate is through message passing. A Remote Procedure Call (RPC) is a form of message passing and gRPC, a multiplexed, bi-directional streaming RPC protocol, and Thrift (Prunicki, 2009), an asynchronous RPC system, are examples. The Actor Model (Haller, 2012) is a message passing programming model supporting asynchronous communication in distributed compute systems the provides an abstraction enabling looser coupling among components, allows for behaviour reasoning, and provides a lightweight concurrency primitive across machines. Futures or Promises are another construct for asynchronous programming and are abstractions of values that will eventually become available. They are useful for message passing in distributed compute to reason about state changes when latency is a concern.
Distributed in-memory databases like H-store (Kallman et al., 2008) and MemSQL (MemSQL Inc., 2015) allow low latency stored procedures and interactive querying respectively to be done on scale-out transactional databases, hence, overcoming memory limitations by adding nodes. They are so fast that they can be used for real-time compute tasks rather than just storage. Spark (Zaharia et al., 2012) is an in-memory data processing engine with two main abstractions, an immutable, read-only collection of objects within Resilient Distributed Datasets (RDD) (as opposed to fine-grained Distributed Shared memory (DSM)) and parallel operations represented as an acyclic data flow graph. SparkSQL includes an execution model that uses the Catalyst query optimiser (Armbrust et al., 2015) for both rule-based and cost-based optimisation to form a Spark data flow graph. D-streams is the Spark streaming abstraction where a streaming computation is treated as series of deterministic batch computations on RDDs within small time intervals. This type of stream processing is called micro-batch processing while Complex Event Processing (CEP) (Luckham, 2002) involves continuous operators on each tuple. Khare et al. (Khare et al., 2015) show how continuous operators can work on publish-subscribe IoT sensor streams with a Functional Reactive Programming (FRP) language. The system was tested on sensor data of a football match to aggregate running data for each player and create descriptive analytics heat maps for players.
Data parallelism means that each node in a distributed compute system can perform independent calculations on a meaningful subset of data. MapReduce (Dean and Ghemawat, 2008), of which Hadoop MapReduce (Section 6.4) is an implementation, and Dryad (Isard et al., 2007) are both programming models for data parallel processing on big data. Hammond et al. (Hammond and Varde, 2013)
deploy analytics in the cloud using Hadoop MapReduce. The analytics techniques include text classification using Naive Bayes, a top-K recommendation engine based on similarity and a Random Forests classifier to categorise data as part of Decision Support Systems. MapReduce, however, does not scale easily for iterative graph algorithms as each iteration requires reading and writing results to disk. Graph Parallel abstractions like those in GraphX(Xin et al., 2013), for graph transformations, and GraphLab (Low et al., 2012), for asynchronous computation, support these.
Finally, Fog or Edge Computing technologies like ANGELS (Mukherjee et al., 2014) and the scheduler designed by Dey et al. (Dey et al., 2013) propose utilising the idle computing resources of edge devices like smartphones through a scheduler in cloud. The edge devices themselves keep track of their resource usage states, which are formed based on user behavioural patterns, and advertise free slot availability. The cloud servers receive analytics jobs and advertisements from edge devices and then schedule subtasks to these devices. Distributed stream processing within a fog computing network has also been implemented in the Eywa framework (Siow et al., 2017) using inverse-publish-subscribe for the control plane and workload pushdown to fog nodes for projections in the data plane. Cloudlets (Satyanarayanan et al., 2009) allow a mobile user to instantiate virtualised compute tasks on physically proximate cloudlet hardware. Cisco IOx (Cisco, 2015) is another platform that consists of a fog director, application host and management components, allowing fog computing tasks to be virtualised and executed on fog nodes.
6.7. Levels of Distribution of Storage and Compute
The Internet of Things, as defined in Section 3, comprises smart and interconnected physical objects with varying storage and compute capabilities. Analytics processing can be done at different distribution levels depending on how far data from physical objects can and should travel and on the storage and compute capabilities at each of:
the device level, where devices act not just as data producers but as participants of the storage and compute process,
the network level, involving remote connections to fog computing nodes, hubs, base stations, gateways, routers and servers and
the cluster level, within a group of interconnected servers.
Enabling infrastructure and technologies are observed to address each of these levels of distribution of compute and storage to a different degree. A classification of the surveyed IoT enabling technologies is proposed in Fig. 9 along the axes of storage and compute distribution.
At the cluster level, we see storage systems that are distributed within locally managed clusters. Usually these clusters are located within data centres and connected by top-of-the-rack switches in a hierarchical fashion (intra and inter rack). Locally managed distributed file systems are an example. In-memory systems distribute both processing and storage, usually by partitioning the data onto nodes, in a centrally managed cluster and running processing on each node that corresponds with the data on that node. Similarly, Massive Parallel Processing Databases, Data Warehouses and Parallel/Distributed Databases are examples of systems with distributed storage and compute on each node. Examples of compute within clusters of distributed servers include Parallel Processing frameworks like MapReduce (Dean and Ghemawat, 2008).
Cloud Computing is usually divided into private, public or hybrid clouds. Private clouds share similarities and types of distributed storage and compute with those previously mentioned at cluster level. Public clouds are remotely managed and hence belong to the network level of distribution of which Cloud Storage and Cloud Compute Engines serve storage and compute tasks respectively. Hybrid clouds bridge both public and private clouds. Similar to cloud storage are remote access distributed file systems. Message brokers, message queues and log-based systems are some other examples of network level, possibly remote access storage systems.
At the device level of storage, we have technologies like federated Linked Data endpoints and Gaian databases where data can reside on their respective devices but be accessed by other clients. On the device level for compute, scheduling of compute tasks on fog or edge devices, is an example. Finally, both compute and storage distribution from the network to device level is present in Edge and Fog computing and middleware is usually used to connect such edge systems together.
Table 4 summarises the surveyed literature on distributed storage and compute technologies to provide a point of reference for researchers on current state-of-the-art implementations.
This review of enabling infrastructure and technologies at each part of the data flow process, classification of the storage and compute distribution and examples of distributed storage and compute technologies form a basis for a direction of future work towards tackling the challenges big data analytics on the IoT.
7. Research Challenges
As we have seen in our study of enabling infrastructure and present analytical applications in the IoT, there are still some challenges that we face in aligning the vision of the IoT with that of analytics. In particular, we argue that infrastructure for analytics in the IoT faces a tradeoff between:
Distribution & Interoperability, complicated by big data variety,
Performance, complicated by the volume and velocity of the big data problem,
and Analytical Value, which deals with how high the output of analytics applications is on the knowledge hierarchy from Fig. 4.
Fig. 10 depicts the tradeoffs that IoT infrastructure for analytics applications face in terms of these three challenges. For example, the semantic technology community argues for its utility in the IoT (Barnaghi et al., 2012) to encourage semantic interoperability, while semantic ontologies provide analytical value and federation supports diverse, heterogeneous distributed sources. Performance of such systems though are still questionable (Saleem et al., 2014; Rakhmawati et al., 2013). Edge and fog computing is also an emerging area of distributed technologies that promises advantages in latency for real-time processing of streams and efficiency due to its proximity sources (Chiang et al., 2017a). However, cloud-based clusters and fast distributed OLTP in-memory processing still offer greater analytical value combining big data sans the advantages of IoT distribution and interoperability.
Variety has been a less researched aspect of the big data problem but is apparent in the IoT paradigm. Heterogenous data sources in the IoT combined with the need for analytics to also involve a wide range of multi-modal data sources like social media, Linked Data, image and video data, satellite and geospatial data, voice data, etc. makes the variety problem highly analogous with the richness of insights and knowledge that can be derived in analytics applications. Predictive analytics can be made more accurate through corroboration of independent data sources and prescription can be optimised with more and varying knowledge inputs. Solving the variety problem can be seen as an opportunity to enhance the value of current IoT applications.
Performance and scalability questions still exist in current systems because of the scale of the IoT. This is not only about scaling the storage of data or of the communications layer but also the scaling of infrastructure to do analytics processing. We see distributed analytics as a plausible means of handling IoT scale-data (which is predicted to be larger and richer than web scale data) and there is potential for more work in this area.
The Internet of Things (IoT) has huge potential to provide advanced services and applications across many domains and the momentum that it has generated, together with its broad visions, make it an ideal frontier for pushing technological innovation. We have shown that analytics plays a role in many applications, across many domains, designed for the IoT and will be even more important in the future as the enabling infrastructure develops and scales and the deployment of devices becomes truly ubiquitous. We have applied a systematic review of analytics applications in the IoT to the task of understanding analytics as it develops. This results in a layered taxonomy that defines and categorises analytics by their capabilities and application potential for research and application roadmaps. We then review the enabling infrastructure and discuss the technologies from different stages in the data flow for analytics. Finally, we look at some tradeoffs for analytics in the IoT that can shape research direction going forward.
- Aazam and Huh (2014) Mohammad Aazam and Eui Nam Huh. 2014. Fog Computing and Smart Gateway Based Communication for Cloud of Things. In Proceedings of the International Conference on Future Internet of Things and Cloud. https://doi.org/10.1109/FiCloud.2014.83
- Abad et al. (2009) Estefania Abad, Francisco Palacio, M Nuin, Alberto G Zárate, A Juarros, José María Gómez, and Santiago Marco. 2009. RFID Smart Tag For Traceability And Cold Chain Monitoring Of Foods: Demonstration In An Intercontinental Fresh Fish Logistic Chain. Journal of Food Engineering 93, 4 (2009), 394–399. https://doi.org/10.1016/j.jfoodeng.2009.02.004
- Aggarwal and Zhai (2012) Charu C Aggarwal and ChengXiang Zhai. 2012. Mining text data. Springer. https://dl.acm.org/citation.cfm?id=2669206
- Ahmed (2014) Hussnain Ahmed. 2014. Applying Big Data Analytics for Energy Efficiency. Masters Thesis. Aalto University. https://aaltodoc.aalto.fi/handle/123456789/13899
- Aigner et al. (2008) Wolfgang Aigner, Silvia Miksch, Wolfgang Müller, Heidrun Schumann, and Christian Tominski. 2008. Visual Methods For Analyzing Time-oriented Data. IEEE Transactions on Visualization and Computer Graphics 14, 1 (2008), 47–60. https://doi.org/10.1109/TVCG.2007.70415
- Akoka et al. (2017) Jacky Akoka, Isabelle Comyn-Wattiau, and Nabil Laoufi. 2017. Research on Big Data - A Systematic Mapping Study. Computer Standards & Interfaces 54, 2 (2017), 105–115. https://doi.org/10.1016/j.csi.2017.01.004
- Al-Fuqaha et al. (2015) Ala Al-Fuqaha, Mohsen Guizani, Mehdi Mohammadi, Mohammed Aledhari, and Moussa Ayyash. 2015. Internet of Things: A Survey on Enabling Technologies, Protocols and Applications. IEEE Communications Surveys and Tutorials 17, 4 (2015), 2347–2376. https://doi.org/10.1109/COMST.2015.2444095
- Alliance (2017) Hypercat Alliance. 2017. Hypercat. (2017). http://www.hypercat.io/
- Alonso et al. (2013) Ignacio González Alonso, María Rodríguez Fernández, Juan Jacobo Peralta, and Adolfo Cortés García. 2013. A Holistic Approach to Energy Efficiency Systems through Consumption Management and Big Data Analytics. International Journal on Advances in Software 6, 3 (2013), 261–271. http://digibuo.uniovi.es/dspace/bitstream/10651/35765/1/soft
- Amazon Web Services (2015) Amazon Web Services. 2015. AWS. (2015). http://aws.amazon.com/products/
- Amendola et al. (2014) Sara Amendola, Rossella Lodato, Sabina Manzari, Cecilia Occhiuzzi, and Gaetano Marrocco. 2014. RFID Technology for IoT-based Personal Healthcare in SmartSpaces. IEEE Internet of Things Journal PP, 2 (2014), 1–1.
- Ampy (2017) Ampy. 2017. Ampy Live Charged. (2017). http://www.getampy.com/
- Apple (2017) Apple. 2017. iBeacon. (2017). https://developer.apple.com/ibeacon/
- Armbrust et al. (2015) Michael Armbrust, Ali Ghodsi, Matei Zaharia, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, and Michael J. Franklin. 2015. Spark SQL: Relational Data Processing in Spark. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. https://doi.org/10.1145/2723372.2742797
- Aste et al. (2017) Niccolo Aste, Massimiliano Manfren, and Giorgia Marenzi. 2017. Building Automation and Control Systems and performance optimization: A framework for analysis. Renewable and Sustainable Energy Reviews 75, 2017 (2017), 313–330. https://doi.org/10.1016/j.rser.2016.10.072
- Atzori et al. (2010) Luigi Atzori, Antonio Iera, and Giacomo Morabito. 2010. The Internet of Things: A Survey. Computer Networks 54, 15 (oct 2010), 2787–2805. https://doi.org/10.1016/j.comnet.2010.05.010
- Bagula et al. (2015) Antoine Bagula, Lorenzo Castelli, and Marco Zennaro. 2015. On the Design of Smart Parking Networks in the Smart Cities: An Optimal Sensor Placement Model. Sensors 15, 7 (2015), 15443–67. https://doi.org/10.3390/s150715443
- Baldominos et al. (2014) Alejandro Baldominos, Esperanza Albacete, Yago Saez, and Pedro Isasi. 2014. A Scalable Machine Learning Online Service for Big Data Real-Time Analysis. In Computational Intelligence in Big Data. 1–8.
- Bandyopadhyay et al. (2011) Soma Bandyopadhyay, Munmun Sengupta, Souvik Maiti, and Subhajit Dutta. 2011. Role Of Middleware For Internet Of Things: A Study. International Journal of Computer Science & Engineering Survey 2, 3 (2011), 94–105. https://doi.org/10.5121/ijcses.2011.2307
- Banos et al. (2016) Oresti Banos, Muhammad Bilal Amin, Wajahat Ali Khan, Muhammad Afzal, Maqbool Hussain, Byeong Ho Kang, and Sungyong Lee. 2016. The Mining Minds digital health and wellness framework. BioMedical Engineering OnLine 15, 1 (jul 2016), 76. https://doi.org/10.1186/s12938-016-0179-9
- Barnaghi et al. (2012) Payam Barnaghi, Wei Wang, Cory Henson, and Kerry Taylor. 2012. Semantics for the Internet of Things: Early Progress and Back to the Future. International Journal on Semantic Web and Information Systems 8, 1 (2012), 1–21. https://doi.org/10.4018/jswis.2012010101
- Basu (2013) Atanu Basu. 2013. Five Pillars of Prescriptive Analytics Success. Analytics Magazine (2013), 8–12. http://analytics-magazine.org/executive-edge-five-pillars-of-prescriptive-analytics-success/
- Bent et al. (2008) Graham Bent, Patrick Dantressangle, David Vyvyan, Abbe Mowshowitz, and Valia Mitsou. 2008. A Dynamic Distributed Federated Database. In Proceedings of the 2nd Annual Conference of the International Technology Alliance.
- Bernstein (2011) Jay H Bernstein. 2011. The Data-Information-Knowledge-Wisdom Hierarchy and its Antithesis. NASKO 2.1 (2011), 68–75.
- Berrios et al. (2017) Victor Berrios, Richard Halter, Mark Harrison, Scott Hollenbeck, Elisa Kendall, Doug Migliori, and John Petze. 2017. Cross-industry Semantic Interoperability. (jul 2017). http://www.embedded-computing.com/semantic-interop/cross-industry-semantic-interoperability-part-two-application-layer-standards-and-open-source-initiatives
- Bertolucci (2013) Jeff Bertolucci. 2013. Big Data Analytics: Descriptive Vs. Predictive Vs. Prescriptive. (dec 2013). http://goo.gl/dyNDFV
- Bizer et al. (2009) Chris Bizer, Tom Heath, and Tim Berners-Lee. 2009. Linked Data - The Story So Far. International Journal on Semantic Web and Information Systems 5 (2009), 1–22. https://eprints.soton.ac.uk/271285/
- Bloom (1970) Burton H Bloom. 1970. Space/time Trade-offs In Hash Coding With Allowable Errors. Commun. ACM 13, 7 (1970), 422–426. https://doi.org/10.1145/362686.362692
- Box et al. (2015) George Box, Gwilym M Jenkins, Gregory C Reinsel, and Greta M Ljung. 2015. Time series analysis: forecasting and control. John Wiley & Sons. http://eu.wiley.com/WileyCDA/WileyTitle/productCd-1118675029.html
- Caragliu et al. (2011) Andrea Caragliu, Chiara Del Bo, and Peter Nijkamp. 2011. Smart Cities in Europe. Journal of Urban Technology 18, January 2015 (2011), 65–82. https://doi.org/10.1080/10630732.2011.601117
- Casalicchio (2017) Emiliano Casalicchio. 2017. Autonomic Orchestration of Containers: Problem Definition and Research Challenges. In Proceedings of the 10th EAI International Conference on Performance Evaluation Methodologies and Tools. https://doi.org/10.4108/eai.25-10-2016.2266649
- Chan et al. (2008) Marie Chan, Daniel Estève, Christophe Escriba, and Eric Campo. 2008. A review of smart homes-Present state and future challenges. Computer Methods and Programs in Biomedicine 91 (2008), 55–81. https://doi.org/10.1016/j.cmpb.2008.02.001
- Chandler et al. (2011) Neil Chandler, Bill Hostmann, Nigel Rayner, and Gareth Herschel. 2011. Gartner’s Business Analytics Framework. Technical Report. Gartner Inc. http://www.gartner.com/imagesrv/summits/docs/na/business-intelligence/gartners
- Chandola et al. (2009) Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly Detection. Comput. Surveys 41, 3 (jul 2009), 1–58. https://doi.org/10.1145/1541880.1541882
- Chatfield (2013) Chris Chatfield. 2013. The Analysis Of Time Series: An Introduction. CRC Press.
- Chen et al. (2012) Hsinchun Chen, Roger H L Chiang, and Veda Storey. 2012. Business Intelligence and Analytics: From Big Data To Big Impact. MIS Quarterly 36, 4 (2012), 1165–1188.
- Chen et al. (2016) Min Chen, Yujun Ma, Jeungeun Song, Chin Feng Lai, and Bin Hu. 2016. Smart Clothing: Connecting Human with Clouds and Big Data for Sustainable Health Monitoring. Mobile Networks and Applications 21, 5 (2016), 825–845. https://doi.org/10.1007/s11036-016-0745-1 arXiv:1312.4722
- Chen et al. (2014a) Min Chen, Shiwen Mao, and Yunhao Liu. 2014a. Big Data: A Survey. Mobile Networks and Applications 19 (2014), 171–209. https://doi.org/10.1007/s11036-013-0489-0
- Chen et al. (2014b) Min Chen, Shiwen Mao, Yin Zhang, and Victor Leung. 2014b. Big Data - Related Technologies , Challenges and Future Prospects. Springer. http://www.springer.com/gp/book/9783319062440
- Cheshire (2017) Stuart Cheshire. 2017. DNS Service Discovery. (2017). http://www.dns-sd.org/
- Chianese et al. (2017) Angelo Chianese, Fiammetta Marulli, Francesco Piccialli, Paolo Benedusi, and Jai E. Jung. 2017. An Associative Engines Based Approach Supporting Collaborative Analytics In The Internet Of Cultural Things. Future Generation Computer Systems 66 (2017), 187–198. https://doi.org/10.1016/j.future.2016.04.015
- Chiang et al. (2017b) Leo Chiang, Bo Lu, and Ivan Castillo. 2017b. Big Data Analytics in Chemical Engineering. Annual Review of Chemical and Biomolecular Engineering 8, 1 (2017), 63–85. https://doi.org/10.1146/annurev-chembioeng-060816-101555
- Chiang et al. (2017a) Mung Chiang, Sangtae Ha, Chih-Lin I, Fulvio Risso, and Tao Zhang. 2017a. Clarifying Fog Computing and Networking: 10 Questions and Answers. IEEE Communications Magazine 55, 4 (apr 2017), 18–20. https://doi.org/10.1109/MCOM.2017.7901470
- Cisco (2015) Cisco. 2015. IOX. (2015). https://developer.cisco.com/site/iox/
- Corcoran (2012) Michael Corcoran. 2012. The Five Types Of Analytics. Technical Report. Information Builders. 68–69 pages. http://www.informationbuilders.co.uk/sites/www.informationbuilders.com/files/intl/co.uk/presentations/four
- Cordell and Newton (2016) Pete Cordell and Andrew Newton. 2016. A Language for Rules Describing JSON Content. (2016). https://www.ietf.org/id/draft-newton-json-content-rules-08.txt
- Danner et al. (2016) Jay Danner, Linda Wills, Elbert M. Ruiz, and Lee W. Lerner. 2016. Rapid Precedent-Aware Pedestrian and Car Classification on Constrained IoT Platforms. Proceedings of the 14th ACM/IEEE Symposium on Embedded Systems for Real-Time Multimedia (2016), 29–36. https://doi.org/10.1145/2993452.2993562
- Das et al. (2010) Sudipto Das, Yannis Sismanis, Kevin S Beyer, Rainer Gemulla, Peter J Haas, and John McPherson. 2010. Ricardo: Integrating R and Hadoop. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. https://doi.org/10.1145/1807167.1807275
- Data Mining Group (2015) Data Mining Group. 2015. PMML 4.2 - General Structure. (2015). http://goo.gl/t2Xvy0
- Davenport (2006) Thomas Davenport. 2006. Competing on Analytics. Harvard Business Review 84, 1 (2006), 98–107. https://hbr.org/2006/01/competing-on-analytics
- Davenport (2013) Thomas Davenport. 2013. Analytics 3.0. (dec 2013). https://hbr.org/2013/12/analytics-30
- Dean and Ghemawat (2008) Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce : Simplified Data Processing on Large Clusters. Commun. ACM 51, 1 (2008), 1–13. arXiv:10.1.1.163.5292
- Depardon et al. (2013) Benjamin Depardon, Gaël Le Mahec, and Cyril Séguin. 2013. Analysis of Six Distributed File Systems. Technical Report. HAL. https://hal.inria.fr/hal-00789086
- Dey et al. (2013) Swarnava Dey, Arijit Mukherjee, Himadri Sekhar Paul, and Arpan Pal. 2013. Challenges Of Using Edge Devices In IoT Computation Grids. In Proceedings of the International Conference on Parallel and Distributed Systems. https://doi.org/10.1109/ICPADS.2013.101
- Ding et al. (2013) Zhiming Ding, Xu Gao, Jiajie Xu, and Hong Wu. 2013. IOT-StatisticDB: A General Statistical Database Cluster Mechanism For Big Data Analysis In The Internet Of Things. In Proceedings of the 2013 IEEE International Conference on Green Computing and Communications. https://doi.org/10.1109/GreenCom-iThings-CPSCom.2013.104
- Dohr et al. (2010) Angelika Dohr, R Modre-Opsrian, Mario Drobics, Dieter Hayn, and Günter Schreier. 2010. The Internet of Things for Ambient Assisted Living. In Proceedings of the 7th International Conference on Information Technology. 804–809. https://doi.org/10.1109/ITNG.2010.104
- European Commission (2015) European Commission. 2015. Digital Agenda for Europe: The Internet of Things. (2015). http://goo.gl/oNhYOP
- Farahzadia et al. (2017) Amirhossein Farahzadia, Pooyan Shams, Javad Rezazadeh, and Reza Farahbakhsh. 2017. Middleware Technologies for Cloud of Things - A Survey. Digital Communications and Networks 3, 4 (2017), 1–13. https://doi.org/10.1016/j.dcan.2017.04.005 arXiv:1705.00387
- Fayyad et al. (1996) Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth. 1996. From Data Mining to Knowledge Discovery in Databases. AI Magazine 17, 3 (1996), 37–53. https://doi.org/10.1609/aimag.v17i3.1230
- Fernandez et al. (2014) Raul Castro Fernandez, Matthias Weidlich, Peter Pietzuch, and Avigdor Gal. 2014. Grand Challenge : Scalable Stateful Stream Processing for Smart Grids. (2014), 0–5. https://doi.org/10.1145/2611286.2611326
- Forni and Meulen (2016) Amy Ann Forni and Rob Meulen. 2016. Gartner’s 2016 Hype Cycle for Emerging Technologies. (2016).
- Galiegue et al. (2013) Francis Galiegue, Kris Zyp, and Others. 2013. JSON Schema: Core definitions and terminology. Internet Engineering Task Force (IETF) (2013). http://json-schema.org/latest/json-schema-core.html
- Ghemawat et al. (2003) Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google File System. ACM SIGOPS Operating Systems Review 37, 5 (2003), 29–43.
- Ghosh et al. (2013) Animikh Ghosh, Ketan a. Patil, and Sunil Kumar Vuppala. 2013. PLEMS: Plug Load Energy Management Solution for Enterprises. In Proceedings of the 27th IEEE International Conference on Advanced Information Networking and Applications. https://doi.org/10.1109/AINA.2013.45
- Giddings et al. (2002) Bob Giddings, Bill Hopwood, and Geoff O’Brien. 2002. Environment, Economy and Society: Fitting Them Together Into Sustainable Development. Sustainable Development 10 (2002), 187–196. https://doi.org/10.1002/sd.199
- Gimenez et al. (2012) Roberto Gimenez, Diego Fuentes, Emilio Martin, Diego Gimenez, Judith Pertejo, Sofia Tsekeridou, Roberto Gavazzi, Mario Carabaño, and Sofia Virgos. 2012. The Safety Transformation in the Future Internet Domain. The Future Internet (2012), 190–200. https://doi.org/10.1007/978-3-642-30241-1_17
- Goebel and Gruenwald (1999) Michael Goebel and Le Gruenwald. 1999. A Survey Of Data Mining And Knowledge Discovery Software Tools. ACM SIGKDD Explorations 1, 1 (1999), 20–33. https://doi.org/10.1145/846170.846172
- Google (2015) Google. 2015. Google Cloud Platform. (2015). https://cloud.google.com/
- Google (2017) Google. 2017. Eddystone Beacons. (2017). https://developers.google.com/beacons/
- Gorlitz and Staab (2011) Olaf Gorlitz and Steffen Staab. 2011. SPLENDID : SPARQL Endpoint Federation Exploiting VOID Descriptions. In Proceedings of the 2nd International Workshop on Consuming Linked Data. http://dl.acm.org/citation.cfm?id=2887354
- Graefe (1994) Goetz Graefe. 1994. Volcano - An Extensible And Parallel Query Evaluation System. IEEE Transactions on Knowledge and Data Engineering 6 (1994), 120–135. https://doi.org/10.1109/69.273032
- Granjal et al. (2015) Jorge Granjal, Edmundo Monteiro, and Jorge Sa Silva. 2015. Security for the Internet of Things: A Survey Of Existing Protocols and Open Research Issues. IEEE Communications Surveys and Tutorials 17, 3 (2015), 1294–1312. https://doi.org/10.1109/COMST.2015.2388550
- Grim et al. (2012) Evan Grim, Chien-liang Fok, and Christine Julien. 2012. Grapevine : Efficient Situational Awareness in Pervasive Computing Environments. In Proceedings of the 2012 IEEE International Conference on Pervasive Computing and Communications Workshops. http://ieeexplore.ieee.org/document/6197539/
- Guazzelli et al. (2009) Alex Guazzelli, Kostantinos Stathatos, and Michael Zeller. 2009. Efficient deployment of predictive analytics through open standards and cloud computing. ACM SIGKDD Explorations Newsletter 11, 1 (2009), 32. https://doi.org/10.1145/1656274.1656281
- Guo et al. (2011) Bin Guo, Daqing Zhang, and Zhu Wang. 2011. Living With Internet of Things: The Emergence of Embedded Intelligence. In Proceedings of the 2011 IEEE International Conferences on Internet of Things and Cyber, Physical and Social Computing. https://doi.org/10.1109/iThings/CPSCom.2011.11
- Hair Jr (2007) Joe F Hair Jr. 2007. Knowledge Creation in Marketing: The Role of Predictive Analytics. European Business Review 19 (2007), 303–315. https://doi.org/10.1108/09555340710760134
- Haller (2012) Philipp Haller. 2012. On The Integration Of The Actor Model In Mainstream Technologies. In Proceedings of the 2nd Edition On Programming Systems, Languages And Applications Based On Actors, Agents, And Decentralized Control Abstractions. ACM Press, New York, New York, USA. https://doi.org/10.1145/2414639.2414641
- Hammond and Varde (2013) Klavdiya Hammond and Aparna S Varde. 2013. Cloud Based Predictive Analytics Text Classification, Recommender Systems and Decision Support. In Proceedings of the 13th IEEE International Conference on Data Mining Workshops. https://doi.org/10.1109/ICDMW.2013.95
- Han et al. (2012) Manhyung Han, La The Vinh, Young-Koo Lee, and Sungyoung Lee. 2012. Comprehensive Context Recognizer Based On Multimodal Sensors In A Smartphone. Sensors 12, 9 (2012), 12588–12605. https://doi.org/10.3390/s120912588
- Hart et al. (1968) Peter E Hart, Nils J Nilsson, and Betram Raphael. 1968. A Formal Basis for the Heuristic Determination of Minimum Cost Paths. IEEE Transactions on Systems Science and Cybernetics 4, 2 (1968), 100–107. http://ieeexplore.ieee.org/document/4082128/
- Hartig (2013) Olaf Hartig. 2013. An Overview on Execution Strategies for Linked Data Queries. Datenbank-Spektrum 13, 2 (2013), 89–99. https://doi.org/10.1007/s13222-013-0122-1
- He et al. (2014) Wu He, Gongjun Yan, and Li Da Xu. 2014. Developing Vehicular Data Cloud Services in the IoT Environment. IEEE Transactions on Industrial Informatics 10, 2 (2014), 1587–1595. https://doi.org/10.1109/TII.2014.2299233
- Hindman (2017) Benjamin Hindman. 2017. CSI: Towards A More Universal Storage Interface For Containers. (2017). https://mesosphere.com/blog/csi-towards-universal-storage-interface-for-containers/
- Hintjens (2013) Pieter Hintjens. 2013. ZeroMQ: Messaging for Many Applications. O’Reilly.
- Holler et al. (2014) Jan Holler, Vlasios Tsiatsis, Catherine Mulligan, Stefan Avesand, Stamatis Karnouskos, and David Boyle. 2014. From Machine-to-Machine to the Internet of Things: Introduction to a New Age. Academic Press. https://doi.org/10.1016/B978-0-12-407684-6.00014-0
- Hossain and Muhammad (2015) M Shamim Hossain and Ghulam Muhammad. 2015. Cloud-assisted Industrial Internet of Things (IIoT) - Enabled Framework for Health Monitoring. Computer Networks 101 (2015), 192–202. https://doi.org/10.1016/j.comnet.2016.01.009
- Hunink et al. (2014) Myriam Hunink, Milton Weinstein, Eve Wittenberg, Michael Drummond, Joseph Pliskin, John Wong, and Paul Glasziou. 2014. Decision Making in Health and Medicine: Integrating Evidence and Values. Cambridge University Press. http://jrsm.rsmjournals.com/cgi/doi/10.1258/jrsm.95.2.108-a
- IBM (2015a) IBM. 2015a. Edgware Fabric: A Service Bus For The Physical World. (2015). https://goo.gl/CH4U6W
- IBM (2015b) IBM. 2015b. Smarter Planet. (2015). https://goo.gl/vW0iLd
- IEEE (2013) IEEE. 2013. 1888.3-2013 - IEEE Standard for Ubiquitous Green Community Control Network: Security. (2013). http://ieeexplore.ieee.org/servlet/opac?punumber=6675753
- International Telecommunication Union (2012) International Telecommunication Union. 2012. Overview of the Internet of Things. Technical Report. International Telecommunication Union. http://www.itu.int/ITU-T/recommendations/rec.aspx?rec=11559
- Internet Engineering Task Force (1999) Internet Engineering Task Force. 1999. Simple Service Discovery Protocol/1.0. (oct 1999). https://tools.ietf.org/html/draft-cai-ssdp-v1-03
- Internet Engineering Task Force (2005) Internet Engineering Task Force. 2005. RFC 4301: Security Architecture for the Internet Protocol. (2005). https://tools.ietf.org/html/rfc4301
- Internet Engineering Task Force (2013) Internet Engineering Task Force. 2013. RFC 6762: Multicast DNS. (feb 2013). https://tools.ietf.org/html/rfc6762
- Internet Engineering Task Force (2014) Internet Engineering Task Force. 2014. The Constrained Application Protocol (CoAP). (jun 2014). https://tools.ietf.org/html/rfc7252
- Internet Engineering Task Force (2017) Internet Engineering Task Force. 2017. CoRE Resource Directory. (jul 2017). https://tools.ietf.org/html/draft-ietf-core-resource-directory-11
- Isard et al. (2007) Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: distributed data-parallel programs from sequential building blocks. ACM SIGOPS Operating Systems Review (2007), 59–72. https://doi.org/10.1145/1272996.1273005
- Jara et al. (2013) Antonio Jara, Pablo Lopez, David Fernandez, Jose Castillo, Miguel Zamora, and Antonio Skarmeta. 2013. Mobile digcovery: A Global Service Discovery for the Internet of Things. In Proceedings of the 27th International Conference on Advanced Information Networking and Applications Workshops. https://doi.org/10.1109/WAINA.2013.261
- Jara et al. (2015) Antonio J Jara, Dominique Genoud, and Yann Bocchi. 2015. Big Data For Smart Cities With KNIME A Real Experience In The SmartSantander Testbed. Software: Practice and Experience 45, 8 (aug 2015), 1145–1160. https://doi.org/10.1002/spe.2274 arXiv:1008.1900
- Jay Kreps (2013) Jay Kreps. 2013. The Log: What Every Software Engineer Should Know About Real-time Data’s Unifying Abstraction. (2013). https://goo.gl/b07C4f
- Jirka and Nüst (2010) Simon Jirka and Daniel Nüst. 2010. OGC Sensor Instance Registry Discussion Paper. Technical Report. Open Geospatial Consortium. https://wiki.52north.org/SensorWeb/SensorInstanceRegistry
- Kallman et al. (2008) Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alexander Rasin, Stanley Zdonik, Evan P C Jones, Samuel Madden, Michael Stonebraker, Yang Zhang, John Hugg, and Daniel J Abadi. 2008. H-store: A High-performance, Distributed Main Memory Transaction Processing System. Proceedings of the VLDB Endowment 1, 2 (2008), 1496–1499. https://doi.org/10.1145/1454159.1454211
- Kambatla et al. (2014) Karthik Kambatla, Giorgos Kollias, Vipin Kumar, and Ananth Grama. 2014. Trends in Big Data Analytics. J. Parallel and Distrib. Comput. 74, 7 (2014), 2561–2573. https://doi.org/10.1016/j.jpdc.2014.01.003
- Kamilaris et al. (2017) Andreas Kamilaris, Feng Gao, Francesc X. Prenafeta-Boldu, and Muhammad Intizar Ali. 2017. Agri-IoT: A semantic framework for Internet of Things-enabled smart farming applications. In Proceedings of the 2016 IEEE 3rd World Forum on Internet of Things. 442–447. https://doi.org/10.1109/WF-IoT.2016.7845467
- Kart (2012) Lisa Kart. 2012. Advancing Analytics. Technical Report. Gartner Inc. http://meetings2.informs.org/analytics2013/AdvancingAnalytics
- Keim et al. (2008) Daniel Keim, Gennady Andrienko, Jean-daniel Fekete, and Guy Melançon. 2008. Visual Analytics: Definition, Process, and Challenges. In Information Visualization. Springer, 154–175. https://doi.org/10.1007/978-3-540-70956-5_7
- Keim et al. (2010) Daniel Keim, Jörn Kohlhammer, Geoffrey Ellis, and Florian Mansmann. 2010. Mastering the Information Age Solving Problems with Visual Analytics. EuroGraphics. https://doi.org/10.1016/j.procs.2011.12.035 arXiv:arXiv:1011.1669v3
- Khan et al. (2003) Khalid S Khan, Regina Kunz, Jos Kleijnen, and Gerd Antes. 2003. Five Steps to Conducting a Systematic Review. Journal of the Royal Society of Medicine 96, 3 (2003), 118–121. https://doi.org/10.1258/jrsm.96.3.118
- Khare et al. (2015) Shweta Khare, Kyoungho An, and Aniruddha Gokhale. 2015. Functional Reactive Stream Processing for Data-centric Publish / Subscribe Systems. In 29th IEEE International Parallel & Distributed Processing Symposium.
- Kortuem et al. (2010) Gerd Kortuem, Fahim Kawsar, Daniel Fitton, and Vasughi Sundramoorthy. 2010. Smart Objects As Building Blocks For The Internet Of Things. IEEE Internet Computing 14 (2010), 44–51. https://doi.org/10.1109/MIC.2009.143
Larson and Chang (2016)
Deanne Larson and Victor
A Review And Future Direction Of Agile, Business Intelligence, Analytics And Data Science.International Journal of Information Management 36, 5 (2016), 700–710. https://doi.org/10.1016/j.ijinfomgt.2016.04.013
- Lavalle et al. (2010) Steve Lavalle, Michael S Hopkins, Eric Lesser, Rebecca Shockley, and Nina Kruschwitz. 2010. Analytics : The New Path to Value. MIT Sloan Management Review (2010), 1–24. https://www-935.ibm.com/services/uk/gbs/pdf/Analytics
- Lee et al. (2013) Jung Hoon Lee, Marguerite Gong Hancock, and Mei Chih Hu. 2013. Towards An Effective Framework For Building Smart Cities: Lessons From Seoul And San Francisco. Technological Forecasting and Social Change 89 (2013), 80–99. https://doi.org/10.1016/j.techfore.2013.08.033
- Liao et al. (2012) Shu Hsien Liao, Pei Hui Chu, and Pei Yuan Hsiao. 2012. Data Mining Techniques And Applications - A Decade Review From 2000 To 2011. Expert Systems with Applications 39, 12 (2012), 11303–11311. https://doi.org/10.1016/j.eswa.2012.02.063 arXiv:1202.1112
- Liebig et al. (2014) Thomas Liebig, Nico Piatkowski, Christian Bockermann, and Katharina Morik. 2014. Predictive Trip Planning-Smart Routing in Smart Cities. In Proceedings of the Workshops of the EDBT/ICDT 2014 Joint Conference. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.429.2841
- Lin et al. (2017) Jie Lin, Wei Yu, Nan Zhang, Xinyu Yang, Hanlin Zhang, and Wei Zhao. 2017. A Survey on Internet of Things: Architecture, Enabling Technologies, Security and Privacy, and Applications. IEEE Internet of Things Journal (2017). https://doi.org/10.1109/JIOT.2017.2683200
- Liu et al. (2013) Honghai Liu, Shengyong Chen, and Naoyuki Kubota. 2013. Intelligent Video Systems and Analytics: A Survey. IEEE Transactions on Industrial Informatics 9, 3 (2013), 1222–1233. https://doi.org/10.1109/TII.2013.2255616
- Locke (2010) Dave Locke. 2010. MQ Telemetry Transport (MQTT) V3.1 Protocol Specification. (2010).
- Low et al. (2012) Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M Hellerstein. 2012. Distributed GraphLab: A Framework For Machine Learning And Data Mining In The Cloud. Proceedings of the VLDB Endowment 5, 8 (apr 2012), 716–727. https://doi.org/10.14778/2212351.2212354
- Luckham (2002) David Luckham. 2002. The Power Of Events: An Introduction To Complex Event Processing In Distributed Enterprise Systems. Addison-Wesley. https://doi.org/10.1007/978-3-540-88808-6_2
- Lustre (2015) Lustre. 2015. The Lustre Filesystem. (2015). http://lustre.opensfs.org/
- Madden (2012) Sam Madden. 2012. From Databases To Big Data. IEEE Internet Computing 16 (2012), 4–6. https://doi.org/10.1109/MIC.2012.50
- Mahalakshmi et al. (2016) Ganapathy Mahalakshmi, Sridevi Sureshkumar, and S Rajaram. 2016. A Survey On Forecasting Of Time Series Data. In Proceedings of the 2016 International Conference on Computing Technologies and Intelligent Data Engineering. https://doi.org/10.1109/ICCTIDE.2016.7725358
- Mak and Fan (2006) Chin Mak and Henry Fan. 2006. Heavy Flow-Based Incident Detection Algorithm Using Information From Two Adjacent Detector Stations. Journal of Intelligent Transportation Systems 10, 1 (2006), 23–31. https://doi.org/10.1080/15472450500455229
- Manyika et al. (2015) James Manyika, Michael Chui, Peter Bisson, Jonathan Woetzel, Richard Dobbs, Jacques Bughin, and Dan Aharon. 2015. The Internet of Things: Mapping the Value Beyond the Hype. Technical Report. McKinsey Global Institute. http://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/the-internet-of-things-the-value-of-digitizing-the-physical-world
- Manyika et al. (2013) James Manyika, Michael Chui, and Jacques Bughin. 2013. Disruptive Technologies: Advances That Will Transform Life, Business, And The Global Economy. Technical Report. McKinsey Global Institute. https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/disruptive-technologies
- MapR Technologies (2014) MapR Technologies. 2014. Stream Processing with MapR. Technical Report. MapR Inc. https://mapr.com/resources/stream-processing-mapr/
- Marz and Warren (2014) Nathan Marz and James Warren. 2014. Big Data : Principles and Best Practices of Scalable Realtime Data Systems. Mannings. arXiv:1-933988-16-9
- MemSQL Inc. (2015) MemSQL Inc. 2015. MemSQL. (2015). http://www.memsql.com/
- Mesosphere (2017) Mesosphere. 2017. Mesosphere. (2017). https://mesosphere.com
- Microsoft (2015) Microsoft. 2015. Microsoft Azure. (2015). http://azure.microsoft.com/en-gb/
- Milenkovic (2015) Milan Milenkovic. 2015. A Case for Interoperable IoT Sensor Data and Meta-data Formats. Ubiquity 2015, November (2015), 1–7. https://doi.org/10.1145/2822643
- Miorandi et al. (2012) Daniele Miorandi, Sabrina Sicari, Francesco De Pellegrini, and Imrich Chlamtac. 2012. Internet of Things: Vision, Applications and Research Challenges. Ad Hoc Networks 10, 7 (2012), 1497–1516. https://doi.org/10.1016/j.adhoc.2012.02.016
- Mittelstadt et al. (2012) Sebastian Mittelstadt, Michael Behrisch, Stefan Weber, Tobias Schreck, Andreas Stoffel, Rene Pompl, Daniel Keim, Holger Last, and Leishi Zhang. 2012. Visual Analytics for the Big Data Era - A Comparative Review of State-of-the-Art Commercial Systems. In Proceedings of IEEE Conference on Visual Analytics Science and Technology. http://ieeexplore.ieee.org/document/6400554/
- Mukherjee et al. (2013) Arijit Mukherjee, Swarnava Dey, Himadri Sekhar Paul, and Batsayan Das. 2013. Utilising Condor for Data Parallel Analytics in an IoT Context - An Experience Report. In Proceedings of the 9th IEEE International Conference on Wireless and Mobile Computing, Networking and Communications. https://doi.org/10.1109/WiMOB.2013.6673380
- Mukherjee et al. (2012) Arijit Mukherjee, Arpan Pal, and Prateep Misra. 2012. Data Analytics in Ubiquitous Sensor-based Health Information Systems. In Proceedings of the 6th International Conference on Next Generation Mobile Applications, Services, and Technologies. https://doi.org/10.1109/NGMAST.2012.39
- Mukherjee et al. (2014) Arijit Mukherjee, Himadri Sekhar Paul, Swarnava Dey, and Ansuman Banerjee. 2014. ANGELS for Distributed Analytics In IoT. In Proceedings of IEEE World Forum on Internet of Things. https://doi.org/10.1109/WF-IoT.2014.6803230
Ujjal Kumar Mukherjee and
Snigdhansu Chatterjee. 2014.
Fast Algorithm for Computing Weighted Projection Quantiles and Data Depth for High-Dimensional Large Data Clouds. InProceedings of the 2014 IEEE International Conference on Big Data. http://ieeexplore.ieee.org/document/7004358/
- Nastic et al. (2013) Stefan Nastic, Sanjin Sehic, Michael Vögler, Hong Linh Truong, and Schahram Dustdar. 2013. PatRICIA - A Novel Programming Model For Iot Applications On Cloud Platforms. In Proceedings of the 6th IEEE International Conference on Service-Oriented Computing and Applications. https://doi.org/10.1109/SOCA.2013.48
- Nechifor et al. (2014) Septimiu Nechifor, Anca Petrescu, Dan Puiu, and Bogdan Tarnauca. 2014. Predictive Analytics based on CEP for Logistic of Sensitive Goods. In Proceedings of the International Conference on Optimization of Electrical and Electronic Equipment. http://ieeexplore.ieee.org/document/6850965/
- Niewolny (2013) David Niewolny. 2013. How the Internet of Things Is Revolutionizing Healthcare. Technical Report. Freescale Semiconductor. 1–8 pages. http://cache.freescale.com/files/corporate/doc/white
- Office of National Statistics (2013) Office of National Statistics. 2013. Population and Household Estimates for the United Kingdom. Technical Report. https://goo.gl/dAUEjm
- O’Hara et al. (2012) Niall O’Hara, Marco Slot, Dan Marinescu, Jan Čurn, Dawei Yang, Mikael Asplund, Mélanie Bouroche, Siobhán Clarke, and Vinny Cahill. 2012. MDDSVsim: An Integrated Traffic Simulation Platform For Autonomous Vehicle Research. In Proceedings of the International Workshop on Vehicular Traffic Management for Smart Cities.
- Oxford English Dictionary (2017) Oxford English Dictionary. 2017. “analytics, n.”. (aug 2017). http://www.oed.com/view/Entry/273413
- Panetta (2017) Kasey Panetta. 2017. Top Trends in the Gartner Hype Cycle for Emerging Technologies, 2017. (2017). http://www.gartner.com/smarterwithgartner/top-trends-in-the-gartner-hype-cycle-for-emerging-technologies-2017/
- Panoply (2017) Panoply. 2017. Panoply Smart Data Warehouse. (2017). https://panoply.io/
- Paradigm4 (2014) Paradigm4. 2014. Leaving Data on the Table. Technical Report. http://goo.gl/6vBhk3
- Pavlo et al. (2017) Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C Mowry, Matthew Perron, Ian Quah, Siddharth Santurkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xian, and Tieying Zhang. 2017. Self-Driving Database Management Systems. In Proceedings of the 8th Biennial Conference on Innovative Data Systems Research. http://pelotondb.io/publications/
- Pereira et al. (2001) Fernando Pereira, John Lafferty, and Andrew Mccallum. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of 18th International Conference on Machine Learning. http://dl.acm.org/citation.cfm?id=655813
- Perera et al. (2013) Charith Perera, Arkady Zaslavsky, Peter Christen, Michael Compton, and Dimitrios Georgakopoulos. 2013. Context-aware Sensor Search, Selection And Ranking Model For Internet Of Things Middleware. In Proceedings of IEEE International Conference on Mobile Data Management. https://doi.org/10.1109/MDM.2013.46
- Perera et al. (2014) Charith Perera, Arkady Zaslavsky, Peter Christen, and Dimitrios Georgakopoulos. 2014. Context Aware Computing for the Internet of Things: A Survey. IEEE Communications Surveys and Tutorials 16, 1 (2014), 414–454. https://doi.org/10.1109/SURV.2013.042313.00197
- Pettey (2010) Christy Pettey. 2010. Gartner’s 2010 Hype Cycle Special Report. (2010). http://www.gartner.com/newsroom/id/1447613
- Pettey and Goasduff (2011) Christy Pettey and Laurence Goasduff. 2011. Gartner’s 2011 Hype Cycle Special Report. (2011). http://www.gartner.com/newsroom/id/1763814
- Pettey and van der Meulen (2012) Christy Pettey and Rob van der Meulen. 2012. Gartner’s 2012 Hype Cycle for Emerging Technologies. (2012). http://www.gartner.com/newsroom/id/2124315
- Piovesan et al. (2016) Nicola Piovesan, Leo Turi, Enrico Toigo, Borja Martinez, and Michele Rossi. 2016. Data Analytics For Smart Parking Applications. Sensors 16, 10 (2016), 1–25. https://doi.org/10.3390/s16101575
- Pivotal Inc. (2015) Pivotal Inc. 2015. Greenplum Database. (2015). http://pivotal.io/big-data/pivotal-greenplum-database
- Ploennigs et al. (2014) Joern Ploennigs, Anika Schumann, and Freddy Lécué. 2014. Adapting Semantic Sensor Networks for Smart Building Diagnosis. In Proceedings of the 13th International Semantic Web Conference. https://doi.org/10.1007/978-3-319-11915-1_20
- Prunicki (2009) Andrew Prunicki. 2009. Apache Thrift. Technical Report. Object Computing Inc. https://thrift.apache.org/
- Quilitz and Leser (2008) Bastian Quilitz and Ulf Leser. 2008. Querying Distributed RDF Data Sources With SPARQL. In Proceedings of the 5th European Semantic Web Conference. https://doi.org/10.1007/978-3-540-68234-9_39
- Quobyte (2017) Quobyte. 2017. Quobyte and XtreemFS. (2017). https://www.quobyte.com/containers
- Raicu et al. (2011) Ioan Raicu, Ian T. Foster, and Pete Beckman. 2011. Making a case for distributed file systems at Exascale. In Proceedings of the 3rd International Workshop on Large-scale System and Application Performance. 11. https://doi.org/10.1145/1996029.1996034
- Raicu et al. (2012) Ioan Raicu, Ian T Foster, and Pete Beckman. 2012. Making A Case For Distributed File Systems At Exascale. In Proceedings of the 3rd International Workshop On Large-scale System And Application Performance. https://doi.org/10.1145/1996029.1996034
- Rakhmawati and Hausenblas (2012) Nur Aini Rakhmawati and Michael Hausenblas. 2012. On the Impact of Data Distribution in Federated SPARQL Queries. In Proceedings of 6th IEEE International Conference on Semantic Computing. https://doi.org/10.1109/ICSC.2012.72
- Rakhmawati et al. (2013) Nur Aini Rakhmawati, Jürgen Umbrich, Marcel Karnstedt, Ali Hasnain, and Michael Hausenblas. 2013. Querying Over Federated SPARQL Endpoints - A State of the Art Survey. Technical Report. Digital Enterprise Research Institute. arXiv:1306.1723 https://arxiv.org/abs/1306.1723
- Ray (2015) Partha Pratim Ray. 2015. Towards An Internet Of Things Based Architectural Framework For Defence. In Proceedings of the 2015 International Conference on Control Instrumentation Communication and Computational Technologies. https://doi.org/10.1109/ICCICCT.2015.7475314
- Ray (2016) Partha Pratim Ray. 2016. A Survey on Internet of Things Architectures. Journal of King Saud University - Computer and Information Sciences (2016). https://doi.org/10.1016/j.jksuci.2016.10.003
- Razip et al. (2014) Ahmad Razip, Abish Malik, Shehzad Afzal, Matthew Potrawski, Ross Maciejewski, Yun Jang, Niklas Elmqvist, and David Ebert. 2014. A Mobile Visual Analytics Approach for Law Enforcement Situation Awareness. In Proceedings of the 2014 IEEE Pacific Visualization Symposium. https://doi.org/10.1109/PacificVis.2014.54
- Razzaque et al. (2016) Mohammad Abdur Razzaque, Marija Milojevic-Jevric, Andrei Palade, and Siobhán Cla. 2016. Middleware for Internet of Things: A Survey. IEEE Internet of Things Journal 3, 1 (2016), 70–95. https://doi.org/10.1109/JIOT.2015.2498900
- Rivera and Meulen (2013) Janessa Rivera and Rob Meulen. 2013. Gartner’s 2013 Hype Cycle for Emerging Technologies. (2013). http://www.gartner.com/newsroom/id/2575515
- Rivera and Meulen (2014) Janessa Rivera and Rob Meulen. 2014. Gartner’s 2014 Hype Cycle for Emerging Technologies. (2014). http://www.gartner.com/newsroom/id/2819918
- Rivera and Meulen (2015) Janessa Rivera and Rob Meulen. 2015. Gartner’s 2015 Hype Cycle for Emerging Technologies. (2015). http://www.gartner.com/newsroom/id/3114217
- Robak et al. (2013) Silva Robak, Bogdan Franczyk, and Marcin Robak. 2013. Applying Big Data and Linked Data Concepts in Supply Chains Management. In Proceedings of the Federated Conference on Computer Science and Information Systems. http://ieeexplore.ieee.org/document/6644169/
- Sagiroglu and Sinanc (2013) Seref Sagiroglu and Duygu Sinanc. 2013. Big Data: A Review. In International Conference on Collaboration Technologies and Systems. https://doi.org/10.1109/CTS.2013.6567202
- Saleem et al. (2014) Muhammad Saleem, Yasar Khan, Ali Hasnain, Ivan Ermilov, and Axel-Cyrille Ngonga Ngomo. 2014. A Fine-Grained Evaluation of SPARQL Endpoint Federation Systems. Semantic Web Journal 1 (2014), 1–5. http://www.semantic-web-journal.net/system/files/swj625.pdf
- Salpietro et al. (2015) Rosario Salpietro, Luca Bedogni, Marco Di Felice, and Luciano Bononi. 2015. Park Here! A Smart Parking System Based On Smartphones’ Embedded Sensors And Short Range Communication Technologies. In Proceedings of the 2015 IEEE World Forum on Internet of Things. https://doi.org/10.1109/WF-IoT.2015.7389020
- Sanchez et al. (2011) Luis Sanchez, Jose Antonio Galache, Veronica Gutierrez, Jose Manuel Hernandez, Jesus Bernat, Alex Gluhak, and Tomas Garcia. 2011. SmartSantander: The Meeting Point Between Future Internet Research and Experimentation and the Smart Cities. In Proceedings of the Future Network & Mobile Summit. http://ieeexplore.ieee.org/document/6095264/
- Satyanarayanan et al. (2009) Mehadev Satyanarayanan, Paramvir Bahl, Ramon Caceres, and Nigel Davies. 2009. The Case for VM-Base Cloudlets in Mobile Computing. Pervasive Computing 8 (2009), 14–23. https://doi.org/10.1109/MPRV.2009.82
- Schnizler et al. (2014) Francois Schnizler, Thomas Liebig, Shie Mannor, Gustavo Souto, Sebastian Bothe, and Hendrik Stange. 2014. Heterogeneous Stream Processing for Disaster Detection and Alarming. In Proceedings of the 2014 IEEE International Conference on Big Data. http://ieeexplore.ieee.org/document/7004323/
- Schonwalder et al. (2010) Jurgen Schonwalder, Martin Bjorklund, and Phil Shafer. 2010. Network Configuration Management Using NETCONF and YANG. IEEE Communications Magazine 48, 9 (2010), 166–173. https://doi.org/10.1109/MCOM.2010.5560601
- Schwarte et al. (2011) Andreas Schwarte, Peter Haase, Katja Hose, Ralf Schenkel, and Michael Schmidt. 2011. FedX: Optimization Techniques For Federated Query Processing On Linked Data. In Proceedings of the 10th International Semantic Web Conference. https://doi.org/10.1007/978-3-642-25073-6_38
- Sethi and Sarangi (2017) Pallavi Sethi and Smruti R Sarangi. 2017. Internet of Things: Architectures, Protocols, and Applications. Journal of Electrical and Computer Engineering 2017 (2017). https://doi.org/10.1155/2017/9324035
- Sharma et al. (2010) Rajeev Sharma, Peter Reynolds, Rens Scheepers, Peter B Seddon, and Graeme G Shanks. 2010. Business Analytics and Competitive Advantage: A Review and a Research Agenda. In Bridging the Socio-technical Gap in Decision Support Systems: Challenges for the Next Decade. IOS Press, 187–198.
- Shmueli et al. (2017) Galit Shmueli, Peter C Bruce, Inbal Yahav, Nitin R Patel, and Kenneth C Lichtendahl Jr. 2017. Data Mining for Business Analytics: Concepts, Techniques, and Applications in R. John Wiley & Sons. https://doi.org/978-1-118-87936-8
- Shmueli and Koppiu (2010) Galit Shmueli and Otto Koppiu. 2010. Predictive Analytics in Information Systems Research. Robert Smith Research (2010), 06–138. https://ai.arizona.edu/sites/ai/files/MIS611D/shmueli-2011-predictiveanalytics-is-research.pdf
- Shtykh and Suzuki (2014) Roman Y Shtykh and Toshihiro Suzuki. 2014. Distributed Data Stream Processing with Onix. In Proceedings of the 4th IEEE International Conference on Big Data and Cloud Computing. https://doi.org/10.1109/BDCloud.2014.54
- Sill et al. (2011) Steve Sill, Blake Christie, Ann Diephaus, Dan Garretson, Kay Sullivan, and Susan Sloan. 2011. Intelligent Transportation Systems (ITS) Standards Program Strategic Plan. Technical Report. U.S. Department of Transportation.
- Siow et al. (2017) Eugene Siow, Thanassis Tiropanis, and Wendy Hall. 2017. Ewya: An Interoperable Fog Computing Infrastructure with RDF Stream Processing. In Proceedings of the 4th International Conference on Internet Science. https://eprints.soton.ac.uk/412749/
- Stankovic (2014) John A Stankovic. 2014. Research Directions for the Internet of Things. IEEE Internet of Things Journal 1, 1 (2014), 3–9. https://doi.org/10.1109/JIOT.2014.2312291
- Sun et al. (2013) Guo-Dao Sun, Ying-Cai Wu, Rong-Hua Liang, and Shi-Xia Liu. 2013. A Survey Of Visual Analytics Techniques And Applications: State-of-the-art Research And Future Challenges. Journal of Computer Science and Technology 28, 5 (2013), 852–867. https://doi.org/10.1007/s11390-013-1383-8
- Teradata (2015) Teradata. 2015. Teradata Database. (2015). http://goo.gl/hLPwIV
- Tosatto et al. (2015) Andrea Tosatto, Pietro Ruiu, and Antonio Attanasio. 2015. Container-Based Orchestration in Cloud: State of the Art and Challenges. In Proceedings of the 9th International Conference on Complex, Intelligent and Software Intensive Systems. IEEE. https://doi.org/10.1109/CISIS.2015.35
- Tukey (1962) John W Tukey. 1962. The Future of Data Analysis. Annals of Mathematical Statistics 33, 1 (1962), 1–67. https://doi.org/10.1214/aoms/1177704711
- Turban et al. (2014) Efraim Turban, Ramesh Sharda, and Dursun Delen. 2014. Businesss Intelligence and Analytics: Systems for Decision Support. Pearson. http://catalogue.pearsoned.co.uk/educator/product/Business-Intelligence-and-Analytics-Systems-for-Decision-Support-Global-Edition/9781292009209.page
- Ubeam (2017) Ubeam. 2017. ubeam. (2017). http://ubeam.com/
- van Nunen et al. (2012) Ellen van Nunen, Maurice Kwakkernaat, Jeroen Ploeg, and Bart Netten. 2012. Cooperative Competition for Future Mobility. IEEE Transactions on Intelligent Transportation Systems 13, 3 (2012), 1018–1025. https://doi.org/10.1109/TITS.2012.2200475
- Vargheese and Dahir (2014) Rajesh Vargheese and Hazim Dahir. 2014. An IoT / IoE Enabled Architecture Framework for Precision On Shelf Availability. In Proceedings of the IEEE International Conference on Big Data. http://ieeexplore.ieee.org/document/7004418
- Varian (2009) Hal Varian. 2009. How the Web Challenges Managers. (jan 2009). http://www.mckinsey.com/industries/high-tech/our-insights/hal-varian-on-how-the-web-challenges-managers
- Verdouw et al. (2013) Cor Verdouw, Adrie Beulens, and Jack van der Vorst. 2013. Virtualisation Of Floricultural Supply Chains: A Review From An IoT Perspective. Computers and Electronics in Agriculture 99 (2013), 160–175. https://doi.org/10.1016/j.compag.2013.09.006
- Vermesan and Friess (2013) Ovidiu Vermesan and Peter Friess. 2013. Internet of Things: Converging Technologies for Smart Environments and Integrated Ecosystems. River Publishers. https://doi.org/10.2139/ssrn.2324902
- Vermesan and Friess (2014) Ovidiu Vermesan and Peter Friess. 2014. Internet of Things – From Research and Innovation to Market Deployment. Vol. 6. River Publishers. arXiv:arXiv:1308.4501v1 https://www.riverpublishers.com/book
- Villari et al. (2014) Massimo Villari, Antonio Celesti, and Maria Fazio. 2014. AllJoyn Lambda : An Architecture for the Management of Smart Environments in IoT. In Proceedings of 2014 International Conference on Smart Computing Workshops. http://ieeexplore.ieee.org/document/7046676/
- Walport (2014) Mark Walport. 2014. The Internet of Things: Making the Most of the Second Digital Revolution. Technical Report. The United Kingdom Government Office for Science. https://www.gov.uk/government/publications/internet-of-things-blackett-review
- Wang et al. (2013) Xin Wang, Thanassis Tiropanis, and Hugh C Davis. 2013. LHD: Optimising Linked Data Query Processing Using Parallelisation. In Proceedings of the Workshop on Linked Data on the Web. https://eprints.soton.ac.uk/350719/
- Weil et al. (2006) Sage A Weil, Scott A Brandt, Ethan L Miller, and Darrell D E Long. 2006. Ceph : A Scalable , High-Performance Distributed File System. In Proceedings Of The 7th Symposium on Operating Systems Design and Implementation. https://dl.acm.org/citation.cfm?id=1298485
- Wolf (2017) Marilyn Wolf. 2017. The Physics of Event-Driven IoT Systems. IEEE Design and Test 34, 2 (2017), 87–90. https://doi.org/10.1109/MDAT.2016.2631082
- World Economic Forum (2012) World Economic Forum. 2012. The Global Information Technology Report 2012 Living in a Hyperconnected World. Technical Report. 441 pages.
- Xin et al. (2013) Reynold S Xin, Joseph E Gonzalez, Michael J Franklin, and Ion Stoica. 2013. GraphX: A Resilient Distributed Graph System on Spark. In Proceedings of the 1st International Workshop on Graph Data Management Experiences and Systems. ACM Press, New York, New York, USA. https://doi.org/10.1145/2484425.2484427 arXiv:1402.2394
- Xu et al. (2014a) Lida Xu, Wu He, and Shancang Li. 2014a. Internet of Things in Industries: A Survey. IEEE Transactions on Industrial Informatics PP, 4 (2014), 1–11. https://doi.org/10.1109/TII.2014.2300753
- Xu et al. (2014b) Xiaomin Xu, Sheng Huang, Yaoliang Chen, Kevin Brown, Inge Halilovic, and Wei Lu. 2014b. TSaaaS : Time Series Analytics As A Service On IoT. In Proceedings of the IEEE International Conference on Web Services. https://doi.org/10.1109/ICWS.2014.45
- Xu et al. (2017) Zheng Xu, Yunhuai Liu, Hui Zhang, Xiangfeng Luo, Lin Mei, and Chuanping Hu. 2017. Building the Multi-Modal Storytelling of Urban Emergency Events Based on Crowdsensing of Social Media Analytics. Mobile Networks and Applications 22, 2 (2017), 218–227. https://doi.org/10.1007/s11036-016-0789-2
- Yang et al. (2015) Fan Yang, Nelson Matthys, Rafael Bachiller, Sam Michiels, Wouter Joosen, and Danny Hughes. 2015. uPnP: Plug and Play Peripherals for the Internet of Things. In Proceedings of the 10th European Conference on Computer Systems. https://doi.org/10.1145/2741948.2741980
- Ye et al. (2013) Feng Ye, Zhi-Jian Wang, Fa-Chao Zhou, Ya-Pu Wang, and Yuan-Chao Zhou. 2013. Cloud-Based Big Data Mining & Analyzing Services Platform Integrating R. In Proceedings of the 2013 International Conference on Advanced Cloud and Big Data. https://doi.org/10.1109/CBD.2013.13
- Yick et al. (2008) Jennifer Yick, Biswanath Mukherjee, and Dipak Ghosal. 2008. Wireless Sensor Network Survey. Computer Networks 52 (2008), 2292–2330. https://doi.org/10.1016/j.comnet.2008.04.002
- Yin et al. (2011) Jian Yin, Anand Kulkarni, Sumit Purohit, Ian Gorton, and Bora Akyol. 2011. Scalable Real Time Data Management For Smart Grid. In Proceedings of the Middleware 2011 Industry Track Workshop. https://doi.org/10.1145/2090181.2090182
- Zaharia et al. (2012) Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, and Ankur Dave. 2012. Resilient Distributed Datasets: A Fault-tolerant Abstraction For In-memory Cluster Computing. Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation (2012). https://doi.org/10.1111/j.1095-8649.2005.00662.x
- Zanella et al. (2014) Andrea Zanella, Nicola Bui, Angelo P Castellani, Lorenzo Vangelista, and Michele Zorzi. 2014. Internet of Things for Smart Cities. IEEE Internet of Things Journal 1, 1 (2014), 22–32. https://doi.org/10.1109/JIOT.2014.2306328
- Zhou et al. (2014) Zhihua Zhou, Nitesh V Chawla, Yaochu Jin, and Graham J Williams. 2014. Big Data Opportunities and Challenges: Discussions from Data Analytics Perspectives. IEEE Computational Intelligence Magazine 9, 4 (2014), 62–74. https://doi.org/10.1109/MCI.2014.2350953
- Ziekow and Jerzak (2014) Holger Ziekow and Zbigniew Jerzak. 2014. The DEBS 2014 Grand Challenge. Proceedings of the 8th ACM International Conference on Distributed Event-based Systems (2014). https://doi.org/10.1145/2611286.2611333