Scientific communities are experiencing a significant increase in the availability of empirical data due to the falling cost of sensors along with the growing ease of sensor deployment and with sensor data distribution over the internet. The same communities are also experiencing increasing pressure from a variety of stakeholders to see their empirical data consolidated, analyzed, and used to explain a broad range of unanswered scientific problems. However, this consolidation and analysis presents challenges since scientists are not yet fully equipped to understand the quality and semantics of scientific measurements with the data and often limited annotations typically available today. Many voice a strong need for a comprehensive vocabulary capable of encoding and supporting systematic understanding of metadata about empirical data, which would enable sound integration of empirical data.
We present the Human-Aware Sensor Network Ontology (HASNetO) that is a comprehensive alignment and integration of well-established ontologies for encoding scientific sensing infrastructures, scientific observations, and provenance. The integrated ontology is available at http://hadatac.org/ont/hasneto. Supporting ontologies for HASNetO and previous versions of HASNetO can also be found at http://hadatac.org. A comprehensive infrastructure for managing HASNetO-based knowledge bases, which is not discussed in this paper, is available at https://github.com/paulopinheiro1234/hadatac. One of the immediate benefits of HASNetO is its capability of describing comprehensive knowledge graphs about empirical data.
One of the immediate benefits of HASNetO is its capability of describing comprehensive knowledge graphs about empirical data.
We have used this graph to systematically annotate and amplify the relevance of scientific measurements stored in database systems, in support of three major projects: Jefferson Project [jefferson], Center for Architectural Sciences and Ecology’s Build Ecology Program for the City of New York111http://www.case.rpi.edu/page/academics.php, and for Smart City activities in Fortaleza, Brazil [santos_contextual_2015]. Throughout these projects, more than eighty scientists in multiple disciplines are exposed to a new generation of graph-enabled tools for retrieving their data, for retrieving the data from other scientists, for retrieving their data in combination with data from other scientists, and to understand the meaning of data retrieved through complex queries, whether the data has been measured by them or by other scientists.
The rest of this paper is organized as follows. In Section 2, we use a diagram to discuss a typical scenario where empirical data is generated and managed. Section 3 presents a categorization of knowledge related to scientific measurements that is often described as measurement metadata. In Section 4, we introduce the Human-Aware Sensor Network Ontology that provides concepts and relationships used to encode the knowledge discussed in Section 3. In Section 5, we compare our work on HASNetO with other initiatives. A more comprehensive discussion about the current impact of our HASNetO work including future work is described in Section 6. Finally, we summarize our work in Section 7.
2 A Typical Empirical Data Collection Scenario
Empirical data are often collected with the use of instruments that are manually operated by scientists, and sensor networks that are automatically operated but that are still deployed, calibrated and maintained manually.
The three faces depicted in Fig. 1 represent human roles (or just roles) in a data collection scenario. The Scientist role is connected to a Technician role representing the fact that scientists interact with technicians to communicate their needs in terms of how sensor networks are required to be set up and maintained. Human roles in Figure 1 can be performed by any combination of people and roles.
3 Knowledge Behind Measurement Data
Figure 1 contains ten diamonds that label knowledge related to typical data collection scenarios. The knowledge identified by these diamonds, which is explained in this section, is often captured and recorded as metadata for the empirical data. One of the assumptions behind such recording is that collected metadata enable data understanding without the need of any explanation from scientists directly involved with the data collection. There are four categories of diamonds; each of which is represented by the number inside the diamonds. Below we describe each of these four diamond categories.
3.1 Available Measurement Infrastructure
In Figure 1, Diamond Category “1” represents knowledge about available measurement infrastructure. Scientists conceptually understand the configuration and capabilities of measurement infrastructures, including which instruments and detectors are available to them, which platforms these instruments can be or are deployed to, where stationary platforms are located, which paths are taken by mobile platforms, and what physical, chemical, biological and sociological properties the sensors are capable of measuring. Assuming that the knowledge about instruments and sensor networks may affect empirical data understanding, scientists are expected to share their measurement infrastructure knowledge by encoding such knowledge as measurement data’s metadata.
3.2 Calibrations, Configurations and Deployments of Instruments and Detectors
In Figure 1, Diamond Category “2” represents knowledge about a broad range of human interventions that may affect the quality of measurement data. Knowledge about measurement infrastructure is not nearly enough to explain data generated by instruments in such infrastructures. For instance, many are the factors/events that may affect the way measurements are performed, which are not included in the knowledge about the measurement infrastructure itself. When scientists are operating scientific instruments in isolation, it is evident the importance of documenting how the instruments were operated. More challenging is the process of explaining human interventions in sensor networks, which is often regarded as an automated infrastructure for the collection of scientific data. For example, a badly deployed instrument, e.g., an instrument that is not properly attached to the surface of the deploying platform, can create measurements that are off by a fixed amount, or even worse, that may not be able to execute any measurement, e.g., because the chord providing power to the instrument is not properly connected.
3.3 Scientific Annotations of Measurements
In Figure 1, Diamond Category “3” represents knowledge about what has been measured, and how the measurement has been represented in terms of units. When measurements occur, these are measurements of physical, chemical, biological, cultural, and social properties of so-called entities of interests. For example, using Air as an entity of interest, we can say that the air temperature is a physical property of air and that the CO concentration of the air is a chemical property of the same entity of interest. For data understanding, it is important for one to know the properties are that are being measured, e.g., temperature and CO concentration, and what entities of interest are behind these properties, e.g., air. Moreover, it is important to understand the unit used to represent the measurements and the semantic context, (e.g., air is ‘outside air’ as opposed of air inside of a room, a lab or a shelter) that may affect, for instance, the actual measurement of both air temperature and air concentration of CO.
3.4 Provenance of Sensor Network Activities
In Figure 1, Diamond Category “4” represents knowledge about the provenance of both human interventions, as well as of each measurement. For each measurement, it is important to know when and where the measurement was done. What was the combination of sensing devices used to support the measurement? Was any configuration parameter provided to the sensing devices to allow the devices to operate the way they were operating at the time the measurements were done?
4 HASNetO: The Human-Aware Sensor Network Ontology
HASNetO aims to provide the concepts and vocabularies needed to encode empirical data’s metadata as identified and described in Section 3. HASNetO is built on top of three ontologies that were integrated and extended under the single name of HASNetO: The Extensible Observation Ontology (OBOE) [oboe], the Virtual Solar Terrestrial Observatory (VSTO)222The vstoi namespace refers to the instrument portion of the VSTO ontology family. [vsto], and the World Wide Web’s Provenance Ontology (PROV-O) [prov].
4.1 Encoding Knowledge about Sensor Networks and Individual Instruments
HASNetO contains content related to sensor networks although it does not have a Sensor concept. We observe that the term sensor is used to refer to detectors, instruments, and often to combinations of detectors and instruments. To avoid further confusion, HASNetO advocates for the use of the terms detectors and instruments knowing that it may be difficult to perceive which part of a device is a detector (or detectors) and which part is an instrument. For instance, a thermometer may include an embedded, non-detachable detector. HASNetO breaks down the elements of measuring infrastructures into three categories, as shown in Fig. 2.
vstoi:Platform: An object that keeps the instrument in a specific location to ensure that it is recording data about the selected location. A platform may also provide overhead services, such as providing power to the instrument and a data connection. Sometimes a platform is mobile like a plane or a person, or stationary like a tower of a weather station.
vstoi:Instrument: n object that receives sensed signals from detectors and processes these signals into numerical values. For example, consider a tipping bucket rain gauge. Inside the tipping bucket rain gauge is a magnet-based detector that detects when the bucket tips. However, in order for this signal to be meaningful, the detector needs a bucket with a known diameter, a funnel to direct water into the bucket, etc. Together, these make up the instrument.
vstoi:Detector: An object that it is capable of sensing environmental properties by collecting physical signals about these properties, translating these physical signals into (most often electrical) signals, and forwarding these electrical signals to instruments. Transducer is another name for detector. Detector metadata are collected because detectors may be interchangeable, that is they can be removed from one instrument and plugged into another.
vstoi:Deployment: An activity of physically deploying an instrument and its attached detectors to a platform. This activity indicates that a single instrument is ready to start collecting data.
For more sophisticated devices, detectors and instruments are sometimes available as distinct hardware components, and thus easier to be mapped into HASNetO concepts. For ordinary instruments, it may be appropriate to make explicit the existence of attached detectors since properties like measurement accuracy and measurement ranges, which are detector’s properties, are not listed as instrument properties.
Fig. 2 also shows that OBOE provides concepts for describing entities of interest and their measured properties. More specifically, measurements are of properties of entities of interests. These measured properties are called oboe:Characteristics. These oboe terms are listed below along with their original definitions.
oboe:Entity “denotes a concrete or conceptual object that has been observed (e.g., a tree, a community, an ecological process).”
oboe:Characteristic “represents a property of an entity that can be measured (e.g., height, length, or color).”
4.2 Encoding Knowledge about Measurements
Imagine two data sets of “air temperature” measurements obtained from a common weather station thermometer and using Celsius to represent measured values. These measurements still could use different hardware and software configurations or calibrations for the platform and observing agent – in this case the weather station and the thermometer respectively, thereby making the measurements difficult to compare or use in combination. For example, during one use of the thermometer, it was calibrated to operate in the [0,20] range when the actual temperature was in the operation range. During another use of the thermometer, it was still calibrated to operate in the [0,20] range although the actual temperature was in the [-10,10
]. As a result of a bad calibration decision, the thermometer ended up generating data that may be classified as of low quality. OBOE is aware of the impact of context in observation data management, which is why the ontology provides a context concept. The notion of context in OBOE provides a start for encoding context, however it does not include descriptions of what constitutes a context property, and more importantly, what does not constitute a context property.
oboe:Measurement is an assertion that a characteristic of an entity was measured and/or recorded. A measurement is also composed of a value, a measurement standard, and a precision (associated with the measured value). Measurements also encapsulate characteristics that were recorded, but that were not necessarily measured in a physical sense. For example, the name of a location and a taxon can be captured through measurements.
oboe:Standard defines a reference for comparing or naming entities via a measurement. A standard can be defined intentionally (e.g., as in the case of units) or extensionally (by listing the values of the standard, e.g., for color this might be red, blue, yellow, etc).
hasneto:DataCollection defines the technical activity of the collection of data that is empirically observed. So far, the state of the art of semantics for observations and measurements characterizes this activity as an Observation. The HASNetO ontologies take the position that the concept of Observation is a scientific activity while most if not all existing ontologies embody the position that describe the technical activity of data collection.
oboe:Observation represents an ‘observed entity’ that is, an entity that was observed by an observer. An observation often consists of measurements that refer to one or more measured characteristics of the observed entity.
4.3 Encoding Knowledge about Human Interventions
Provenance knowledge is an important part of contextual knowledge that is often not fully captured in many scientific applications. HASNetO is a major beneficiary of all the previous work developed by the provenance community in defining a truly general-purpose vocabulary for provenance, which is the W3C PROV language [prov, provxg]. In terms of empirical data, we use provenance any time we have technical activities in support of scientific activities that may affect measurement data. For example, humans are often heavily involved in technical activities such as instrument deployments, platform maintenance, instrument and detector’s calibration and soon. This human involvement in the scientific process often is not encoded, yet it can impact measurements and their interpretations.
Fig. 4 shows how vstoi:Deployment and hasneto:DataCollection, which are two of the most important technical activities related to empirical data, are defined as subclasses of prov:Activity. These two subclasses of prov:Activity have been discussed previously. Below, we briefly describe prov:Activity and its two complementary classes prov:Agent and prov:Entity.
prov:Activity is “how PROV entities come into existence and how their attributes change to become new entities, often making use of previously existing entities to achieve this.”
prov:Agent “takes a role in an activity such that the agent can be assigned some degree of responsibility for the activity taking place. An agent can be a person, a piece of software, an inanimate object, an organization, or other entities that may be ascribed responsibility. When an agent has some responsibility for an activity, PROV says the agent was associated with the activity, where several agents may be associated with an activity and vice-versa.” In HASNetO terms, we see that some prov:Activity instances in support of data collection are mainly performed by humans while others are mainly performed by machines. However, it can be challenging and in fact unnecessary to classify these activities as long we can fully describe the exact involvement of each agent in each activity, including the fact that the agent is a human or a machine.
prov:Entity is defined as “physical, digital, conceptual, or other kinds of thing.” In HASNetO, prov:Entity is used to represent, for instance, samples that have been collected and that are going to be further analyzed in a lab, that is where scientific measurements and data collections occur. Also, prov:Entity is used to specify any information that is fed into an platform, instrument or detector that change the behavior of any of these measuring devices. Finally, while an instance of prov:Entity may be an instance of oboe:Entity and vice-versa, we prefer to treat them separate considering their distinct roles in scientific activities.
5 Related Work
Ongoing research activities in support of semantic sensor networks make use of the description of instruments and detectors (many times called just “sensors” in the literature) to maintain complex networks of sensors, while providing integration of the collected data. In [compton_survey_2009], twelve different sensor network ontologies are studied and compared. The authors concluded that no ontology (or combination of ontologies) at that time was able to describe properties required for the stipulated capabilities of sensor networks. This work preceded the W3C’s Semantic Sensor Network Ontology (SSN) [compton_ssn_2012]. SSN is an ontology that aims to describe sensors, observations and related concepts, like sensor capabilities, measurement processes and deployments. SSN provides vocabulary capable of annotating data in a manner that makes it possible to determine if data are coming from a certain sensor, and if they are using some specific process to measure a certain property of an entity of interest. BOnSAI [stavropoulos_bonsai:_2012] and SESAME Meter Data Ontology [fensel_sesame-s:_2012] are other sensor network ontologies that are focused on smart buildings. Despite the capability of describing the tracking of single measurements, those ontologies are not concerned with the linking of measurements to units or entities of interest. Although the ontologies mentioned above are capable of describing sensor networks used to collect data, SSN does not rely on standard provenance approaches, like the W3C’s PROV, and thus are limited when they attempt to describe human interventions to sensor networks. Besides that, the SSN ontology does not provide any software framework describing how the vocabulary should be used to enable management of empirical data. BOnSAI and SESAME are not scientific centric ontologies. They are unable to track human interventions to the network by means of deployments, calibrations or sensor settings, and are also unable to explain the implication of these interventions on empirical data quality.
The concept of Observation data is treated in the literature [quine_stimulus_1995] [stasch_stimulus-centric_2009] [probst_ontological_2006] as data that are obtained while sensing some property of an entity from the real world. The result of an observation is a value for that property [usbeck_combining_2014]. Content annotation is crucial when dealing with observation data (do they talk about data quality, and more specifically, how to differentiate measurements when they are from distinct data collections, i.e., distinct calibrations, setting, etc.?). It enables some level of interoperability and discoverability, making the data easier to be used. To leverage this potential, several approaches exist to both model the infrastructure that generates the data and to describe data content and context.
O&M [cox_observations_2011] is an XML implementation from the Open Geospatial Consortium (OGC) that defines a schema for modeling observations and their results. In [kuhn_functional_2009], an observation and measurement ontology is proposed that makes use of OGC’s definitions. OBOE (The SEEK Extensible Observation Ontology) is an ontology focused primarily on ecology that provides a data model that can capture measurement semantics and that can be used to streamline data integration. To achieve this goal, the OBOE ontology contains concepts and relationships for describing observational datasets.
In other initiatives to annotate scientific data, VSTO provides a data framework for ontology based discovery of datasets across the fields of solar physics, space physics and solar-terrestrial physics from multiple repositories.
6.1 Systematic Evaluation of HASNetO by Scientists
One strength of HASNetO comes from the fact that OBOE, PROV and VSTO are mature community-developed ontologies. For instance, OBOE was initiated by an NSF-funded project and has evolved through a number of sponsored research projects. PROV is a recommended standard from W3C endorsed by academic organizations and industry. VSTOI is a by-product of the VSTO ontology [vsto] which was funded by NSF and NASA awards and, has been influential in the development of Woods Hole’s BCO-DMO Ontology currently used by a large oceanographic community [bcodmo].
At the RPI Tetherless World Constellation in support of the Jefferson Project developed in collaboration between IBM, Rensselaer Polytechnic Institute (RPI), and The FUND for Lake George [jefferson];
At RPI’s Center for Architect, Science and Ecology in support of large empirical observations and experiments in the areas of urban ecology;
At the Universidade of Fortaleza’s Smart City Center where scientific observations are conducted to understand the use of city’s resources in support of mass transportation.
6.2 Future Work
The Human-Aware Science Ontologies (HAScO) is a family of ontologies. HAScO itself is a high-level ontology that describes scientific activities along with supporting technical activities. Within HAScO, data collections are defined as technical activities in support of empirical and simulated data. HASNetO is the HAScO ontology that provides a vocabulary for encoding knowledge about empirical data collection. One overarching goal (and challenge) for HAScO is to provide a vocabulary small enough that domain scientists are comfortable using it, but still rich enough for use in explaining complex relationships involved in the combined used of empirical and computational scientific activities.
The Human-Aware Sensor Network Ontology (HASNetO) was established as an integrated and comprehensive vocabulary for encoding knowledge related to scientific measurements and their derived empirical data. HASNetO aligns and resolves conflicts from the integration of three community-developed and community-maintained ontologies for observation, sensing, and provenance. Contributions include the identification of appropriate covering ontologies, the alignment between them, the gap analysis, and the gap filling. One key gap that HASNetO filled relates to providing terms for modeling human interventions related to empirical activities. Sensor deployment and data collection are examples of such human interventions. The exact interpretation of the Observation concept from the OBOE Ontology, and its meaning in terms of data collection was clarified with the creation of a HASNetO concept called DataCollection. This is the actual act of collecting data in the context of scientific activities such as an OBOE Observation itself and empirical experiments. It is also worth mentioning that HASNetO clarifies the use of the term “sensor” in its description of a sensing infrastructure, and when it is compared against competing efforts.
Finally, a full explanation of human interventions in measurements generating empirical data is provided by the provenance of empirical data, which is defined as a result of combinations of activities such as VSTO Deployment and HASNetO Data Collection. Moreover, VSTO Deployment and HASNetO Data Collection are defined as PROV Activity’s specializations.
The third author is supported by CNPq - Brazil - Science Without Borders scholarship.