IoT devices  are becoming increasingly common in research laboratories , where they promise to transform experimentation by enabling the concurrent measurement of many experimental characteristics in real time. Integrating IoT data sources presents exciting new opportunities to attain previously unachievable insights into the state of an ongoing experiment, and to steer and optimize instruments in real time. However, these opportunities also come with new challenges. The many different sensor types results in data streams with widely disparate rates, volumes, and velocities. These streams must be dynamically aggregated, transformed, and analyzed in order to perform in situ analysis—tasks that are infeasible for human operators.
Machine Learning (ML) has been shown to be an effective tool for analyzing big IoT data streams . However, deploying and using ML models can require the use of specialized methods, software, and environments . In addition, the deluge of data generated by IoT devices and the addition of new sensors can quickly exceed the processing capabilities of computing resources colocated with experimental apparatus. Achieving near-real-time analysis to enable online feedback can require the use of remote high performance computing (HPC) systems and specialized ML accelerators.
The Manufacturing Data and Machine Learning (MDML) platform standardizes the research and operational environment for advanced data analytics to enable automated, ML-driven optimization. MDML is designed to support in situ measurements for accelerating scalable materials manufacturing and to enable the integration of ML and HPC resources into the experimental process. MDML enables users to construct rich, data-oriented analysis pipelines that span disparate computational environments, from laptops and local servers to supercomputers and clouds. Finally, MDML leverages industry standard visualization and monitoring tools to create dynamic interfaces to experimental facilities and their processing pipelines.
We will demonstrate MDML and its application to the Manufacturing Engineering Research Facility’s (MERF) combustion synthesis research project, in which ML-guided steering is used to enable the high-throughput manufacturing of nanomaterials. In particular, we will showcase data being streamed from multiple combustion chamber sensors into MDML, and then processed in near real time to steer the experimental configuration. The demonstration will make use of local MERF servers and Argonne National Laboratory’s Theta supercomputer, while visualizing the state of the experiment and data processing in a user-friendly interface.
Ii The MDML Platform
The design of the MDML platform is motivated by the unique needs and challenging demands of MERF’s scale-up research and development. MDML is designed to enable scientists to dynamically integrate scientific IoT device data during high throughput experiments to guide and automatically optimize experiments. As such, MDML simplifies the aggregation, analysis, and application of ML to IoT data streams with almost arbitrary data rates and volumes
, designed for connected IoT assets, to integrate in situ measurements from the experimental stations at the MERF. MQ Telemetry Transport (MQTT) protocol message queues are used to deliver IoT data streams at disparate rates, which MDML then aggregates, preprocesses, and delivers to analysis tasks and ML models. Any number of sensors or instrument groups can be created inside MDML. Data sets are treated as time series data and MDML’s data fusion capabilities enable diverse sources to be aggregated based on customizable batching rules. Thus, MDML is a uniquely powerful environment for reinforcement learning and leveraging deep neural networks to sense key events and steer experiments.
MDML is designed as a portable environment that can be configured to use a wide variety of available computing resources, from edge to exascale, depending on computational demands and problem complexity. MDML can also use commercial cloud environments such as AWS. To achieve this we have integrated the funcX  function serving platform into MDML by implementing custom Node-RED pallets for funcX (and for Globus  for data transfer) that can be used in MDML pipelines. We extend the MDML client with a Globus Auth  native client to enable secure offloading of computational tasks to different computing resources. Globus Auth enables users to authenticate and to interact securely with MDML; MDML can then introspect the user’s tokens and acquire new tokens to request actions (such as data transfers and funcX invocations) on the user’s behalf.
Instrument data are packaged in envelopes when streamed to MDML via MQTT topics. These packages, which describe the experiment and data source, are used by MDML to route the data through the processing pipeline. Once processed, data are compressed and archived in a persistent object store. The archived datasets contain descriptions of the streamed data to inform collaborators and enable post situ analyses.
MDML provides dashboards as a gateway to data generated by an experiment and to support the building of analysis pipelines. MDML employs the industry standard Grafana dashboard as a customizable platform to create dynamic and interactive visualization experiences. MDML users develop their own dashboards to visualize sensor measurements, the state of analysis tasks, and results as data are processed. This not only allows researchers to dissect their data in real-time but also acts as a monitoring tool for critical experimental data.
Our demonstration will showcase MDML’s application to a combustion experiment for high-throughput manufacturing of nanomaterials at Argonne’s MERF. This experiment manufactures nanomaterials in high volumes using flame spray pyrolysis (FSP) , a versatile process that allows for commodity-scale production of a broad range of nanomaterials. The FSP instrument includes multiple sensors, including a Planar Laser Induced Fluorescence (PLIF) diagnostic system that uses a tunable laser light sheet to characterize flame chemistries, spectroscopy to determine the contents of the exhaust and particle size distribution of the resulting nanomaterials. These sensors generate data at vastly different volumes and rates. For example, PLIF data can be generated every 50ms, whereas spectroscopy and particle size results are integrated every few minutes.
We will demonstrate the analysis pipeline used to perform near-real time quality control of the flame’s stability. We will show how MDML enables scientists to both monitor the state of the experiment and steer the evolution of the flame. Using local resources for rapid quality control, HPC resources for large-scale analysis, and ML models to integrate diverse data types, we will use MDML to guide flame stability. Finally, we will visualize experiments in real time via an interactive interface built on the industry standard Grafana platform.
This work was supported by the U.S. Department of Energy, Office of Science, under contract DE-AC02-06CH11357.
-  (2016) Globus: recent enhancements and future plans. In XSEDE Conference, pp. 27. Cited by: §II.
-  (2019) DLHub: model and data serving for science. In Intl Parallel and Distributed Processing Symp., pp. 283–292. Cited by: §I.
-  (2019) Serverless supercomputing: high performance function as a service for science. arXiv preprint arXiv:1908.04907. Cited by: §II.
-  (2017) Big IoT data analytics: architecture, opportunities, and open research challenges. IEEE Access 5, pp. 5247–5261. Cited by: §I.
-  (Website) Note: https://nodered.org/. Accessed Nov 30, 2019 Cited by: §II.
-  (2018) IoT for real-time measurement of high-throughput liquid dispensing in laboratory environments. SLAS Technology 23 (5), pp. 440–447. Cited by: §I.
-  (2014) A survey of Internet-of-Things: future vision, architecture, challenges and services. In IEEE World Forum on Internet of Things (WF-IoT), pp. 287–292. Cited by: §I.
-  (2010) Flame spray pyrolysis: an enabling technology for nanoparticles design and fabrication. Nanoscale 2 (8), pp. 1324–1347. Cited by: §III.
-  (2016) Globus Auth: a research identity and access management platform. In 12th Intl Conf. on e-Science, Vol. , pp. 203–212. External Links: Cited by: §II.