Unrelated to rise of web-technologies, a new emerging trend is the rapid adoption of containerization technologies. These have enabled the concept of compute portability in a similar sense to data portability. Just as data can be moved from place to place, containerization allows for operations on that data to also be moved from place to place.
To our knowledge, no web-based platform currently exists that provides data and compute agnostic services (some services, such as CBRAIN  and LONI  provide conceptually similar approaches, but do not have deep connectivity to typical hospital database repositories), in particular collection, management, and real-time sharing of medical data, as well as access to pipelines that process that data. In this paper, we introduce CHIPS (Cloud Healthcare Image Processing Service). CHIPS is a novel web-based medical data storage and data processing workflow service that provides strict data security while also facilitating secure, real-time interactive collaboration over the Internet and internal Intranets.
CHIPS is able to seamlessly collect data from typical sources found in hospitals (such as Picture Archive and Communications Systems, PACS) and easily export to approved cloud storage. CHIPS not only manages data collection and organization, but it also provides a large (and expanding) library of pipelines to analyze imported data, and the containerized compute can execute in a large variety of remote resources. CHIPS provides for persistent record and management of activity in feeds as well as for powerful visualization of data. In particular, it makes use of the popular XTK toolkit which was also developed by our team at the Fetal-Neonatal Neuroimaging and Developmental Science Center, Boston Children’s Hospital111http://fnndsc.babymri.org for the in-browser rendering and visualization of medical image data and can be freely downloaded from the web222http://goxtk.com .
2 Architectural Overview
The creation of CHIPS has been motivated by both clinical and research needs. On the clinical side, CHIPS was built to provide clinicians with easy access to large amounts of data (especially from hospital image databases like Picture Archive and Communications Systems – PACS), to provide for powerful collaboration, and to allow for easy access to a library of analysis processes or pipelines. On the research side, CHIPS was designed to allow computational researchers to test and develop new algorithms for image processing across heterogeneous platforms, while allowing life science researchers to focus on their research protocols and data processing, without needing to spend time on the minutiae of performing data analysis.
The system design is highly distributed, as shown in Figure 1, which shows a CHIPS deployment connected to multiple input sources and multiple compute sources. Though the figure suggests a single, discrete central point, components of CHIPS do reside on each input (PACS) and compute location.
2.2 Distributed Component Design
Architecturally CHIPS is not a single monolithic system, but a distributed collection of interconnected components, including a front-end webserver and web-based UI; a core RESTful back-end central server that provides access to all data, feeds, users, etc; a DICOM/PACS interface; a set of independent RESTful microservices that handle inter-network data IO and also remote process management, and a core cloud-based computational platform that orchestrates offloading of image processing pipelines to some remote cloud-based compute – see Figure 2.
The top the red box of Figure 2 contains the PACS node and represents the Hospital image data repository. The second blue box, labeled Web-entry point and data hosting node contains the main CHIPS backend and is presented as being in a “cloud” (i.e. some resource that is accessible from the Internet). Finally, the bottom yellow box is shown on a separate “cloud” to emphasize that it is topologically distinct from the Web-entry point.
The logical relationships between data (represented as the rectangles with a tree structure) and compute elements denoted by the named hexagons is shown by either data connectors (thick blue arrows) or control connections (single line arrows). In the syntax of the diagram, the stylized cloud icon touching some of the boxes denotes that these compute elements are controlled by a REST API, while the sphere icon denotes web-access.
An remote compute is denoted by plugin, which is controlled by a manage component. In the most abstract sense, the plugin processes an input data structure, and outputs a transformed data structure (the two tree graphs as shown). File transfer between the data cloud and compute cloud is performed by the file IO handler component. A query/retrieve process in the data cloud connects to an authentication process, auth in the Hospital network, while on-the-fly anonymization of DICOM images is handled by process anonymizer anon. Finally the dispatcher is a component that determines what compute node (or cloud) is best suited for the data analysis at hand. The circle icon attached to the manage and plugin icons implies the attached process and can provide real-time feedback information to other software agents about the controlled process via its own REST interface.
2.3 Pervasive containerization
CHIPS is designed as a distributed system, and the underlying components are containerized (currently using docker333https://www.docker.com. In Figure 2, the Main CHIPS web interface and associated backend database is housed within a single container444https://github.com/FNNDSC/ChRIS_ultron_backEnd. Input data and processed results are accessible in the hosting node and volume mapped as appropriate to this back end. Other components of CHIPS in the web-entry node are similarly containerized. This includes the manage555https://github.com/FNNDSC/pman block, which is responsible for spawning processes on the underlying system. Not only does manage provide the means to start and stop processes, but it also tracks the execution state, termination state, and standard output/error streams of the process. The manage component has a REST interface through which clients can start/stop and query processes.
Also containerized is the IO666https://github.com/FNNDSC/pfioh component that can transfer entire directory trees across network boundaries from one system to another as well as the dispatch777https://github.com/FNNDSC/swarm component that can orchestrate multiple processing jobs as handled by manage. The plugin container houses the particular compute to perform on a given set of data, and is spawned by the manage component under direction of the dispatch. Since the compute typically occurs on a separate system to the data hosting node, the IO containers perform the necessary transmit of data to this compute system, as well as the retrieve of resultant data back to the data node, allowing the web container to present (and visualize) results to the user.
3 UI Considerations
Figure 3 shows the home page view on first logging into the system. Studies that have been “sent” to CHIPS appear in their own “cards” on the user’s home page with a small visualization of a represented image set of the study. Various control on this home page allow users to organize/tag “cards” in specific projects (or folders), remove cards, bookmark for easy access, etc. New cards can be generated by clicking on the +⃝ icon and choosing an activity (such as PACS Query/Retrieve), and any card can be seamlessly shared with other users of the system.
On selecting a given feed, the core image data in that feed is visualized in a rich, web-based viewer – see Figure 4. Various tabs and elements of the feed view provide different perspectives on the data, and also provide the ability to annotate notes, or add comments. As in the feed view, a +⃝ icon is also present, and if selected, opens a ribbon of “plugins” (or “apps”) to run on the data contained in the feed. For example, certain plugins might perform a surface reconstruction of the brain surface with tissue segmentation (for example, a FreeSurfer plugin).
The interface semantics within a feed are straightforward: a user clicks on the feed and enters the top level data view. Once a plugin from the +⃝ is applied, the feed data is processed accordingly. When the plugin is completed, its output files are also organized in the feed in a logical tree view (accessible via the left ”Data” tab) in a manner akin to an email thread. In this manner, the thread of execution from data plugin data is defined – in effect building a workflow.
Any image visualized can also be shared in real-time using collaboration features built into the viewer library and leveraging the Google Drive API and Google Realtime API .
4 Big Data Infrastructure
An important component of CHIPS lies in creating a foundation suitable for future support of “data mining”. Recently, the term Big Data has come into common parlance, especially in the context of informatics [13, 16, 7]. Despite the term and the use of Big, the concept often refers to the use of predictive analytics and other advanced data analytics tools that extract meaning from sets of data and does not necessarily to the particular size of the data set.
In healthcare, big data analytics has impacted the field in very specific areas such as clinical risk intervention, waste and care variability reduction, and automated reporting. However, as a field, biomedical imaging has not especially benefited from big data approaches due to the unstructured nature of image data, complexity of results from analysis in terms of data formats (again usually unstructured), simple quality issues such as noise in image acquisitions, etc.
CHIPS constructs a framework to allow big data methods to be used in this image space. Consider that the incoming source data to CHIPS are DICOM images that by their nature contain a large amount of meta information, most of which is non PHI and will be left unchanged by the anonymization processes. Information about the scanning/imaging protocol, acquisition parameters, as well as certain non-PHI demographics such as patient sex and age can be meaningfully databased. Moreover, the application of an analysis pipeline to an image data-set can in turn result in large amounts of meaningful data that can be databased and associated with the incoming source data. For example, FreeSurfer, which is dockerized as a plugin in the CHIPS system produces volumetric segmentations and surface reconstructions on raw input MRI T1 weighted data [3, 5, 1].
In Figure 5 input raw DICOM (purple block) and output processed data from the DICOMs (green block) are shown. A DICOM tag extraction process removes the image meta data and associates this information with the particular image record. DICOM data is regularly formatted and easily extracted. Importantly, for the output data, and assuming the output data is a 3D surface reconstruction and tables of brain parcellation volume values, a structured analysis process regularizes all this information into meta data that will be added to the space of data pertaining to this image record. This processing will lay the ground work on which data analytics can explore and mine for relations between (for example) input acquisition parameters and pipeline output results, or simply mine across output results for hidden trends in data trajectories (for example volumetric changes with age or sex).
5 Conclusion and Future Directions
CHIPS is a distributed system that provides a single, cloud-based, access point to a large family of services. These include: (a) accessing medical image data securely from participating institutions with authenticated access and built-in anonymization of collected image data; (b) organizing collected data in a modern UI that allows for easy data management and sharing; (c) performing processing on images by dispatching data to remote clouds and controlling/managing remote execution on these resources; (d) powerful real-time collaboration on images using secure third party services (such as the Google RealTime API); and intuitively constructing medical image processing workflows. CHIPS is not only a medical data management system, but strives to improve the quality of healthcare by allowing clinical users the ability to easily perform value added processing and sharing of data and information. Current and future directions for CHIPS
include facilitating the construction of big-data frameworks and allowing for users to simply construct experiments for data analytics and various machine learning pipelines.
All analysis and development conducted by the CHIPS system at the Boston Children’s Hospital was conducted under relevant Institutional Review Board approval, which governed access to image data and controlled the scope of sharing of such data.
-  FreeSurfer. http://surfer.nmr.mgh.harvard.edu/
-  Dale, A.M., Fischl, B., Sereno, M.I.: Cortical Surface-Based Analysis – I. Segmentation and Surface Reconstruction. NEUROIMAGE 9, 179–194 (1999)
-  Eckersley, P., Egan, G.F., De Schutter, E., Yiyuan, T., Novak, M., Sebesta, V., Matthiessen, L., Jaaskelainen, I.P., Ruotsalainen, U., Herz, A.V., et al.: Neuroscience data and tool sharing. Neuroinformatics 1(2), 149–165 (2003)
-  Fischl, B., Sereno, M.I., Dale, A.M.: Cortical surface-based analysis II: Inflation, flattening, and a surface-based coordinate system. NeuroImage 9, 195–207 (1999)
-  Ginsburg, D., Gerhard, S., Calle, J.E.C., Pienaar, R.: Realtime visualization of the connectome in the browser using webgl. Frontiers in Neuroinformatics (2011)
-  Greene, C.S., Tan, J., Ung, M., Moore, J.H., Cheng, C.: Big data bioinformatics. Journal of Cellular Physiology 229(12), 1896–1900 (2014), http://dx.doi.org/10.1002/jcp.24662
-  Haehn, D., Rannou, N., Ahtam, B., Grant, E., Pienaar, R.: Neuroimaging in the browser using the x toolkit. In: Frontiers in Neuroinformatics Conference Abstract: 5th INCF Congress of Neuroinformatics (Munich) (2014)
-  Haehn, D., Rannou, N., Grant, P.E., Pienaar, R.: Slice:drop: Collaborative medical imaging in the browser. In: ACM SIGGRAPH 2013 Computer Animation Festival. pp. 1–1. SIGGRAPH ’13, ACM, New York, NY, USA (2013), http://doi.acm.org/10.1145/2503541.2503645
-  Millan, J., Yunda, L.: An open-access web-based medical image atlas for collaborative medical image sharing, processing, web semantic searching and analysis with uses in medical training, research and second opinion of cases. Nova 12(22), 143–150 (2014)
-  Mwalongo, F., Krone, M., Reina, G., Ertl, T.: State-of-the-art report in web-based visualization. In: Computer Graphics Forum. vol. 35, pp. 553–575. Wiley Online Library (2016)
Provost, F., Fawcett, T.: Data science and its relationship to big data and data-driven decision making. Big Data 1(1), 51–59 (Feb 2013),http://dx.doi.org/10.1089/big.2013.1508
-  Rex, D.E., Ma, J.Q., Toga, A.W.: The LONI Pipeline Processing Environment. Neuroimage 19(3), 1033–1048 (Jul 2003), http://www.hubmed.org/display.cgi?uids=12880830
-  Sherif, T., Rioux, P., Rousseau, M.E., Kassis, N., Beck, N., Adalat, R., Das, S., Glatard, T., Evans, A.C.: Cbrain: a web-based, distributed computing platform for collaborative neuroimaging research. Frontiers in neuroinformatics 8 (2014)
-  Swan, M.: The quantified self: Fundamental disruption in big data science and biological discovery. Big Data 1(2), 85–99 (Jun 2013), http://dx.doi.org/10.1089/big.2012.0002
Wood, D., King, M., Landis, D., Courtney, W., Wang, R., Kelly, R., Turner, J.A., Calhoun, V.D.: Harnessing modern web application technology to create intuitive and efficient data visualization and sharing tools. Frontiers in neuroinformatics 8, 71 (2014)