From Data Processes to Data Products: Knowledge Infrastructures in Astronomy

09/03/2021
by   Christine L. Borgman, et al.
0

We explore how astronomers take observational data from telescopes, process them into usable scientific data products, curate them for later use, and reuse data for further inquiry. Astronomers have invested heavily in knowledge infrastructures - robust networks of people, artifacts, and institutions that generate, share, and maintain specific knowledge about the human and natural worlds. Drawing upon a decade of interviews and ethnography, this article compares how three astronomy groups capture, process, and archive data, and for whom. The Sloan Digital Sky Survey is a mission with a dedicated telescope and instruments, while the Black Hole Group and Integrative Astronomy Group (both pseudonyms) are university-based, investigator-led collaborations. Findings are organized into four themes: how these projects develop and maintain their workflows; how they capture and archive their data; how they maintain and repair knowledge infrastructures; and how they use and reuse data products over time. We found that astronomers encode their research methods in software known as pipelines. Algorithms help to point telescopes at targets, remove artifacts, calibrate instruments, and accomplish myriad validation tasks. Observations may be reprocessed many times to become new data products that serve new scientific purposes. Knowledge production in the form of scientific publications is the primary goal of these projects. They vary in incentives and resources to sustain access to their data products. We conclude that software pipelines are essential components of astronomical knowledge infrastructures, but are fragile, difficult to maintain and repair, and often invisible. Reusing data products is fundamental to the science of astronomy, whether or not those resources are made publicly available. We make recommendations for sustaining access to data products in scientific fields such as astronomy.

READ FULL TEXT

page 1

page 7

page 9

page 10

page 12

page 25

research
07/26/2023

It's Not Just GitHub: Identifying Data and Software Sources Included in Publications

Paper publications are no longer the only form of research product. Due ...
research
05/17/2022

Subdivisions and Crossroads: Identifying Hidden Community Structures in a Data Archive's Citation Network

Data archives are an important source of high quality data in many field...
research
08/28/2017

Re-run, Repeat, Reproduce, Reuse, Replicate: Transforming Code into Scientific Contributions

Scientific code is not production software. Scientific code participates...
research
01/27/2023

A sustainable infrastructure concept for improved accessibility, reusability, and archival of research software

Research software is an integral part of most research today and it is w...
research
11/21/2019

Exposing SED Models And Snapshots Via VO Simulation Artefacts

The Virtual Observatory (VO) simulation standards, Simulation Data Model...
research
05/01/2018

Computing Environments for Reproducibility: Capturing the "Whole Tale"

The act of sharing scientific knowledge is rapidly evolving away from tr...
research
02/10/2018

Astrolabe: Curating, Linking and Computing Astronomy's Dark Data

Where appropriate repositories are not available to support all relevant...

Please sign up or login with your details

Forgot password? Click here to reset