Structuring spreadsheets with ObjTables enables data quality control, reuse, and integration

05/11/2020
by   Jonathan R. Karr, et al.
0

A central challenge in science is to understand how systems behaviors emerge from complex networks. This often requires aggregating, reusing, and integrating heterogeneous information. Supplementary spreadsheets to articles are a key data source. Spreadsheets are popular because they are easy to read and write. However, spreadsheets are often difficult to reanalyze because they capture data ad hoc without schemas that define the objects, relationships, and attributes that they represent. To help researchers reuse and compose spreadsheets, we developed ObjTables, a toolkit that makes spreadsheets human- and machine-readable by combining spreadsheets with schemas and an object-relational mapping system. ObjTables includes a format for schemas; markup for indicating the class and attribute represented by each spreadsheet and column; numerous data types for scientific information; and high-level software for using schemas to read, write, validate, compare, merge, revision, and analyze spreadsheets. By making spreadsheets easier to reuse, ObjTables could enable unprecedented secondary meta-analyses. By making it easy to build new formats and associated software for new types of data, ObjTables can also accelerate emerging scientific fields.

READ FULL TEXT
research
05/11/2020

ObjTables: structured spreadsheets that promote data quality, reuse, and integration

A central challenge in science is to understand how systems behaviors em...
research
05/17/2022

Subdivisions and Crossroads: Identifying Hidden Community Structures in a Data Archive's Citation Network

Data archives are an important source of high quality data in many field...
research
06/08/2020

Lethe: A Tunable Delete-Aware LSM Engine

Data-intensive applications fueled the evolution of log structured merge...
research
09/30/2020

t-Resilient k-Immediate Snapshot and its Relation with Agreement Problems

An immediate snapshot object is a high level communication object, built...
research
12/15/2021

or2yw: Modeling and Visualizing OpenRefineHistories as YesWorkflow Diagrams

OpenRefine is a popular open-source data cleaning tool. It allows users ...
research
07/15/2021

Improving I/O Performance for Exascale Applications through Online Data Layout Reorganization

The applications being developed within the U.S. Exascale Computing Proj...
research
06/17/2018

Utilizing Provenance in Reusable Research Objects

Science is conducted collaboratively, often requiring the sharing of kno...

Please sign up or login with your details

Forgot password? Click here to reset