CosmoHub: Interactive exploration and distribution of astronomical data on Hadoop

03/04/2020
by   Pau Tallada, et al.
0

We present CosmoHub (https://cosmohub.pic.es), a web application based on Hadoop to perform interactive exploration and distribution of massive cosmological datasets. Recent Cosmology seeks to unveil the nature of both dark matter and dark energy mapping the large-scale structure of the Universe, through the analysis of massive amounts of astronomical data, progressively increasing during the last (and future) decades with the digitization and automation of the experimental techniques. CosmoHub, hosted and developed at the Port d'Informació Científica (PIC), provides support to a worldwide community of scientists, without requiring the end user to know any Structured Query Language (SQL). It is serving data of several large international collaborations such as the Euclid space mission, the Dark Energy Survey (DES), the Physics of the Accelerating Universe Survey (PAUS) and the Marenostrum Institut de Ciències de l'Espai (MICE) numerical simulations. While originally developed as a PostgreSQL relational database web frontend, this work describes the current version of CosmoHub, built on top of Apache Hive, which facilitates scalable reading, writing and managing huge datasets. As CosmoHub's datasets are seldomly modified, Hive it is a better fit. Over 60 TiB of catalogued information and 50 × 10^9 astronomical objects can be interactively explored using an integrated visualization tool which includes 1D histogram and 2D heatmap plots. In our current implementation, online exploration of datasets of 10^9 objects can be done in a timescale of tens of seconds. Users can also download customized subsets of data in standard formats generated in few minutes.

READ FULL TEXT

page 6

page 7

page 10

page 12

page 13

page 14

research
05/20/2020

Interactive exploration of population scale pharmacoepidemiology datasets

Population-scale drug prescription data linked with adverse drug reactio...
research
10/26/2021

Argo Scholar: Interactive Visual Exploration of Literature in Browsers

Discovering and making sense of relevant research literature is fundamen...
research
04/10/2018

A Web-based Large-scale Timelapse Editor for Creating and Sharing Guided Video Tours and Interactive Slideshows

Scientists, journalists, and photographers have used advanced camera tec...
research
11/03/2017

Toward real-time data query systems in HEP

Exploratory data analysis tools must respond quickly to a user's questio...
research
11/19/2020

Freecyto: Quantized Flow Cytometry Analysis for the Web

Flow cytometry (FCM) is an analytic technique that is capable of detecti...
research
01/25/2022

SQRQuerier: A Visual Querying Framework for Cross-national Survey Data Recycling

Public opinion surveys constitute a powerful tool to study peoples' atti...
research
06/12/2014

Evolutionary Robotics on the Web with WebGL and Javascript

Web-based applications are highly accessible to users, providing rich, i...

Please sign up or login with your details

Forgot password? Click here to reset