Metabolomics in the Cloud: Scaling Computational Tools to Big Data

04/04/2019
by   Jianliang Gao, et al.
0

Background: Metabolomics datasets are becoming increasingly large and complex, with multiple types of algorithms and workflows needed to process and analyse the data. A cloud infrastructure with portable software tools can provide much needed resources enabling faster processing of much larger datasets than would be possible at any individual lab. The PhenoMeNal project has developed such an infrastructure, allowing users to run analyses on local or commercial cloud platforms. We have examined the computational scaling behaviour of the PhenoMeNal platform using four different implementations across 1-1000 virtual CPUs using two common metabolomics tools. Results: Our results show that data which takes up to 4 days to process on a standard desktop computer can be processed in just 10 min on the largest cluster. Improved runtimes come at the cost of decreased efficiency, with all platforms falling below 80 number of vCPUs. An economic analysis revealed that running on large scale cloud platforms is cost effective compared to traditional desktop systems. Conclusions: Overall, cloud implementations of PhenoMeNal show excellent scalability for standard metabolomics computing tasks on a range of platforms, making them a compelling choice for research computing in metabolomics.

READ FULL TEXT
research
11/24/2017

Plug and Play Bench: Simplifying Big Data Benchmarking Using Containers

The recent boom of big data, coupled with the challenges of its processi...
research
06/09/2020

Reproducible and Portable Workflows for Scientific Computing and HPC in the Cloud

The increasing availability of cloud computing services for science has ...
research
02/17/2021

Deployment of Elastic Virtual Hybrid Clusters Across Cloud Sites

Virtual clusters are widely used computing platforms than can be deploye...
research
03/10/2022

A Framework for the Interoperability of Cloud Platforms: Towards FAIR Data in SAFE Environments

As the number of cloud platforms supporting biomedical research grows, t...
research
01/13/2023

PESC – Parallel Experiment for Sequential Code

The need for computational resources grows as computational algorithms g...
research
10/18/2022

OpenStack and Google Cloud performance comparison in Infrastructure as a Service model

Cloud computing is becoming common, and the choice of proper infrastruct...
research
02/14/2022

Short-lived Datacenter

Serverless platforms have attracted attention due to their promise of el...

Please sign up or login with your details

Forgot password? Click here to reset