ClimSim: An open large-scale dataset for training high-resolution physics emulators in hybrid multi-scale climate simulators

06/14/2023
∙
by   Sungduk Yu, et al.
∙
8
∙

Modern climate projections lack adequate spatial and temporal resolution due to computational constraints. A consequence is inaccurate and imprecise prediction of critical processes such as storms. Hybrid methods that combine physics with machine learning (ML) have introduced a new generation of higher fidelity climate simulators that can sidestep Moore's Law by outsourcing compute-hungry, short, high-resolution simulations to ML emulators. However, this hybrid ML-physics simulation approach requires domain-specific treatment and has been inaccessible to ML experts because of lack of training data and relevant, easy-to-use workflows. We present ClimSim, the largest-ever dataset designed for hybrid ML-physics research. It comprises multi-scale climate simulations, developed by a consortium of climate scientists and ML researchers. It consists of 5.7 billion pairs of multivariate input and output vectors that isolate the influence of locally-nested, high-resolution, high-fidelity physics on a host climate simulator's macro-scale physical state. The dataset is global in coverage, spans multiple years at high sampling frequency, and is designed such that resulting emulators are compatible with downstream coupling into operational climate simulators. We implement a range of deterministic and stochastic regression baselines to highlight the ML challenges and their scoring. The data (https://huggingface.co/datasets/LEAP/ClimSim_high-res) and code (https://leap-stc.github.io/ClimSim) are released openly to support the development of hybrid ML-physics and high-fidelity climate simulations for the benefit of science and society.

READ FULL TEXT

page 1

page 27

page 29

research
∙ 09/19/2023

Multi-fidelity climate model parameterization for better generalization and extrapolation

Machine-learning-based parameterizations (i.e. representation of sub-gri...
research
∙ 11/29/2021

ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate Models

Numerical simulations of Earth's weather and climate require substantial...
research
∙ 11/29/2022

Machine learning emulation of a local-scale UK climate model

Climate change is causing the intensification of rainfall extremes. Prec...
research
∙ 04/13/2021

Using Machine Learning at Scale in HPC Simulations with SmartSim: An Application to Ocean Climate Modeling

We demonstrate the first climate-scale, numerical ocean simulations impr...
research
∙ 09/16/2023

Earth Virtualization Engines – A Technical Perspective

Participants of the Berlin Summit on Earth Virtualization Engines (EVEs)...
research
∙ 02/07/2023

Climate Intervention Analysis using AI Model Guided by Statistical Physics Principles

The availability of training data remains a significant obstacle for the...
research
∙ 11/20/2022

Multi-scale Digital Twin: Developing a fast and physics-informed surrogate model for groundwater contamination with uncertain climate models

Soil and groundwater contamination is a pervasive problem at thousands o...

Please sign up or login with your details

Forgot password? Click here to reset