On Designing Data Models for Energy Feature Stores

05/09/2022
by   Gregor Cerar, et al.
0

The digitization of the energy infrastructure enables new, data driven, applications often supported by machine learning models. However, domain specific data transformations, pre-processing and management in modern data driven pipelines is yet to be addressed. In this paper we perform a first time study on data models, energy feature engineering and feature management solutions for developing ML-based energy applications. We first propose a taxonomy for designing data models suitable for energy applications, analyze feature engineering techniques able to transform the data model into features suitable for ML model training and finally also analyze available designs for feature stores. Using a short-term forecasting dataset, we show the benefits of designing richer data models and engineering the features on the performance of the resulting models. Finally, we benchmark three complementary feature management solutions, including an open-source feature store.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/04/2023

Augmenting data-driven models for energy systems through feature engineering: A Python framework for feature engineering

Data-driven modeling is an approach in energy systems modeling that has ...
research
10/04/2022

Integrating pre-processing pipelines in ODC based framework

Using on-demand processing pipelines to generate virtual geospatial prod...
research
10/26/2021

Concepts for Automated Machine Learning in Smart Grid Applications

Undoubtedly, the increase of available data and competitive machine lear...
research
05/31/2023

Managed Geo-Distributed Feature Store: Architecture and System Design

Companies are using machine learning to solve real-world problems and ar...
research
03/07/2023

Training Machine Learning Models to Characterize Temporal Evolution of Disadvantaged Communities

Disadvantaged communities (DAC), as defined by the Justice40 initiative ...
research
07/17/2023

CohortFinder: an open-source tool for data-driven partitioning of biomedical image cohorts to yield robust machine learning models

Batch effects (BEs) refer to systematic technical differences in data co...
research
08/04/2020

Macroeconomic Data Transformations Matter

From a purely predictive standpoint, rotating the predictors' matrix in ...

Please sign up or login with your details

Forgot password? Click here to reset