Managed Geo-Distributed Feature Store: Architecture and System Design

05/31/2023
by   Anya Li, et al.
0

Companies are using machine learning to solve real-world problems and are developing hundreds to thousands of features in the process. They are building feature engineering pipelines as part of MLOps life cycle to transform data from various data sources and materialize the same for future consumption. Without feature stores, different teams across various business groups would maintain the above process independently, which can lead to conflicting and duplicated features in the system. Data scientists find it hard to search for and reuse existing features and it is painful to maintain version control. Furthermore, feature correctness violations related to online (inferencing) - offline (training) skews and data leakage are common. Although the machine learning community has extensively discussed the need for feature stores and their purpose, this paper aims to capture the core architectural components that make up a managed feature store and to share the design learning in building such a system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2020

autoNLP: NLP Feature Recommendations for Text Analytics Applications

While designing machine learning based text analytics applications, ofte...
research
04/01/2020

Leveraging Data Preparation, HBase NoSQL Storage, and HiveQL Querying for COVID-19 Big Data Analytics Projects

Epidemiologist, Scientists, Statisticians, Historians, Data engineers an...
research
05/09/2022

On Designing Data Models for Energy Feature Stores

The digitization of the energy infrastructure enables new, data driven, ...
research
12/06/2022

Measuring Intangible Assets Using Parametric and Machine Learning Approaches

Intangible capital as the result of digitalization and globalization has...
research
03/26/2021

FeatureEnVi: Visual Analytics for Feature Engineering Using Stepwise Selection and Semi-Automatic Extraction Approaches

The machine learning (ML) life cycle involves a series of iterative step...
research
09/26/2018

A Machine Learning Approach to Shipping Box Design

Having the right assortment of shipping boxes in the fulfillment warehou...
research
05/19/2021

Mill.jl and JsonGrinder.jl: automated differentiable feature extraction for learning from raw JSON data

Learning from raw data input, thus limiting the need for manual feature ...

Please sign up or login with your details

Forgot password? Click here to reset