Detect, Distill and Update: Learned DB Systems Facing Out of Distribution Data

10/11/2022
by   Meghdad Kurmanji, et al.
0

Machine Learning (ML) is changing DBs as many DB components are being replaced by ML models. One open problem in this setting is how to update such ML models in the presence of data updates. We start this investigation focusing on data insertions (dominating updates in analytical DBs). We study how to update neural network (NN) models when new data follows a different distribution (a.k.a. it is "out-of-distribution" – OOD), rendering previously-trained NNs inaccurate. A requirement in our problem setting is that learned DB components should ensure high accuracy for tasks on old and new data (e.g., for approximate query processing (AQP), cardinality estimation (CE), synthetic data generation (DG), etc.). This paper proposes a novel updatability framework (DDUp). DDUp can provide updatability for different learned DB system components, even based on different NNs, without the high costs to retrain the NNs from scratch. DDUp entails two components: First, a novel, efficient, and principled statistical-testing approach to detect OOD data. Second, a novel model updating approach, grounded on the principles of transfer learning with knowledge distillation, to update learned models efficiently, while still ensuring high accuracy. We develop and showcase DDUp's applicability for three different learned DB components, AQP, CE, and DG, each employing a different type of NN. Detailed experimental evaluation using real and benchmark datasets for AQP, CE, and DG detail DDUp's performance advantages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/16/2023

Experimental Demonstration of ML-Based DWDM System Margin Estimation

SNR margins between partially and fully loaded DWDM systems are estimate...
research
09/09/2020

Method for classifying a noisy Raman spectrum based on a wavelet transform and a deep neural network

This paper proposes a new framework based on a wavelet transform and dee...
research
05/16/2023

Towards Lifelong Learning for Software Analytics Models: Empirical Study on Brown Build and Risk Prediction

Nowadays, software analytics tools using machine learning (ML) models to...
research
06/30/2021

Recognizing Facial Expressions in the Wild using Multi-Architectural Representations based Ensemble Learning with Distillation

Facial expressions are the most common universal forms of body language....
research
04/20/2019

CleanML: A Benchmark for Joint Data Cleaning and Machine Learning [Experiments and Analysis]

It is widely recognized that the data quality affects machine learning (...
research
07/29/2021

Machine Learning over Static and Dynamic Relational Data

This tutorial overviews principles behind recent works on training and m...
research
07/01/2023

JoinBoost: Grow Trees Over Normalized Data Using Only SQL

Although dominant for tabular data, ML libraries that train tree models ...

Please sign up or login with your details

Forgot password? Click here to reset