FlashP: An Analytical Pipeline for Real-time Forecasting of Time-Series Relational Data

01/09/2021
by   Shuyuan Yan, et al.
22

Interactive response time is important in analytical pipelines for users to explore a sufficient number of possibilities and make informed business decisions. We consider a forecasting pipeline with large volumes of high-dimensional time series data. Real-time forecasting can be conducted in two steps. First, we specify the part of data to be focused on and the measure to be predicted by slicing, dicing, and aggregating the data. Second, a forecasting model is trained on the aggregated results to predict the trend of the specified measure. While there are a number of forecasting models available, the first step is the performance bottleneck. A natural idea is to utilize sampling to obtain approximate aggregations in real time as the input to train the forecasting model. Our scalable real-time forecasting system FlashP (Flash Prediction) is built based on this idea, with two major challenges to be resolved in this paper: first, we need to figure out how approximate aggregations affect the fitting of forecasting models, and forecasting results; and second, accordingly, what sampling algorithms we should use to obtain these approximate aggregations and how large the samples are. We introduce a new sampling scheme, called GSW sampling, and analyze error bounds for estimating aggregations using GSW samples. We introduce how to construct compact GSW samples with the existence of multiple measures to be analyzed. We conduct experiments to evaluate our solution and compare it with alternatives on real data.

READ FULL TEXT

page 7

page 8

page 9

research
05/04/2021

TimeGym: Debugging for Time Series Modeling in Python

We introduce the TimeGym Forecasting Debugging Toolkit, a Python library...
research
10/21/2019

You May Not Need Order in Time Series Forecasting

Time series forecasting with limited data is a challenging yet critical ...
research
08/05/2022

A Case-Study of Sample-Based Bayesian Forecasting Algorithms

For a Bayesian, real-time forecasting with the posterior predictive dist...
research
02/23/2023

Adaptive Sampling for Probabilistic Forecasting under Distribution Shift

The world is not static: This causes real-world time series to change ov...
research
02/03/2022

Review of automated time series forecasting pipelines

Time series forecasting is fundamental for various use cases in differen...
research
11/19/2018

Weighted Ensemble of Statistical Models

We present a detailed description of our submission for the M4 forecasti...
research
10/29/2021

A Demonstration of Benchmarking Time Series Management Systems in the Cloud

Time Series Management Systems (TSMS) are Database Management Systems th...

Please sign up or login with your details

Forgot password? Click here to reset