LST-Bench: Benchmarking Log-Structured Tables in the Cloud

Log-Structured Tables (LSTs), also commonly referred to as table formats, have recently emerged to bring consistency and isolation to object stores. With the separation of compute and storage, object stores have become the go-to for highly scalable and durable storage. However, this comes with its own set of challenges, such as the lack of recovery and concurrency management that traditional database management systems provide. This is where LSTs such as Delta Lake, Apache Iceberg, and Apache Hudi come into play, providing an automatic metadata layer that manages tables defined over object stores, effectively addressing these challenges. A paradigm shift in the design of these systems necessitates the updating of evaluation methodologies. In this paper, we examine the characteristics of LSTs and propose extensions to existing benchmarks, including workload patterns and metrics, to accurately capture their performance. We introduce our framework, LST-Bench, which enables users to execute benchmarks tailored for the evaluation of LSTs. Our evaluation demonstrates how these benchmarks can be utilized to evaluate the performance, efficiency, and stability of LSTs. The code for LST-Bench is open sourced and is available at https://github.com/microsoft/lst-bench/ .

READ FULL TEXT

page 9

page 10

research
08/07/2023

Generative Benchmark Creation for Table Union Search

Data management has traditionally relied on synthetic data generators to...
research
05/04/2021

Retrieving Complex Tables with Multi-Granular Graph Representation Learning

The task of natural language table retrieval (NLTR) seeks to retrieve se...
research
11/15/2022

Deep learning for table detection and structure recognition: A survey

Tables are everywhere, from scientific journals, papers, websites, and n...
research
04/16/2021

Learning to Reason for Text Generation from Scientific Tables

In this paper, we introduce SciGen, a new challenge dataset for the task...
research
08/12/2022

AutoShard: Automated Embedding Table Sharding for Recommender Systems

Embedding learning is an important technique in deep recommendation mode...
research
05/26/2020

Benchmarking Graph Data Management and Processing Systems: A Survey

The development of scalable, representative, and widely adopted benchmar...
research
10/29/2021

A Demonstration of Benchmarking Time Series Management Systems in the Cloud

Time Series Management Systems (TSMS) are Database Management Systems th...

Please sign up or login with your details

Forgot password? Click here to reset