LEA: A Learned Encoding Advisor for Column Stores

05/18/2021
by   Lujing Cen, et al.
0

Data warehouses organize data in a columnar format to enable faster scans and better compression. Modern systems offer a variety of column encodings that can reduce storage footprint and improve query performance. Selecting a good encoding scheme for a particular column is an optimization problem that depends on the data, the query workload, and the underlying hardware. We introduce Learned Encoding Advisor (LEA), a learned approach to column encoding selection. LEA is trained on synthetic datasets with various distributions on the target system. Once trained, LEA uses sample data and statistics (such as cardinality) from the user's database to predict the optimal column encodings. LEA can optimize for encoded size, query performance, or a combination of the two. Compared to the heuristic-based encoding advisor of a commercial column store on TPC-H, LEA achieves 19 space.

READ FULL TEXT
research
09/05/2022

Spatial Parquet: A Column File Format for Geospatial Data Lakes [Extended Version]

Modern data analytics applications prefer to use column-storage formats ...
research
09/01/2022

ByteStore: Hybrid Layouts for Main-Memory Column Stores

The performance of main memory column stores highly depends on the scan ...
research
09/06/2022

An Adaptive Column Compression Family for Self-Driving Databases

Modern in-memory databases are typically used for high-performance workl...
research
04/27/2019

A computational model for analytic column stores

This work presents an abstract model for the computations performed by a...
research
06/27/2023

LeCo: Lightweight Compression via Learning Serial Correlations

Lightweight data compression is a key technique that allows column store...
research
05/19/2021

Revisiting Data Compression in Column-Stores

Data compression is widely used in contemporary column-oriented DBMSes t...
research
09/08/2023

Value-Compressed Sparse Column (VCSC): Sparse Matrix Storage for Redundant Data

Compressed Sparse Column (CSC) and Coordinate (COO) are popular compress...

Please sign up or login with your details

Forgot password? Click here to reset