ParPaRaw: Massively Parallel Parsing of Delimiter-Separated Raw Data

05/31/2019
by   Elias Stehle, et al.
0

Parsing is essential for a wide range of use cases, such as stream processing, bulk loading, and in-situ querying of raw data. Yet, the compute-intense step often constitutes a major bottleneck in the data ingestion pipeline, since parsing of inputs that require more involved parsing rules is challenging to parallelise. This work proposes a massively parallel algorithm for parsing delimiter-separated data formats on GPUs. Other than the state-of-the-art, the proposed approach does not require an initial sequential pass over the input to determine a thread's parsing context. That is, how a thread, beginning somewhere in the middle of the input, should interpret a certain symbol (e.g., whether to interpret a comma as a delimiter or as part of a larger string enclosed in double-quotes). Instead of tailoring the approach to a single format, we are able to perform a massively parallel FSM simulation, which is more flexible and powerful, supporting more expressive parsing rules with general applicability. Achieving a parsing rate of as much as 14.2 GB/s, our experimental evaluation on a GPU with 3584 cores shows that the presented approach is able to scale to thousands of cores and beyond. With an end-to-end streaming approach, we are able to exploit the full-duplex capabilities of the PCIe bus and hide latency from data transfers. Considering the end-to-end performance, the algorithm parses 4.8 GB in as little as 0.44 seconds, including data transfers.

READ FULL TEXT

page 10

page 12

page 13

page 14

research
06/02/2023

An Evaluation of Log Parsing with ChatGPT

Software logs play an essential role in ensuring the reliability and mai...
research
12/01/2022

PIZZA: A new benchmark for complex end-to-end task-oriented parsing

Much recent work in task-oriented parsing has focused on finding a middl...
research
10/15/2019

Text2Math: End-to-end Parsing Text into Math Expressions

We propose Text2Math, a model for semantically parsing text into math ex...
research
05/01/2017

Dependency Parsing with Dilated Iterated Graph CNNs

Dependency parses are an effective way to inject linguistic knowledge in...
research
10/19/2017

SLING: A framework for frame semantic parsing

We describe SLING, a framework for parsing natural language into semanti...
research
09/17/2018

Devil in the Details: Towards Accurate Single and Multiple Human Parsing

Human parsing has received considerable interest due to its wide applica...
research
08/04/2021

Multi-Round Parsing-based Multiword Rules for Scientific OpenIE

Information extraction (IE) in scientific literature has facilitated man...

Please sign up or login with your details

Forgot password? Click here to reset