TADOC: Text Analytics Directly on Compression

09/20/2020
by   Feng Zhang, et al.
0

This article provides a comprehensive description of Text Analytics Directly on Compression (TADOC), which enables direct document analytics on compressed textual data. The article explains the concept of TADOC and the challenges to its effective realizations. Additionally, a series of guidelines and technical solutions that effectively address those challenges, including the adoption of a hierarchical compression method and a set of novel algorithms and data structure designs, are presented. Experiments on six data analytics tasks of various complexities show that TADOC can save 90.8 memory usage, while halving data processing times.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/14/2023

GreedyGD: Enhanced Generalized Deduplication for Direct Analytics in IoT

Exponential growth in the amount of data generated by the Internet of Th...
research
06/13/2021

G-TADOC: Enabling Efficient GPU-Based Text Analytics without Decompression

Text analytics directly on compression (TADOC) has proven to be a promis...
research
08/10/2021

A Survey on Deep Reinforcement Learning for Data Processing and Analytics

Data processing and analytics are fundamental and pervasive. Algorithms ...
research
03/08/2023

Change a Bit to save Bytes: Compression for Floating Point Time-Series Data

The number of IoT devices is expected to continue its dramatic growth in...
research
04/11/2022

Towards Understanding Analytics in Software Startups

Analytics plays a crucial role in the data-informed decision-making proc...
research
05/03/2020

An Algebraic Approach for High-level Text Analytics

Text analytical tasks like word embedding, phrase mining, and topic mode...
research
08/14/2023

3D Analytics: Opportunities and Guidelines for Information Systems Research

Progress in sensor technologies has made three-dimensional (3D) represen...

Please sign up or login with your details

Forgot password? Click here to reset