Large Language Models as Data Preprocessors

08/30/2023
by   Haochen Zhang, et al.
0

Large Language Models (LLMs), typified by OpenAI's GPT series and Meta's LLaMA variants, have marked a significant advancement in artificial intelligence. Trained on vast amounts of text data, LLMs are capable of understanding and generating human-like text across a diverse range of topics. This study expands on the applications of LLMs, exploring their potential in data preprocessing, a critical stage in data mining and analytics applications. We delve into the applicability of state-of-the-art LLMs such as GPT-3.5, GPT-4, and Vicuna-13B for error detection, data imputation, schema matching, and entity matching tasks. Alongside showcasing the inherent capabilities of LLMs, we highlight their limitations, particularly in terms of computational expense and inefficiency. We propose an LLM-based framework for data preprocessing, which integrates cutting-edge prompt engineering techniques, coupled with traditional methods like contextualization and feature selection, to improve the performance and efficiency of these models. The effectiveness of LLMs in data preprocessing is evaluated through an experimental study spanning 12 datasets. GPT-4 emerged as a standout, achieving 100% accuracy or F1 score on 4 datasets, suggesting LLMs' immense potential in these tasks. Despite certain limitations, our study underscores the promise of LLMs in this domain and anticipates future developments to overcome current hurdles.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/05/2023

On the Planning, Search, and Memorization Capabilities of Large Language Models

The rapid advancement of large language models, such as the Generative P...
research
05/24/2023

GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and Benchmarking

Large language models (LLM) like ChatGPT have become indispensable to ar...
research
03/31/2023

Enhancing Large Language Models with Climate Resources

Large language models (LLMs) have significantly transformed the landscap...
research
06/11/2023

Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective

Molecule discovery plays a crucial role in various scientific fields, ad...
research
03/08/2023

Does Synthetic Data Generation of LLMs Help Clinical Text Mining?

Recent advancements in large language models (LLMs) have led to the deve...
research
09/03/2023

AutoML-GPT: Large Language Model for AutoML

With the emerging trend of GPT models, we have established a framework c...

Please sign up or login with your details

Forgot password? Click here to reset