Automatic Data Transformation Using Large Language Model: An Experimental Study on Building Energy Data

09/05/2023
by   Ankita Sharma, et al.
0

Existing approaches to automatic data transformation are insufficient to meet the requirements in many real-world scenarios, such as the building sector. First, there is no convenient interface for domain experts to provide domain knowledge easily. Second, they require significant training data collection overheads. Third, the accuracy suffers from complicated schema changes. To bridge this gap, we present a novel approach that leverages the unique capabilities of large language models (LLMs) in coding, complex reasoning, and zero-shot learning to generate SQL code that transforms the source datasets into the target datasets. We demonstrate the viability of this approach by designing an LLM-based framework, termed SQLMorpher, which comprises a prompt generator that integrates the initial prompt with optional domain knowledge and historical patterns in external databases. It also implements an iterative prompt optimization mechanism that automatically improves the prompt based on flaw detection. The key contributions of this work include (1) pioneering an end-to-end LLM-based solution for data transformation, (2) developing a benchmark dataset of 105 real-world building energy data transformation problems, and (3) conducting an extensive empirical evaluation where our approach achieved 96 effectiveness of utilizing LLMs in complex, domain-specific challenges, highlighting the potential of their potential to drive sustainable solutions.

READ FULL TEXT

page 1

page 3

page 5

research
06/15/2023

Interleaving Pre-Trained Language Models and Large Language Models for Zero-Shot NL2SQL Generation

Zero-shot NL2SQL is crucial in achieving natural language to SQL that is...
research
06/07/2023

ScienceBenchmark: A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems

Natural Language to SQL systems (NL-to-SQL) have recently shown a signif...
research
09/11/2021

Exploring Underexplored Limitations of Cross-Domain Text-to-SQL Generalization

Recently, there has been significant progress in studying neural network...
research
02/10/2022

Zero Shot Learning for Predicting Energy Usage of Buildings in Sustainable Design

The 2030 Challenge is aimed at making all new buildings and major renova...
research
08/22/2022

Incorporating Domain Knowledge through Task Augmentation for Front-End JavaScript Code Generation

Code generation aims to generate a code snippet automatically from natur...
research
07/03/2023

Iterative Zero-Shot LLM Prompting for Knowledge Graph Construction

In the current digitalization era, capturing and effectively representin...
research
07/20/2023

Ethosight: A Reasoning-Guided Iterative Learning System for Nuanced Perception based on Joint-Embedding Contextual Label Affinity

Traditional computer vision models often require extensive manual effort...

Please sign up or login with your details

Forgot password? Click here to reset