Leveraging LLMs for KPIs Retrieval from Hybrid Long-Document: A Comprehensive Framework and Dataset

05/24/2023
by   Chongjian Yue, et al.
0

Large Language Models (LLMs) demonstrate exceptional performance in textual understanding and tabular reasoning tasks. However, their ability to comprehend and analyze hybrid text, containing textual and tabular data, remains underexplored. In this research, we specialize in harnessing the potential of LLMs to comprehend critical information from financial reports, which are hybrid long-documents. We propose an Automated Financial Information Extraction (AFIE) framework that enhances LLMs' ability to comprehend and extract information from financial reports. To evaluate AFIE, we develop a Financial Reports Numerical Extraction (FINE) dataset and conduct an extensive experimental analysis. Our framework is effectively validated on GPT-3.5 and GPT-4, yielding average accuracy increases of 53.94 compared to a naive method. These results suggest that the AFIE framework offers accuracy for automated numerical extraction from complex, hybrid documents.

READ FULL TEXT

page 9

page 18

page 20

research
06/14/2022

FETILDA: An Effective Framework For Fin-tuned Embeddings For Long Financial Text Documents

Unstructured data, especially text, continues to grow rapidly in various...
research
05/27/2023

Financial misstatement detection: a realistic evaluation

In this work, we examine the evaluation process for the task of detectin...
research
07/06/2019

Qwant Research @DEFT 2019: Document matching and information retrieval using clinical cases

This paper reports on Qwant Research contribution to tasks 2 and 3 of th...
research
09/01/2021

FinQA: A Dataset of Numerical Reasoning over Financial Data

The sheer volume of financial statements makes it difficult for humans t...
research
06/08/2023

Interpretable Medical Diagnostics with Structured Data Extraction by Large Language Models

Tabular data is often hidden in text, particularly in medical diagnostic...
research
02/18/2023

Form 10-K Itemization

Form 10-K report is a financial report disclosing the annual financial s...
research
05/14/2021

Extracting Variable-Depth Logical Document Hierarchy from Long Documents: Method, Evaluation, and Application

In this paper, we study the problem of extracting variable-depth "logica...

Please sign up or login with your details

Forgot password? Click here to reset