AMuRD: Annotated Multilingual Receipts Dataset for Cross-lingual Key Information Extraction and Classification

09/18/2023
by   Abdelrahman Abdallah, et al.
0

Key information extraction involves recognizing and extracting text from scanned receipts, enabling retrieval of essential content, and organizing it into structured documents. This paper presents a novel multilingual dataset for receipt extraction, addressing key challenges in information extraction and item classification. The dataset comprises 47,720 samples, including annotations for item names, attributes like (price, brand, etc.), and classification into 44 product categories. We introduce the InstructLLaMA approach, achieving an F1 score of 0.76 and an accuracy of 0.68 for key information extraction and item classification. We provide code, datasets, and checkpoints.[<https://github.com/Update-For-Integrated-Business-AI/AMuRD>].

READ FULL TEXT
research
10/19/2020

The RELX Dataset and Matching the Multilingual Blanks for Cross-Lingual Relation Classification

Relation classification is one of the key topics in information extracti...
research
02/09/2023

Massively Multilingual Language Models for Cross Lingual Fact Extraction from Low Resource Indian Languages

Massive knowledge graphs like Wikidata attempt to capture world knowledg...
research
11/17/2022

GLAMI-1M: A Multilingual Image-Text Fashion Dataset

We introduce GLAMI-1M: the largest multilingual image-text classificatio...
research
04/20/2023

Prompt-Learning for Cross-Lingual Relation Extraction

Relation Extraction (RE) is a crucial task in Information Extraction, wh...
research
02/23/2023

Automated Extraction of Fine-Grained Standardized Product Information from Unstructured Multilingual Web Data

Extracting structured information from unstructured data is one of the k...
research
04/18/2022

Ingredient Extraction from Text in the Recipe Domain

In recent years, there has been an increase in the number of devices wit...
research
03/18/2021

ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction

Scanned receipts OCR and key information extraction (SROIE) represent th...

Please sign up or login with your details

Forgot password? Click here to reset