Embedding-based Retrieval with LLM for Effective Agriculture Information Extracting from Unstructured Data

08/06/2023
by   Ruoling Peng, et al.
0

Pest identification is a crucial aspect of pest control in agriculture. However, most farmers are not capable of accurately identifying pests in the field, and there is a limited number of structured data sources available for rapid querying. In this work, we explored using domain-agnostic general pre-trained large language model(LLM) to extract structured data from agricultural documents with minimal or no human intervention. We propose a methodology that involves text retrieval and filtering using embedding-based retrieval, followed by LLM question-answering to automatically extract entities and attributes from the documents, and transform them into structured data. In comparison to existing methods, our approach achieves consistently better accuracy in the benchmark while maintaining efficiency.

READ FULL TEXT
research
08/20/2020

Constructing a Knowledge Graph from Unstructured Documents without External Alignment

Knowledge graphs (KGs) are relevant to many NLP tasks, but building a re...
research
02/02/2019

A Multi-Resolution Word Embedding for Document Retrieval from Large Unstructured Knowledge Bases

Deep language models learning a hierarchical representation proved to be...
research
02/14/2023

Large-Scale Knowledge Synthesis and Complex Information Retrieval from Biomedical Documents

Recent advances in the healthcare industry have led to an abundance of u...
research
10/29/2021

Learning Representations for Zero-Shot Retrieval over Structured Data

Large Scale Question-Answering systems today are widely used in downstre...
research
03/26/2017

Question Answering from Unstructured Text by Retrieval and Comprehension

Open domain Question Answering (QA) systems must interact with external ...
research
05/31/2023

Structure-Aware Language Model Pretraining Improves Dense Retrieval on Structured Data

This paper presents Structure Aware Dense Retrieval (SANTA) model, which...
research
11/05/2021

A Semi-automatic Data Extraction System for Heterogeneous Data Sources: A Case Study from Cotton Industry

With the recent developments in digitisation, there are increasing numbe...

Please sign up or login with your details

Forgot password? Click here to reset