Leveraging Large Language Models for Topic Classification in the Domain of Public Affairs

06/05/2023
by   Alejandro Peña, et al.
0

The analysis of public affairs documents is crucial for citizens as it promotes transparency, accountability, and informed decision-making. It allows citizens to understand government policies, participate in public discourse, and hold representatives accountable. This is crucial, and sometimes a matter of life or death, for companies whose operation depend on certain regulations. Large Language Models (LLMs) have the potential to greatly enhance the analysis of public affairs documents by effectively processing and understanding the complex language used in such documents. In this work, we analyze the performance of LLMs in classifying public affairs documents. As a natural multi-label task, the classification of these documents presents important challenges. In this work, we use a regex-powered tool to collect a database of public affairs documents with more than 33K samples and 22.5M tokens. Our experiments assess the performance of 4 different Spanish LLMs to classify up to 30 different topics in the data in different configurations. The results shows that LLMs can be of great use to process domain-specific documents, such as those in the domain of public affairs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/20/2023

Re-visiting Automated Topic Model Evaluation with Large Language Models

Topic models are used to make sense of large text collections. However, ...
research
04/19/2023

Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes

A long standing goal of the data management community is to develop gene...
research
06/12/2023

Document Layout Annotation: Database and Benchmark in the Domain of Public Affairs

Every day, thousands of digital documents are generated with useful info...
research
05/04/2021

Towards Accountability in the Use of Artificial Intelligence for Public Administrations

We argue that the phenomena of distributed responsibility, induced accep...
research
08/10/2023

LLM As DBA

Database administrators (DBAs) play a crucial role in managing, maintain...
research
06/28/2016

Hierarchical Neural Language Models for Joint Representation of Streaming Documents and their Content

We consider the problem of learning distributed representations for docu...
research
06/27/2023

Paradigm Shift in Sustainability Disclosure Analysis: Empowering Stakeholders with CHATREPORT, a Language Model-Based Tool

This paper introduces a novel approach to enhance Large Language Models ...

Please sign up or login with your details

Forgot password? Click here to reset