Next-Year Bankruptcy Prediction from Textual Data: Benchmark and Baselines

08/24/2022
by   Henri Arno, et al.
0

Models for bankruptcy prediction are useful in several real-world scenarios, and multiple research contributions have been devoted to the task, based on structured (numerical) as well as unstructured (textual) data. However, the lack of a common benchmark dataset and evaluation strategy impedes the objective comparison between models. This paper introduces such a benchmark for the unstructured data scenario, based on novel and established datasets, in order to stimulate further research into the task. We describe and evaluate several classical and neural baseline models, and discuss benefits and flaws of different strategies. In particular, we find that a lightweight bag-of-words model based on static in-domain word representations obtains surprisingly good results, especially when taking textual data from several years into account. These results are critically assessed, and discussed in light of particular aspects of the data and the task. All code to replicate the data and experimental results will be released.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2012

A Biomimetic Approach Based on Immune Systems for Classification of Unstructured Data

In this paper we present the results of unstructured data clustering in ...
research
03/17/2022

Enhanced Temporal Knowledge Embeddings with Contextualized Language Representations

With the emerging research effort to integrate structured and unstructur...
research
10/04/2021

Benchmarking Data Lakes Featuring Structured and Unstructured Data with DLBench

In the last few years, the concept of data lake has become trendy for da...
research
09/12/2020

FuxiCTR: An Open Benchmark for Click-Through Rate Prediction

In many applications, such as recommender systems, online advertising, a...
research
10/19/2022

Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversarial NLP

Textual adversarial samples play important roles in multiple subfields o...
research
05/18/2021

rx-anon – A Novel Approach on the De-Identification of Heterogeneous Data based on a Modified Mondrian Algorithm

Traditional approaches for data anonymization consider relational data a...
research
08/29/2019

A Deep Neural Information Fusion Architecture for Textual Network Embeddings

Textual network embeddings aim to learn a low-dimensional representation...

Please sign up or login with your details

Forgot password? Click here to reset