FiNER: Financial Numeric Entity Recognition for XBRL Tagging

03/12/2022
by   Lefteris Loukas, et al.
0

Publicly traded companies are required to submit periodic reports with eXtensive Business Reporting Language (XBRL) word-level tags. Manually tagging the reports is tedious and costly. We, therefore, introduce XBRL tagging as a new entity extraction task for the financial domain and release FiNER-139, a dataset of 1.1M sentences with gold XBRL tags. Unlike typical entity extraction datasets, FiNER-139 uses a much larger label set of 139 entity types. Most annotated tokens are numeric, with the correct tag per token depending mostly on context, rather than the token itself. We show that subword fragmentation of numeric expressions harms BERT's performance, allowing word-level BILSTMs to perform better. To improve BERT's performance, we propose two simple and effective solutions that replace numeric expressions with pseudo-tokens reflecting original token shapes and numeric magnitudes. We also experiment with FIN-BERT, an existing BERT model for the financial domain, and release our own BERT (SEC-BERT), pre-trained on financial filings, which performs best. Through data and error analysis, we finally identify possible limitations to inspire future work on XBRL tagging.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/03/2022

KPI-BERT: A Joint Named Entity Recognition and Relation Extraction Model for Financial Reports

We present KPI-BERT, a system which employs novel methods of named entit...
research
03/21/2022

Neural Token Segmentation for High Token-Internal Complexity

Tokenizing raw texts into word units is an essential pre-processing step...
research
05/31/2022

FinBERT-MRC: financial named entity recognition using BERT under the machine reading comprehension paradigm

Financial named entity recognition (FinNER) from literature is a challen...
research
10/08/2022

Detecting Label Errors in Token Classification Data

Mislabeled examples are a common issue in real-world data, particularly ...
research
09/29/2021

EDGAR-CORPUS: Billions of Tokens Make The World Go Round

We release EDGAR-CORPUS, a novel corpus comprising annual reports from a...
research
06/06/2023

Financial Numeric Extreme Labelling: A Dataset and Benchmarking for XBRL Tagging

The U.S. Securities and Exchange Commission (SEC) mandates all public co...
research
11/11/2022

Towards automating Numerical Consistency Checks in Financial Reports

We introduce KPI-Check, a novel system that automatically identifies and...

Please sign up or login with your details

Forgot password? Click here to reset