Multimodal Document Analytics for Banking Process Automation

by   Christopher Gerling, et al.

In response to growing FinTech competition and the need for improved operational efficiency, this research focuses on understanding the potential of advanced document analytics, particularly using multimodal models, in banking processes. We perform a comprehensive analysis of the diverse banking document landscape, highlighting the opportunities for efficiency gains through automation and advanced analytics techniques in the customer business. Building on the rapidly evolving field of natural language processing (NLP), we illustrate the potential of models such as LayoutXLM, a cross-lingual, multimodal, pre-trained model, for analyzing diverse documents in the banking sector. This model performs a text token classification on German company register extracts with an overall F1 score performance of around 80%. Our empirical evidence confirms the critical role of layout information in improving model performance and further underscores the benefits of integrating image information. Interestingly, our study shows that over 75 achieved with only 30 LayoutXLM. Through addressing state-of-the-art document analysis frameworks, our study aims to enhance process efficiency and demonstrate the real-world applicability and benefits of multimodal models within banking.


LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

Multimodal pre-training with text, layout, and image has achieved SOTA p...

Position Masking for Improved Layout-Aware Document Understanding

Natural language processing for document scans and PDFs has the potentia...
03/14/2023 A Comprehensive German BERT Model for the Medical Domain

This paper presents medBERTde, a pre-trained German BERT model specifica...

Fast derivation of neural network based document vectors with distance constraint and negative sampling

A universal cross-lingual representation of documents is very important ...

Polarity based Sarcasm Detection using Semigraph

Sarcasm is an advanced linguistic expression often found on various onli...

Learning a faceted customer segmentation for discovering new business opportunities at Intel

For sales and marketing organizations within large enterprises, identify...

Please sign up or login with your details

Forgot password? Click here to reset