Field Extraction from Forms with Unlabeled Data

10/08/2021
by   Mingfei Gao, et al.
6

We propose a novel framework to conduct field extraction from forms with unlabeled data. To bootstrap the training process, we develop a rule-based method for mining noisy pseudo-labels from unlabeled forms. Using the supervisory signal from the pseudo-labels, we extract a discriminative token representation from a transformer-based model by modeling the interaction between text in the form. To prevent the model from overfitting to label noise, we introduce a refinement module based on a progressive pseudo-label ensemble. Experimental results demonstrate the effectiveness of our framework.

READ FULL TEXT
research
01/25/2022

AggMatch: Aggregating Pseudo Labels for Semi-Supervised Learning

Semi-supervised learning (SSL) has recently proven to be an effective pa...
research
10/08/2021

Robustness Evaluation of Transformer-based Form Field Extractors via Form Attacks

We propose a novel framework to evaluate the robustness of transformer-b...
research
06/09/2021

Semi-Supervised Training with Pseudo-Labeling for End-to-End Neural Diarization

In this paper, we present a semi-supervised training technique using pse...
research
12/17/2019

Field Label Prediction for Autofill in Web Browsers

Automatic form fill is an important productivity related feature present...
research
04/23/2021

STRUDEL: Self-Training with Uncertainty Dependent Label Refinement across Domains

We propose an unsupervised domain adaptation (UDA) approach for white ma...
research
09/12/2020

DualLip: A System for Joint Lip Reading and Generation

Lip reading aims to recognize text from talking lip, while lip generatio...

Please sign up or login with your details

Forgot password? Click here to reset