Lightly-supervised Representation Learning with Global Interpretability

We propose a lightly-supervised approach for information extraction, in particular named entity classification, which combines the benefits of traditional bootstrapping, i.e., use of limited annotations and interpretability of extraction patterns, with the robust learning approaches proposed in representation learning. Our algorithm iteratively learns custom embeddings for both the multi-word entities to be extracted and the patterns that match them from a few example entities per category. We demonstrate that this representation-based approach outperforms three other state-of-the-art bootstrapping approaches on two datasets: CoNLL-2003 and OntoNotes. Additionally, using these embeddings, our approach outputs a globally-interpretable model consisting of a decision list, by ranking patterns based on their proximity to the average entity embedding in a given class. We show that this interpretable model performs close to our complete bootstrapping model, proving that representation learning can be used to produce interpretable models with small loss in performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/03/2022

Intermediate Entity-based Sparse Interpretable Representation Learning

Interpretable entity representations (IERs) are sparse embeddings that a...
research
04/26/2023

Key-value information extraction from full handwritten pages

We propose a Transformer-based approach for information extraction from ...
research
10/20/2020

Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation

A challenge for named entity disambiguation (NED), the task of mapping t...
research
01/08/2017

Multi-level Representations for Fine-Grained Typing of Knowledge Base Entities

Entities are essential elements of natural language. In this paper, we p...
research
09/23/2022

Incorporation of Human Knowledge into Data Embeddings to Improve Pattern Significance and Interpretability

Embedding is a common technique for analyzing multi-dimensional data. Ho...
research
02/05/2020

Dropout Prediction over Weeks in MOOCs via Interpretable Multi-Layer Representation Learning

Massive Open Online Courses (MOOCs) have become popular platforms for on...
research
08/10/2017

SESA: Supervised Explicit Semantic Analysis

In recent years supervised representation learning has provided state of...

Please sign up or login with your details

Forgot password? Click here to reset