Evaluating the Utility of Hand-crafted Features in Sequence Labelling

08/28/2018
by   Minghao Wu, et al.
0

Conventional wisdom is that hand-crafted features are redundant for deep learning models, as they already learn adequate representations of text automatically from corpora. In this work, we test this claim by proposing a new method for exploiting handcrafted features as part of a novel hybrid learning approach, incorporating a feature auto-encoder loss component. We evaluate on the task of named entity recognition (NER), where we show that including manual features for part-of-speech, word shapes and gazetteers can improve the performance of a neural CRF model. We obtain a F_1 of 91.89 for the CoNLL-2003 English shared task, which significantly outperforms a collection of highly competitive baseline models. We also present an ablation study showing the importance of auto-encoding, over using features as either inputs or outputs alone, and moreover, show including the autoencoder components reduces training requirements to 60%, while retaining the same predictive accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/04/2016

Neural Architectures for Named Entity Recognition

State-of-the-art named entity recognition systems rely heavily on hand-c...
research
10/09/2017

Multitask training with unlabeled data for end-to-end sign language fingerspelling recognition

We address the problem of automatic American Sign Language fingerspellin...
research
02/26/2019

Multi-Task Learning with Contextualized Word Representations for Extented Named Entity Recognition

Fine-Grained Named Entity Recognition (FG-NER) is critical for many NLP ...
research
07/01/2016

Sharing Network Parameters for Crosslingual Named Entity Recognition

Most state of the art approaches for Named Entity Recognition rely on ha...
research
08/22/2023

Non-Redundant Combination of Hand-Crafted and Deep Learning Radiomics: Application to the Early Detection of Pancreatic Cancer

We address the problem of learning Deep Learning Radiomics (DLR) that ar...
research
03/10/2019

Named Entity Recognition for Electronic Health Records: A Comparison of Rule-based and Machine Learning Approaches

This work investigates multiple approaches to Named Entity Recognition (...
research
10/24/2018

Automatic Identification of Indicators of Compromise using Neural-Based Sequence Labelling

Indicators of Compromise (IOCs) are artifacts observed on a network or i...

Please sign up or login with your details

Forgot password? Click here to reset