LFTK: Handcrafted Features in Computational Linguistics

05/25/2023
by   Bruce W. Lee, et al.
0

Past research has identified a rich set of handcrafted linguistic features that can potentially assist various tasks. However, their extensive number makes it difficult to effectively select and utilize existing handcrafted features. Coupled with the problem of inconsistent implementation across research works, there has been no categorization scheme or generally-accepted feature names. This creates unwanted confusion. Also, most existing handcrafted feature extraction libraries are not open-source or not actively maintained. As a result, a researcher often has to build such an extraction system from the ground up. We collect and categorize more than 220 popular handcrafted features grounded on past literature. Then, we conduct a correlation analysis study on several task-specific datasets and report the potential use cases of each feature. Lastly, we devise a multilingual handcrafted linguistic feature extraction system in a systematically expandable manner. We open-source our system for public access to a rich set of pre-implemented handcrafted features. Our system is coined LFTK and is the largest of its kind. Find it at github.com/brucewlee/lftk.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/20/2020

BlaBla: Linguistic Feature Extraction for Clinical Analysis in Multiple Languages

We introduce BlaBla, an open-source Python library for extracting lingui...
research
08/17/2021

M-ar-K-Fast Independent Component Analysis

This study presents the m-arcsinh Kernel ('m-ar-K') Fast Independent Com...
research
08/08/2022

DeepTLS: comprehensive and high-performance feature extraction for encrypted traffic

Feature extraction is critical for TLS traffic analysis using machine le...
research
02/20/2017

Developing a comprehensive framework for multimodal feature extraction

Feature extraction is a critical component of many applied data science ...
research
07/11/2023

Optimizing Feature Extraction for Symbolic Music

This paper presents a comprehensive investigation of existing feature ex...
research
03/21/2022

MixFormer: End-to-End Tracking with Iterative Mixed Attention

Tracking often uses a multi-stage pipeline of feature extraction, target...
research
11/06/2021

Linguistic Cues of Deception in a Multilingual April Fools' Day Context

In this work we consider the collection of deceptive April Fools' Day(AF...

Please sign up or login with your details

Forgot password? Click here to reset