Developing a comprehensive framework for multimodal feature extraction

02/20/2017
by   Quinten McNamara, et al.
0

Feature extraction is a critical component of many applied data science workflows. In recent years, rapid advances in artificial intelligence and machine learning have led to an explosion of feature extraction tools and services that allow data scientists to cheaply and effectively annotate their data along a vast array of dimensions---ranging from detecting faces in images to analyzing the sentiment expressed in coherent text. Unfortunately, the proliferation of powerful feature extraction services has been mirrored by a corresponding expansion in the number of distinct interfaces to feature extraction services. In a world where nearly every new service has its own API, documentation, and/or client library, data scientists who need to combine diverse features obtained from multiple sources are often forced to write and maintain ever more elaborate feature extraction pipelines. To address this challenge, we introduce a new open-source framework for comprehensive multimodal feature extraction. Pliers is an open-source Python package that supports standardized annotation of diverse data types (video, images, audio, and text), and is expressly with both ease-of-use and extensibility in mind. Users can apply a wide range of pre-existing feature extraction tools to their data in just a few lines of Python code, and can also easily add their own custom extractors by writing modular classes. A graph-based API enables rapid development of complex feature extraction pipelines that output results in a single, standardized format. We describe the package's architecture, detail its major advantages over previous feature extraction toolboxes, and use a sample application to a large functional MRI dataset to illustrate how pliers can significantly reduce the time and effort required to construct sophisticated feature extraction workflows while increasing code clarity and maintainability.

READ FULL TEXT
research
07/07/2022

Multimodal Feature Extraction for Memes Sentiment Classification

In this study, we propose feature extraction for multimodal meme classif...
research
11/17/2021

Exploring Unsupervised Learning Methods for Automated Protocol Analysis

The ability to analyse and differentiate network protocol traffic is cru...
research
05/18/2020

Surfboard: Audio Feature Extraction for Modern Machine Learning

We introduce Surfboard, an open-source Python library for extracting aud...
research
05/25/2023

LFTK: Handcrafted Features in Computational Linguistics

Past research has identified a rich set of handcrafted linguistic featur...
research
05/20/2020

BlaBla: Linguistic Feature Extraction for Clinical Analysis in Multiple Languages

We introduce BlaBla, an open-source Python library for extracting lingui...
research
03/02/2022

Interactive Visualization of Protein RINs using NetworKit in the Cloud

Network analysis has been applied in diverse application domains. In thi...

Please sign up or login with your details

Forgot password? Click here to reset