A Hierarchical Approach to exploiting Multiple Datasets from TalkBank

06/21/2023
by   Man Ho Wong, et al.
0

TalkBank is an online database that facilitates the sharing of linguistics research data. However, the existing TalkBank's API has limited data filtering and batch processing capabilities. To overcome these limitations, this paper introduces a pipeline framework that employs a hierarchical search approach, enabling efficient complex data selection. This approach involves a quick preliminary screening of relevant corpora that a researcher may need, and then perform an in-depth search for target data based on specific criteria. The identified files are then indexed, providing easier access for future analysis. Furthermore, the paper demonstrates how data from different studies curated with the framework can be integrated by standardizing and cleaning metadata, allowing researchers to extract insights from a large, integrated dataset. While being designed for TalkBank, the framework can also be adapted to process data from other open-science platforms.

READ FULL TEXT
research
03/17/2019

Shining a light on Spotlight: Leveraging Apple's desktop search utility to recover deleted file metadata on macOS

Spotlight is a proprietary desktop search technology released by Apple i...
research
04/24/2018

On-Demand Big Data Integration: A Hybrid ETL Approach for Reproducible Scientific Research

Scientific research requires access, analysis, and sharing of data that ...
research
02/08/2018

Praaline: Integrating Tools for Speech Corpus Research

This paper presents Praaline, an open-source software system for managin...
research
02/14/2020

Deploying large fixed file datasets with SquashFS and Singularity

Shared high-performance computing (HPC) platforms, such as those provide...
research
05/26/2023

DataChat: Prototyping a Conversational Agent for Dataset Search and Visualization

Data users need relevant context and research expertise to effectively s...
research
09/17/2020

Extensible Data Skipping

Data skipping reduces I/O for SQL queries by skipping over irrelevant da...
research
10/18/2019

PyTorchPipe: a framework for rapid prototyping of pipelines combining language and vision

Access to vast amounts of data along with affordable computational power...

Please sign up or login with your details

Forgot password? Click here to reset