Named Entity Recognition for Audio De-Identification

04/26/2022
by   Guillaume Baril, et al.
0

Data anonymization is often a task carried out by humans. Automating it would reduce the cost and time required to complete this task. This paper presents a pipeline to automate the anonymization of audio data in French. We propose a pipeline, which takes audio files with their transcriptions and removes the named entities (NEs) present in the audio. Our pipeline is made up of a forced aligner, which aligns words in an audio transcript with speech and a model that performs named entity recognition (NER). Then, the audio segments that correspond to NEs are substituted with silence to anonymize audio. We compared forced aligners and NER models to find the best ones for our scenario. We evaluated our pipeline on a small hand-annotated dataset, achieving an F1 score of 0.769. This result shows that automating this task is feasible.

READ FULL TEXT
research
03/17/2019

Audio De-identification: A New Entity Recognition Task

Named Entity Recognition (NER) has been mostly studied in the context of...
research
05/30/2018

End-to-end named entity extraction from speech

Named entity recognition (NER) is among SLU tasks that usually extract s...
research
04/25/2020

A Named Entity Based Approach to Model Recipes

Traditional cooking recipes follow a structure which can be modelled ver...
research
05/18/2020

A Semantically Enriched Dataset based on Biomedical NER for the COVID19 Open Research Dataset Challenge

Research into COVID-19 is a big challenge and highly relevant at the mom...
research
03/01/2023

DTW-SiameseNet: Dynamic Time Warped Siamese Network for Mispronunciation Detection and Correction

Personal Digital Assistants (PDAs) - such as Siri, Alexa and Google Assi...
research
05/22/2023

Taxonomy Expansion for Named Entity Recognition

Training a Named Entity Recognition (NER) model often involves fixing a ...
research
04/16/2023

EasyNER: A Customizable Easy-to-Use Pipeline for Deep Learning- and Dictionary-based Named Entity Recognition from Medical Text

Medical research generates a large number of publications with the PubMe...

Please sign up or login with your details

Forgot password? Click here to reset