Efficient Ensemble Architecture for Multimodal Acoustic and Textual Embeddings in Punctuation Restoration using Time-Delay Neural Networks

02/26/2023
by   Xing Yi Liu, et al.
0

Punctuation restoration plays an essential role in the post-processing procedure of automatic speech recognition, but model efficiency is a key requirement for this task. To that end, we present EfficientPunct, an ensemble method with a multimodal time-delay neural network that outperforms the current best model by 1.0 F1 points, using less than a tenth of its parameters to process embeddings. We streamline a speech recognizer to efficiently output hidden layer latent vectors as audio embeddings for punctuation restoration, as well as BERT to extract meaningful text embeddings. By using forced alignment and temporal convolutions, we eliminate the need for multi-head attention-based fusion, greatly increasing computational efficiency but also raising performance. EfficientPunct sets a new state of the art, in terms of both performance and efficiency, with an ensemble that weights BERT's purely language-based predictions slightly more than the multimodal network's predictions.

READ FULL TEXT
research
01/24/2022

Unified Multimodal Punctuation Restoration Framework for Mixed-Modality Corpus

The punctuation restoration task aims to correctly punctuate the output ...
research
06/12/2021

Incorporating External POS Tagger for Punctuation Restoration

Punctuation restoration is an important post-processing step in automati...
research
02/19/2022

Punctuation Restoration

Given the increasing number of livestreaming videos, automatic speech re...
research
02/13/2022

Multimodal Depression Classification Using Articulatory Coordination Features And Hierarchical Attention Based Text Embeddings

Multimodal depression classification has gained immense popularity over ...
research
05/24/2021

Diacritics Restoration using BERT with Analysis on Czech language

We propose a new architecture for diacritics restoration based on contex...
research
08/15/2023

MultiSChuBERT: Effective Multimodal Fusion for Scholarly Document Quality Prediction

Automatic assessment of the quality of scholarly documents is a difficul...

Please sign up or login with your details

Forgot password? Click here to reset