Siddharth Dalmia

research

∙ 09/19/2023

Multimodal Modeling For Spoken Language Identification

Spoken language identification refers to the task of automatically predi...

0 Shikhar Bharadwaj, et al. ∙

research

∙ 04/10/2023

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit

ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitat...

0 Brian Yan, et al. ∙

research

∙ 11/11/2022

Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation

The black-box nature of end-to-end speech translation (E2E ST) systems m...

0 Motoi Omachi, et al. ∙

research

∙ 11/10/2022

A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding

Collecting sufficient labeled data for spoken language understanding (SL...

0 Yifan Peng, et al. ∙

research

∙ 10/27/2022

Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models

End-to-end spoken language understanding (SLU) systems are gaining popul...

0 Siddhant Arora, et al. ∙

research

∙ 10/11/2022

CTC Alignments Improve Autoregressive Translation

Connectionist Temporal Classification (CTC) is a widely used approach fo...

0 Brian Yan, et al. ∙

research

∙ 07/14/2022

Two-Pass Low Latency End-to-End Spoken Language Understanding

End-to-end (E2E) models are becoming increasingly popular for spoken lan...

0 Siddhant Arora, et al. ∙

research

∙ 07/06/2022

Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding

Conformer has proven to be effective in many speech processing tasks. It...

0 Yifan Peng, et al. ∙

research

∙ 06/07/2022

LegoNN: Building Modular Encoder-Decoder Models

State-of-the-art encoder-decoder models (e.g. for machine translation (M...

0 Siddharth Dalmia, et al. ∙

research

∙ 05/25/2022

FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech

We introduce FLEURS, the Few-shot Learning Evaluation of Universal Repre...

0 Alexis Conneau, et al. ∙

research

∙ 11/29/2021

Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization

Conversational bilingual speech encompasses three types of utterances: t...

0 Brian Yan, et al. ∙

research

∙ 11/29/2021

ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet

As Automatic Speech Processing (ASR) systems are getting better, there i...

0 Siddhant Arora, et al. ∙

research

∙ 09/27/2021

Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates

The multi-decoder (MD) end-to-end speech translation model has demonstra...

0 Hirofumi Inaguma, et al. ∙

research

∙ 07/24/2021

Differentiable Allophone Graphs for Language-Universal Speech Recognition

Building language-universal speech recognition systems entails producing...

0 Brian Yan, et al. ∙

research

∙ 07/01/2021

ESPnet-ST IWSLT 2021 Offline Speech Translation System

This paper describes the ESPnet-ST group's IWSLT 2021 submission in the ...

0 Hirofumi Inaguma, et al. ∙

research

∙ 06/29/2021

Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding

Decomposable tasks are complex and comprise of a hierarchy of sub-tasks....

0 Siddhant Arora, et al. ∙

research

∙ 05/02/2021

Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks

End-to-end approaches for sequence tasks are becoming increasingly popul...

7 Siddharth Dalmia, et al. ∙

research

∙ 11/30/2020

Transformer-Transducers for Code-Switched Speech Recognition

We live in a world where 60 languages fluently. Members of these communi...

0 Siddharth Dalmia, et al. ∙

research

∙ 02/26/2020

Universal Phone Recognition with a Multilingual Allophone System

Multilingual models can improve language processing, particularly for lo...

0 Xinjian Li, et al. ∙

research

∙ 02/26/2020

Towards Zero-shot Learning for Automatic Phonemic Transcription

Automatic phonemic transcription tools are useful for low-resource langu...

0 Xinjian Li, et al. ∙

research

∙ 11/09/2019

Enforcing Encoder-Decoder Modularity in Sequence-to-Sequence Models

Inspired by modular software design principles of independence, intercha...

0 Siddharth Dalmia, et al. ∙

research

∙ 08/02/2019

SANTLR: Speech Annotation Toolkit for Low Resource Languages

While low resource speech recognition has attracted a lot of attention f...

0 Xinjian Li, et al. ∙

research

∙ 08/02/2019

Multilingual Speech Recognition with Corpus Relatedness Sampling

Multilingual acoustic models have been successfully applied to low-resou...

0 Xinjian Li, et al. ∙

research

∙ 07/24/2019

Cross-Attention End-to-End ASR for Two-Party Conversations

We present an end-to-end speech recognition model that learns interactio...

0 Suyoun Kim, et al. ∙

research

∙ 06/27/2019

Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion

We present a novel conversational-context aware end-to-end speech recogn...

0 Suyoun Kim, et al. ∙

research

∙ 02/24/2019

The ARIEL-CMU Systems for LoReHLT18

This paper describes the ARIEL-CMU submissions to the Low Resource Human...

0 Aditi Chaudhary, et al. ∙

research

∙ 02/20/2019

Phoneme Level Language Models for Sequence Based Low Resource ASR

Building multilingual and crosslingual models help bring different langu...

0 Siddharth Dalmia, et al. ∙

research

∙ 07/28/2018

Domain Robust Feature Extraction for Rapid Low Resource ASR Development

Developing a practical speech recognizer for a low resource language is ...

0 Siddharth Dalmia, et al. ∙

research

∙ 02/21/2018

Sequence-based Multi-lingual Low Resource Speech Recognition

Techniques for multi-lingual and cross-lingual speech recognition can he...

0 Siddharth Dalmia, et al. ∙

Siddharth Dalmia

Featured Co-authors

Sign in with Google

Consider DeepAI Pro