SONAR: Sentence-Level Multimodal and Language-Agnostic Representations

08/22/2023
by   Paul-Ambroise Duquenne, et al.
0

We introduce SONAR, a new multilingual and multimodal fixed-size sentence embedding space. Our single text encoder, covering 200 languages, substantially outperforms existing sentence embeddings such as LASER3 and LabSE on the xsim and xsim++ multilingual similarity search tasks. Speech segments can be embedded in the same SONAR embedding space using language-specific speech encoders trained in a teacher-student setting on speech transcription data. Our encoders outperform existing speech encoders on similarity search tasks. We also provide a text decoder for 200 languages, which allows us to perform text-to-text and speech-to-text machine translation, including for zero-shot language and modality combinations. Our text-to-text results are competitive compared to the state-of-the-art NLLB 1B model, despite the fixed-size bottleneck representation. Our zero-shot speech-to-text translation results compare favorably with strong supervised baselines such as Whisper.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2022

T-Modules: Translation Modules for Zero-Shot Cross-Modal Machine Translation

We present a new approach to perform zero-shot cross-modal transfer betw...
research
12/20/2022

TeSS: Zero-Shot Classification via Textual Similarity Comparison with Prompting using Sentence Encoder

We introduce TeSS (Text Similarity Comparison using Sentence Encoder), a...
research
04/04/2022

Analysis of Joint Speech-Text Embeddings for Semantic Matching

Embeddings play an important role in many recent end-to-end solutions fo...
research
06/21/2018

Learning Shared Multimodal Embeddings with Unpaired Data

In this paper, we propose a method to learn a joint multimodal embedding...
research
11/10/2014

Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models

Inspired by recent advances in multimodal learning and machine translati...
research
04/14/2023

Zero-Shot Multi-Label Topic Inference with Sentence Encoders

Sentence encoders have indeed been shown to achieve superior performance...
research
11/14/2016

Zero-resource Machine Translation by Multimodal Encoder-decoder Network with Multimedia Pivot

We propose an approach to build a neural machine translation system with...

Please sign up or login with your details

Forgot password? Click here to reset