Visual Grounding in Video for Unsupervised Word Translation

03/11/2020
by   Gunnar A. Sigurdsson, et al.
8

There are thousands of actively spoken languages on Earth, but a single visual world. Grounding in this visual world has the potential to bridge the gap between all these languages. Our goal is to use visual grounding to improve unsupervised word mapping between languages. The key idea is to establish a common visual representation between two languages by learning embeddings from unpaired instructional videos narrated in the native language. Given this shared embedding we demonstrate that (i) we can map words between the languages, particularly the 'visual' words; (ii) that the shared embedding provides a good initialization for existing unsupervised text-based word translation techniques, forming the basis for our proposed hybrid visual-text mapping algorithm, MUVE; and (iii) our approach achieves superior performance by addressing the shortcomings of text-based methods – it is more robust, handles datasets with less commonality, and is applicable to low-resource languages. We apply these methods to translate words from English to French, Korean, and Japanese – all without any parallel corpora and simply by watching many videos of people speaking while doing things.

READ FULL TEXT

page 1

page 2

page 6

research
10/14/2019

Mapping Supervised Bilingual Word Embeddings from English to low-resource languages

It is very challenging to work with low-resource languages due to the in...
research
10/12/2017

Emergent Translation in Multi-Agent Communication

While most machine translation systems to date are trained on large para...
research
02/12/2020

Unsupervised Separation of Native and Loanwords for Malayalam and Telugu

Quite often, words from one language are adopted within a different lang...
research
08/24/2018

A Visual Attention Grounding Neural Model for Multimodal Machine Translation

We introduce a novel multimodal machine translation model that utilizes ...
research
09/08/2022

Visual Grounding of Inter-lingual Word-Embeddings

Visual grounding of Language aims at enriching textual representations o...
research
04/04/2019

Revisiting Adversarial Autoencoder for Unsupervised Word Translation with Cycle Consistency and Improved Training

Adversarial training has shown impressive success in learning bilingual ...
research
03/26/2022

Joint Transformer/RNN Architecture for Gesture Typing in Indic Languages

Gesture typing is a method of typing words on a touch-based keyboard by ...

Please sign up or login with your details

Forgot password? Click here to reset