Vishrav Chaudhary

research

∙ 05/23/2023

DUBLIN – Document Understanding By Language-Image Network

Visual document understanding is a complex task that involves analyzing ...

0 Kriti Aggarwal, et al. ∙

research

∙ 02/27/2023

Language Is Not All You Need: Aligning Perception with Language Models

A big convergence of language, multimodal perception, action, and world ...

0 Shaohan Huang, et al. ∙

research

∙ 01/27/2023

Understanding the Effectiveness of Very Large Language Models on Dialog Evaluation

Language models have steadily increased in size over the past few years....

0 Jessica Huynh, et al. ∙

research

∙ 12/20/2022

A Length-Extrapolatable Transformer

Position modeling plays a critical role in Transformers. In this paper, ...

0 Yutao Sun, et al. ∙

research

∙ 11/23/2022

TorchScale: Transformers at Scale

Large Transformers have achieved state-of-the-art performance across man...

0 Shuming Ma, et al. ∙

research

∙ 11/16/2022

Holistic Evaluation of Language Models

Language models (LMs) are becoming the foundation for almost all major l...

21 Percy Liang, et al. ∙

research

∙ 10/26/2022

Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning

In this paper, we elaborate upon recipes for building multilingual repre...

0 Barun Patra, et al. ∙

research

∙ 10/13/2022

Language Model Decoding as Likelihood-Utility Alignment

A critical component of a successful language generation pipeline is the...

11 Martin Josifoski, et al. ∙

research

∙ 10/12/2022

Foundation Transformers

A big convergence of model architectures across language, vision, speech...

26 Hongyu Wang, et al. ∙

research

∙ 04/29/2022

How Robust is Neural Machine Translation to Language Imbalance in Multilingual Tokenizer Training?

A multilingual tokenizer is a fundamental component of multilingual neur...

2 Shiyue Zhang, et al. ∙

research

∙ 03/25/2022

Data Selection Curriculum for Neural Machine Translation

Neural Machine Translation (NMT) models are typically trained on heterog...

10 Tasnim Mohiuddin, et al. ∙

research

∙ 02/27/2022

OCR Improves Machine Translation for Low-Resource Languages

We aim to investigate the performance of current OCR systems on low reso...

0 Oana Ignat, et al. ∙

research

∙ 12/20/2021

Few-shot Learning with Multilingual Language Models

Large-scale autoregressive language models such as GPT-3 are few-shot le...

8 Xi Victoria Lin, et al. ∙

research

∙ 10/15/2021

Alternative Input Signals Ease Transfer in Multilingual Machine Translation

Recent work in multilingual machine translation (MMT) has focused on the...

0 Simeng Sun, et al. ∙

research

∙ 09/17/2021

Classification-based Quality Estimation: Small and Efficient Models for Real-world Applications

Sentence-level Quality estimation (QE) of machine translation is traditi...

0 Shuo Sun, et al. ∙

research

∙ 06/07/2021

LAWDR: Language-Agnostic Weighted Document Representations from Pre-trained Models

Cross-lingual document representations enable language understanding in ...

0 Hongyu Gong, et al. ∙

research

∙ 06/06/2021

The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation

One of the biggest challenges hindering progress in low-resource and mul...

0 Naman Goyal, et al. ∙

research

∙ 05/31/2021

Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data

The scarcity of parallel data is a major obstacle for training high-qual...

7 Wei-Jen Ko, et al. ∙

research

∙ 04/18/2021

AmericasNLI: Evaluating Zero-shot Natural Language Understanding of Pretrained Multilingual Models in Truly Low-resource Languages

Pretrained multilingual models are able to perform cross-lingual transfe...

12 Abteen Ebrahimi, et al. ∙

research

∙ 02/08/2021

Quality Estimation without Human-labeled Data

Quality estimation aims to measure the quality of translated content wit...

0 Yi-Lin Tuan, et al. ∙

research

∙ 10/21/2020

Beyond English-Centric Multilingual Machine Translation

Existing work in translation demonstrated the potential of massively mul...

11 Angela Fan, et al. ∙

research

∙ 10/09/2020

MLQE-PE: A Multilingual Quality Estimation and Post-Editing Dataset

We present MLQE-PE, a new dataset for Machine Translation (MT) Quality E...

0 Marina Fomicheva, et al. ∙

research

∙ 10/05/2020

Self-training Improves Pre-training for Natural Language Understanding

Unsupervised pre-training has led to much recent progress in natural lan...

0 Jingfei Du, et al. ∙

research

∙ 08/02/2020

Multilingual Translation with Extensible Multilingual Pretraining and Finetuning

Recent work demonstrates the potential of multilingual pretraining of cr...

0 Yuqing Tang, et al. ∙

research

∙ 05/21/2020

Unsupervised Quality Estimation for Neural Machine Translation

Quality Estimation (QE) is an important component in making Machine Tran...

0 Marina Fomicheva, et al. ∙

research

∙ 11/10/2019

A Massive Collection of Cross-Lingual Web-Document Pairs

Cross-lingual document alignment aims to identify pairs of documents in ...

0 Ahmed El-Kishky, et al. ∙

research

∙ 11/05/2019

Unsupervised Cross-lingual Representation Learning at Scale

This paper shows that pretraining multilingual language models at scale ...

0 Alexis Conneau, et al. ∙

research

∙ 11/01/2019

CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data

Pre-training text representations have led to significant improvements i...

13 Guillaume Wenzek, et al. ∙

research

∙ 10/15/2019

Facebook AI's WAT19 Myanmar-English Translation Task Submission

This paper describes Facebook AI's submission to the WAT 2019 Myanmar-En...

0 Peng-Jen Chen, et al. ∙

research

∙ 07/10/2019

WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia

We present an approach based on multilingual sentence embeddings to auto...

0 Holger Schwenk, et al. ∙

research

∙ 06/20/2019

Low-Resource Corpus Filtering using Multilingual Sentence Embeddings

In this paper, we describe our submission to the WMT19 low-resource para...

0 Vishrav Chaudhary, et al. ∙

research

∙ 02/04/2019

Two New Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English

The vast majority of language pairs in the world are low-resource becaus...

0 Francisco Guzmán, et al. ∙

Vishrav Chaudhary

Featured Co-authors

Sign in with Google

Consider DeepAI Pro