SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation

by   Daniel Cer, et al.

Semantic Textual Similarity (STS) measures the meaning similarity of sentences. Applications include machine translation (MT), summarization, generation, question answering (QA), short answer grading, semantic search, dialog and conversational systems. The STS shared task is a venue for assessing the current state-of-the-art. The 2017 task focuses on multilingual and cross-lingual pairs with one sub-track exploring MT quality estimation (MTQE) data. The task obtained strong participation from 31 teams, with 17 participating in all language tracks. We summarize performance and review a selection of well performing methods. Analysis highlights common errors, providing insight into the limitations of existing models. To support ongoing work on semantic representations, the STS Benchmark is introduced as a new shared training and evaluation set carefully selected from the corpus of English STS shared task data (2012-2017).


Multilingual Transfer Learning for QA Using Translation as Data Augmentation

Prior work on multilingual question answering has mostly focused on usin...

COMET: A Neural Framework for MT Evaluation

We present COMET, a neural framework for training multilingual machine t...

MLQA: Evaluating Cross-lingual Extractive Question Answering

Question answering (QA) models have shown rapid progress enabled by the ...

Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering

Coupled with the availability of large scale datasets, deep learning arc...

Why Not Simply Translate? A First Swedish Evaluation Benchmark for Semantic Similarity

This paper presents the first Swedish evaluation benchmark for textual s...

AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations

In this work, we present the construction of multilingual parallel corpo...

Neobility at SemEval-2017 Task 1: An Attention-based Sentence Similarity Model

This paper describes a neural-network model which performed competitivel...