Evaluation Benchmarks and Learning Criteriafor Discourse-Aware Sentence Representations

08/31/2019
by   Mingda Chen, et al.
0

Prior work on pretrained sentence embeddings and benchmarks focus on the capabilities of stand-alone sentences. We propose DiscoEval, a test suite of tasks to evaluate whether sentence representations include broader context information. We also propose a variety of training objectives that makes use of natural annotations from Wikipedia to build sentence encoders capable of modeling discourse. We benchmark sentence encoders pretrained with our proposed training objectives, as well as other popular pretrained sentence encoders on DiscoEval and other sentence evaluation tasks. Empirically, we show that these training objectives help to encode different aspects of information in document structures. Moreover, BERT and ELMo demonstrate strong performances over DiscoEval with individual hidden layers showing different characteristics.

READ FULL TEXT
research
04/13/2021

Discourse Probing of Pretrained Language Models

Existing work on probing of pretrained language models (LMs) has predomi...
research
07/16/2023

Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling

Modeling discourse – the linguistic phenomena that go beyond individual ...
research
10/11/2022

Towards Structure-aware Paraphrase Identification with Phrase Alignment Using Sentence Encoders

Previous works have demonstrated the effectiveness of utilising pre-trai...
research
04/23/2017

Discourse-Based Objectives for Fast Unsupervised Sentence Representation Learning

This work presents a novel objective function for the unsupervised train...
research
01/25/2021

Randomized Deep Structured Prediction for Discourse-Level Processing

Expressive text encoders such as RNNs and Transformer Networks have been...
research
09/01/2019

Higher-order Comparisons of Sentence Encoder Representations

Representational Similarity Analysis (RSA) is a technique developed by n...
research
03/16/2022

CUE Vectors: Modular Training of Language Models Conditioned on Diverse Contextual Signals

We propose a framework to modularize the training of neural language mod...

Please sign up or login with your details

Forgot password? Click here to reset