Disentangling Semantics and Syntax in Sentence Embeddings with Pre-trained Language Models

04/11/2021
by   James Y. Huang, et al.
0

Pre-trained language models have achieved huge success on a wide range of NLP tasks. However, contextual representations from pre-trained models contain entangled semantic and syntactic information, and therefore cannot be directly used to derive useful semantic sentence embeddings for some tasks. Paraphrase pairs offer an effective way of learning the distinction between semantics and syntax, as they naturally share semantics and often vary in syntax. In this work, we present ParaBART, a semantic sentence embedding model that learns to disentangle semantics and syntax in sentence embeddings obtained by pre-trained language models. ParaBART is trained to perform syntax-guided paraphrasing, based on a source sentence that shares semantics with the target paraphrase, and a parse tree that specifies the target syntax. In this way, ParaBART learns disentangled semantic and syntactic representations from their respective inputs with separate encoders. Experiments in English show that ParaBART outperforms state-of-the-art sentence embedding models on unsupervised semantic similarity tasks. Additionally, we show that our approach can effectively remove syntactic information from semantic sentence embeddings, leading to better robustness against syntactic variation on downstream semantic tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/02/2019

A Multi-Task Approach for Disentangling Syntax and Semantics in Sentence Representations

We propose a generative model for a sentence that uses two latent variab...
research
01/09/2021

Learning Better Sentence Representation with Syntax Information

Sentence semantic understanding is a key topic in the field of natural l...
research
06/04/2023

Sen2Pro: A Probabilistic Perspective to Sentence Embedding from Pre-trained Language Model

Sentence embedding is one of the most fundamental tasks in Natural Langu...
research
01/21/2023

Syntax-guided Neural Module Distillation to Probe Compositionality in Sentence Embeddings

Past work probing compositionality in sentence embedding models faces is...
research
09/16/2020

Mimic and Conquer: Heterogeneous Tree Structure Distillation for Syntactic NLP

Syntax has been shown useful for various NLP tasks, while existing work ...
research
03/16/2023

SemDeDup: Data-efficient learning at web-scale through semantic deduplication

Progress in machine learning has been driven in large part by massive in...
research
10/24/2022

The Better Your Syntax, the Better Your Semantics? Probing Pretrained Language Models for the English Comparative Correlative

Construction Grammar (CxG) is a paradigm from cognitive linguistics emph...

Please sign up or login with your details

Forgot password? Click here to reset