Adapting Sentence Transformers for the Aviation Domain

05/16/2023
by   Liya Wang, et al.
0

Learning effective sentence representations is crucial for many Natural Language Processing (NLP) tasks, including semantic search, semantic textual similarity (STS), and clustering. While multiple transformer models have been developed for sentence embedding learning, these models may not perform optimally when dealing with specialized domains like aviation, which has unique characteristics such as technical jargon, abbreviations, and unconventional grammar. Furthermore, the absence of labeled datasets makes it difficult to train models specifically for the aviation domain. To address these challenges, we propose a novel approach for adapting sentence transformers for the aviation domain. Our method is a two-stage process consisting of pre-training followed by fine-tuning. During pre-training, we use Transformers and Sequential Denoising AutoEncoder (TSDAE) with aviation text data as input to improve the initial model performance. Subsequently, we fine-tune our models using a Natural Language Inference (NLI) dataset in the Sentence Bidirectional Encoder Representations from Transformers (SBERT) architecture to mitigate overfitting issues. Experimental results on several downstream tasks show that our adapted sentence transformers significantly outperform general-purpose transformers, demonstrating the effectiveness of our approach in capturing the nuances of the aviation domain. Overall, our work highlights the importance of domain-specific adaptation in developing high-quality NLP solutions for specialized industries like aviation.

READ FULL TEXT

page 1

page 5

research
08/19/2019

Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models

Neural language representation models such as Bidirectional Encoder Repr...
research
06/09/2023

FPDM: Domain-Specific Fast Pre-training Technique using Document-Level Metadata

Pre-training Transformers has shown promising results on open-domain and...
research
05/02/2022

Paragraph-based Transformer Pre-training for Multi-Sentence Inference

Inference tasks such as answer sentence selection (AS2) or fact verifica...
research
04/14/2021

TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning

Learning sentence embeddings often requires large amount of labeled data...
research
04/28/2022

Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers

Randomly masking and predicting word tokens has been a successful approa...
research
06/09/2021

URLTran: Improving Phishing URL Detection Using Transformers

Browsers often include security features to detect phishing web pages. I...
research
12/09/2021

Transferring BERT-like Transformers' Knowledge for Authorship Verification

The task of identifying the author of a text spans several decades and w...

Please sign up or login with your details

Forgot password? Click here to reset