Multi-stage Distillation Framework for Cross-Lingual Semantic Similarity Matching

09/13/2022
by   Kunbo Ding, et al.
0

Previous studies have proved that cross-lingual knowledge distillation can significantly improve the performance of pre-trained models for cross-lingual similarity matching tasks. However, the student model needs to be large in this operation. Otherwise, its performance will drop sharply, thus making it impractical to be deployed to memory-limited devices. To address this issue, we delve into cross-lingual knowledge distillation and propose a multi-stage distillation framework for constructing a small-size but high-performance cross-lingual model. In our framework, contrastive learning, bottleneck, and parameter recurrent strategies are combined to prevent performance from being compromised during the compression process. The experimental results demonstrate that our method can compress the size of XLM-R and MiniLM by more than 50%, while the performance is only reduced by about 1

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/26/2022

Multi-Level Contrastive Learning for Cross-Lingual Alignment

Cross-language pre-trained models such as multilingual BERT (mBERT) have...
research
05/30/2023

Research on Multilingual News Clustering Based on Cross-Language Word Embeddings

Classifying the same event reported by different countries is of signifi...
research
03/10/2023

FedACK: Federated Adversarial Contrastive Knowledge Distillation for Cross-Lingual and Cross-Model Social Bot Detection

Social bot detection is of paramount importance to the resilience and se...
research
03/27/2023

Mutually-paced Knowledge Distillation for Cross-lingual Temporal Knowledge Graph Reasoning

This paper investigates cross-lingual temporal knowledge graph reasoning...
research
03/27/2023

Empowering Dual-Encoder with Query Generator for Cross-Lingual Dense Retrieval

In monolingual dense retrieval, lots of works focus on how to distill kn...
research
09/30/2021

Deep Neural Compression Via Concurrent Pruning and Self-Distillation

Pruning aims to reduce the number of parameters while maintaining perfor...
research
12/07/2021

Improving Neural Cross-Lingual Summarization via Employing Optimal Transport Distance for Knowledge Distillation

Current state-of-the-art cross-lingual summarization models employ multi...

Please sign up or login with your details

Forgot password? Click here to reset