CMT in TREC-COVID Round 2: Mitigating the Generalization Gaps from Web to Special Domain Search

11/03/2020
by   Chenyan Xiong, et al.
0

Neural rankers based on deep pretrained language models (LMs) have been shown to improve many information retrieval benchmarks. However, these methods are affected by their the correlation between pretraining domain and target domain and rely on massive fine-tuning relevance labels. Directly applying pretraining methods to specific domains may result in suboptimal search quality because specific domains may have domain adaption problems, such as the COVID domain. This paper presents a search system to alleviate the special domain adaption problem. The system utilizes the domain-adaptive pretraining and few-shot learning technologies to help neural rankers mitigate the domain discrepancy and label scarcity problems. Besides, we also integrate dense retrieval to alleviate traditional sparse retrieval's vocabulary mismatch obstacle. Our system performs the best among the non-manual runs in Round 2 of the TREC-COVID task, which aims to retrieve useful information from scientific literature related to COVID-19. Our code is publicly available at https://github.com/thunlp/OpenMatch.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/25/2022

A Compact Pretraining Approach for Neural Language Models

Domain adaptation for large neural language models (NLMs) is coupled wit...
research
01/30/2021

OpenMatch: An Open-Source Package for Information Retrieval

Information Retrieval (IR) is an important task and can be used in many ...
research
07/12/2022

PLM-ICD: Automatic ICD Coding with Pretrained Language Models

Automatically classifying electronic health records (EHRs) into diagnost...
research
06/25/2021

Domain-Specific Pretraining for Vertical Search: Case Study on Biomedical Literature

Information overload is a prevalent challenge in many high-value domains...
research
05/31/2023

Structure-Aware Language Model Pretraining Improves Dense Retrieval on Structured Data

This paper presents Structure Aware Dense Retrieval (SANTA) model, which...
research
03/19/2022

DuReader_retrieval: A Large-scale Chinese Benchmark for Passage Retrieval from Web Search Engine

In this paper, we present DuReader_retrieval, a large-scale Chinese data...
research
05/05/2020

SLEDGE: A Simple Yet Effective Baseline for Coronavirus Scientific Knowledge Search

With worldwide concerns surrounding the Severe Acute Respiratory Syndrom...

Please sign up or login with your details

Forgot password? Click here to reset