OpenMatch: An Open-Source Package for Information Retrieval

01/30/2021
by   Zhenghao Liu, et al.
0

Information Retrieval (IR) is an important task and can be used in many applications. Neural IR (Neu-IR) models overcome the vocabulary mismatch problem of sparse retrievers and thrive on the ranking pipeline with semantic matching. Recent progress in IR mainly focuses on Neu-IR models, including efficient dense retrieval, advanced neural architectures and robustly training for few-shot IR that lacks training data. In order to integrate these advantages for researchers and engineers to utilize and develop, OpenMatch provides various functional neural modules based on PyTorch to maintain sufficient extensibility, making it easy to build customized and higher-capacity IR systems. Besides, OpenMatch consists of complicated optimization tricks, various sparse/dense retrieval methods, and advanced few-shot training methods, liberating users from surplus labor in baseline reimplementation and neural model finetuning. With OpenMatch, we achieve reasonable performance on various ranking datasets, rank first of the automatic group in TREC COVID (Round 2) and rank top on the MS MARCO Document Ranking leaderboard. The library, experimental methodologies and results of OpenMatch are all publicly available at https://github.com/thunlp/OpenMatch.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/10/2019

Neural-IR-Explorer: A Content-Focused Tool to Explore Neural Re-Ranking Results

In this paper we look beyond metrics-based evaluation of Information Ret...
research
06/08/2023

RETA-LLM: A Retrieval-Augmented Large Language Model Toolkit

Although Large Language Models (LLMs) have demonstrated extraordinary ca...
research
01/24/2019

Neural IR Meets Graph Embedding: A Ranking Model for Product Search

Recently, neural models for information retrieval are becoming increasin...
research
11/03/2020

CMT in TREC-COVID Round 2: Mitigating the Generalization Gaps from Web to Special Domain Search

Neural rankers based on deep pretrained language models (LMs) have been ...
research
09/17/2019

Revealing the Importance of Semantic Retrieval for Machine Reading at Scale

Machine Reading at Scale (MRS) is a challenging task in which a system i...
research
05/03/2021

SmoothI: Smooth Rank Indicators for Differentiable IR Metrics

Information retrieval (IR) systems traditionally aim to maximize metrics...
research
07/02/2023

BioCPT: Contrastive Pre-trained Transformers with Large-scale PubMed Search Logs for Zero-shot Biomedical Information Retrieval

Information retrieval (IR) is essential in biomedical knowledge acquisit...

Please sign up or login with your details

Forgot password? Click here to reset