IR-BERT: Leveraging BERT for Semantic Search in Background Linking for News Articles

by   Anup Anand Deshmukh, et al.

This work describes our two approaches for the background linking task of TREC 2020 News Track. The main objective of this task is to recommend a list of relevant articles that the reader should refer to in order to understand the context and gain background information of the query article. Our first approach focuses on building an effective search query by combining weighted keywords extracted from the query document and uses BM25 for retrieval. The second approach leverages the capability of SBERT (Nils Reimers et al.) to learn contextual representations of the query in order to perform semantic search over the corpus. We empirically show that employing a language model benefits our approach in understanding the context as well as the background of the query article. The proposed approaches are evaluated on the TREC 2018 Washington Post dataset and our best model outperforms the TREC median as well as the highest scoring model of 2018 in terms of the nDCG@5 metric. We further propose a diversity measure to evaluate the effectiveness of the various approaches in retrieving a diverse set of documents. This would potentially motivate researchers to work on introducing diversity in their recommended list. We have open sourced our implementation on Github and plan to submit our runs for the background linking task in TREC 2020.


Deeper Text Understanding for IR with Contextual Neural Language Modeling

Neural networks provide new possibilities to automatically learn complex...

Neural Net Model for Featured Word Extraction

Search engines perform the task of retrieving information related to the...

BERT-QE: Contextualized Query Expansion for Document Re-ranking

Query expansion aims to mitigate the mismatch between the language used ...

News Article Retrieval in Context for Event-centric Narrative Creation

Writers such as journalists often use automatic tools to find relevant c...

Harvey Mudd College at SemEval-2019 Task 4: The Clint Buchanan Hyperpartisan News Detector

We investigate the recently developed Bidirectional Encoder Representati...

COPER: a Query-adaptable Semantics-based Search Engine for Persian COVID-19 Articles

With the surge of pretrained language models, a new pathway has been ope...

Cascade Neural Ensemble for Identifying Scientifically Sound Articles

Background: A significant barrier to conducting systematic reviews and m...

Please sign up or login with your details

Forgot password? Click here to reset