
-
Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling
A vital step towards the widespread adoption of neural retrieval models ...
read it
-
A Replication Study of Dense Passage Retriever
Text retrieval using learned dense representations has recently emerged ...
read it
-
Investigating the Limitations of Transformers with Simple Arithmetic Tasks
The ability to perform arithmetic tasks is a remarkable trait of human i...
read it
-
Significant Improvements over the State of the Art? A Case Study of the MS MARCO Document Ranking Leaderboard
Leaderboards are a ubiquitous part of modern research in applied machine...
read it
-
Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations
Pyserini is an easy-to-use Python toolkit that supports replicable IR re...
read it
-
The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models
We propose a design pattern for tackling text ranking problems, dubbed "...
read it
-
Inserting Information Bottlenecks for Attribution in Transformers
Pretrained transformers achieve the state of the art across tasks in nat...
read it
-
Scientific Claim Verification with VERT5ERINI
This work describes the adaptation of a pretrained sequence-to-sequence ...
read it
-
Distilling Dense Representations for Ranking using Tightly-Coupled Teachers
We present an approach to ranking with dense representations that applie...
read it
-
Rainfall-Runoff Prediction at Multiple Timescales with a Single Long Short-Term Memory Network
Long Short-Term Memory Networks (LSTMs) have been applied to daily disch...
read it
-
Pretrained Transformers for Text Ranking: BERT and Beyond
The goal of text ranking is to generate an ordered list of texts retriev...
read it
-
Howl: A Deployed, Open-Source Wake Word Detection System
We describe Howl, an open-source wake word detection toolkit with native...
read it
-
To Paraphrase or Not To Paraphrase: User-Controllable Selective Paraphrase Generation
In this article, we propose a paraphrase generation technique to keep th...
read it
-
Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset
We present Covidex, a search engine that exploits the latest neural rank...
read it
-
Generalized Optimal Sparse Decision Trees
Decision tree optimization is notoriously difficult from a computational...
read it
-
Query Reformulation using Query History for Passage Retrieval in Conversational Search
Passage retrieval in a conversational context is essential for many down...
read it
-
SegaBERT: Pre-training of Segment-aware BERT for Language Understanding
Pre-trained language models have achieved state-of-the-art results in va...
read it
-
Showing Your Work Doesn't Always Work
In natural language processing, a recently popular line of work explores...
read it
-
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
Large-scale pre-trained language models such as BERT have brought signif...
read it
-
Rapidly Bootstrapping a Question Answering Dataset for COVID-19
We present CovidQA, the beginnings of a question answering dataset speci...
read it
-
Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset: Preliminary Thoughts and Lessons Learned
We present the Neural Covidex, a search engine that exploits the latest ...
read it
-
Semantics of the Unwritten
The semantics of a text is manifested not only by what is read, but also...
read it
-
Conversational Question Reformulation via Sequence-to-Sequence Architectures and Pretrained Language Models
This paper presents an empirical study of conversational question reform...
read it
-
TTTTTackling WinoGrande Schemas
We applied the T5 sequence-to-sequence model to tackle the AI2 WinoGrand...
read it
-
Supporting Interoperability Between Open-Source Search Engines with the Common Index File Format
There exists a natural tension between encouraging a diverse ecosystem o...
read it
-
Document Ranking with a Pretrained Sequence-to-Sequence Model
This work proposes a novel adaptation of a pretrained sequence-to-sequen...
read it
-
Rapid Adaptation of BERT for Information Extraction on Domain-Specific Business Documents
Techniques for automatically extracting important content elements from ...
read it
-
A Prototype of Serverless Lucene
This paper describes a working prototype that adapts Lucene, the world's...
read it
-
Navigation-Based Candidate Expansion and Pretrained Language Models for Citation Recommendation
Citation recommendation systems for the scientific literature, to help a...
read it
-
The Archives Unleashed Project: Technology, Process, and Community to Improve Scholarly Access to Web Archives
The Archives Unleashed project aims to improve scholarly access to web a...
read it
-
The Proper Care and Feeding of CAMELS: How Limited Training Data Affects Streamflow Prediction
Accurate streamflow prediction largely relies on historical records of b...
read it
-
Exploiting Token and Path-based Representations of Code for Identifying Security-Relevant Commits
Public vulnerability databases such as CVE and NVD account for only 60 s...
read it
-
Attentive Student Meets Multi-Task Teacher: Improved Knowledge Distillation for Pretrained Models
In this paper, we explore the knowledge distillation approach under the ...
read it
-
What Would Elsa Do? Freezing Layers During Transformer Fine-Tuning
Pretrained transformer-based language models have achieved state of the ...
read it
-
Cross-Lingual Relevance Transfer for Document Retrieval
Recent work has shown the surprising ability of multi-lingual BERT to se...
read it
-
Explicit Pairwise Word Interaction Modeling Improves Pretrained Transformers for English Semantic Similarity Tasks
In English semantic similarity tasks, classic word embedding-based appro...
read it
-
Multi-Stage Document Ranking with BERT
The advent of deep neural networks pre-trained via language modeling tas...
read it
-
The Performance Envelope of Inverted Indexing on Modern Hardware
This paper explores the performance envelope of "traditional" inverted i...
read it
-
Lucene for Approximate Nearest-Neighbors Search on Arbitrary Dense Vectors
We demonstrate three approaches for adapting the open-source Lucene sear...
read it
-
Aligning Cross-Lingual Entities with Multi-Aspect Information
Multilingual knowledge graphs (KGs), such as YAGO and DBpedia, represent...
read it
-
Two Birds, One Stone: A Simple, Unified Model for Text Generation from Structured and Unstructured Data
A number of researchers have recently questioned the necessity of increa...
read it
-
Critically Examining the "Neural Hype": Weak Baselines and the Additivity of Effectiveness Gains from Neural Ranking Models
Is neural IR mostly hype? In a recent SIGIR Forum article, Lin expressed...
read it
-
The Simplest Thing That Can Possibly Work: Pseudo-Relevance Feedback Using Text Classification
Motivated by recent commentary that has questioned today's pursuit of ev...
read it
-
DocBERT: BERT for Document Classification
Pre-trained language representation models achieve remarkable state of t...
read it
-
Document Expansion by Query Prediction
One technique to improve the retrieval effectiveness of a search engine ...
read it
-
Data Augmentation for BERT Fine-Tuning in Open-Domain Question Answering
Recently, a simple combination of passage retrieval using off-the-shelf ...
read it
-
Simple BERT Models for Relation Extraction and Semantic Role Labeling
We present simple BERT-based models for relation extraction and semantic...
read it
-
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
In the natural language processing literature, neural networks are becom...
read it
-
Simple Applications of BERT for Ad Hoc Document Retrieval
Following recent successes in applying BERT to question answering, we ex...
read it
-
Matching Entities Across Different Knowledge Graphs with Graph Embeddings
This paper explores the problem of matching entities across different kn...
read it