b'Jimmy Lin'

research

∙ 09/14/2023

MMEAD: MS MARCO Entity Annotations and Disambiguations

MMEAD, or MS MARCO Entity Annotations and Disambiguations, is a resource...

0 Chris Kamphuis, et al. ∙

research

∙ 09/10/2023

Unsupervised Chunking with Hierarchical RNN

In Natural Language Processing (NLP), predicting linguistic structures, ...

0 Zijun Wu, et al. ∙

research

∙ 08/29/2023

Vector Search with OpenAI Embeddings: Lucene Is All You Need

We provide a reproducible, end-to-end demonstration of vector search wit...

0 Jimmy Lin, et al. ∙

research

∙ 08/14/2023

Approximating Human-Like Few-shot Learning with GPT-based Compression

In this work, we conceptualize the learning process as information compr...

0 Cynthia Huang, et al. ∙

research

∙ 07/31/2023

HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution

The rise of large language models (LLMs) had a transformative impact on ...

0 Ehsan Kamalloo, et al. ∙

research

∙ 07/19/2023

SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval

Traditionally, sparse retrieval systems relied on lexical representation...

0 Nandan Thakur, et al. ∙

research

∙ 06/13/2023

Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard

BEIR is a benchmark dataset for zero-shot evaluation of information retr...

0 Ehsan Kamalloo, et al. ∙

research

∙ 06/02/2023

GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration

Noticing the urgent need to provide tools for fast and user-friendly qua...

0 Aleksandra Piktus, et al. ∙

research

∙ 05/19/2023

How Does Generative Retrieval Scale to Millions of Passages?

Popularized by the Differentiable Search Index, the emerging paradigm of...

0 Ronak Pradeep, et al. ∙

research

∙ 05/14/2023

SmartProbe: A Virtual Moderator for Market Research Surveys

Market research surveys are a powerful methodology for understanding con...

0 Josh Seltzer, et al. ∙

research

∙ 05/10/2023

Evaluating Embedding APIs for Information Retrieval

The ever-increasing size of language models curtails their widespread ac...

0 Ehsan Kamalloo, et al. ∙

research

∙ 05/03/2023

Zero-Shot Listwise Document Reranking with a Large Language Model

Supervised ranking methods based on bi-encoder or cross-encoder architec...

0 Xueguang Ma, et al. ∙

research

∙ 04/24/2023

Anserini Gets Dense Retrieval: Integration of Lucene's HNSW Indexes

Anserini is a Lucene-based toolkit for reproducible information retrieva...

0 Xueguang Ma, et al. ∙

research

∙ 04/04/2023

AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content Creation

This paper presents the AToMiC (Authoring Tools for Multimedia Content) ...

0 Jheng-Hong Yang, et al. ∙

research

∙ 04/03/2023

Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval

The advent of multilingual language models has generated a resurgence of...

0 Jimmy Lin, et al. ∙

research

∙ 02/28/2023

Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face

We present Spacerini, a modular framework for seamless building and depl...

0 Christopher Akiki, et al. ∙

research

∙ 02/15/2023

How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval

Various techniques have been developed in recent years to improve dense ...

0 Sheng-Chieh Lin, et al. ∙

research

∙ 02/13/2023

Improving Out-of-Distribution Generalization of Neural Rerankers with Contextualized Late Interaction

Recent progress in information retrieval finds that embedding query and ...

0 Xinyu Zhang, et al. ∙

research

∙ 02/13/2023

SLIM: Sparsified Late Interaction for Multi-Vector Retrieval with Inverted Indexes

This paper introduces a method called Sparsified Late Interaction for Mu...

0 Minghan Li, et al. ∙

research

∙ 01/17/2023

Which Model Shall I Choose? Cost/Quality Trade-offs for Text Classification Tasks

Industry practitioners always face the problem of choosing the appropria...

0 Shi Zong, et al. ∙

research

∙ 12/27/2022

Building a Culture of Reproducibility in Academic Research

Reproducibility is an ideal that no researcher would dispute "in the abs...

0 Jimmy Lin, et al. ∙

research

∙ 12/20/2022

Precise Zero-Shot Dense Retrieval without Relevance Labels

While dense retrieval has been shown effective and efficient across task...

0 Luyu Gao, et al. ∙

research

∙ 12/19/2022

Less is More: Parameter-Free Text Classification with Gzip

Deep neural networks (DNNs) are often used for text classification tasks...

0 Zhiying Jiang, et al. ∙

research

∙ 12/10/2022

Improving Precancerous Case Characterization via Transformer-based Ensemble Learning

The application of natural language processing (NLP) to cancer pathology...

0 Yizhen Zhong, et al. ∙

research

∙ 11/21/2022

SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale

End-to-end automatic speech recognition systems represent the state of t...

0 Raphael Tang, et al. ∙

research

∙ 11/18/2022

CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval

Multi-vector retrieval methods combine the merits of sparse (e.g. BM25) ...

0 Minghan Li, et al. ∙

research

∙ 11/01/2022

On the Interaction Between Differential Privacy and Gradient Compression in Deep Learning

While differential privacy and gradient compression are separately well-...

0 Jimmy Lin, et al. ∙

research

∙ 10/17/2022

VoxelCache: Accelerating Online Mapping in Robotics and 3D Reconstruction Tasks

Real-time 3D mapping is a critical component in many important applicati...

0 Sankeerth Durvasula, et al. ∙

research

∙ 10/13/2022

Query Expansion Using Contextual Clue Sampling with Language Models

Query expansion is an effective approach for mitigating vocabulary misma...

0 Linqing Liu, et al. ∙

research

∙ 10/11/2022

Better Than Whitespace: Information Retrieval for Languages without Custom Tokenizers

Tokenization is a crucial step in information retrieval, especially for ...

0 Odunayo Ogundepo, et al. ∙

research

∙ 10/10/2022

What the DAAM: Interpreting Stable Diffusion Using Cross Attention

Large-scale diffusion neural networks represent a substantial milestone ...

1 Raphael Tang, et al. ∙

research

∙ 07/31/2022

Aggretriever: A Simple Approach to Aggregate Textual Representation for Robust Dense Passage Retrieval

Pre-trained transformers has declared its success in many NLP tasks. One...

0 Sheng-Chieh Lin, et al. ∙

research

∙ 07/31/2022

Building an Efficiency Pipeline: Commutativity and Cumulativeness of Efficiency Operators for Transformers

There exists a wide variety of efficiency methods for natural language p...

0 Ji Xin, et al. ∙

research

∙ 06/23/2022

Few-Shot Non-Parametric Learning with Deep Latent Variable Model

Most real-world problems that machine learning algorithms are expected t...

0 Zhiying Jiang, et al. ∙

research

∙ 06/20/2022

A Dense Representation Framework for Lexical and Semantic Matching

Lexical and semantic matching capture different successful approaches to...

0 Sheng-Chieh Lin, et al. ∙

research

∙ 05/23/2022

Domain Adaptation for Memory-Efficient Dense Retrieval

Dense retrievers encode documents into fixed dimensional embeddings. How...

9 Nandan Thakur, et al. ∙

research

∙ 04/30/2022

To Interpolate or not to Interpolate: PRF, Dense and Sparse Retrievers

Current pre-trained language model approaches to information retrieval c...

0 Hang Li, et al. ∙

research

∙ 04/05/2022

Towards Best Practices for Training Multilingual Dense Retrieval Models

Dense retrieval models using a transformer-based bi-encoder design have ...

0 Xinyu Zhang, et al. ∙

research

∙ 03/21/2022

Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information Retrieval

With the recent success of dense retrieval methods based on bi-encoders,...

0 Wei Zhong, et al. ∙

research

∙ 03/11/2022

Tevatron: An Efficient and Flexible Toolkit for Dense Retrieval

Recent rapid advancements in deep pre-trained language models and the in...

0 Luyu Gao, et al. ∙

research

∙ 01/26/2022

Can Old TREC Collections Reliably Evaluate Modern Neural Retrieval Models?

Neural retrieval models are generally regarded as fundamentally differen...

0 Ellen M. Voorhees, et al. ∙

research

∙ 12/17/2021

Sparsifying Sparse Representations for Passage Retrieval by Top-k Masking

Sparse lexical representation learning has demonstrated much progress in...

0 Jheng-Hong Yang, et al. ∙

research

∙ 12/13/2021

Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback: A Reproducibility Study

Pseudo-Relevance Feedback (PRF) utilises the relevance signals from the ...

0 Hang Li, et al. ∙

research

∙ 12/09/2021

Densifying Sparse Representations for Passage Retrieval by Representational Slicing

Learned sparse and dense representations capture different successful ap...

0 Sheng-Chieh Lin, et al. ∙

research

∙ 10/22/2021

Wacky Weights in Learned Sparse Representations and the Revenge of Score-at-a-Time Query Evaluation

Recent advances in retrieval models based on learned sparse representati...

0 Joel Mackenzie, et al. ∙

research

∙ 10/04/2021

Encoder Adaptation of Dense Passage Retrieval for Open-Domain Question Answering

One key feature of dense passage retrievers (DPR) is the use of separate...

0 Minghan Li, et al. ∙

research

∙ 10/04/2021

A Proposed Conceptual Framework for a Representational Approach to Information Retrieval

This paper outlines a conceptual framework for understanding recent deve...

0 Jimmy Lin, et al. ∙

research

∙ 08/19/2021

Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval

We present Mr. TyDi, a multi-lingual benchmark dataset for mono-lingual ...

0 Xinyu Zhang, et al. ∙

research

∙ 06/28/2021

A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques

Recent developments in representational learning for information retriev...

0 Jimmy Lin, et al. ∙

research

∙ 05/09/2021

MS MARCO: Benchmarking Ranking Models in the Large-Data Regime

Evaluation efforts such as TREC, CLEF, NTCIR and FIRE, alongside public ...

0 Nick Craswell, et al. ∙

Jimmy Lin

Featured Co-authors

Sign in with Google

Consider DeepAI Pro