When writing programs, people have the ability to tackle a new complex t...
We study (differentially) private federated learning (FL) of language mo...
Cross-encoder models, which jointly encode and score a query-item pair, ...
Dual encoder models are ubiquitous in modern classification and retrieva...
The impressive generalization performance of modern neural networks is
a...
Large neural models (such as Transformers) achieve state-of-the-art
perf...
Many practical applications, such as recommender systems and learning to...
Privacy noise may negate the benefits of using adaptive optimizers in
di...
Large language models (LLMs) have led to a series of breakthroughs in na...
A fundamental ability of an intelligent web-based agent is seeking out a...
Efficient k-nearest neighbor search is a fundamental task, foundational ...
Ontonotes has served as the most important benchmark for coreference
res...
Many modern high-performing machine learning models such as GPT-3 primar...
We revisit the problem of learning mixtures of spherical Gaussians. Give...
We introduce ART, a new corpus-level autoencoding approach for training ...
Knowledge and language understanding of models evaluated through questio...
When writing programs, people have the ability to tackle a new complex t...
Question answering (QA) over real-world knowledge bases (KBs) is challen...
Adaptive optimization methods have become the default solvers for many
m...
Mean rewards of actions are often correlated. The form of these correlat...
In contrast to SGD, adaptive gradient methods like Adam allow robust tra...
One of the central problems in auction design is developing an
incentive...
Meta-, multi-task, and federated learning can be all viewed as solving
s...
Scaling neural networks to "large" sizes, with billions of parameters, h...
We propose AdaTS, a Thompson sampling algorithm that adapts
sequentially...
We study Thompson sampling (TS) in online decision-making problems where...
It is often challenging for a system to solve a new complex problem from...
Hierarchical clustering is a critical task in numerous domains. Many
app...
In many domains, relationships between categories are encoded in the
kno...
Efficient exploration in multi-armed bandits is a fundamental online lea...
Users of recommender systems often behave in a non-stationary fashion, d...
In many sequence learning tasks, such as program synthesis and document
...
Large Transformer models have achieved impressive performance in many na...
Federated Learning (FL) is a distributed learning paradigm which scales
...
Bottom-up algorithms such as the classic hierarchical agglomerative
clus...
A case-based reasoning (CBR) system solves a new problem by retrieving
`...
High-quality dialogue-summary paired data is expensive to produce and
do...
In this paper, we study bidirectional LSTM network for the task of text
...
Transformers-based models, such as BERT, have been one of the most succe...
We present a surprisingly simple yet accurate approach to reasoning in
k...
A latent bandit problem is one in which the learning agent knows the arm...
Off-policy learning is a framework for evaluating and optimizing policie...
We study a contextual bandit setting where the learning agent has access...
Recently, there has been a surge of interest in representation learning ...
Learning continuous representations of discrete objects such as text, us...
Federated learning is a distributed machine learning paradigm in which a...
We present a modular neural network architecture Main that learns algori...
We consider the task of answering complex multi-hop questions using a co...
We learn bandit policies that maximize the average reward over bandit
in...
Federated learning aims to jointly learn statistical models over massive...