As large language models (LLMs) are continuously being developed, their
...
We study whether multiple large language models (LLMs) can autonomously
...
The surprising ability of Large Language Models (LLMs) to perform well o...
Recent work has shown that large language models are capable of generati...
How reliably can we trust the scores obtained from social bias benchmark...
Few-shot prompting is a surprisingly powerful way to use Large Language
...
We study the task of prompting large-scale language models to perform
mu...
Question-answering datasets require a broad set of reasoning skills. We ...
Considerable progress has been made recently in open-domain question
ans...
Humans often solve complex problems by interacting (in natural language)...
To build challenging multi-hop question answering datasets, we propose a...
Is it possible to use natural language to intervene in a model's behavio...
While day-to-day questions come with a variety of answer types, the curr...
We present the ARC-DA dataset, a direct-answer ("open response", "freefo...
A key limitation in current datasets for multi-hop reasoning is that the...
Humans often have to read multiple documents to address their informatio...
While large-scale language models are extremely effective when directly
...
Existing works on temporal reasoning among events described in text focu...
While language embeddings have been shown to have stereotyping biases, h...
A common approach to solve complex tasks is by breaking them down into s...
The measurement of true progress in multihop question-answering has been...
Question answering (QA) tasks have been posed using a variety of formats...
State-of-the-art models for multi-hop question answering typically augme...
While recent models have achieved human-level scores on many NLP dataset...
Composing knowledge from multiple pieces of texts is a key challenge in
...
Multi-hop textual question answering requires combining information from...
AI has achieved remarkable mastery over games such as Chess, Go, and Pok...
We propose a novel method for exploiting the semantic structure of text ...
Question Answering (QA) naturally reduces to an entailment problem, name...
Recent systems for natural language understanding are strong at overcomi...
We focus on the task of multi-hop reading comprehension where a system i...
We present a new kind of question answering dataset, OpenBookQA, modeled...
Most textual entailment models focus on lexical gaps between the premise...
We consider the problem of learning Relational Logistic Regression (RLR)...
We consider the problem of learning textual entailment models with limit...
We present a new question set, text corpus, and baselines assembled to
e...
While there has been substantial progress in factoid question-answering ...
Answering science questions posed in natural language is an important AI...
Our goal is to answer elementary-level science questions using knowledge...