Recent work in vision-and-language pretraining has investigated supervis...
While pretraining on large-scale image-text data from the Web has facili...
The ability to discriminate between large and small quantities is a core...
People rely heavily on context to enrich meaning beyond what is literall...
Large pre-trained models have proved to be remarkable zero- and
(prompt-...
Vision-and-language (V L) models pretrained on large-scale multimodal ...
Building models that can be rapidly adapted to numerous tasks using only...
Large language models have shown impressive performance on many natural
...
Multimodal image-language transformers have achieved impressive results ...
Recently multimodal transformer models have gained popularity because th...
Children learn word meanings by tapping into the commonalities across
di...
We apply a generative segmental model of task structure, guided by narra...
There are thousands of actively spoken languages on Earth, but a single
...
We propose a new dataset for evaluating question answering models with
r...
Modern convolutional neural networks (CNNs) are able to achieve human-le...
Recent work has attempted to characterize the structure of semantic memo...
Children can use the statistical regularities of their environment to le...
People exhibit a tendency to generalize a novel noun to the basic-level ...
Recent empirical and modeling research has focused on the semantic fluen...