We propose Unified-IO, a model that performs a large variety of AI tasks...
As humans, we understand events in the visual world contextually, perfor...
We propose PIGLeT: a model that learns physical commonsense knowledge th...
Despite major advances in open-ended text generation, there has been lim...
Multimodal disinformation, from `deepfakes' to simple edits that deceive...
Conditional text generation often requires lexical constraints, i.e., wh...
Vision, as a central component of human perception, plays a fundamental ...
There is a fundamental gap between how humans understand and use languag...
Large neural models have demonstrated human-level performance on languag...
To apply eyeshadow without a brush, should I use a cotton swab or a
toot...
Recent progress in natural language generation has raised dual-use conce...
Recent work by Zellers et al. (2018) introduced a new task of commonsens...
Visual understanding goes well beyond object recognition. With one glanc...
Given a partial description like "she opened the hood of the car," human...
We investigate the problem of producing structured graph representations...
In this paper, we investigate large-scale zero-shot activity recognition...
People are sharing their opinions, stories and reviews through online vi...