b'Jack Hessel'

research

∙ 08/12/2023

VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use

We introduce VisIT-Bench (Visual InsTruction Benchmark), a benchmark for...

0 Yonatan Bitton, et al. ∙

research

∙ 08/02/2023

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

We introduce OpenFlamingo, a family of autoregressive vision-language mo...

0 Anas Awadalla, et al. ∙

research

∙ 06/26/2023

FunQA: Towards Surprising Video Comprehension

Surprising videos, e.g., funny clips, creative performances, or visual i...

0 Binzhu Xie, et al. ∙

research

∙ 05/24/2023

Text encoders are performance bottlenecks in contrastive vision-language models

Performant vision-language (VL) models like CLIP represent captions usin...

0 Amita Kamath, et al. ∙

research

∙ 03/13/2023

Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images

Weird, unusual, and uncanny images pique the curiosity of observers beca...

0 Nitzan Bitton Guetta, et al. ∙

research

∙ 10/14/2021

Symbolic Knowledge Distillation: from General Language Models to Commonsense Models

The common practice for training commonsense models has gone from-human-...

0 Peter West, et al. ∙

research

∙ 06/04/2021

MERLOT: Multimodal Neural Script Knowledge Models

As humans, we understand events in the visual world contextually, perfor...

0 Rowan Zellers, et al. ∙

research

∙ 04/18/2021

CLIPScore: A Reference-free Evaluation Metric for Image Captioning

Image captioning has conventionally relied on reference-based automatic ...

0 Jack Hessel, et al. ∙

research

∙ 10/30/2020

Domain-Specific Lexical Grounding in Noisy Visual-Textual Documents

Images can give us insights into the contextual meanings of words, but c...

0 Gregory Yauney, et al. ∙

research

∙ 10/13/2020

Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think!

Modeling expressive cross-modal interactions seems crucial in multimodal...

0 Jack Hessel, et al. ∙

research

∙ 04/29/2020

Beyond Instructional Videos: Probing for More Diverse Visual-Textual Grounding on YouTube

Pretraining from unlabelled web videos has quickly become the de-facto m...

4 Jack Hessel, et al. ∙

research

∙ 10/07/2019

A Case Study on Combining ASR and Visual Features for Generating Instructional Video Captions

Instructional videos get high-traffic on video sharing platforms, and pr...

0 Jack Hessel, et al. ∙

research

∙ 04/16/2019

Unsupervised Discovery of Multimodal Links in Multi-Image, Multi-Sentence Documents

Images and text co-occur everywhere on the web, but explicit links betwe...

0 Jack Hessel, et al. ∙

research

∙ 04/15/2019

Something's Brewing! Early Prediction of Controversy-causing Posts from Discussion Features

Controversial posts are those that split the preferences of a community,...

0 Jack Hessel, et al. ∙

research

∙ 04/18/2018

Quantifying the visual concreteness of words and topics in multimodal datasets

Multimodal machine learning algorithms aim to learn visual-textual corre...

0 Jack Hessel, et al. ∙

research

∙ 03/06/2017

Cats and Captions vs. Creators and the Clock: Comparing Multimodal Content to Context in Predicting Relative Popularity

The content of today's social media is becoming more and more rich, incr...

0 Jack Hessel, et al. ∙

research

∙ 08/09/2015

Image Representations and New Domains in Neural Image Captioning

We examine the possibility that recent promising results in automatic ca...

0 Jack Hessel, et al. ∙

Jack Hessel

Featured Co-authors

Sign in with Google

Consider DeepAI Pro