An implementation of the "Guess who?" game using CLIP

11/30/2021
by   Arnau Martí Sarri, et al.
0

CLIP (Contrastive Language-Image Pretraining) is an efficient method for learning computer vision tasks from natural language supervision that has powered a recent breakthrough in deep learning due to its zero-shot transfer capabilities. By training from image-text pairs available on the internet, the CLIP model transfers non-trivially to most tasks without the need for any data set specific training. In this work, we use CLIP to implement the engine of the popular game "Guess who?", so that the player interacts with the game using natural language prompts and CLIP automatically decides whether an image in the game board fulfills that prompt or not. We study the performance of this approach by benchmarking on different ways of prompting the questions to CLIP, and show the limitations of its zero-shot capabilites.

READ FULL TEXT

page 4

page 7

research
04/18/2021

Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation

Traditional computer vision models are trained to predict a fixed set of...
research
06/16/2022

Know your audience: specializing grounded language models with the game of Dixit

Effective communication requires adapting to the idiosyncratic common gr...
research
04/12/2023

RECLIP: Resource-efficient CLIP by Training with Small Images

We present RECLIP (Resource-efficient CLIP), a simple method that minimi...
research
02/26/2021

Learning Transferable Visual Models From Natural Language Supervision

State-of-the-art computer vision systems are trained to predict a fixed ...
research
03/21/2022

CLIP meets GamePhysics: Towards bug identification in gameplay videos using zero-shot transfer learning

Gameplay videos contain rich information about how players interact with...
research
10/11/2022

CLIP also Understands Text: Prompting CLIP for Phrase Understanding

Contrastive Language-Image Pretraining (CLIP) efficiently learns visual ...
research
12/17/2021

Data Efficient Language-supervised Zero-shot Recognition with Optimal Transport Distillation

Traditional computer vision models are trained to predict a fixed set of...

Please sign up or login with your details

Forgot password? Click here to reset