Towards the Human Global Context: Does the Vision-Language Model Really Judge Like a Human Being?

07/18/2022
by   Sangmyeong Woh, et al.
10

As computer vision and NLP make progress, Vision-Language(VL) is becoming an important area of research. Despite the importance, evaluation metrics of the research domain is still at a preliminary stage of development. In this paper, we propose a quantitative metric "Equivariance Score" and evaluation dataset "Human Puzzle" to assess whether a VL model is understanding an image like a human. We observed that the VL model does not interpret the overall context of an input image but instead shows biases toward a specific object or shape that forms the local context. We aim to quantitatively measure a model's performance in understanding context. To verify the current existing VL model's capability, we sliced the original input image into pieces and randomly placed them, distorting the global context of the image. Our paper discusses each VL model's level of interpretation on global context and addresses how the structural characteristics influenced the results.

READ FULL TEXT

page 2

page 3

page 4

research
05/10/2023

When ChatGPT for Computer Vision Will Come? From 2D to 3D

ChatGPT and its improved variant GPT4 have revolutionized the NLP field ...
research
10/18/2021

BEAMetrics: A Benchmark for Language Generation Evaluation Evaluation

Natural language processing (NLP) systems are increasingly trained to ge...
research
09/22/2017

Novel Evaluation Metrics for Seam Carving based Image Retargeting

Image retargeting effectively resizes images by preserving the recogniza...
research
06/01/2018

Video Description: A Survey of Methods, Datasets and Evaluation Metrics

Automatic video description is useful for assisting the visually impaire...
research
05/31/2022

Cluster-based Evaluation of Automatically Generated Text

While probabilistic language generators have improved dramatically over ...
research
08/23/2019

Neural Text Summarization: A Critical Evaluation

Text summarization aims at compressing long documents into a shorter for...
research
11/04/2022

OSIC: A New One-Stage Image Captioner Coined

Mainstream image caption models are usually two-stage captioners, i.e., ...

Please sign up or login with your details

Forgot password? Click here to reset