Introduction to the 1st Place Winning Model of OpenImages Relationship Detection Challenge

11/01/2018
by   Ji Zhang, et al.
16

This article describes the model we built that achieved 1st place in the OpenImage Visual Relationship Detection Challenge on Kaggle. Three key factors contribute the most to our success: 1) language bias is a powerful baseline for this task. We build the empirical distribution P(predicate|subject,object) in the training set and directly use that in testing. This baseline achieved the 2nd place when submitted; 2) spatial features are as important as visual features, especially for spatial relationships such as "under" and "inside of"; 3) It is a very effective way to fuse different features by first building separate modules for each of them, then adding their output logits before the final softmax layer. We show in ablation study that each factor can improve the performance to a non-trivial extent, and the model reaches optimal when all of them are combined.

READ FULL TEXT

page 2

page 3

research
11/21/2018

An Interpretable Model for Scene Graph Generation

We propose an efficient and interpretable scene graph generator. We cons...
research
12/12/2019

Learning Effective Visual Relationship Detector on 1 GPU

We present our winning solution to the Open Images 2019 Visual Relations...
research
09/26/2018

A Problem Reduction Approach for Visual Relationships Detection

Identifying different objects (man and cup) is an important problem on i...
research
04/16/2019

Visual Relationship Detection with Language prior and Softmax

Visual relationship detection is an intermediate image understanding tas...
research
10/14/2022

Intel Labs at Ego4D Challenge 2022: A Better Baseline for Audio-Visual Diarization

This report describes our approach for the Audio-Visual Diarization (AVD...
research
08/07/2017

What Makes a Place? Building Bespoke Place Dependent Object Detectors for Robotics

This paper is about enabling robots to improve their perceptual performa...
research
06/18/2023

STHG: Spatial-Temporal Heterogeneous Graph Learning for Advanced Audio-Visual Diarization

This report introduces our novel method named STHG for the Audio-Visual ...

Please sign up or login with your details

Forgot password? Click here to reset