Top1 Solution of QQ Browser 2021 Ai Algorithm Competition Track 1 : Multimodal Video Similarity

10/30/2021
by   Zhuoran Ma, et al.
0

In this paper, we describe the solution to the QQ Browser 2021 Ai Algorithm Competition (AIAC) Track 1. We use the multi-modal transformer model for the video embedding extraction. In the pretrain phase, we train the model with three tasks, (1) Video Tag Classification (VTC), (2) Mask Language Modeling (MLM) and (3) Mask Frame Modeling (MFM). In the finetune phase, we train the model with video similarity based on rank normalized human labels. Our full pipeline, after ensembling several models, scores 0.852 on the leaderboard, which we achieved the 1st place in the competition. The source codes have been released at Github.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/24/2023

3rd Place Solution to Meta AI Video Similarity Challenge

This paper presents our 3rd place solution in both Descriptor Track and ...
research
08/18/2021

The Multi-Modal Video Reasoning and Analyzing Competition

In this paper, we introduce the Multi-Modal Video Reasoning and Analyzin...
research
10/06/2021

2nd Place Solution to Google Landmark Recognition Competition 2021

As Transformer-based architectures have recently shown encouraging progr...
research
05/25/2023

A Similarity Alignment Model for Video Copy Segment Matching

With the development of multimedia technology, Video Copy Detection has ...
research
11/15/2021

2nd Place Solution to Facebook AI Image Similarity Challenge Matching Track

This paper presents the 2nd place solution to the Facebook AI Image Simi...
research
05/22/2020

microPhantom: Playing microRTS under uncertainty and chaos

This competition paper presents microPhantom, a bot playing to microRTS ...
research
07/20/2021

Critic Guided Segmentation of Rewarding Objects in First-Person Views

This work discusses a learning approach to mask rewarding objects in ima...

Please sign up or login with your details

Forgot password? Click here to reset