FusedLSTM: Fusing frame-level and video-level features for Content-based Video Relevance Prediction

09/29/2018
by   Yash Bhalgat, et al.
0

This paper describes two of my best performing approaches on the Content-based Video Relevance Prediction challenge. In the FusedLSTM based approach, the inception-pool3 and the C3D-pool5 features are combined using an LSTM and a dense layer to form embeddings with the objective to minimize the triplet loss function. In the second approach, an Online Kernel Similarity Learning method is proposed to learn a non-linear similarity measure to adhere the relevance training data. The last section gives a complete comparison of all the approaches implemented during this challenge, including the one presented in the baseline paper.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/03/2018

Content-based Video Relevance Prediction Challenge: Data, Protocol, and Baseline

Video relevance prediction is one of the most important tasks for online...
research
06/19/2020

A Symbolic Temporal Pooling method for Video-based Person Re-Identification

In video-based person re-identification, both the spatial and temporal f...
research
12/09/2015

Video captioning with recurrent networks based on frame- and video-level features and visual content classification

In this paper, we describe the system for generating textual description...
research
06/18/2021

Multi-Granularity Network with Modal Attention for Dense Affective Understanding

Video affective understanding, which aims to predict the evoked expressi...
research
04/12/2020

Inception LSTM

In this paper, we proposed a novel deep-learning method called Inception...

Please sign up or login with your details

Forgot password? Click here to reset