Log In Sign Up

Smoothed Gaussian Mixture Models for Video Classification and Recommendation

by   Sirjan Kafle, et al.

Cluster-and-aggregate techniques such as Vector of Locally Aggregated Descriptors (VLAD), and their end-to-end discriminatively trained equivalents like NetVLAD have recently been popular for video classification and action recognition tasks. These techniques operate by assigning video frames to clusters and then representing the video by aggregating residuals of frames with respect to the mean of each cluster. Since some clusters may see very little video-specific data, these features can be noisy. In this paper, we propose a new cluster-and-aggregate method which we call smoothed Gaussian mixture model (SGMM), and its end-to-end discriminatively trained equivalent, which we call deep smoothed Gaussian mixture model (DSGMM). SGMM represents each video by the parameters of a Gaussian mixture model (GMM) trained for that video. Low-count clusters are addressed by smoothing the video-specific estimates with a universal background model (UBM) trained on a large number of videos. The primary benefit of SGMM over VLAD is smoothing which makes it less sensitive to small number of training samples. We show, through extensive experiments on the YouTube-8M classification task, that SGMM/DSGMM is consistently better than VLAD/NetVLAD by a small but statistically significant margin. We also show results using a dataset created at LinkedIn to predict if a member will watch an uploaded video.


page 1

page 2

page 3

page 4


EGMM: an Evidential Version of the Gaussian Mixture Model for Clustering

The Gaussian mixture model (GMM) provides a convenient yet principled fr...

A sparse negative binomial mixture model for clustering RNA-seq count data

Clustering with variable selection is a challenging but critical task fo...

No frame left behind: Full Video Action Recognition

Not all video frames are equally informative for recognizing an action. ...

ClustRank: a Visual Quality Measure Trained on Perceptual Data for Sorting Scatterplots by Cluster Patterns

Visual quality measures (VQMs) are designed to support analysts by autom...

Pluralistic Image Completion with Probabilistic Mixture-of-Experts

Pluralistic image completion focuses on generating both visually realist...

A GMBCG Galaxy Cluster Catalog of 55,424 Rich Clusters from SDSS DR7

We present a large catalog of optically selected galaxy clusters from th...

Apple Counting using Convolutional Neural Networks

Estimating accurate and reliable fruit and vegetable counts from images ...