DeepAI AI Chat
Log In Sign Up

Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications

by   Biagio Brattoli, et al.
University of Heidelberg
California Institute of Technology

Trained on large datasets, deep learning (DL) can accurately classify videos into hundreds of diverse classes. However, video data is expensive to annotate. Zero-shot learning (ZSL) proposes one solution to this problem. ZSL trains a model once, and generalizes to new tasks whose classes are not present in the training dataset. We propose the first end-to-end algorithm for ZSL in video classification. Our training procedure builds on insights from recent video classification literature and uses a trainable 3D CNN to learn the visual features. This is in contrast to previous video ZSL methods, which use pretrained feature extractors. We also extend the current benchmarking paradigm: Previous techniques aim to make the test task unknown at training time but fall short of this goal. We encourage domain shift across training and test data and disallow tailoring a ZSL model to a specific test dataset. We outperform the state-of-the-art by a wide margin. Our code, evaluation procedure and model weights are available at


page 1

page 2

page 15

page 16

page 17

page 18


Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language

Learning to classify video data from classes not included in the trainin...

Temporal and cross-modal attention for audio-visual zero-shot learning

Audio-visual generalised zero-shot learning for video classification req...

Synthesized Classifiers for Zero-Shot Learning

Given semantic descriptions of object classes, zero-shot learning aims t...

Zero-Shot Action Recognition with Transformer-based Video Semantic Embedding

While video action recognition has been an active area of research for s...

Alignment-Uniformity aware Representation Learning for Zero-shot Video Classification

Most methods tackle zero-shot video classification by aligning visual-se...

Few-Shot Video Classification via Temporal Alignment

There is a growing interest in learning a model which could recognize no...

One Line To Rule Them All: Generating LO-Shot Soft-Label Prototypes

Increasingly large datasets are rapidly driving up the computational cos...

Code Repositories


Zero-shot video classification by end-to-end training of 3D convolutional neural networks

view repo