Drop-DTW: Aligning Common Signal Between Sequences While Dropping Outliers

08/26/2021
by   Nikita Dvornik, et al.
2

In this work, we consider the problem of sequence-to-sequence alignment for signals containing outliers. Assuming the absence of outliers, the standard Dynamic Time Warping (DTW) algorithm efficiently computes the optimal alignment between two (generally) variable-length sequences. While DTW is robust to temporal shifts and dilations of the signal, it fails to align sequences in a meaningful way in the presence of outliers that can be arbitrarily interspersed in the sequences. To address this problem, we introduce Drop-DTW, a novel algorithm that aligns the common signal between the sequences while automatically dropping the outlier elements from the matching. The entire procedure is implemented as a single dynamic program that is efficient and fully differentiable. In our experiments, we show that Drop-DTW is a robust similarity measure for sequence retrieval and demonstrate its effectiveness as a training loss on diverse applications. With Drop-DTW, we address temporal step localization on instructional videos, representation learning from noisy videos, and cross-modal representation learning for audio-visual retrieval and localization. In all applications, we take a weakly- or unsupervised approach and demonstrate state-of-the-art results under these settings.

READ FULL TEXT

page 2

page 4

page 6

page 18

page 19

page 20

research
05/11/2021

Representation Learning via Global Temporal Alignment and Cycle-Consistency

We introduce a weakly supervised method for representation learning base...
research
11/10/2021

Space-Time Memory Network for Sounding Object Localization in Videos

Leveraging temporal synchronization and association within sight and sou...
research
06/13/2021

Cross-Modal Attention Consistency for Video-Audio Unsupervised Learning

Cross-modal correlation provides an inherent supervision for video unsup...
research
08/03/2017

Unsupervised Representation Learning by Sorting Sequences

We present an unsupervised representation learning approach using videos...
research
11/07/2018

Y^2Seq2Seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and Prediction of View and Word Sequences

A recent method employs 3D voxels to represent 3D shapes, but this limit...
research
09/03/2023

Semi-supervised 3D Video Information Retrieval with Deep Neural Network and Bi-directional Dynamic-time Warping Algorithm

This paper presents a novel semi-supervised deep learning algorithm for ...
research
08/10/2023

Stabilizing Training with Soft Dynamic Time Warping: A Case Study for Pitch Class Estimation with Weakly Aligned Targets

Soft dynamic time warping (SDTW) is a differentiable loss function that ...

Please sign up or login with your details

Forgot password? Click here to reset