An Order Preserving Bilinear Model for Person Detection in Multi-Modal Data

12/20/2017
by   Oytun Ulutan, et al.
0

We propose a new order preserving bilinear framework that exploits low-resolution video for person detection in a multi-modal setting using deep neural networks. In this setting cameras are strategically placed such that less robust sensors, e.g. geophones that monitor seismic activity, are located within the field of views (FOVs) of cameras. The primary challenge is being able to leverage sufficient information from videos where there are less than 40 pixels on targets, while also taking advantage of less discriminative information from other modalities, e.g. seismic. Unlike state-of-the-art methods, our bilinear framework retains spatio-temporal order when computing the vector outer products between pairs of features. Despite the high dimensionality of these outer products, we demonstrate that our order preserving bilinear framework yields better performance than recent orderless bilinear models and alternative fusion methods.

READ FULL TEXT

page 2

page 3

page 6

research
11/29/2016

Is a picture worth a thousand words? A Deep Multi-Modal Fusion Architecture for Product Classification in e-commerce

Classifying products into categories precisely and efficiently is a majo...
research
09/16/2018

Towards Good Practices for Multi-modal Fusion in Large-scale Video Classification

Leveraging both visual frames and audio has been experimentally proven e...
research
06/03/2022

A unifying framework for tangential interpolation of structured bilinear control systems

In this paper, we consider the structure-preserving model order reductio...
research
03/31/2020

X-Linear Attention Networks for Image Captioning

Recent progress on fine-grained visual recognition and visual question a...
research
05/24/2018

Backpropagation with N-D Vector-Valued Neurons Using Arbitrary Bilinear Products

Vector-valued neural learning has emerged as a promising direction in de...
research
11/25/2018

Temporal Bilinear Networks for Video Action Recognition

Temporal modeling in videos is a fundamental yet challenging problem in ...
research
05/05/2023

A technical note on bilinear layers for interpretability

The ability of neural networks to represent more features than neurons m...

Please sign up or login with your details

Forgot password? Click here to reset