Learning Joint Representation of Human Motion and Language

10/27/2022
by   Jihoon Kim, et al.
0

In this work, we present MoLang (a Motion-Language connecting model) for learning joint representation of human motion and language, leveraging both unpaired and paired datasets of motion and language modalities. To this end, we propose a motion-language model with contrastive learning, empowering our model to learn better generalizable representations of the human motion domain. Empirical results show that our model learns strong representations of human motion data through navigating language modality. Our proposed method is able to perform both action recognition and motion retrieval tasks with a single model where it outperforms state-of-the-art approaches on a number of action recognition benchmarks.

READ FULL TEXT
research
03/06/2019

Learning multimodal representations for sample-efficient recognition of human actions

Humans interact in rich and diverse ways with the environment. However, ...
research
10/16/2020

Pose And Joint-Aware Action Recognition

Most human action recognition systems typically consider static appearan...
research
03/03/2021

Domain and View-point Agnostic Hand Action Recognition

Hand action recognition is a special case of human action recognition wi...
research
03/19/2018

Deja Vu: Motion Prediction in Static Images

This paper proposes motion prediction in single still images by learning...
research
08/24/2016

A Study of Vision based Human Motion Recognition and Analysis

Vision based human motion recognition has fascinated many researchers du...
research
09/12/2023

Grounded Language Acquisition From Object and Action Imagery

Deep learning approaches to natural language processing have made great ...

Please sign up or login with your details

Forgot password? Click here to reset