Understanding Human-Centric Images: From Geometry to Fashion

12/14/2015
by   Edgar Simo-Serra, et al.
0

Understanding humans from photographs has always been a fundamental goal of computer vision. In this thesis we have developed a hierarchy of tools that cover a wide range of topics with the objective of understanding humans from monocular RGB image: from low level feature point descriptors to high level fashion-aware conditional random fields models. In order to build these high level models it is paramount to have a battery of robust and reliable low and mid level cues. Along these lines, we have proposed two low-level keypoint descriptors: one based on the theory of the heat diffusion on images, and the other that uses a convolutional neural network to learn discriminative image patch representations. We also introduce distinct low-level generative models for representing human pose: in particular we present a discrete model based on a directed acyclic graph and a continuous model that consists of poses clustered on a Riemannian manifold. As mid level cues we propose two 3D human pose estimation algorithms: one that estimates the 3D pose given a noisy 2D estimation, and an approach that simultaneously estimates both the 2D and 3D pose. Finally, we formulate higher level models built upon low and mid level cues for understanding humans from single images. Concretely, we focus on two different tasks in the context of fashion: semantic segmentation of clothing, and predicting the fashionability from images with metadata to ultimately provide fashion advice to the user. For all presented approaches we present extensive results and comparisons against the state-of-the-art and show significant improvements on the entire variety of tasks we tackle.

READ FULL TEXT

page 24

page 25

page 27

page 37

page 38

research
12/27/2013

Learning Human Pose Estimation Features with Convolutional Networks

This paper introduces a new architecture for human pose estimation using...
research
09/05/2017

SeDAR - Semantic Detection and Ranging: Humans can localise without LiDAR, can robots?

How does a person work out their location using a floorplan? It is proba...
research
07/31/2023

DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose Estimation

Denoising diffusion probabilistic models that were initially proposed fo...
research
11/17/2016

Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation

Most recent approaches to monocular 3D human pose estimation rely on Dee...
research
01/14/2013

Unsupervised Feature Learning for low-level Local Image Descriptors

Unsupervised feature learning has shown impressive results for a wide ra...
research
08/10/2017

Joint Multi-Person Pose Estimation and Semantic Part Segmentation

Human pose estimation and semantic part segmentation are two complementa...
research
10/01/2014

Coupling Top-down and Bottom-up Methods for 3D Human Pose and Shape Estimation from Monocular Image Sequences

Until recently Intelligence, Surveillance, and Reconnaissance (ISR) focu...

Please sign up or login with your details

Forgot password? Click here to reset