-
Adaptive Temporal Encoding Network for Video Instance-level Human Parsing
Beyond the existing single-person and multiple-person human parsing task...
read it
-
Learning Motion Flows for Semi-supervised Instrument Segmentation from Robotic Surgical Video
Performing low hertz labeling for surgical videos at intervals can great...
read it
-
Surveillance Video Parsing with Single Frame Supervision
Surveillance video parsing, which segments the video frames into several...
read it
-
Semantics through Time: Semi-supervised Segmentation of Aerial Videos with Iterative Label Propagation
Semantic segmentation is a crucial task for robot navigation and safety....
read it
-
Learning Temporal Pose Estimation from Sparsely-Labeled Videos
Modern approaches for multi-person pose estimation in video require larg...
read it
-
Dual Swap Disentangling
Learning interpretable disentangled representations is a crucial yet cha...
read it
-
Actions as Moving Points
The existing action tubelet detectors mainly depend on heuristic anchor ...
read it
SiamParseNet: Joint Body Parsing and Label Propagation in Infant Movement Videos
General movement assessment (GMA) of infant movement videos (IMVs) is an effective method for the early detection of cerebral palsy (CP) in infants. Automated body parsing is a crucial step towards computer-aided GMA, in which infant body parts are segmented and tracked over time for movement analysis. However, acquiring fully annotated data for video-based body parsing is particularly expensive due to the large number of frames in IMVs. In this paper, we propose a semi-supervised body parsing model, termed SiamParseNet (SPN), to jointly learn single frame body parsing and label propagation between frames in a semi-supervised fashion. The Siamese-structured SPN consists of a shared feature encoder, followed by two separate branches: one for intra-frame body parts segmentation, and one for inter-frame label propagation. The two branches are trained jointly, taking pairs of frames from the same videos as their input. An adaptive training process is proposed that alternates training modes between using input pairs of only labeled frames and using inputs of both labeled and unlabeled frames. During testing, we employ a multi-source inference mechanism, where the final result for a test frame is either obtained via the segmentation branch or via propagation from a nearby key frame. We conduct extensive experiments on a partially-labeled IMV dataset where SPN outperforms all prior arts, demonstrating the effectiveness of our proposed method.
READ FULL TEXT
Comments
There are no comments yet.