Relational Graph Learning on Visual and Kinematics Embeddings for Accurate Gesture Recognition in Robotic Surgery

by   Yong-Hao Long, et al.

Automatic surgical gesture recognition is fundamentally important to enable intelligent cognitive assistance in robotic surgery. With recent advancement in robot-assisted minimally invasive surgery, rich information including surgical videos and robotic kinematics can be recorded, which provide complementary knowledge for understanding surgical gestures. However, existing methods either solely adopt uni-modal data or directly concatenate multi-modal representations, which can not sufficiently exploit the informative correlations inherent in visual and kinematics data to boost gesture recognition accuracies. In this regard, we propose a novel approach of multimodal relational graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information through interactive message propagation in the latent feature space. In specific, we first extract embeddings from video and kinematics sequences with temporal convolutional networks and LSTM units. Next, we identify multi-relations in these multi-modal features and model them through a hierarchical relational graph learning module. The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset, outperforming current uni-modal and multi-modal methods on both suturing and knot typing tasks. Furthermore, we validated our method on in-house visual-kinematics datasets collected with da Vinci Research Kit (dVRK) platforms in two centers, with consistent promising performance achieved.


page 1

page 2

page 6


Joint Surgical Gesture and Task Classification with Multi-Task and Multimodal Learning

We propose a novel multi-modal and multi-task architecture for simultane...

Domain Adaptive Robotic Gesture Recognition with Unsupervised Kinematic-Visual Data Alignment

Automated surgical gesture recognition is of great importance in robot-a...

Multimodal and self-supervised representation learning for automatic gesture recognition in surgical robotics

Self-supervised, multi-modal learning has been successful in holistic re...

Gesture Recognition in Robotic Surgery: a Review

Objective: Surgical activity recognition is a fundamental step in comput...

Visual-Kinematics Graph Learning for Procedure-agnostic Instrument Tip Segmentation in Robotic Surgeries

Accurate segmentation of surgical instrument tip is an important task fo...

Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures

Recent advancements in surgical computer vision applications have been d...

Surgical Gesture Recognition Based on Bidirectional Multi-Layer Independently RNN with Explainable Spatial Feature Extraction

Minimally invasive surgery mainly consists of a series of sub-tasks, whi...

Please sign up or login with your details

Forgot password? Click here to reset