Relational Graph Learning on Visual and Kinematics Embeddings for Accurate Gesture Recognition in Robotic Surgery

by   Yong-Hao Long, et al.

Automatic surgical gesture recognition is fundamentally important to enable intelligent cognitive assistance in robotic surgery. With recent advancement in robot-assisted minimally invasive surgery, rich information including surgical videos and robotic kinematics can be recorded, which provide complementary knowledge for understanding surgical gestures. However, existing methods either solely adopt uni-modal data or directly concatenate multi-modal representations, which can not sufficiently exploit the informative correlations inherent in visual and kinematics data to boost gesture recognition accuracies. In this regard, we propose a novel approach of multimodal relational graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information through interactive message propagation in the latent feature space. In specific, we first extract embeddings from video and kinematics sequences with temporal convolutional networks and LSTM units. Next, we identify multi-relations in these multi-modal features and model them through a hierarchical relational graph learning module. The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset, outperforming current uni-modal and multi-modal methods on both suturing and knot typing tasks. Furthermore, we validated our method on in-house visual-kinematics datasets collected with da Vinci Research Kit (dVRK) platforms in two centers, with consistent promising performance achieved.


page 1

page 2

page 6


Joint Surgical Gesture and Task Classification with Multi-Task and Multimodal Learning

We propose a novel multi-modal and multi-task architecture for simultane...

Domain Adaptive Robotic Gesture Recognition with Unsupervised Kinematic-Visual Data Alignment

Automated surgical gesture recognition is of great importance in robot-a...

Gesture Recognition in Robotic Surgery: a Review

Objective: Surgical activity recognition is a fundamental step in comput...

Multimodal and self-supervised representation learning for automatic gesture recognition in surgical robotics

Self-supervised, multi-modal learning has been successful in holistic re...

Symmetric Dilated Convolution for Surgical Gesture Recognition

Automatic surgical gesture recognition is a prerequisite of intra-operat...

Multi-modal Entity Alignment in Hyperbolic Space

Many AI-related tasks involve the interactions of data in multiple modal...

Automatic Gesture Recognition in Robot-assisted Surgery with Reinforcement Learning and Tree Search

Automatic surgical gesture recognition is fundamental for improving inte...