EchoCoTr: Estimation of the Left Ventricular Ejection Fraction from Spatiotemporal Echocardiography

09/09/2022
by   Rand Muhtaseb, et al.
0

Learning spatiotemporal features is an important task for efficient video understanding especially in medical images such as echocardiograms. Convolutional neural networks (CNNs) and more recent vision transformers (ViTs) are the most commonly used methods with limitations per each. CNNs are good at capturing local context but fail to learn global information across video frames. On the other hand, vision transformers can incorporate global details and long sequences but are computationally expensive and typically require more data to train. In this paper, we propose a method that addresses the limitations we typically face when training on medical video data such as echocardiographic scans. The algorithm we propose (EchoCoTr) utilizes the strength of vision transformers and CNNs to tackle the problem of estimating the left ventricular ejection fraction (LVEF) on ultrasound videos. We demonstrate how the proposed method outperforms state-of-the-art work to-date on the EchoNet-Dynamic dataset with MAE of 3.95 and R^2 of 0.82. These results show noticeable improvement compared to all published research. In addition, we show extensive ablations and comparisons with several algorithms, including ViT and BERT. The code is available at https://github.com/BioMedIA-MBZUAI/EchoCoTr.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/31/2023

Hierarchical Vision Transformers for Cardiac Ejection Fraction Estimation

The left ventricular of ejection fraction is one of the most important m...
research
01/12/2022

Uniformer: Unified Transformer for Efficient Spatiotemporal Representation Learning

It is a challenging task to learn rich and multi-scale spatiotemporal se...
research
03/19/2022

CNNs and Transformers Perceive Hybrid Images Similar to Humans

Hybrid images is a technique to generate images with two interpretations...
research
12/12/2022

Masked autoencoders are effective solution to transformer data-hungry

Vision Transformers (ViTs) outperforms convolutional neural networks (CN...
research
10/21/2022

Boosting vision transformers for image retrieval

Vision transformers have achieved remarkable progress in vision tasks su...
research
02/06/2021

CMS-LSTM: Context-Embedding and Multi-Scale Spatiotemporal-Expression LSTM for Video Prediction

Extracting variation and spatiotemporal features via limited frames rema...
research
11/17/2022

UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer

Learning discriminative spatiotemporal representation is the key problem...

Please sign up or login with your details

Forgot password? Click here to reset