Convolutional Nets Versus Vision Transformers for Diabetic Foot Ulcer Classification

11/12/2021
by   Adrian Galdran, et al.
0

This paper compares well-established Convolutional Neural Networks (CNNs) to recently introduced Vision Transformers for the task of Diabetic Foot Ulcer Classification, in the context of the DFUC 2021 Grand-Challenge, in which this work attained the first position. Comprehensive experiments demonstrate that modern CNNs are still capable of outperforming Transformers in a low-data regime, likely owing to their ability for better exploiting spatial correlations. In addition, we empirically demonstrate that the recent Sharpness-Aware Minimization (SAM) optimization algorithm considerably improves the generalization capability of both kinds of models. Our results demonstrate that for this task, the combination of CNNs and the SAM optimization process results in superior performance than any other of the considered approaches.

READ FULL TEXT
research
11/22/2021

Semi-Supervised Vision Transformers

We study the training of Vision Transformers for semi-supervised image c...
research
07/13/2021

CMT: Convolutional Neural Networks Meet Vision Transformers

Vision transformers have been successfully applied to image recognition ...
research
04/07/2023

Deepfake Detection with Deep Learning: Convolutional Neural Networks versus Transformers

The rapid evolvement of deepfake creation technologies is seriously thre...
research
08/09/2022

How Well Do Vision Transformers (VTs) Transfer To The Non-Natural Image Domain? An Empirical Study Involving Art Classification

Vision Transformers (VTs) are becoming a valuable alternative to Convolu...
research
06/28/2022

Cross-Forgery Analysis of Vision Transformers and CNNs for Deepfake Image Detection

Deepfake Generation Techniques are evolving at a rapid pace, making it p...
research
03/01/2022

Tricks and Plugins to GBM on Images and Sequences

Convolutional neural networks (CNNs) and transformers, which are compose...
research
10/13/2022

Vision Transformers provably learn spatial structure

Vision Transformers (ViTs) have achieved comparable or superior performa...

Please sign up or login with your details

Forgot password? Click here to reset