DAF:re: A Challenging, Crowd-Sourced, Large-Scale, Long-Tailed Dataset For Anime Character Recognition

01/21/2021
by   Edwin Arkel Rios, et al.
0

In this work we tackle the challenging problem of anime character recognition. Anime, referring to animation produced within Japan and work derived or inspired from it. For this purpose we present DAF:re (DanbooruAnimeFaces:revamped), a large-scale, crowd-sourced, long-tailed dataset with almost 500 K images spread across more than 3000 classes. Additionally, we conduct experiments on DAF:re and similar datasets using a variety of classification models, including CNN based ResNets and self-attention based Vision Transformer (ViT). Our results give new insights into the generalization and transfer learning properties of ViT models on substantially different domain datasets from those used for the upstream pre-training, including the influence of batch and image size in their training. Additionally, we share our dataset, source-code, pre-trained checkpoints and results, as Animesion, the first end-to-end framework for large-scale anime character recognition: https://github.com/arkel23/animesion

READ FULL TEXT

page 1

page 3

research
01/10/2020

NWPU-Crowd: A Large-Scale Benchmark for Crowd Counting

In the last decade, crowd counting attracts much attention of researcher...
research
01/23/2022

AttentionHTR: Handwritten Text Recognition Based on Attention Encoder-Decoder Networks

This work proposes an attention-based sequence-to-sequence model for han...
research
09/30/2021

SCIMAT: Science and Mathematics Dataset

In this work, we announce a comprehensive well curated and opensource da...
research
08/15/2021

HCR-Net: A deep learning based script independent handwritten character recognition network

Handwritten character recognition (HCR) is a challenging learning proble...
research
11/14/2022

The Birds Need Attention Too: Analysing usage of Self Attention in identifying bird calls in soundscapes

Birds are vital parts of ecosystems across the world and are an excellen...
research
09/11/2023

CNN or ViT? Revisiting Vision Transformers Through the Lens of Convolution

The success of Vision Transformer (ViT) has been widely reported on a wi...
research
03/16/2021

A Large-Scale Dataset for Benchmarking Elevator Button Segmentation and Character Recognition

Human activities are hugely restricted by COVID-19, recently. Robots tha...

Please sign up or login with your details

Forgot password? Click here to reset