Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery

by   Shawn Tan, et al.

We release the largest public ECG dataset of continuous raw signals for representation learning containing 11 thousand patients and 2 billion labelled beats. Our goal is to enable semi-supervised ECG models to be made as well as to discover unknown subtypes of arrhythmia and anomalous ECG signal events. To this end, we propose an unsupervised representation learning task, evaluated in a semi-supervised fashion. We provide a set of baselines for different feature extractors that can be built upon. Additionally, we perform qualitative evaluations on results from PCA embeddings, where we identify some clustering of known subtypes indicating the potential for representation learning in arrhythmia sub-type discovery.


page 1

page 2

page 3

page 4


Self-supervised representation learning from 12-lead ECG data

We put forward a comprehensive assessment of self-supervised representat...

VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation

We introduce VoxPopuli, a large-scale multilingual corpus providing 100K...

Iterative Frame-Level Representation Learning And Classification For Semi-Supervised Temporal Action Segmentation

Temporal action segmentation classifies the action of each frame in (lon...

CURL: Co-trained Unsupervised Representation Learning for Image Classification

In this paper we propose a strategy for semi-supervised image classifica...

Self-supervised ECG Representation Learning for Emotion Recognition

We present a self-supervised deep multi-task learning framework for elec...

Deep Clustering with Measure Propagation

Deep models have improved state-of-the-art for both supervised and unsup...

On Deep Representation Learning from Noisy Web Images

The keep-growing content of Web images may be the next important data so...