Granular Learning with Deep Generative Models using Highly Contaminated Data
An approach to utilize recent advances in deep generative models for anomaly detection in a granular (continuous) sense on a real-world image dataset with quality issues is detailed using recent normalizing flow models, with implications in many other applications/domains/data types. The approach is completely unsupervised (no annotations available) but qualitatively shown to provide accurate semantic labeling for images via heatmaps of the scaled log-likelihood overlaid on the images. When sorted based on the median values per image, clear trends in quality are observed. Furthermore, downstream classification is shown to be possible and effective via a weakly supervised approach using the log-likelihood output from a normalizing flow model as a training signal for a feature-extracting convolutional neural network. The pre-linear dense layer outputs on the CNN are shown to disentangle high level representations and efficiently cluster various quality issues. Thus, an entirely non-annotated (fully unsupervised) approach is shown possible for accurate estimation and classification of quality issues..
READ FULL TEXT