Clustering Indices based Automatic Classification Model Selection

05/23/2023
by   Sudarsun Santhiappan, et al.
0

Classification model selection is a process of identifying a suitable model class for a given classification task on a dataset. Traditionally, model selection is based on cross-validation, meta-learning, and user preferences, which are often time-consuming and resource-intensive. The performance of any machine learning classification task depends on the choice of the model class, the learning algorithm, and the dataset's characteristics. Our work proposes a novel method for automatic classification model selection from a set of candidate model classes by determining the empirical model-fitness for a dataset based only on its clustering indices. Clustering Indices measure the ability of a clustering algorithm to induce good quality neighborhoods with similar data characteristics. We propose a regression task for a given model class, where the clustering indices of a given dataset form the features and the dependent variable represents the expected classification performance. We compute the dataset clustering indices and directly predict the expected classification performance using the learned regressor for each candidate model class to recommend a suitable model class for dataset classification. We evaluate our model selection method through cross-validation with 60 publicly available binary class datasets and show that our top3 model recommendation is accurate for over 45 of 60 datasets. We also propose an end-to-end Automated ML system for data classification based on our model selection method. We evaluate our end-to-end system against popular commercial and noncommercial Automated ML systems using a different collection of 25 public domain binary class datasets. We show that the proposed system outperforms other methods with an excellent average rank of 1.68.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/18/2018

Cross Validation Based Model Selection via Generalized Method of Moments

Structural estimation is an important methodology in empirical economics...
research
12/24/2020

Leave Zero Out: Towards a No-Cross-Validation Approach for Model Selection

As the main workhorse for model selection, Cross Validation (CV) has ach...
research
05/15/2019

Automatic Model Selection for Neural Networks

Neural networks and deep learning are changing the way that artificial i...
research
09/29/2022

Dataset Complexity Assessment Based on Cumulative Maximum Scaled Area Under Laplacian Spectrum

Dataset complexity assessment aims to predict classification performance...
research
11/16/2020

Automatic selection of clustering algorithms using supervised graph embedding

The widespread adoption of machine learning (ML) techniques and the exte...
research
11/11/2020

A Survey and Implementation of Performance Metrics for Self-Organized Maps

Self-Organizing Map algorithms have been used for almost 40 years across...
research
05/05/2013

Efficient Estimation of the number of neighbours in Probabilistic K Nearest Neighbour Classification

Probabilistic k-nearest neighbour (PKNN) classification has been introdu...

Please sign up or login with your details

Forgot password? Click here to reset