Understanding Concept Identification as Consistent Data Clustering Across Multiple Feature Spaces

01/13/2023
by   Felix Lanfermannn, et al.
0

Identifying meaningful concepts in large data sets can provide valuable insights into engineering design problems. Concept identification aims at identifying non-overlapping groups of design instances that are similar in a joint space of all features, but which are also similar when considering only subsets of features. These subsets usually comprise features that characterize a design with respect to one specific context, for example, constructive design parameters, performance values, or operation modes. It is desirable to evaluate the quality of design concepts by considering several of these feature subsets in isolation. In particular, meaningful concepts should not only identify dense, well separated groups of data instances, but also provide non-overlapping groups of data that persist when considering pre-defined feature subsets separately. In this work, we propose to view concept identification as a special form of clustering algorithm with a broad range of potential applications beyond engineering design. To illustrate the differences between concept identification and classical clustering algorithms, we apply a recently proposed concept identification algorithm to two synthetic data sets and show the differences in identified solutions. In addition, we introduce the mutual information measure as a metric to evaluate whether solutions return consistent clusters across relevant subsets. To support the novel understanding of concept identification, we consider a simulated data set from a decision-making problem in the energy management domain and show that the identified clusters are more interpretable with respect to relevant feature subsets than clusters found by common clustering algorithms and are thus more suitable to support a decision maker.

READ FULL TEXT

page 1

page 6

page 9

research
06/14/2023

Identification of Energy Management Configuration Concepts from a Set of Pareto-optimal Solutions

Optimizing building configurations for an efficient use of energy is inc...
research
07/31/2020

Identifying meaningful clusters in malware data

Finding meaningful clusters in drive-by-download malware data is a parti...
research
04/08/2020

Search Result Clustering in Collaborative Sound Collections

The large size of nowadays' online multimedia databases makes retrieving...
research
02/03/2023

A Novel Fuzzy Bi-Clustering Algorithm with AFS for Identification of Co-Regulated Genes

The identification of co-regulated genes and their transcription-factor ...
research
11/10/2022

DiSC: Differential Spectral Clustering of Features

Selecting subsets of features that differentiate between two conditions ...
research
01/22/2016

When is Clustering Perturbation Robust?

Clustering is a fundamental data mining tool that aims to divide data in...
research
11/25/2015

A Short Survey on Data Clustering Algorithms

With rapidly increasing data, clustering algorithms are important tools ...

Please sign up or login with your details

Forgot password? Click here to reset