An Unsupervised Method for Estimating Class Separability of Datasets with Application to LLMs Fine-Tuning

05/24/2023
by   Najah Ghalyan, et al.
0

This paper proposes an unsupervised method that leverages topological characteristics of data manifolds to estimate class separability of the data without requiring labels. Experiments conducted in this paper on several datasets demonstrate a clear correlation and consistency between the class separability estimated by the proposed method with supervised metrics like Fisher Discriminant Ratio (FDR) and cross-validation of a classifier, which both require labels. This can enable implementing learning paradigms aimed at learning from both labeled and unlabeled data, like semi-supervised and transductive learning. This would be particularly useful when we have limited labeled data and a relatively large unlabeled dataset that can be used to enhance the learning process. The proposed method is implemented for language model fine-tuning with automated stopping criterion by monitoring class separability of the embedding-space manifold in an unsupervised setting. The proposed methodology has been first validated on synthetic data, where the results show a clear consistency between class separability estimated by the proposed method and class separability computed by FDR. The method has been also implemented on both public and internal data. The results show that the proposed method can effectively aid – without the need for labels – a decision on when to stop or continue the fine-tuning of a language model and which fine-tuning iteration is expected to achieve a maximum classification performance through quantification of the class separability of the embedding manifold.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2023

Enhancing CLIP with CLIP: Exploring Pseudolabeling for Limited-Label Prompt Tuning

Fine-tuning vision-language models (VLMs) like CLIP to downstream tasks ...
research
06/17/2020

Big Self-Supervised Models are Strong Semi-Supervised Learners

One paradigm for learning from few labeled examples while making best us...
research
04/26/2018

Competitive Learning Enriches Learning Representation and Accelerates the Fine-tuning of CNNs

In this study, we propose the integration of competitive learning into c...
research
09/28/2022

Prompt-driven efficient Open-set Semi-supervised Learning

Open-set semi-supervised learning (OSSL) has attracted growing interest,...
research
08/24/2023

Towards Realistic Unsupervised Fine-tuning with CLIP

The emergence of vision-language models (VLMs), such as CLIP, has spurre...
research
06/07/2023

3D Human Keypoints Estimation From Point Clouds in the Wild Without Human Labels

Training a 3D human keypoint detector from point clouds in a supervised ...
research
02/12/2018

Estimating Diffusion With Compound Poisson Jumps Based On Self-normalized Residuals

This paper considers parametric estimation problem of the continuous par...

Please sign up or login with your details

Forgot password? Click here to reset