The Evolution of User-Selected Passwords: A Quantitative Analysis of Publicly Available Datasets

04/11/2018
by   Theodosis Mourouzis, et al.
0

The aim of this work is to study the evolution of password selection among users. We investigate whether users follow best practices when selecting passwords and identify areas in need of improvement. Four distinct publicly-available password datasets (obtained from security breaches, compiled by security experts, and designated as containing bad passwords) are employed. As these datasets were released at different times, the distributions characterizing these datasets suggest a chronological evolution of password selection. A similarity metric, Levenshtein distance, is used to compare passwords in each dataset against the designated benchmark of bad passwords. The resulting distributions of normalized similarity scores are then compared to each other. The comparison reveals an overall increase in the mean of the similarity distributions corresponding to more recent datasets, implying a shift away from the use of bad passwords. This conclusion is corroborated by the passwords' clustering behavior. An encoding capturing best practices maps passwords to a high dimensional space over which a k-means clustering (with silhouette coefficient) analysis is performed. Cluster comparison and character frequency analysis indicates an improvement in password selection over time with respect to certain features (length, mixing character types), yet certain discouraged practices (name inclusion, selection bias) still persist.

READ FULL TEXT

page 9

page 16

research
06/24/2016

Multipartite Ranking-Selection of Low-Dimensional Instances by Supervised Projection to High-Dimensional Space

Pruning of redundant or irrelevant instances of data is a key to every s...
research
06/27/2021

Point to Point Ethernet TransmissionWireless Backhaul Links Clustering

Arxiv is acting weird and throwing error: "Bad character(s) in field Abs...
research
06/25/2023

Evolution of K-means solution landscapes with the addition of dataset outliers and a robust clustering comparison measure for their analysis

The K-means algorithm remains one of the most widely-used clustering met...
research
03/12/2018

The family resemblance of technologically mediated work practices

Practice-based perspectives in information systems have established how,...
research
12/04/2018

Bad practices in evaluation methodology relevant to class-imbalanced problems

For research to go in the right direction, it is essential to be able to...
research
12/08/2020

The Role of Interpretable Patterns in Deep Learning for Morphology

We examine the role of character patterns in three tasks: morphological ...
research
12/22/2022

Cross-Dataset Propensity Estimation for Debiasing Recommender Systems

Datasets for training recommender systems are often subject to distribut...

Please sign up or login with your details

Forgot password? Click here to reset