Exploring the Impact of Password Dataset Distribution on Guessing
Leaks from password datasets are a regular occurrence. An organization may defend a leak with reassurances that just a small subset of passwords were taken. In this paper we show that the leak of a relatively small number of text-based passwords from an organizations' stored dataset can lead to a further large collection of users being compromised. Taking a sample of passwords from a given dataset of passwords we exploit the knowledge we gain of the distribution to guess other samples from the same dataset. We show theoretically and empirically that the distribution of passwords in the sample follows the same distribution as the passwords in the whole dataset. We propose a function that measures the ability of one distribution to estimate another. Leveraging this we show that a sample of passwords leaked from a given dataset, will compromise the remaining passwords in that dataset better than a sample leaked from another source.
READ FULL TEXT