Lost in Disclosure: On The Inference of Password Composition Policies

03/12/2020 ∙ by Saul Johnson, et al. ∙ Teesside University 0

Large-scale password data breaches are becoming increasingly commonplace, which has enabled researchers to produce a substantial body of password security research utilising real-world password datasets, which often contain numbers of records in the tens or even hundreds of millions. While much study has been conducted on how password composition policies (sets of rules that a user must abide by when creating a password) influence the distribution of user-chosen passwords on a system, much less research has been done on inferring the password composition policy that a given set of user-chosen passwords was created under. In this paper, we state the problem with the naive approach to this challenge, and suggest a simple approach that produces more reliable results. We also present pol-infer, a tool that implements this approach, and demonstrates its use in inferring password composition policies.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

When cybercriminals compromise a user credential database and release its contents into the public arena, a number of different interested parties might seek to obtain and use the data it contains, with varying goals in mind. These might include, for instance, other groups of cybercriminals seeking to employ the data in credential stuffing attacks [1], and security researchers seeking to understand user password choice on the system concerned [19, 14, 18]. In particular, the latter group may be concerned with the password composition policy the passwords in the database were created under, in order to better understand how these rules around user password creation affect the distribution of user password choices.

Security researchers may find themselves confounded in this endeavour, however, because when the breached user credential database is released to the public, information about the password composition policy in place at the time of the breach is often not included. This could be because the party behind the breach does not think it relevant, wishes to keep their methods as secret as possible, or never sought this information out in the first place—after all, the password composition policy is of comparatively little interest to malicious actors seeking to directly employ the credentials in the database to criminal ends. The only other party known to have this information is the organisation that was the victim of the data breach in the first place, who by this point may be unable or unwilling to disclose any information regarding their security practices. Reasons for this might include, for example:

  • The organisation may have ceased to exist entirely, prior to the time at which the research in question is being conducted. There are several examples of this happening in the real world, for example the now-defunct Christian dating site singles.org [16] which ceased to exist sometime after 2009 when their entire user credential database was compromised in plaintext.

  • The organisation might be understandably reluctant to disclose any information regarding their security practices for fear of being further targeted or incriminating themselves by confessing to having taken inadequate measures to safeguard user data. This is especially the case in Europe, where tightening legislation around data protection [5] might make the latter point of particular concern.

If we cannot obtain a description of the password composition policy from any of the organisations involved in the breach, this information has been lost in disclosure—that is, lost somewhere in the process of the transfer of data between parties. We are therefore forced to turn to the data that we do have to attempt to infer as much of that lost information as we can.

Dataset Policy Size
RockYou [4] 32,603,048
Yahoo [10] 453,492
000webhost [15] 15,271,208
LinkedIn [2] 172,428,238
TABLE I: The four real-world breached password datasets studied in this work, alongside their corresponding policies according to [8, 13], and numbers of passwords within them.

There is no shortage of breached user credential databases available online. Arguably the most well-known of these, the RockYou set [4], like many others (e.g. the Yahoo [10] or 000webhost [15] sets) contains passwords that do not comply with the password composition policy in place when the breach happened (see Tables I and II). Reasons for this “noise” vary, but include:

  • Multiple password composition policies per dump—the RockYou set, for example, is an aggregate made up of at least two tables: one containing passwords to the main web application and one containing passwords used to log in to “partner services” (e.g. MySpace) which may enforce different policies [4]. Passwords created under old policies may also be present. RockYou, for instance, changed their policy after their data breach in 2009 from minimum 5 characters in length [8] to a stronger policy [13, 7]. In this case, our methodology gives the password composition policy that the majority of passwords were created under, though there is scope for improving upon this in future work (see Section VI).

  • Formatting errors—when the raw data is being processed by the exfiltrating party, errors may be introduced if their data processing scripts are not robust. For example, passwords containing spaces may be read as two separate data points.

  • Intentional padding—if cybercriminals initially offer the data for sale, the price that they are capable of obtaining is often contingent on the number of records it contains. It is therefore possible that the dataset may be intentionally padded with extra records, some of which might contain non-compliant passwords.

Dataset Compliant Non-compliant
RockYou [4] 32524461 78587 (0.24%)
Yahoo [10] 444942 8550 (1.89%)
000webhost [15] 14936872 334336 (2.19%)
LinkedIn [2] 172409689 18549 (0.01%)
TABLE II: A breakdown of the number of compliant and non-compliant passwords present in each dataset listed in Table I, according to [8, 13].

With “noisy” data like this, we cannot, for example, simply check for the shortest password in the database to determine the minimum password length constraint specified by the policy. In fact, the authors of one published work [12] mention in their publication that the presence of “non-password artifacts” in the RockYou dataset factored in to their choice of research methods, at least in part due to the difficulty of filtering these out. This motivates us to search for a simple, easy-to-implement method to attempt to infer password composition policy rules from a password dataset, which would make filtering out at least some of these artifacts trivial. The remainder of this work outlines an alternative approach that we have found success with.

Contribution

We make the following concrete contributions in this work: (i) for the first time, we draw attention to the problem of “noise” in publicly-available breached password datasets in the form of passwords that do not comply with the password composition policy in place when the breach occurred (ii)

we suggest an easy-to-implement approach to filtering out this noise by converting the problem to one of outlier detection, without consulting any organisation involved in the breach

(iii) we make pol-infer [11] available111Available for download at: https://sr-lab.github.io/pol-infer/, the tool used to produce the data and visualisations in our results (Section IV and Section V).

Outline

We have introduced and motivated the work in this Section I. We describe related work in Section II. In Section III we describe our approach in detail, showing the results we are able to obtain from the four password datasets shown in Table II in Section IV. In Section V we apply our methodology to datasets created to simulate both intentional padding and processing with error-prone data processing scripts. We conclude in Section VI, discussing the limitations of our approach and potential future work.

Ii Related Work

We are not aware of any existing published work that explores the automation of password composition policy inference from large datasets. Previous research has involved determining the password composition policies used by active services. A study by Florêncio and Herley [7] gathered password composition policy information by creating an account on the service, where possible, and performing web searches otherwise. This study was later replicated by Mayer et al. in [13]. In [8], Golla and Dürmuth make extensive use of password data dumps where the password composition policy is known.

Iii Methodology

Our approach is applicable to any numerically-typed password attribute which is a function of type which extracts some password property (e.g. length). By default, pol-infer supports the password attributes in Table III, sufficient to capture the policies used in the study by Shay et al. [17] with the exception of the dictionary check on the comprehensive8 policy, which cannot be expressed as an attribute of this type.

Attribute () Description
length The number of characters in the password (i.e. its length).
words The number of words in the password. We define “words” in the same way as in [17]—as “letter sequences separated by a nonletter sequence”.
lowers The number of lowercase letters in the password.
uppers The number of uppercase letters in the password.
digits The number of digits in the password.
symbols The number of non-alphanumeric characters in the password.
classes The number of character classes in the password. We recognise four character classes in the popular LUDS scheme—lowercase, uppercase, digits and symbols.
TABLE III: Password attributes usable with pol-infer by default. Any attribute appearing the table below can be used by the tool to infer password composition policies.

For instance, let us suppose we wish to infer the minimum length constraint specified by the policy that the 000webhost set [15] was created under (that is, ). In this case, previous research [8] has established that the answer is , and yet the data in Table IV would seem to contradict this—there are passwords shorter than this present in the data.

1 306 306 6.03
2 1540 1846 1.42
3 775 2621 1.47
4 1221 3842 1.66
5 2456 6388 137.23
6 870209 876597 2.38
7 1208092 2084689
TABLE IV: Frequencies of passwords of different lengths in the 000webhost set [15], alongside their cumulative frequencies and the multiplier required to reach the cumulative frequency of the next length .

It is readily apparent how the data in Table IV may be used to determine the minimum length constraint in the 000webhost policy. By observing the outlying value of in the column, we can see that we now have an outlier detection problem. In Table IV, for every length :

We can infer the minimum password length enforced by the password composition policy under which this data was created by looking for the outlying “sudden increase” in , taking where:

For the 000webhost data, this gives us the correct answer . By examining the number of digits in a password, as opposed to password length (that is to say ), we are also able to determine that the 000webhost policy demands that passwords contain at least one digit (see Section IV).

By setting a lower threshold on we are able to specify a cutoff point below which we assume there is no constraint in place on the attribute in question. For , we have found success using a value of as this threshold (i.e. ). For example, consider that the 000webhost policy does not demand that any uppercase letters be present in passwords.

0 12366006 12366006 1.08
1 1049727 13415733 1.02
2 315637 13731370 1.02
3 267042 13998412 1.02
4 260061 14258473 1.02
5 241305 14499778 1.02
6 220202 14719980 1.01
7 187806 14907786
TABLE V: Frequencies , cumulative frequencies and multipliers of passwords containing different numbers of uppercase letters in the 000webhost set [15].

As no value in Table V is outlying above the default cutoff point of , we conclude that there was likely no constraint on minimum number of uppercase letters present in the password policy when the dataset was created.

Iv Results: Real Data

We present a set of results demonstrating the success of our approach when used to infer minimum password length specified by the policy under which 4 different data sets were created.

  • RockYou—breached in plaintext from an online gaming service of the same name circa 2009 [4]. The policy in place at the time enforced a minimum length of 5 characters, with no other constraints [8]. Contains a total of 32,603,048 passwords.

  • Yahoo—breached from the Yahoo Voice VoIP service circa 2012 [10]. The policy in place at the time of the breach enforced a minimum length of 6 characters with no other requirements [13]. Contains 453,492 passwords.

  • 000webhost—breached from the web hosting service of the same name circa 2015 [15]. The policy in place at the time of the breach enforced a minimum length of 6 characters, with at least one numeric digit [8]. Contains 15,271,208 passwords.

  • LinkedIn—breached from the professional social networking site of the same name circa 2012, the true extent of this breach was uncovered in 2016 as much bigger than was initially made public [2]. Unsalted password hashes in SHA-1 format were extracted, of which have since been cracked. It is these cracked passwords we use in this work. The policy in place at the time of the breach enforced a minimum length of 6 characters with no other requirements [13]. Contains 172,428,238 passwords.

The results that follow were produced using pol-infer—a tool we make available [11] for inferring password composition policies from large datasets using the approach we describe in Section III.

Iv-a The RockYou Set (2009)

Previous research has established that the majority of the RockYou set [4] was created under a policy enforcing minimum length with no other requirements [8].

Fig. 1: Passwords of different lengths in the RockYou set [4], plotted against .

The outlying point at in Figure 1 indicates that the password composition policy that most of the passwords in the set were created under enforces a minimum length of . This aligns with existing literature [8].

Iv-B The Yahoo Set (2012)

Previous research has established that the majority of the Yahoo set [10] was created under a policy enforcing minimum length with no other requirements [13].

Fig. 2: Passwords of different lengths in the Yahoo set [10], plotted against .

The outlying point at in Figure 2 indicates that the password composition policy that most of the passwords in the set were created under enforces a minimum length of . This aligns with existing literature [13].

Fig. 3: Passwords containing different numbers of digits in the Yahoo set [10], plotted against .

Iv-B1 Inferring the Absence of Constraints

As no points in Figure 3 are present above the default pol-infer [11] cutoff point of , the tool indicates that there was likely no constraint on minimum number of digits present in the password policy when the Yahoo dataset was created. This aligns with existing literature [13].

Iv-C The 000webhost Set (2015)

Previous research has established that the majority of the 000webhost set [15] was created under a policy enforcing minimum length with the additional requirement that passwords must contain at least one digit [8].

Fig. 4: Passwords of different lengths in the 000webhost set [15], plotted against .

The outlying point at in Figure 4 indicates that the password composition policy that most of the passwords in the set were created under enforces a minimum length of . This aligns with existing literature [8].

Fig. 5: Passwords containing different numbers of digits in the 000webhost set [15], plotted against .

The outlying point at in Figure 5 indicates that the password composition policy that most of the passwords in the set were created under enforces a minimum of digit in passwords.

Iv-D The LinkedIn Set (2016)

Previous research has established that the majority of the LinkedIn set [2] was created under a policy enforcing minimum length with no other requirements [8].

Fig. 6: Passwords of different lengths in the LinkedIn set [2], plotted against .

The outlying point at in Figure 6 indicates that the password composition policy that most of the passwords in the set were created under enforces a minimum length of . This aligns with existing literature [8].

V Results: Synthetic Data

In order to simulate the effect of some of the circumstances mentioned in Section I that could potentially create non-compliant “noise” in real-world password datasets, we created the following synthetic datasets:

  • 2word12_linkedin_padded—The LinkedIn dataset [2] filtered according to a 2word12 policy (at least 12 characters long, at least 2 letter sequences separated by a non-letter sequence) to leave 1,511,786 passwords. This has then been combined with the singles.org dataset [16] (16,248 passwords), elitehacker dataset (1000 passwords), hak5 dataset [3] (2987 passwords), and faithwriters dataset [9] (9709 passwords). This is designed to simulate intentional padding of a dataset created under one policy with several other smaller datasets in order to increase its resale value.

  • 2class8_linkedin_errors—The LinkedIn dataset [2] filtered according to 2class8 policy (at least 8 characters long, at least 2 character classes present from lowercase, uppercase, digits and symbols) to leave 65,271,156 passwords. For every password in this dataset containing either a space or a comma, this password has then been split into two or more separate strings along these tokens, leading to the creation of 404,547 additional records. This simulates the type of formatting error that might be introduced by processing scripts after the dataset has been exfiltrated.

V-a Intentional Padding

Figure 7 and Table VI show the use of our methodology to recover the original password composition policy of 2word12_linkedin_padded (2word12). The outlying points at and give us a length and word count of and respectively.

Fig. 7: Passwords of different lengths in the 2word12_linkedin_padded synthetic dataset, plotted against the multiplier required to reach the cumulative frequency of the next length .
0 2500 2500 11.18
1 25460 27960 39.39
2 1073513 1101473 1.17
3 190996 1292469 1.07
4 89916 1382385
TABLE VI: Frequencies , cumulative frequencies and multipliers of passwords containing different numbers of words in the 2word12_linkedin_padded synthetic dataset.

V-B Formatting Errors

Figure 8 and Table VII show the use of our methodology to recover the original password composition policy of 2class8_linkedin_errors (2class8). The outlying points at and give us a length and class count of and respectively.

Fig. 8: Passwords of different lengths in the 2word12_linkedin_errors synthetic dataset, plotted against .
1 591820 591820 84.87
2 49637360 50229180 1.27
3 13401629 63630809 1.03
4 2044894 65675703
TABLE VII: Frequencies , cumulative frequencies and multipliers of passwords containing different numbers of words in the 2word12_linkedin_errors synthetic dataset.

Vi Conclusion

In this work, we have demonstrated a simple, easy-to-implement methodology for inferring the password composition policy under which a password data dump was created without the need to interact with any of the parties involved in its disclosure. Once we have done this, we are able to trivially filter out non-compliant passwords if we so wish. We make pol-infer, the tool implementing this methodology that we used to produce the results in Sections IV and V, freely available [11]. We show that results obtained by this tool agree with existing literature on several real-world password datasets, and that it is effective on datasets generated to mimic those that might arise as a result of intentional padding or buggy data processing.

Limitations

While our approach is capable of approximately inferring password composition policies that place constraints on specific password attributes, it cannot offer a guarantee that the inferred policy is accurate or complete. As an example of a password composition policy rule that would be very difficult to infer, consider a rule that limits password length to a maximum of 1024 characters. As very few user-chosen passwords would be in violation of this rule even in its absence, its impact on user password choice would be very limited, making its inference very difficult.

Future work

Where time and date of account creation is available in password data dumps, it may be possible to detect with some accuracy the date and time of any password composition policy changes, offering new insight into the organisation’s internal security practices. This may require pol-infer to become more modular, acting as a framework capable of hosting different inference algorithms. Work on pol-infer is planned to make policy inference more automated and comprehensive (e.g. inference of dictionary checks), with an option to generate password composition policy names in the style used by [17]. We plan to make use of pol-infer and the methodology we propose in this work to help prepare password data for use in research into other aspects of password security, such as formally verified password composition policy enforcement software [6].

References