"Guess Who ?" Large-Scale Data-Centric Study of the Adequacy of Browser Fingerprints for Web Authentication

05/19/2020 ∙ by Nampoina Andriamilanto, et al. ∙ Irisa 0

Browser fingerprinting consists in collecting attributes from a web browser to build a browser fingerprint. In this work, we assess the adequacy of browser fingerprints as an authentication factor, on a dataset of 4,145,408 fingerprints composed of 216 attributes. It was collected throughout 6 months from a population of general browsers. We identify, formalize, and assess the properties for browser fingerprints to be usable and practical as an authentication factor. We notably evaluate their distinctiveness, their stability through time, their collection time, and their size in memory. We show that considering a large surface of 216 fingerprinting attributes leads to an 81.8 fingerprints are known to evolve, but we observe that between consecutive fingerprints, more than 90 months. Fingerprints are also affordable. On average, they weight a dozen of kilobytes, and are collected in a few seconds. We conclude that browser fingerprints are a promising additional web authentication factor.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Web authentication widely relies on identifier-password pairs. Passwords are easy to use, but suffer from severe security flaws. Indeed, users use common passwords, paving the way to brute-force or guessing attacks Bonneau (2012). They also use similar passwords across websites Wang et al. (2018), which increases the impact of attacks. Phishing attacks are also a major threat to passwords. Over the course of a year, Thomas et al. Thomas et al. (2017) achieved to retrieve  million credentials stolen by phishing kits. These flaws gave rise to multi-factor authentication Bonneau et al. (2015), such that each additional authentication factor provides an additional security barrier. However, this usually comes at the cost of usability (i.e., users have to remember, possess, or do something).

In the meantime, browser fingerprinting gains attention. The Panopticlick study Eckersley (2010) highlights the possibility to build a browser fingerprint by collecting attributes from a web browser. In addition to being widely used for web tracking purposes Englehardt and Narayanan (2016) (raising legal and ethical issues), browser fingerprints are used as an authentication factor in real-life. Browser fingerprints are indeed good a candidate as an authentication factor thanks to their distinctive power, their frictionless deployment (e.g., no additional software), and their usability (no secret to remember, no additional object to possess, and no supplementary action to carry out). As a result, companies like MicroFocus111 https://www.microfocus.com/media/white-paper/device-fingerprinting-for-low-friction-authentication-wp.pdf or SecureAuth222 https://docs.secureauth.com/pages/viewpage.action?pageId=33063454 include browser fingerprints within their authentication mechanisms (see Figure 1 for an example of such mechanism).

Figure 1: Simplified browser fingerprinting web authentication mechanism.

Related works. To the best of our knowledge, no large-scale study rigorously evaluates the adequacy of browser fingerprinting as an authentication factor. Most works about their use for authentication concentrate on the design of authentication mechanism Unger et al. (2013); Preuveneers and Joosen (2015); Laperdrix et al. (2019); Rochet et al. (2019), and empirical studies on browser fingerprints focus on their efficacy as a web tracking tool Eckersley (2010); Laperdrix et al. (2016); Gómez-Boix et al. (2018). Such a mismatch between the understanding of browser fingerprints for authentication – currently poor – and their ongoing adoption in real-life is a serious harm to the security of web users. The lack of documentation from the existing tools (e.g., about the used attributes, the distinctiveness, and the stability of the resulting fingerprints) only adds up to the current state of ignorance. All this whereas security-by-obscurity contradicts the most fundamental security principles.

Our contributions. We conduct the first large-scale data-centric empirical study of fundamental properties of browser fingerprints when used as an additional authentication factor. We base our findings on an in-depth analysis of a real-life fingerprint dataset collected over  months, that contains  fingerprints composed of  attributes. We formalize, and assess on our dataset, the properties necessary for paving the way to elaborate browser fingerprinting authentication mechanisms. The selected properties are usually used to evaluate biometric characteristics for authentication Maltoni et al. (2003). We stress that we do not make any assumption on the inner working of the authentication mechanism, and consequently on the adversarial strategy. Our properties aim at characterizing the adequacy and practicability of browser fingerprints, independently of their use within future authentication mechanisms. In particular, we measure the size of browser anonymity sets through time, and show that % of our fingerprints are unique. Moreover, we measure the proportion of identical attributes between two observations of the fingerprint of a browser, and show that % of attributes remains unchanged after nearly  months. Finally, we measure the collection time and the size of fingerprints. We show that on average, they weight a dozen of kilobytes, and are collected in a few seconds.

The rest of the paper is organized as follows. Section 2 presents and formalizes the properties evaluated in our analysis. Section 3 describes the dataset analyzed in this study. Section 4 presents our experimental results. Finally, Section 5 synthesizes the results and concludes.

2 Authentication Factor Properties

The “Handbook of Fingerprint Recognition” Maltoni et al. (2003) summarizes properties that a biometric characteristic requires to be usable333 Here, usable refers to the adequacy of the characteristic to be used for authentication, rather than the ease of use by users. as an authentication factor, and additional properties required for a biometric authentication scheme to be practical. We make the connection between fingerprints used to recognize persons, and browser fingerprints used to recognize browsers. So we evaluate browser fingerprints according to these properties, to assert their adequacy for web authentication. In this section, we list these properties, formalize how to measure some properties, and explain why the others are not addressed in this study.

The four properties needed for an anatomical or a behavioral characteristic to be usable as a biometric authentication factor are described below.

  • Universality: the characteristic should be present in everyone.

  • Distinctiveness: two distinct persons should have different characteristics.

  • Permanence: the same person should have the same characteristic over time. We rather use the term stability.

  • Collectibility: the characteristic should be collectible and measurable.

The three properties that a biometric authentication scheme requires to be practical are the following.

  • Performance: the scheme should consume few resources, and be robust against environmental changes.

  • Acceptability: the users should accept to use the scheme in their daily lives.

  • Circumvention: it should be difficult for an attacker to deceive the scheme.

The properties that we study are the distinctiveness, the stability, and the performance. We consider that the universality and the collectibility are satisfied, as the HTTP headers that are automatically sent by browsers constitute a fingerprint. However, we stress that a loss of distinctiveness occurs when no JavaScript attribute is available. About the circumvention, we refer the reader to Laperdrix et al. Laperdrix et al. (2019) that analyzed the security of an authentication mechanism based on browser fingerprints. We let the evaluation of the acceptability as future works, but we stress that such mechanisms are already used in a rudimentary form444 https://support.google.com/accounts/answer/1144110 .

2.1 Distinctiveness

To satisfy the distinctiveness property, browser fingerprints should enable two different browsers to be distinguishable. The distinctiveness depends on the used attributes, and on the fingerprinted browser population. The two extreme cases are every browser sharing the same fingerprint, which makes them indistinguishable from each other, and no two browsers sharing the same fingerprint, making every browser distinguishable.

Our dataset entries are composed of a fingerprint, the source browser, and the time of collection in the form of a Unix timestamp in milliseconds. We denote  the domain of the unique identifiers (UIDs), and  the timestamp domain. The fingerprint dataset is denoted  and is formalized as:

(1)

We use the size of browser anonymity sets to quantify the distinctiveness, as browsers belonging to the same anonymity set are indistinguishable. We denote  a function returning the set of browsers that provide the fingerprint  in the dataset . It is formalized as:

(2)

We denote  a function providing the set of fingerprints having an anonymity set size of  (i.e., being shared by  browsers) in the dataset . It is formalized as:

(3)

We measure the anonymity set sizes on the fingerprints currently in use by each browser, and not on their whole history. It is performed by simulating datasets composed of the last fingerprint seen for each browser at a given time. Let  be the simulated dataset originating from  that represents the state of the fingerprints after  days. With  the last timestamp of this day, we have:

(4)

2.2 Stability

Browser fingerprints have the particularity of evolving through time, due to changes in the web environment like a software update or a user configuration. We measure the stability by the mean similarity between two consecutive fingerprints coming from a browser, given the elapsed time between them. The two extreme cases are every browser holding the same fingerprint through its life, and the fingerprint of a browser changing completely at each observation.

We denote  a function providing the set of consecutive fingerprints in  that are separated by a time difference comprised in the  time range. It is formalized as:

(5)

We consider the Kronecker delta , being  if  equals , and  otherwise. We denote  the value taken by the attribute  for the fingerprint . Let  be a simple similarity function between fingerprints, formalized as:

(6)

We define the function  providing the mean similarity of the consecutive fingerprints, for a given time range  and a dataset , as:

(7)

2.3 Performance

To evaluate the performance of browser fingerprints used in an authentication context, we consider three aspects. The first two are the consumption of time and memory resources. The third is the loss of efficacy (i.e., distinctiveness and stability) among device types.

The collection time of fingerprints only depend on JavaScript attributes, as HTTP headers are transmitted anyway. So we measure the collection time of our fingerprints composed of  JavaScript attributes.

The size of fingerprints depends on their storage format. For example, a canvas Mowery and Shacham (2012) image can be encoded as a base64 string or as a hash. We stress that compressing the complete fingerprint to a single hash is unpractical due to the evolution of fingerprints. The size of attributes is not specified, hence we measure the size of the fingerprints of our dataset.

Previous works showed that mobile and desktop devices present differences in the properties of their browser fingerprints Spooren et al. (2015); Laperdrix et al. (2016); Gómez-Boix et al. (2018). Mobile browsers usually have less distinctive fingerprints. Following these findings, we assess that the distinctiveness and the stability of the fingerprints of these two groups are similar.

3 Fingerprint Dataset

To study the properties of browser fingerprints on a real-world browser population, we launched a fingerprint collection experiment. It was performed in collaboration with the authors of Gómez-Boix et al. (2018), and an industrial partner that controls one of the top  French websites according to Alexa555 https://www.alexa.com/topsites/countries/FR . The authors of Gómez-Boix et al. (2018) held the attributes of their previous work Laperdrix et al. (2016) and focused on web tracking, whereas we held attributes and focused on web authentication.

3.1 Fingerprint Collection

We compiled  JavaScript properties and  HTTP header fields, and designed a fingerprinting probe that collects these attributes. We integrated the probe to two general audience web pages of our industrial partner, which subjects are political news and weather. The probe collected fingerprints from December , , to June , . Only the visitors that consented to cookies were fingerprinted, in compliance with the European directives 2002/58/CE and 2009/136/CE in effect at the time. To differentiate browsers, we assigned them a unique identifier (UID) as a -months cookie. Similarly to Eckersley (2010); Laperdrix et al. (2016), we coped with cookie deletion by storing a one-way hash of the IP address, computed by a secure cryptographic hash function.

Previous datasets were collected through dedicated websites, and are biased towards privacy-aware and technically-skilled persons Eckersley (2010); Laperdrix et al. (2016). Our population is more general audience oriented, but the website audience is mainly French-speaking users. This leads to a bias towards this population. The timezone is set to for % of browsers, % of them have daylight saving time enabled, and fr is present in % of the Accept-Language HTTP header value.

PTC Eckersley (2010) AIU Laperdrix et al. (2016) HITC Gómez-Boix et al. (2018) This study
Collection period 3 weeks 3-4 months* 6 months 6 months
Number of attributes 8 17 17 216
Number of browsers - - - 1,989,365
Number of fingerprints 470,161 118,934 2,067,942 4,145,408
Number of distinct fingerprints 409,296 142,023666 This number is provided in Figure  as the distinct fingerprints, but also corresponds to the raw fingerprints. Every fingerprint would be unique if the number of distinct and collected fingerprints are equal, hence we are not confident in this number, but it is the one provided by the authors. - 3,578,196
Proportion of desktop fingerprints - 0.890* 0.879 0.805
Proportion of mobile fingerprints - 0.110* 0.121 0.134
Unicity of global fingerprints 0.836 0.894 0.336 0.818
Unicity of mobile fingerprints - 0.810 0.185 0.399
Unicity of desktop fingerprints - 0.900 0.357 0.884
Table 1: Dataset comparison between Panopticlick, AmIUnique, Hiding in the Crowd, and this study. - denotes missing information. * denotes deduced information. The attributes only comprises original ones, and fingerprints are counted after data preprocessing.

3.2 Dataset Filtering and Preprocessing

Given the experimental aspect of fingerprints and the scale of our collection, the raw dataset contained erroneous or irrelevant samples. We remove  entries entries that have a wrong format (e.g., empty or truncated data), that are duplicated, or that come from a robot.

Cookies are an unreliable identification method, hence we perform a resynchronization similar to Eckersley (2010). We consider the entries that have the same (fingerprint, IP address hash) pair to come from the same browser, and assign them the same UID. Similarly to Eckersley (2010), we do not synchronize the interleaved UIDs, being the pairs that have UID values , , then again. We replace  UIDs with replacement UIDs using this method.

To avoid counting multiple entries of identical fingerprints coming from the same browser, the usual way is to ignore them during collection Eckersley (2010); Laperdrix et al. (2016). Our probe collects fingerprint on each visit, and to stay consistent with common methodologies we deduplicate the fingerprints afterward. For each browser, we hold the first entry having a given fingerprint, and ignore the following entries if they have this fingerprint. For example, if a browser has the entries , we only hold the entries . The deduplication constitutes the biggest cut in our dataset, with entries filtered out.

We extract  additional attributes from  original attributes, which are of two types. The first type consists in extracted attributes composed of parts of original attributes, like the screen resolution that is split into the values of width and height. The second type consists of information sourced from an original attribute, like the number of plugins extracted from the list of plugins.

3.3 Work Dataset

The work dataset obtained after the preprocessing step contains  entries (comprising identical fingerprints for a given browser if interleaved), with  fingerprints (no identical fingerprint counted for the same browser), composed of  distinct fingerprints. The fingerprints are composed of  original attributes and  extracted ones, for a total of  attributes. They come from  browsers, % of which have multiple fingerprints. Table 1 presents a comparison between the dataset of Panopticlick Eckersley (2010), AmIUnique Laperdrix et al. (2016), Hiding in the Crowd Gómez-Boix et al. (2018), and this study.

4 Empirical Evaluation of Browser Fingerprints Properties

4.1 Distinctiveness

Figure 2 presents the size of the anonymity sets (AS) alongside the frequency of browser arrival for the daily-partitioned datasets. We call unicity rate the proportion of fingerprints that belong to an AS of size one. Our fingerprints have a stable unicity rate of approximately % on the long run, and at least % of fingerprints are shared by  browsers or less. However, the fingerprints of the mobile group are more uniform than that of the desktop group, with a unicity rate of approximately % on the long run.

Figure 2: Anonymity set sizes and frequency of browser arrivals through partitioned datasets obtained after each day.

New browsers are encountered continually, but starting from the th day, the arrival frequency stabilizes around  new browsers per day. Before this stabilization, we have a variable arrival frequency with some major spikes. They seem to correspond to events having happened in France that lead to more visits. For example, the spike on the th day corresponds to a live political debate on TV, and the spike on the rd day correlates with the announcement of a cold snap.

Figure 3: Proportion of unique fingerprints for overall, mobile, and desktop groups, through partitioned datasets obtained after each day.

Figure 3 presents the proportion of unique fingerprints through partitioned datasets for overall, mobile, and desktop groups. The proportion of unique fingerprints is stable for the desktop browsers, with a slight increase of  points from the th day to the th, from % to %. The unicity rate of the fingerprints of the mobile group is lower than that of the desktop group, and has a little decrease of  points on the same period, from % to %.

4.2 Stability

Two fingerprints of a given browser can be linked as only a small portion of the attributes is expected to change, even after several months. Figure 4 displays the mean similarity between consecutive fingerprints in function of the time difference. The ranges  are expressed in days, so that day  on the x-axis represents the fingerprints separated by  days. We ignore the comparisons of time ranges having less than  pairs, or with a time difference higher than the limit of our experiment ( days), which account for less than % of each category. Our stability results are a lower bound, as consecutive fingerprints are necessarily different (i.e., their similarity is strictly lower than ).

Figure 4: Mean similarity between consecutive fingerprints in function of the time difference, with the number of compared pairs.

We have a total of  compared pairs for the overall group,  pairs for the desktop group, and  pairs for the mobile group. A fingerprint is expected to have at least % of its attributes having an identical value after  days. Few attributes among those included in our script are highly unstable. Getting rid of these attributes could reduce the distinctiveness of fingerprints, but would improve their stability.

The fingerprints of the mobile group are generally more stable than these of the desktop group, as suggests their respective similarity curve. However, it seems to be the case only for high time differences, as in the range of  days difference, the fingerprints of the mobile group are slightly less stable than these of the desktop group with a mean similarity of % against %.

4.3 Performance

Time Resource Consumption

Our script takes several seconds to collect the attributes composing the fingerprints. Figure 5

displays the cumulative distribution of the collection time of fingerprints in seconds, with the outliers removed. We measure it by the time difference between the starting of the script and the fingerprint sending. Some values take from several hours to days, that can come from a web page put in background or accessed after a long time. We limit our population to the fingerprints that take less than

 seconds to collect, and consider higher values as outliers. Outliers account for less than % of each group.

Figure 5: Cumulative distribution of the collection time of fingerprints in seconds.

We present the collection time of fingerprints in seconds for the (th percentile, median, th percentile). Our script collects most fingerprints within a few seconds, with values (, , ). A difference occurs between the desktop browsers (, , ) and the mobile browsers (, ,

). The median collection time is less than the estimated median time taken by web pages to load completely

777 https://httparchive.org/reports/loading-speed#ol , being at  seconds for the desktop browsers and  seconds for the mobile browsers, at the date of February 1, 2020.

Memory Resource Consumption

Our script consumes a dozen of kilobytes per fingerprint, a size easily handled by the current storage and bandwidth capacities. Figure 6 displays the cumulative distribution of fingerprint size in bytes, with the outliers removed, and canvases stored as sha256 hashes. The mean fingerprint size is

 bytes, and the standard deviation is

. We remove  fingerprint from a desktop browser considered an outlier because of its size being greater than .

Figure 6: Cumulative distribution of the size of fingerprints in bytes.

Half of our fingerprints take less than  bytes, and % less than  kilobytes. It is negligible given the current storage and bandwidth capacities. We observe a difference between the fingerprints of mobile and desktop browsers, with % of fingerprints weighing respectively less than  bytes and  bytes. This is due to heavy attributes being lighter on mobiles, like the plugins or mime types lists that are most of the time empty.

5 Synthesis of Results and Conclusion

In this study, we evaluate the properties offered by browser fingerprints as an additional web authentication factor, through the analysis of a large-scale real-life fingerprint dataset. We show that browser fingerprints offer a satisfying distinctiveness, as % of our fingerprints are only shared by one browser. Moreover, fingerprints are stable. At least % of the attributes are expected to stay identical between two observations of the fingerprint of a browser, even if they are separated by nearly  months. We validate that fingerprints offer a high performance, as they only weight a dozen of kilobytes and take a few seconds to collect. We conclude that browser fingerprints provide satisfying properties for an additional web authentication factor, and can strengthen password-based systems without a major loss of usability.

Acknowledgements.
We want to thank Benoît Baudry and David Gross-Amblard for their valuable comments, together with Alexandre Garel for his work on the experiment.

Bibliography

  • J. Bonneau, C. Herley, P. C. van Oorschot, and F. Stajano (2015) Passwords and the Evolution of Imperfect Authentication. Communications of the ACM. Cited by: §1.
  • J. Bonneau (2012) The science of guessing: analyzing an anonymized corpus of 70 million passwords. In Symposium on Security and Privacy, Cited by: §1.
  • P. Eckersley (2010) How unique is your web browser?. In Privacy Enhancing Technologies, Cited by: §1, §1, §3.1, §3.1, §3.2, §3.2, §3.3, Table 1.
  • S. Englehardt and A. Narayanan (2016) Online Tracking: A 1-million-site Measurement and Analysis. In Conference on Computer and Communications Security, Cited by: §1.
  • A. Gómez-Boix, P. Laperdrix, and B. Baudry (2018) Hiding in the Crowd: an Analysis of the Effectiveness of Browser Fingerprinting at Large Scale. In The Web Conference, Cited by: §1, §2.3, §3.3, §3, Table 1.
  • P. Laperdrix, G. Avoine, B. Baudry, and N. Nikiforakis (2019) Morellian analysis for browsers: making web authentication stronger with canvas fingerprinting. In Conference on Detection of Intrusions and Malware & Vulnerability Assessment, Cited by: §1, §2.
  • P. Laperdrix, W. Rudametkin, and B. Baudry (2016) Beauty and the Beast: Diverting modern web browsers to build unique browser fingerprints. In Symposium on Security and Privacy, Cited by: §1, §2.3, §3.1, §3.1, §3.2, §3.3, §3, Table 1.
  • D. Maltoni, D. Maio, A. K. Jain, and S. Prabhakar (2003) Handbook of fingerprint recognition. Springer-Verlag. Cited by: §1, §2.
  • K. Mowery and H. Shacham (2012) Pixel perfect: fingerprinting canvas in html5. In W2SP, Cited by: §2.3.
  • D. Preuveneers and W. Joosen (2015) SmartAuth: Dynamic Context Fingerprinting for Continuous User Authentication. In Annual Symposium on Applied Computing, Cited by: §1.
  • F. Rochet, K. Efthymiadis, F. Koeune, and O. Pereira (2019) SWAT: seamless web authentication technology. In The World Wide Web Conference, Cited by: §1.
  • J. Spooren, D. Preuveneers, and W. Joosen (2015) Mobile Device Fingerprinting Considered Harmful for Risk-based Authentication. In European Workshop on System Security, Cited by: §2.3.
  • K. Thomas, F. Li, A. Zand, J. Barrett, J. Ranieri, L. Invernizzi, Y. Markov, O. Comanescu, V. Eranti, A. Moscicki, et al. (2017) Data breaches, phishing, or malware?: understanding the risks of stolen credentials. In Conference on Computer and Communications Security, Cited by: §1.
  • T. Unger, M. Mulazzani, D. Frühwirt, M. Huber, S. Schrittwieser, and E. Weippl (2013) SHPF: Enhancing HTTP(S) Session Security with Browser Fingerprinting. In 2013 International Conference on Availability, Reliability and Security, pp. 255–261. External Links: Document Cited by: §1.
  • C. Wang, S. T. Jan, H. Hu, D. Bossart, and G. Wang (2018) The next domino to fall: empirical analysis of user passwords across online services. In Conference on Data and Application Security and Privacy, Cited by: §1.