Web authentication widely relies on identifier-password pairs. Passwords are easy to use, but suffer from severe security flaws. Indeed, users use common passwords, paving the way to brute-force or guessing attacks Bonneau (2012). They also use similar passwords across websites Wang et al. (2018), which increases the impact of attacks. Phishing attacks are also a major threat to passwords. Over the course of a year, Thomas et al. Thomas et al. (2017) achieved to retrieve million credentials stolen by phishing kits. These flaws gave rise to multi-factor authentication Bonneau et al. (2015), such that each additional authentication factor provides an additional security barrier. However, this usually comes at the cost of usability (i.e., users have to remember, possess, or do something).
In the meantime, browser fingerprinting gains attention. The Panopticlick study Eckersley (2010) highlights the possibility to build a browser fingerprint by collecting attributes from a web browser. In addition to being widely used for web tracking purposes Englehardt and Narayanan (2016) (raising legal and ethical issues), browser fingerprints are used as an authentication factor in real-life. Browser fingerprints are indeed good a candidate as an authentication factor thanks to their distinctive power, their frictionless deployment (e.g., no additional software), and their usability (no secret to remember, no additional object to possess, and no supplementary action to carry out). As a result, companies like MicroFocus111 https://www.microfocus.com/media/white-paper/device-fingerprinting-for-low-friction-authentication-wp.pdf or SecureAuth222 https://docs.secureauth.com/pages/viewpage.action?pageId=33063454 include browser fingerprints within their authentication mechanisms (see Figure 1 for an example of such mechanism).
Related works. To the best of our knowledge, no large-scale study rigorously evaluates the adequacy of browser fingerprinting as an authentication factor. Most works about their use for authentication concentrate on the design of authentication mechanism Unger et al. (2013); Preuveneers and Joosen (2015); Laperdrix et al. (2019); Rochet et al. (2019), and empirical studies on browser fingerprints focus on their efficacy as a web tracking tool Eckersley (2010); Laperdrix et al. (2016); Gómez-Boix et al. (2018). Such a mismatch between the understanding of browser fingerprints for authentication – currently poor – and their ongoing adoption in real-life is a serious harm to the security of web users. The lack of documentation from the existing tools (e.g., about the used attributes, the distinctiveness, and the stability of the resulting fingerprints) only adds up to the current state of ignorance. All this whereas security-by-obscurity contradicts the most fundamental security principles.
Our contributions. We conduct the first large-scale data-centric empirical study of fundamental properties of browser fingerprints when used as an additional authentication factor. We base our findings on an in-depth analysis of a real-life fingerprint dataset collected over months, that contains fingerprints composed of attributes. We formalize, and assess on our dataset, the properties necessary for paving the way to elaborate browser fingerprinting authentication mechanisms. The selected properties are usually used to evaluate biometric characteristics for authentication Maltoni et al. (2003). We stress that we do not make any assumption on the inner working of the authentication mechanism, and consequently on the adversarial strategy. Our properties aim at characterizing the adequacy and practicability of browser fingerprints, independently of their use within future authentication mechanisms. In particular, we measure the size of browser anonymity sets through time, and show that % of our fingerprints are unique. Moreover, we measure the proportion of identical attributes between two observations of the fingerprint of a browser, and show that % of attributes remains unchanged after nearly months. Finally, we measure the collection time and the size of fingerprints. We show that on average, they weight a dozen of kilobytes, and are collected in a few seconds.
2 Authentication Factor Properties
The “Handbook of Fingerprint Recognition” Maltoni et al. (2003) summarizes properties that a biometric characteristic requires to be usable333 Here, usable refers to the adequacy of the characteristic to be used for authentication, rather than the ease of use by users. as an authentication factor, and additional properties required for a biometric authentication scheme to be practical. We make the connection between fingerprints used to recognize persons, and browser fingerprints used to recognize browsers. So we evaluate browser fingerprints according to these properties, to assert their adequacy for web authentication. In this section, we list these properties, formalize how to measure some properties, and explain why the others are not addressed in this study.
The four properties needed for an anatomical or a behavioral characteristic to be usable as a biometric authentication factor are described below.
Universality: the characteristic should be present in everyone.
Distinctiveness: two distinct persons should have different characteristics.
Permanence: the same person should have the same characteristic over time. We rather use the term stability.
Collectibility: the characteristic should be collectible and measurable.
The three properties that a biometric authentication scheme requires to be practical are the following.
Performance: the scheme should consume few resources, and be robust against environmental changes.
Acceptability: the users should accept to use the scheme in their daily lives.
Circumvention: it should be difficult for an attacker to deceive the scheme.
To satisfy the distinctiveness property, browser fingerprints should enable two different browsers to be distinguishable. The distinctiveness depends on the used attributes, and on the fingerprinted browser population. The two extreme cases are every browser sharing the same fingerprint, which makes them indistinguishable from each other, and no two browsers sharing the same fingerprint, making every browser distinguishable.
Our dataset entries are composed of a fingerprint, the source browser, and the time of collection in the form of a Unix timestamp in milliseconds. We denote the domain of the unique identifiers (UIDs), and the timestamp domain. The fingerprint dataset is denoted and is formalized as:
We use the size of browser anonymity sets to quantify the distinctiveness, as browsers belonging to the same anonymity set are indistinguishable. We denote a function returning the set of browsers that provide the fingerprint in the dataset . It is formalized as:
We denote a function providing the set of fingerprints having an anonymity set size of (i.e., being shared by browsers) in the dataset . It is formalized as:
We measure the anonymity set sizes on the fingerprints currently in use by each browser, and not on their whole history. It is performed by simulating datasets composed of the last fingerprint seen for each browser at a given time. Let be the simulated dataset originating from that represents the state of the fingerprints after days. With the last timestamp of this day, we have:
Browser fingerprints have the particularity of evolving through time, due to changes in the web environment like a software update or a user configuration. We measure the stability by the mean similarity between two consecutive fingerprints coming from a browser, given the elapsed time between them. The two extreme cases are every browser holding the same fingerprint through its life, and the fingerprint of a browser changing completely at each observation.
We denote a function providing the set of consecutive fingerprints in that are separated by a time difference comprised in the time range. It is formalized as:
We consider the Kronecker delta , being if equals , and otherwise. We denote the value taken by the attribute for the fingerprint . Let be a simple similarity function between fingerprints, formalized as:
We define the function providing the mean similarity of the consecutive fingerprints, for a given time range and a dataset , as:
To evaluate the performance of browser fingerprints used in an authentication context, we consider three aspects. The first two are the consumption of time and memory resources. The third is the loss of efficacy (i.e., distinctiveness and stability) among device types.
The size of fingerprints depends on their storage format. For example, a canvas Mowery and Shacham (2012) image can be encoded as a base64 string or as a hash. We stress that compressing the complete fingerprint to a single hash is unpractical due to the evolution of fingerprints. The size of attributes is not specified, hence we measure the size of the fingerprints of our dataset.
Previous works showed that mobile and desktop devices present differences in the properties of their browser fingerprints Spooren et al. (2015); Laperdrix et al. (2016); Gómez-Boix et al. (2018). Mobile browsers usually have less distinctive fingerprints. Following these findings, we assess that the distinctiveness and the stability of the fingerprints of these two groups are similar.
3 Fingerprint Dataset
To study the properties of browser fingerprints on a real-world browser population, we launched a fingerprint collection experiment. It was performed in collaboration with the authors of Gómez-Boix et al. (2018), and an industrial partner that controls one of the top French websites according to Alexa555 https://www.alexa.com/topsites/countries/FR . The authors of Gómez-Boix et al. (2018) held the attributes of their previous work Laperdrix et al. (2016) and focused on web tracking, whereas we held attributes and focused on web authentication.
3.1 Fingerprint Collection
Previous datasets were collected through dedicated websites, and are biased towards privacy-aware and technically-skilled persons Eckersley (2010); Laperdrix et al. (2016). Our population is more general audience oriented, but the website audience is mainly French-speaking users. This leads to a bias towards this population. The timezone is set to for % of browsers, % of them have daylight saving time enabled, and fr is present in % of the Accept-Language HTTP header value.
|PTC Eckersley (2010)||AIU Laperdrix et al. (2016)||HITC Gómez-Boix et al. (2018)||This study|
|Collection period||3 weeks||3-4 months*||6 months||6 months|
|Number of attributes||8||17||17||216|
|Number of browsers||-||-||-||1,989,365|
|Number of fingerprints||470,161||118,934||2,067,942||4,145,408|
|Number of distinct fingerprints||409,296||142,023666 This number is provided in Figure as the distinct fingerprints, but also corresponds to the raw fingerprints. Every fingerprint would be unique if the number of distinct and collected fingerprints are equal, hence we are not confident in this number, but it is the one provided by the authors.||-||3,578,196|
|Proportion of desktop fingerprints||-||0.890*||0.879||0.805|
|Proportion of mobile fingerprints||-||0.110*||0.121||0.134|
|Unicity of global fingerprints||0.836||0.894||0.336||0.818|
|Unicity of mobile fingerprints||-||0.810||0.185||0.399|
|Unicity of desktop fingerprints||-||0.900||0.357||0.884|
3.2 Dataset Filtering and Preprocessing
Given the experimental aspect of fingerprints and the scale of our collection, the raw dataset contained erroneous or irrelevant samples. We remove entries entries that have a wrong format (e.g., empty or truncated data), that are duplicated, or that come from a robot.
Cookies are an unreliable identification method, hence we perform a resynchronization similar to Eckersley (2010). We consider the entries that have the same (fingerprint, IP address hash) pair to come from the same browser, and assign them the same UID. Similarly to Eckersley (2010), we do not synchronize the interleaved UIDs, being the pairs that have UID values , , then again. We replace UIDs with replacement UIDs using this method.
To avoid counting multiple entries of identical fingerprints coming from the same browser, the usual way is to ignore them during collection Eckersley (2010); Laperdrix et al. (2016). Our probe collects fingerprint on each visit, and to stay consistent with common methodologies we deduplicate the fingerprints afterward. For each browser, we hold the first entry having a given fingerprint, and ignore the following entries if they have this fingerprint. For example, if a browser has the entries , we only hold the entries . The deduplication constitutes the biggest cut in our dataset, with entries filtered out.
We extract additional attributes from original attributes, which are of two types. The first type consists in extracted attributes composed of parts of original attributes, like the screen resolution that is split into the values of width and height. The second type consists of information sourced from an original attribute, like the number of plugins extracted from the list of plugins.
3.3 Work Dataset
The work dataset obtained after the preprocessing step contains entries (comprising identical fingerprints for a given browser if interleaved), with fingerprints (no identical fingerprint counted for the same browser), composed of distinct fingerprints. The fingerprints are composed of original attributes and extracted ones, for a total of attributes. They come from browsers, % of which have multiple fingerprints. Table 1 presents a comparison between the dataset of Panopticlick Eckersley (2010), AmIUnique Laperdrix et al. (2016), Hiding in the Crowd Gómez-Boix et al. (2018), and this study.
4 Empirical Evaluation of Browser Fingerprints Properties
Figure 2 presents the size of the anonymity sets (AS) alongside the frequency of browser arrival for the daily-partitioned datasets. We call unicity rate the proportion of fingerprints that belong to an AS of size one. Our fingerprints have a stable unicity rate of approximately % on the long run, and at least % of fingerprints are shared by browsers or less. However, the fingerprints of the mobile group are more uniform than that of the desktop group, with a unicity rate of approximately % on the long run.
New browsers are encountered continually, but starting from the th day, the arrival frequency stabilizes around new browsers per day. Before this stabilization, we have a variable arrival frequency with some major spikes. They seem to correspond to events having happened in France that lead to more visits. For example, the spike on the th day corresponds to a live political debate on TV, and the spike on the rd day correlates with the announcement of a cold snap.
Figure 3 presents the proportion of unique fingerprints through partitioned datasets for overall, mobile, and desktop groups. The proportion of unique fingerprints is stable for the desktop browsers, with a slight increase of points from the th day to the th, from % to %. The unicity rate of the fingerprints of the mobile group is lower than that of the desktop group, and has a little decrease of points on the same period, from % to %.
Two fingerprints of a given browser can be linked as only a small portion of the attributes is expected to change, even after several months. Figure 4 displays the mean similarity between consecutive fingerprints in function of the time difference. The ranges are expressed in days, so that day on the x-axis represents the fingerprints separated by days. We ignore the comparisons of time ranges having less than pairs, or with a time difference higher than the limit of our experiment ( days), which account for less than % of each category. Our stability results are a lower bound, as consecutive fingerprints are necessarily different (i.e., their similarity is strictly lower than ).
We have a total of compared pairs for the overall group, pairs for the desktop group, and pairs for the mobile group. A fingerprint is expected to have at least % of its attributes having an identical value after days. Few attributes among those included in our script are highly unstable. Getting rid of these attributes could reduce the distinctiveness of fingerprints, but would improve their stability.
The fingerprints of the mobile group are generally more stable than these of the desktop group, as suggests their respective similarity curve. However, it seems to be the case only for high time differences, as in the range of days difference, the fingerprints of the mobile group are slightly less stable than these of the desktop group with a mean similarity of % against %.
Time Resource Consumption
Our script takes several seconds to collect the attributes composing the fingerprints. Figure 5
displays the cumulative distribution of the collection time of fingerprints in seconds, with the outliers removed. We measure it by the time difference between the starting of the script and the fingerprint sending. Some values take from several hours to days, that can come from a web page put in background or accessed after a long time. We limit our population to the fingerprints that take less thanseconds to collect, and consider higher values as outliers. Outliers account for less than % of each group.
We present the collection time of fingerprints in seconds for the (th percentile, median, th percentile). Our script collects most fingerprints within a few seconds, with values (, , ). A difference occurs between the desktop browsers (, , ) and the mobile browsers (, ,
). The median collection time is less than the estimated median time taken by web pages to load completely777 https://httparchive.org/reports/loading-speed#ol , being at seconds for the desktop browsers and seconds for the mobile browsers, at the date of February 1, 2020.
Memory Resource Consumption
Our script consumes a dozen of kilobytes per fingerprint, a size easily handled by the current storage and bandwidth capacities. Figure 6 displays the cumulative distribution of fingerprint size in bytes, with the outliers removed, and canvases stored as sha256 hashes. The mean fingerprint size is
bytes, and the standard deviation is. We remove fingerprint from a desktop browser considered an outlier because of its size being greater than .
Half of our fingerprints take less than bytes, and % less than kilobytes. It is negligible given the current storage and bandwidth capacities. We observe a difference between the fingerprints of mobile and desktop browsers, with % of fingerprints weighing respectively less than bytes and bytes. This is due to heavy attributes being lighter on mobiles, like the plugins or mime types lists that are most of the time empty.
5 Synthesis of Results and Conclusion
In this study, we evaluate the properties offered by browser fingerprints as an additional web authentication factor, through the analysis of a large-scale real-life fingerprint dataset. We show that browser fingerprints offer a satisfying distinctiveness, as % of our fingerprints are only shared by one browser. Moreover, fingerprints are stable. At least % of the attributes are expected to stay identical between two observations of the fingerprint of a browser, even if they are separated by nearly months. We validate that fingerprints offer a high performance, as they only weight a dozen of kilobytes and take a few seconds to collect. We conclude that browser fingerprints provide satisfying properties for an additional web authentication factor, and can strengthen password-based systems without a major loss of usability.
Acknowledgements.We want to thank Benoît Baudry and David Gross-Amblard for their valuable comments, together with Alexandre Garel for his work on the experiment.
- Passwords and the Evolution of Imperfect Authentication. Communications of the ACM. Cited by: §1.
- The science of guessing: analyzing an anonymized corpus of 70 million passwords. In Symposium on Security and Privacy, Cited by: §1.
- How unique is your web browser?. In Privacy Enhancing Technologies, Cited by: §1, §1, §3.1, §3.1, §3.2, §3.2, §3.3, Table 1.
- Online Tracking: A 1-million-site Measurement and Analysis. In Conference on Computer and Communications Security, Cited by: §1.
- Hiding in the Crowd: an Analysis of the Effectiveness of Browser Fingerprinting at Large Scale. In The Web Conference, Cited by: §1, §2.3, §3.3, §3, Table 1.
- Morellian analysis for browsers: making web authentication stronger with canvas fingerprinting. In Conference on Detection of Intrusions and Malware & Vulnerability Assessment, Cited by: §1, §2.
- Beauty and the Beast: Diverting modern web browsers to build unique browser fingerprints. In Symposium on Security and Privacy, Cited by: §1, §2.3, §3.1, §3.1, §3.2, §3.3, §3, Table 1.
- Handbook of fingerprint recognition. Springer-Verlag. Cited by: §1, §2.
- Pixel perfect: fingerprinting canvas in html5. In W2SP, Cited by: §2.3.
- SmartAuth: Dynamic Context Fingerprinting for Continuous User Authentication. In Annual Symposium on Applied Computing, Cited by: §1.
- SWAT: seamless web authentication technology. In The World Wide Web Conference, Cited by: §1.
- Mobile Device Fingerprinting Considered Harmful for Risk-based Authentication. In European Workshop on System Security, Cited by: §2.3.
- Data breaches, phishing, or malware?: understanding the risks of stolen credentials. In Conference on Computer and Communications Security, Cited by: §1.
- SHPF: Enhancing HTTP(S) Session Security with Browser Fingerprinting. In 2013 International Conference on Availability, Reliability and Security, pp. 255–261. External Links: Cited by: §1.
- The next domino to fall: empirical analysis of user passwords across online services. In Conference on Data and Application Security and Privacy, Cited by: §1.