DeepAI
Log In Sign Up

SSOPrivateEye: Timely Disclosure of Single Sign-On Privacy Design Differences

The number of login options on websites has increased since the introduction of web single sign-on (SSO) protocols. SSO services allow users to grant websites or relying parties (RPs) access to their personal profile information from identity provider (IdP) accounts. When prompting users to select an SSO login option, many websites do not provide any privacy information that could help users make informed choices. Moreover, privacy differences in permission requests across available login options are largely hidden from users and are time consuming to manually extract and compare. In this paper, we present an empirical study of popular RP implementations supporting three major IdP login options (Facebook, Google, and Apple) and categorize RPs in the top 300 sites into four client-side code patterns. Our findings suggest a relatively uniform distribution in three code patterns. We select RPs in one of these patterns as target sites for the design and implementation of SSOPrivateEye (SPEye), a browser extension prototype that extracts comparative data on SSO login options in RPs covering the three IdPs. Our evaluation of SPEye demonstrates the viability of extracting privacy information that can inform SSO login choices in the majority of our target sites.

READ FULL TEXT VIEW PDF

page 4

page 8

03/03/2021

Exploring Privacy Implications in OAuth Deployments

Single sign-on authentication systems such as OAuth 2.0 are widely used ...
01/24/2019

OAuthGuard: Protecting User Security and Privacy with OAuth 2.0 and OpenID Connect

Millions of users routinely use Google to log in to websites supporting ...
05/03/2019

Characterising Third Party Cookie Usage in the EU after GDPR

The recently introduced General Data Protection Regulation (GDPR) requir...
11/19/2018

Anonymous Single Sign-on with Proxy Re-Verification

An anonymous Single Sign-On (ASSO) scheme allows users to access multipl...
09/16/2020

(Un)clear and (In)conspicuous: The right to opt-out of sale under CCPA

The California Consumer Privacy Act (CCPA)—which began enforcement on Ju...
04/08/2022

Gone Quishing: A Field Study of Phishing with Malicious QR Codes

The COVID-19 pandemic enabled "quishing", or phishing with malicious QR ...

1. Introduction

Single Sign-On (SSO) systems are widely used, including by many popular web applications. Relying party (RP) sites that rely on SSO services such as “Sign in with Google” authenticate users by integrating their apps to identity providers (IdPs) such as Google or Facebook. This is convenient for users who can log into multiple services using a single account and manage only a single set of credentials. OAuth 2.0 (oauth2rfc), a standard authorization protocol, enables SSO users to authenticate and grant RPs API access to personal information from their IdP accounts. User data accessed through these APIs enable RPs to offer additional functionality (e.g., managing personal calendars). These APIs raise privacy concerns (balash2022security) as RPs could gain access to extensive user data (potentially built up by IdP account use over several years) typically without fully explaining or justifying the requested access to users. The API access is not limited to the time of login or the active use of RP site; it can include user data from the past, and can extend into silent continuous monitoring of future activities.

To engage a wider set of users, sites often offer SSO login options from more than one identity provider. The choice of login option can impact the user’s privacy differently depending on three primary factors. First, each provider exposes APIs giving access to user data that are relevant to its own app ecosystem. For example, Google offers the Gmail API for email retrieval by RPs; Facebook offers APIs for apps to access a user’s photos and videos. Second, API access is granted for broadly specified user data types and the API does not restrict access to data from a specific time range. This means that the amount of user data an API releases to the site depends on user factors such as how long ago the user’s account was created and how much user activity is associated with the account. And third, user privacy depends on the actual user data APIs a site is designed to access through the SSO option chosen by the user.

The number of login choices offered by RP sites has steadily increased over the past decade (jarpehult2022longitudinal). Most sites that offer multiple SSO logins request different types of personal data from each SSO, often with one option more privacy-friendly than others (morkonda2021empirical). It is also observed that RPs offer fewer but more privacy-friendly SSO options to users in the EU compared to non-EU users, likely due to stricter privacy laws such as the GDPR (morkonda2021empirical). These privacy differences are often not visible as current SSO UIs are designed to only inform the user about the one SSO option they select. Current UI workflows do not support easy comparison of requested data between the login choices offered by RP sites. In cases where there are multiple alternate (especially more privacy-friendly) SSO options, users might make choices that reveal more information than desired. Moreover, a manual comparison requires the user to first complete login with each SSO option before seeing the requested permissions.

In this work, we design and build a browser extension to provide a privacy comparison when users are presented with one or more SSO logins. Specifically, our extension lists all the data resources a site requests through each SSO login. This privacy comparison is compiled and presented to users as they navigate to a site’s login page, and particularly before the user commits to a login choice. Our work makes the following contributions:

  • We identify through empirical extraction, and explain, four software patterns used by popular sites to implement SSO logins. For each of the four patterns, we provide strategies allowing a tool to automatically extract RP site data useful for privacy and security analysis.

  • We design and build SSOPrivateEye (SPEye) as a browser extension for one of the patterns, offering real-time privacy information to users (when prompted to make a login choice) by comparing user data a site requests through different IdPs.

    • This provides privacy-related comparative information, up-to-date with respect to the current relying party site API that extracts user information from identity providers.

    • The approach takes into account privacy differences in RP versions shown to users in locations where different privacy laws apply.

  • We evaluate our approach by testing its effectiveness on a fresh set of target RP sites. The results show that our approach achieves a reasonable success rate in extracting privacy comparison from a variety of RP sites.

We plan to make our Chrome extension code open-source. We believe that it can be expanded to identify security weaknesses and warn users about risks prior to committing to login decisions.

In this work, we demonstrate the usefulness of our prototype by building it to recognize SSO logins with the top three identity providers: Facebook, Google, and Apple. We designed the tool such that additional IdPs (that use standard SSO protocols) can be added with minimal effort. Our design is limited to RP sites that initiate SSO requests from HTML code. While our current design does not consider SSO requests generated exclusively in JavaScript, we discuss the challenges involved and provide strategies for increasing the prototype’s coverage (to different RP implementations) in a future version.

2. Background and motivation

Our privacy tool extracts information on SSO user data a given RP intends to access through each SSO login option listed on the site. This section provides background on OAuth-based SSO protocols and describes the privacy issues that motivate our work.

2.1. SSO protocol background

Single Sign-On protocols commonly supported by major identity providers (Google, Facebook, and Apple) include OAuth-based protocols. The OAuth 2.0 (oauth2rfc) authorization protocol is designed to give an RP delegated access (within a specified scope) to user data in an IdP account. OpenID Connect 1.0 (openIdConnect1Spec) is an adaptation of OAuth 2.0 designed specifically for user authentication. Although they are different specifications, they are closely related and both are offered by major IdPs. We use the following definitions to describe the protocols.

The user is the owner of the data that is access-controlled. The OAuth 2.0 specification refers to this data as protected resources and the user as the resource owner. The identity provider or IdP is responsible for authenticating the resource owner and maintaining access control to the protected resources. The relying party or RP is a site or an app requesting access to protected resources. It relies on the identity provider to perform user authentication and return access tokens that grant access to protected resources.

OAuth 2.0

OAuth 2.0 (oauth2rfc) is an authorization protocol for granting RPs access to user resources protected by an IdP without disclosing the user’s credentials to the RP. This is achieved through access tokens which are confidential strings issued by an IdP to allow the RP to access protected resources. Access tokens are obtained by executing one of several authorization procedures called OAuth grant types or flows. OAuth 2.0 has four main flows designed to support authorization from different resource owners such as SSO users and app services. RP sites that use OAuth for user SSO login primarily follow the authorization code flow or the implicit flow. Although the two flows differ in security properties (oauth2rfc), extracting protocol data from these flows involves an identical approach for our work. For this reason, we describe only the authorization code flow (Figure 1) in more detail and highlight shared features of these flows relevant to our work.

Figure 1. Overview of OAuth 2.0 authorization code flow.

In Step

1
, the user selects SSO login at the RP site which initiates the flow by sending an authorization request. This includes several query parameters that inform the IdP about the RP’s request to access user data. In particular, the scope parameter specifies the data resources (i.e., IdP APIs) the RP wants to access. Other relevant parameters include a client ID (unique string issued to the RP site during registration with the IdP), a grant type (to indicate the OAuth flow), and a redirect URI to specify the endpoint to which the IdP should redirect the user agent once the request has been authorized (or denied) by the user.

In Step

2
, the RP redirects the user agent to the IdP’s authorization server endpoint where the user is prompted to login with their IdP credentials. Then, the user is asked (in Step

3
) if they want to grant the RP access to the requested data resources. Once the user completes login at the IdP (with or without granting access), the IdP redirects the user agent (Step

4
) back to the RP along with a fresh authorization code value. Before redirection to the RP, the IdP must verify that the redirection URI specified in the initial request matches a value provided by the RP during its registration with the IdP. This ensures that the authorization codes are delivered to the correct RP endpoint.

As a final step, the RP needs to exchange the authorization code for an access token. In Step

5
, the RP sends the authorization code (along with an optional client password for RP authentication) to the IdP’s token exchange endpoint where the request is verified. If the authorization code (and the client password) is valid, the IdP issues a fresh access token and returns it to the RP site (Step

6
). The obtained access token grants the RP access to data resources approved by the user. It is also possible for the RP to obtain fresh access tokens and extend previously granted access by exchanging a refresh token without involving the user for each new exchange.

OpenID Connect 1.0

The OAuth protocol was originally designed as an authorization protocol, but through custom changes RPs could use OAuth to perform user authentication in SSO login. The OpenID Connect 1.0 (openIdConnect1Spec) specification (OIDC) was developed as an authentication extension to OAuth 2.0. It introduces the ID token, a value issued by an IdP (also referred to as an OpenID Provider) to convey information about a user’s identity to the RP. Common data returned in an ID token include a unique identifier for the user, an expiry time for the token, and an IdP identifier. A standard ID token uses the JSON Web Token (JWT) (jwtrfc) data structure to represent a digitally signed JSON object that contains verifiable claims about an authentication event. Standard claims about the user include basic profile info such as name, email address, and profile picture. ID tokens enable the RP to verify a user’s identity based on the claims returned by the IdP. The claims contained in the ID tokens could reveal sensitive personal information depending on specific IdP implementations.

OIDC extends OAuth 2.0 to define a set of six OIDC flows for RPs to obtain ID tokens, and access tokens when requested. Similar to OAuth 2.0, each OIDC flow begins with an initial authentication request from RP to IdP with the response_type parameter indicating the flow type. Authentication requests use the same parameters as authorization requests in OAuth 2.0 but with special values. For example, the OIDC specification uses the scope parameter with the value “openid” to indicate an authentication request. Each flow then redirects the user agent to the IdP’s login page where the user is prompted to grant the RP access to basic information such as their name and email address. If the user consents, the IdP redirects the user agent back to the RP (at the specified redirect URI). Depending on the OIDC flow, the IdP might redirect to the RP with an authorization code (e.g., as in the authorization code flow) or directly return the ID token (e.g., in the implicit flow).

Google (IdP) authorization form
(b) Google (IdP) authorization form
(c) Facebook (IdP) authorization form
(d) Insufficient permissions error
(a) Rakuten.com (RP) login
Figure 6. Lack of transparent information at time of (a) user login prompt. When signing into Rakuten.com with (b) Google or (c) Facebook, the user is informed about permission requests only for the selected choice (typically after the user has committed to using that SSO). In (d), an attempt to login using Facebook without revealing the email address raises an insufficient permissions error on the RP site. Note the lack of justification on why data access is needed; to access this information, a user would need to navigate to and search a secondary page such as the RP’s privacy policy.
(a) Rakuten.com (RP) login

2.2. Privacy issues in current SSO UI design

Our work aims to address SSO design-related privacy issues associated with RP requests to access IdP user data. In the discussion below, we first describe the typical UI design used in SSO procedures to inform users about permission requests. We then highlight privacy issues in this design that motivate our work.

As observed in recent studies (jarpehult2022longitudinal; morkonda2021empirical), many RPs offer multiple SSO options to support users across popular IdPs, including Facebook, Google, and Apple. User data accessed through OAuth APIs is useful for RPs wanting to give a more personalized user experience and offer extended functionality. For example, an RP site might autofill forms using personal information from the IdP or offer tools to edit photos in the user’s IdP photo library. In some cases, this API data is optional, meaning that the RP site can function as designed when a user authenticates using SSO login but denies access to a subset of requested data. In other sites, RP services such as sending emails on behalf of the user require access to the user’s IdP data (e.g., email account). This distinction between the user’s IdP data that is essential and data that is optional for provisioning RP services is often unclear based on information given to inform users. We illustrate this with the sample SSO UI below.

A typical RP login form lists one or more SSO login choices along with traditional username and password fields for non-SSO (i.e., site-specific) accounts. If a user picks an SSO login, their browser window redirects to the IdP where they are prompted to enter their IdP account credentials (if they are not already signed in with the IdP). Then, the IdP informs the user of the data access requested by the RP. Figure (a)a highlights the UI design in a typical SSO login flow from the RP to the individual IdPs. For SSO choices in Figure (a)a(a), the figure illustrates the UI prompts (Figure (a)a(b) and (c)) where information on permission requests is displayed. This design raises two main questions about the transparency of permission requests:

  • Is the access necessary to use the service? Both the RP and IdP UIs lack information on whether the requested access is essential for the RP to offer its services. For example, the Google login dialog in Figure (a)a(b) informs users that Rakuten.com wants access to the user’s email messages but it is not clear whether this data is required for the site to function properly. Note that it is also possible to use the site with the Facebook (or the Apple) login option which only require the user to disclose their name and email address (Figure (a)a(c)). The secondary prompt in the Facebook permissions UI (in Figure (a)a(b)) suggests an opt-out option for the email address. However, we find that an attempt to login without revealing the email address raises an error (Figure (a)a(d)) on the RP site asking the user to retry using their email address.

  • Which login option is more privacy friendly? The second issue is the lack of visibility across the available SSO choices. The OAuth APIs exposed by an IdP depend on type of user data it stores and the IdP’s privacy policy for sharing user data with RPs. Users may not be aware of data requested by other SSO choices as they are only informed (after completing authentication) about the permissions requested with the SSO option they select. If a user is aware that an alternate SSO choice reveals less personal data compared to other login options, they might choose differently. In current SSO UI design, the user would need to login with each SSO choice and manually compare the permission requests to be fully informed about the privacy choices. This is time consuming as it could require entering credentials (and completing two-factor verification where enabled) with each IdP login.

We highlight that at the key decision point of selecting an SSO option, which occurs in Figure (a)a(a), the user lacks the knowledge necessary to make an informed comparison of options. These privacy issues can lead to SSO users making privacy decisions without full information. Our work aims to address this through a browser-based extension that automatically extracts and compares authorization requests across the available SSO login options. It provides the comparison before the user commits to a particular login choice. Our comparison addresses the second issue above by enabling users to identify which options are better aligned with their privacy preferences, thus offering better control over their privacy when using SSO logins.

Addressing the first issue is more challenging because it is difficult to predict at which point in its service workflow an RP might request extra permissions. For example, an RP that requests only the user’s name from the IdP might at a later point (in an RP UI, as Figure (a)a(d)) ask the user to enter their email address. The RP can also make extra permission requests after the user has completed login with an IdP, in a subsequent UI unrelated to login. While our tool does not directly address this issue, we hope that our comparison before the login prompt increases the user’s ability to make an informed choice.

3. Our Study of Empirical client-side SSO code patterns

RP implementations of OAuth and OIDC protocols differ significantly across RP sites, and can include variations within an RP site for the implementation of individual IdP login options. For example, some RPs use an IdP-provided SDK to exchange protocol messages with the IdP while others use custom code to manage the OAuth flows (either JavaScript code or through backend RP server code such as Java or PHP). These variations complicate automated security and privacy analysis of OAuth systems.

Understanding the variations in RP implementations is important for developing tools that address privacy issues discussed in Section 2.2. To guide the design of our SPEye tool (described in Section 4), we examined the implementations of initial RP authorization (in OAuth 2.0) and authentication (in OIDC) requests111For simplicity, we refer to requests in both OAuth 2.0 and OIDC as initial requests. in the top 300 sites of the Tranco (lepochat2019tranco) list. We focused our empirical analysis on the RPs that implemented login with Facebook, Google, and Apple since these are the most common SSO providers. We found 101 RPs using SSO services of these IdPs, and identified four client-side code patterns for RP implementations of the initial requests: 26% HTML-based; 36% JavaScript-based; 35% IdP SDK-based; and 3% used a combination of the three patterns.

Authorization requests are triggered by the user, typically by clicking a button or link from the RP site. As described in Section 2.1, authorization requests redirect users to the IdP with several protocol parameters relevant to security and privacy analysis. Identifying recurring RP code patterns can be used to guide the design of automated tools such as ours as they help build coverage requirements. Next we describe the four patterns that we identified based on our empirical study. We also discuss ideas for automatically extracting protocol data from sites in each pattern.

1) HTML-based SSO. In RP sites with this code pattern, the initial requests for each SSO login are embedded directly into SSO-related HTML elements. An example of this pattern is shown in Listing 3. When the user selects an SSO login button, a request is sent to the link in the element’s href attribute. In most sites, we observed that this link leads to the RP server code which responded with an HTTP 302 code to redirect the user agent to the IdP endpoint. A small number of sites included the IdP link and the OAuth parameters directly in HTML; to avoid vulnerabilities, the OAuth state parameter must not be reused across requests (it should be a non-guessable nonce, to protect against CSRF attacks (oauth2rfc)).

[ht] [bgcolor=codebg,frame=single,fontsize=]html ¡div class=”sso-logins”¿ ¡a id=”sso-google” href=”https://example.com/sso/google”¿ ¡div¿Sign in with Google¡/div¿ ¡/a¿ ¡/div¿ Sample code that implements SSO in HTML.

In our dataset of 101 RPs, we found 26 implementations with initial requests in the HTML code. We observed one RP, ok.ru which directly included the IdP endpoint in HTML; all other RPs implemented authorization requests through redirects from their backend servers. These authorization requests can be extracted by finding the initial requests and sending GET requests to the RP. For each valid request, the backend server returns a response to redirect to the IdP endpoint. The OAuth protocol parameters are included in the redirection URL as query strings. We distinguished requests to endpoints of each targeted IdP using a set of URLs associated with each IdP. We identified implementations with this code pattern by scanning the HTML of RP login pages for <href> tags and endpoints to RP and IdP servers.

[t] [bgcolor=codebg,frame=single,fontsize=]html ¡div class=”sso-logins”¿ ¡button id=”sso-fb” value=”Login with Facebook” onclick=”sso()”¿ ¡/button¿ ¡/div¿ ¡script¿ function sso() req = new XMLHttpRequest(); req.open(”POST”, ”https://example.com/sso”) req.send(’ssoWith=facebook’); ¡/script¿ Sample code that implements SSO in JavaScript.

2) JavaScript-based SSO. In this code pattern (illustrated in Listing 3), DOM events such as mouse clicks trigger RP JavaScript code that generates initial requests for SSO options in the RP login page. We found that most sites sent the initial requests to their backend servers before redirecting to the IdP endpoints. This design could be useful when the RP wants to initialize and maintain per-user state information (separate from the OAuth state parameter) in its backend server. If the RP functionality is fully front-end implemented, this implies that the authorization request is implemented using JavaScript with no backend server communications. The OAuth state parameter (returned by the IdP after login) allows the RP’s callback code to link the IdP response to the initial request. Scanning implementations of this pattern by analyzing (but not executing) loaded HTML page elements is challenging as JavaScript code often includes values that are combined from multiple variables interpreted during runtime.

We found that 36 of 101 implementations sent the initial requests from JavaScript code. Although we did not exhaustively search every RP script (from code visible at the client), we found that many sites do not include authorization parameters in JavaScript code. Instead the parameters are located in server responses to dynamically constructed requests. In these implementations, extracting information (including protocol parameters) related to authorization requests may not be possible solely by searching code visible at the client (e.g., searching for pattern matches to static strings); rather it might require executing the RP scripts using browser automation tools such as Selenium 

(seleniumWebDriver). However, this is not suitable in tools for users (such as ours) as it may increase latency or require custom setup of various tools designed for expert users.

To identify implementations with this pattern, we monitored the network traffic after clicking each SSO button on the RP’s login page. If a request is sent to the RP’s server, and if the link is not embedded in HTML, we searched the JavaScript in RP HTML pages for traces of the request URL. This pattern can also be identified by searching for listener functions (e.g., specified in onclick attributes) that are triggered when an SSO-related element is clicked.

3) IdP’s SDK-based SSO. IdP services often provide software development kits (SDKs) for RPs to integrate their SSO services. As shown in Listing 3, RP implementations that use these SDKs import the SDK library into their app and manage authorization requests and responses through the library functions. Although these libraries are designed to make it easier to integrate IdP services, these SDKs often make implicit security assumptions that might not be understood by RP developers (wang2013explicating). These libraries also contain functions that allow the IdP to provide a consistent user experience across different RP and IdP sites. For example, the Google Sign-In library for JavaScript apps (googleApiAuth) offers the “One Tap” feature which allows the RP’s landing page to include Google’s popup overlay for prompting the user to login using Google.

[!ht] [bgcolor=codebg,frame=single,fontsize=]html ¡div class=”sso-logins”¿ ¡button id=”sso-fb” value=”Login with Facebook” onclick=”sso()”¿ ¡/button¿ ¡script¿ function ssoFB() FB.login(function(response) // handler function for IdP response after login , scope: ’user_friends,user_likes’); ¡/script¿ ¡/div¿ ¡script src=”https://connect.facebook.net/en_US/sdk.js”¿¡/script¿ Sample implementation using Facebook SDK (facebookLoginSDK).

We found that 35 of 101 RPs used IdP SDKs to manage authorization requests. Unlike the other code patterns, all these SDK-using sites send requests directly to the IdP. Therefore, the authorization request parameters (specified as arguments to the SDK functions) are available in RP’s JavaScript code. Analyzing the RP’s client-side code could help to identify SDK function calls. Searching and extracting these parameters might be simple if the arguments are included directly in the function calls. However, if the arguments are formed by combining other variables, extracting them might require more advanced methods such as data-flow analysis. These implementations can be identified by searching for the presence of IdP libraries which are typically imported into HTML using <script> tags.

4) Mixed SSO. We found three sites in our dataset that implemented the initial requests using more than one pattern. We observed that etsy.com implemented SSO with Google and Facebook using IdP SDKs. Although Apple offers its own SDK (appleSignInDocumentation), the site had implemented Apple SSO by including the IdP URL directly in its HTML.222We confirmed that the IdP link contained different state values on each visit. We are not sure why sites might choose to implement SSO options using different patterns. Perhaps some developers (adding a new IdP option) simply prefer a different code pattern than colleagues who implemented previous options.

To analyze implementations with mixed patterns, the first step is to identify the code patterns (e.g., using criteria similar to ours) used in the site. Then, a combination of analysis techniques might be necessary to scan the RP’s HTML page for OAuth parameters.

4. SSO-Private-Eye tool (SPEye)

SPEye is our prototype extension for the Chrome browser that aims to inform users about the privacy implications of using SSO to log into an RP by providing a comparison of the available SSO choices. In this section, we present its design and implementation.

4.1. Scope overview

In Section 3, we identified four client-side implementation patterns that RP sites use to integrate SSO services. This dataset reveals a relatively uniform distribution of RP sites across three patterns. For practicality, SPEye targets only HTML-based RP implementations. SPEye provides a foundation from which to learn, and to build tools that extract comparative information about privacy (extendable to security) implications of SSO services for display to users.

4.2. Design requirements

We designed SPEye to scan (after page loading) a given site’s HTML page for SSO services and generate comparative information for available SSO login options. This choice to implement SPEye as a browser extension is motivated by three main design requirements: real-time comparison, user-location-specific comparison and usability.

Real-time comparison

By extracting comparative information at the time of login prompt, SPEye gives users an up-to-the-minute privacy comparison of the available SSO choices. RP sites often update the SSO services they offer. For example, after the introduction of “Sign in with Apple” in 2019, Apple quickly overtook Twitter as the third most popular IdP (jarpehult2022longitudinal). User tools that aim to inform SSO users about security and privacy practices should ideally use recent data to take into account recent RP changes. An informational website could offer a similar comparison by regularly collecting and maintaining an RP dataset; this however would not be “in-flow” with the user’s current focus on visiting a target RP.

User-location-specific comparison

Some RPs present different versions of their sites, e.g., within different countries, and request varying amounts of user information through the individual SSO login options (morkonda2021empirical). This practice might be a result of complying with regions which have stricter privacy laws such as the EU’s GDPR and CCPA in the US. Privacy tools that collect and analyze RP data from a central location might offer limited insight as RP privacy policies could differ from one location to another. SPEye uses local scripts within the user’s browser application, and therefore extracts information from the RP site relative to the user’s current location.

Usability

We designed SPEye to run its analysis without navigating the user away from the current RP page to minimize disruption to their workflow. Tools that are not targeted for end-users might not be subject to this constraint. Several previous tools targeted for researchers (e.g., (yang2016model; ghasemisharif2018single; morkonda2021empirical; jarpehult2022longitudinal)) have instead extracted the protocol data using browser automation tools to open a new window and simulate user actions in the SSO workflow. SPEye’s constraints differ for two reasons: (a) our tool is meant for end-users, so disrupting the user’s view of the RP page (e.g., for IdP redirection) would limit usability, and (b) our tool does not require simulating the entire OAuth login flow as it only needs to extract the initial authorization request to a specified IdP.

4.3. Implementation approach

To guide SPEye’s implementation, we used 13 of the 26 HTML-based RP sites (Section 3) as the training set. As discussed later in Section 5, we use the remaining 13 RP sites as the testing set. Relying only on a subset of sites during development allows us to test the effectiveness of SPEye using a fresh set of sites that may or may not be covered by our approach.

Heuristics

SPEye relies on heuristics to identify HTML pages that contain SSO login options and URLs that point to IdP authorization servers. We developed these based on our dataset in Section 

3. A heuristics-based approach is similarly used by comparable tools (e.g., (zhou2014ssoscan)) to identify SSO login buttons. SPEye identifies SSO login options by searching the DOM for elements with attribute values that match specific CSS selector patterns. Our list of match patterns is not exhaustive, but instead serves as a proof-of-concept illustrating the feasibility of SPEye’s approach to extract protocol data from a variety of RP sites. We describe these RP differences and our approach in the next subsection.

Figure 7. Overview of SPEye architecture and its workflow.

4.4. Browser extension

Figure 7 shows the high-level interactions between SPEye’s components and an RP website stack. SPEye contains three components: (a) a popup script (with UI) triggered when the user clicks on the browser extension icon to open it, (b) a content script that runs on the visited HTML page when the interface is opened, and (c) a background script that monitors browser traffic for redirection requests to IdP endpoints. SPEye processes the HTML page only when the extension interface is opened. This prevents SPEye from interfering with user-triggered login requests during which the popup interface remains closed. It also eliminates unnecessary background processing of all the pages the user visits (while the extension is active) and limits performance impact. The following tasks are performed each time the SPEye interface is opened (Step

1
in Figure 7).

A) Search RP HTML page. Using the Chrome runtime API,333https://developer.chrome.com/docs/extensions/mv3/messaging/ the popup script sends a message to the content script (Step

2
) to search the current HTML page for SSO login options. The content script uses CSS selector strings (see Appendix A.1) to identify potential matches for SSO buttons, login forms and login URLs (Step

3
). When a matching element is found, the script searches its attributes (e.g., href, onclick) for URLs to RP and IdP servers. In the common match case (as observed in our dataset), the script extracts the URL from an href attribute and makes an XMLHttpRequest for each matching element (Step

4
).

If a link is not found, the matching element could be part of a HTML form, identified by the presence of attribute types such as “input” and “submit”. For these matches, the script searches the DOM to identify its parent form element and extracts the path to which the form will be submitted along with other form data linked to a specific SSO choice. As an example of form data observed in our dataset, selecting login with Facebook on fandom.com involved a custom attribute value="facebook".

The content script then makes an XMLHttpRequest to submit the form with custom attributes (for each SSO login) along with other internal parameters (e.g., nonce values) in the form (Step

4
).

B) Extract authorization request parameters. When an RP server accepts the SSO request initiated by the content script, the server responds with an HTTP redirect code (Step

5
). SSO design requires RPs to redirect the user to the IdP endpoint with authorization request parameters in the URL. SPEye’s background script uses the Chrome webRequest API444https://developer.chrome.com/docs/extensions/reference/webRequest/ to observe traffic with HTTP redirect codes (Step

6
). Each redirection URL is compared against a set of regular expressions (listed in Appendix A.2) derived from IdP authorization server URLs. If the observed redirection URL is to a known IdP server, the background script copies and sends the URL to the popup script for further processing (Step

7
).

Although the extension’s background scripts are always active (e.g., listening), these redirect URLs are only received by the popup script when the SPEye user interface is opened by the user. Other redirects such as user triggered SSO logins, during which the SPEye interface remains closed, are unaffected by the background script.

theguardian.com is an example we use to illustrate the type of typical variations we found across RP implementations. We observed RPs that redirect through multiple endpoints before sending the user agent to the IdP server. For example, theguardian.com’s backend server responded to SPEye’s SSO requests by returning a redirect to an intermediate (RP) endpoint which then returned with the IdP redirection URL. In such instances, SPEye follows the sequence of redirects to observe the eventual IdP endpoint.

nytimes.com. This RP’s backend server responded to SPEye as expected for Facebook and Google, but provided a non-redirect response for Apple. Although the server responses were different, the three SSO options were implemented identically by the site’s front-end. As a result, SPEye’s comparison on this site (and on another site with similar implementation) is limited to two of the three SSO login options.

C) Display privacy comparison. SPEye’s popup (Figure 8) is the main UI where the privacy comparison of RP’s SSO login options is presented. When the popup script receives an authorization request URL, it extracts the values in the scope parameter. We reviewed IdP documentation to map each scope value to descriptive text explaining the associated permissions. Where scope values from different IdPs refer to the same category of user information (e.g., user’s IdP profile info), we modify the text provided by the IdPs to provide a consistent description. In the final step, SPEye’s popup shows a privacy comparison of data attribute descriptions for each IdP login choice identified by the content script.

Figure 8. SPEye’s UI showing a privacy comparison of SSO login options on imdb.com. To get such info without SPEye, the user must login with each SSO option to view and, manually collect and compare the personal data requested by each SSO. Items not marked mandatory can be opted-out at the IdP login prompt. In the case of Apple SSO, SPEye indicates an IdP privacy feature (not shown in this image but available on IdP UI) that allows the user to modify/hide information released to the RP.

4.5. User Interface design

Figure 8 shows SPEye’s popup UI containing a comparison of SSO options for a given RP’s login page. For each SSO login, the UI lists the information requested by the RP through the IdP’s user data APIs. For a subset of attributes, SPEye’s description informs about the amount of data that will be released to the RP if the associated SSO choice is selected. For example, if the user’s IdP profile privacy settings is set to only reveal their age, the birthday API might only reveal the user’s year of birth to the RP. By enabling comparison across SSO choices, SPEye provides users with information to make informed decisions aligned with their privacy preferences. This design can also inform users about specific privacy features. For example, Sign in with Apple555https://support.apple.com/en-us/HT210425 offers Apple users the option to hide their email address by generating a unique per-RP email address which relays emails between the RP and the user’s email inbox.

5. Evaluation

We now evaluate SPEye’s effectiveness in extracting comparative information from different RP sites. The training set of our HTML-based SSO implementations was used to devise heuristics and here we test SPEye’s effectiveness on the testing set. We evaluated SPEye on a VM running Ubuntu 22.04 and Chrome version 102.0.5005.

SPEye’s approach avoids false matches

SPEye uses specific match patterns (Appendix A.1) to identify SSO login buttons. A more generalised search criteria (e.g., identify all elements with one or more attributes containing the string “facebook”) could lead to wider coverage but also false matches and unintended activity on the user’s IdP profile. For instance, many sites include non-SSO related buttons (with attributes similar to SSO buttons) such as an option to share a post to the user’s IdP profile. To prevent SPEye from making unintended requests, we restrict match patterns to SSO login buttons. In our testing, we did not find any false matches that led to SPEye making non-SSO related requests. After evaluating SPEye using the testing set, we built in additional match patterns observed in the testing set to improve its coverage.

SPEye misses

SPEye failed to extract data in two of 13 sites from the testing set. We now discuss these two cases in more detail.

unsplash.com. Although SPEye extracted and sent the initial request to the correct URL, the RP’s backend server returned a non-redirect response. Closer inspection of the site’s HTML page reveals <meta> tags containing custom CSRF token parameters (owaspCSRFinDOM). Extracting these is beyond scope in the current version of SPEye as they might be added to the login requests from the RP’s JavaScript code. However, to confirm our findings, we modified SPEye to resubmit the request that included these parameters, and the RP server responded with a redirect request for the selected choice.

themeforest.net. SPEye’s requests to this RP’s backend server were initially blocked. To identify why, we compared a request sent by SPEye with the same request triggered through the SSO button on the RP page. The requests were identical except for two HTTP request headers (sec-fetch-dest and sec-fetch-mode) typically added automatically by the browser. The values in these headers indicate the request’s mode to the server to inform if the request was initiated by a user clicking a link. SPEye cannot modify these parameters as they are read-only,666https://developer.mozilla.org/en-US/docs/Web/API/Request/mode and managed by the browser.

SPEye coverage

Our goal in using the testing set is to evaluate SPEye’s ability to extract protocol data from a variety of implementations. While SSO button identification is part of SPEye’s workflow, we focus less on its ability to automatically identify SSO buttons as our main evaluation criteria is protocol data extraction. To expand SPEye’s button coverage, additional match patterns can be added by scanning SSO buttons in sites beyond the 101 RPs in this study.

With additional match patterns for identifying SSO buttons, SPEye successfully extracted data from 11 of 13 sites in the testing set. For four of these, the existing match patterns (based on the first 13 sites) were sufficient for SPEye to identify SSO buttons. Extra match patterns were needed for the remaining seven test sites. Appendix A.1 lists these match patterns.

In total, after including match patterns from both the training and the testing sets, SPEye’s approach to extract authorization requests succeeded in 22 of 26 target RP sites. SPEye also partially covered one of the four remaining sites by extracting protocol data from two of three SSO options. These results demonstrate that it is feasible—given the correct match patterns for SSO buttons—to extract authorization requests that allow privacy comparison of SSO login options.

SPEye limitations

Although SPEye achieves reasonable success in target RP implementations, the current prototype has limitations:

  • SPEye’s targets are limited to HTML-based code. While it is thus not a general-purpose tool that provides data across all RPs, as a proof-of-concept we believe the findings in Section 3 provide the basis from which to build an open-source tool capable of handling a variety of RP sites.

  • In our dataset of the 101 SSO-supporting RPs (in top 300 sites) (Section 3), we found 23 unique IdPs. SSO comparison in our current prototype covers the top three IdPs. This does not provide a full comparison for sites that offer login with other IdPs (which might request more or less personal data). It is difficult to cover all IdPs; for example, there are at least 81 OAuth providers on a Wikipedia IdP list (wikipediaOAuthProviders). We expect future versions of SPEye to support additional IdPs for a wider comparison of SSO choices.

  • SPEye’s extraction of privacy information shows what data an RP requests, but cannot determine how the requested data is used. Users might want to weigh the asserted purpose before granting access to personal data. While an RP’s privacy policy might offer information on intended use, it is difficult to ascertain actual usage (or even intent) without access to RP systems or developers.

6. Discussion

This section discusses additional privacy considerations relating to SSO and how these complement SPEye’s goals. We also discuss a potential security extension to our work.

Permissions requested after initial login

SPEye scans a given RP’s login page to extract the permissions requested through the individual SSO login options. This approach provides comparison of permissions requested at the initial login prompt (where permissions are not usually visible to users), but does not include data the RP might request at a later time (e.g., after user login). Once the user has completed login with an IdP, it is possible for the RP to request additional permissions. A privacy-unfriendly RP might request additional permissions gradually at various points in its workflow (escaping SPEye’s scans), perhaps across many different interactions with the RP, leading to a form of ‘permission creep’ — a progressive increase in permissions — as the user performs different tasks. The UI workflow for these extra requests explicitly prompts the user to grant or deny the additional permissions ‘just-in-time’, without having to complete login or make a new login choice.

IdP guidelines (e.g., (facebookOauthPermissions)) suggest RPs should request additional permissions in the context where the data is needed. Although this design aims to give users better control over what data is requested and how it might be used, it withholds from users overall context about how much other data is already being collected or how this new permission fits into this overall picture. In a type of dark pattern (narayanan2020dark), an RP could request minimal information during the initial login (and thus appear privacy-friendly in SPEye) and then gradually request additional permissions, hoping that the user will not realize the extent of data collected, or will feel compelled to stick with their decision having invested time and effort into it.

SPEye provides comparative information necessary to make an informed initial login choice and offers an example of the type of information that users need to make informed decisions. Future research may explore how to keep users informed throughout their relationship with an RP.

RP handling of opted-out permissions

Except for basic information such as the user’s name and profile picture, major IdPs (e.g., Google and Facebook) classify other permissions as ‘optional’ and allow users to decline such permissions requested by an RP. The apparent intent by the IdPs is to give users control over their data. However, as described in Section 

2.2 and Figure (a)a(d), some RP sites use tactics to persuade users to grant permissions despite being labelled as optional on the IdP’s login prompts (if users want to use the given IdP). Although RPs may have legitimate uses for the data, such workarounds suggest the potential presence of a misleading or ‘dark’ pattern (narayanan2020dark) in RP implementations by bypassing the IdP’s intended privacy controls. SPEye is unable to detect such RP workarounds because they occur outside of the SSO login interaction. Conflicting cues in existing SSO UIs (e.g., whether access to specific user data is mandatory (or not) for successful login at the RP site), as visible in Figure (a)a(c) and (d), may confuse users and consequently reduce user trust in the privacy of SSO systems. Tools like SPEye can help clarify which user data items are available for opt-out at the IdP UIs.

SSO privacy in mobile apps

The OAuth 2.0 spec (oauth2rfc) originally targeted web apps and is less suited for implementation in mobile apps. The mobile ecosystem is sufficiently different that it has its own issues separate from web implementations. Platform differences in mobile OSs make it challenging to securely implement OAuth 2.0 in mobile apps. A 2014 study (chen2014oauth) of mobile apps found a significant number of vulnerable RP implementations primarily caused by custom solutions for storing or delivering secrets. For example, they found that due to an absence of secure redirection mechanism (in iOS and Android) from an IdP site (on a mobile browser) to an RP mobile app, access tokens could not be delivered safely. Furthermore, UI constraints in mobile apps require a novel approach to inform mobile users. Our work analyzed web SSO implementations and built SPEye to automatically extract protocol data from RPs in desktop web browsers; addressing the parallel but unique problem in the mobile ecosystem and with mobile apps as RPs is equally valuable but beyond our current scope.

SPEye-like tool to inform security decisions

SPEye demonstrates the feasibility of extracting protocol data from RP sites; we believe our approach could also be used to convey security information as well as privacy information. Security-related information extracted using methods from previous security tools can be presented to users in an SPEye-like comparative fashion (but now showing security rather than privacy-related info). For example, a security scan could use the information available through SPEye’s existing design to scan RP client-side code, e.g., that use OAuth implicit flows (known to be weak) or that lack CSRF protection (applied through the OAuth state parameter). In this way, an augmented version of SPEye could warn users about vulnerable RP implementations before they commit to SSO login decisions.

7. Related Work

This section discusses related work under four headings, the first of which is closest to our work.

SSO measurements

Existing tools used in privacy measurements primarily rely on browser automation to scan RP sites and execute OAuth and OpenID flows. In a longitudinal measurement study, Järpehult et al. (jarpehult2022longitudinal) developed a Selenium-based tool to track differences in RP permission requests and IdP usage over a nine-year period. Morkonda et al. (morkonda2021empirical) developed OAuthScope based on Selenium to extract differences in permission requests among SSO login choices offered in RP sites. OpenWPM (englehardt2016online) is a comprehensive privacy measurement framework that relies on Selenium to instrument user tracking techniques employed by site operators.

SPEye’s approach differs from these tools as our end-user focus results in constraints (Section 4.2) leading our design to explicitly avoid browser automation. Instead, SPEye statically analyzes loaded RP HTML pages to search for SSO-related code fragments, and then makes login requests to RP backend servers, and from responses extracts SSO protocol data to provide privacy comparisons.

Shehab et al. (shehab2011roauth) proposed ROAuth as an extension to OAuth 2.0 and built a Firefox browser extension to allow Facebook users to configure (and possibly limit) requested permissions in SSO logins with Facebook. Li et al. (li2019oauthguard) developed OAuthGuard, a Chrome browser extension that monitors HTTP traffic to detect vulnerable OAuth 2.0 implementations.

While the above extensions extract protocol data by intercepting user-triggered OAuth requests, our tool automatically extracts the data once the extension is active, as the primary goal of SPEye is to provide comparative privacy information before the user chooses a login provider. We repeat that SPEye extracts data from multiple SSO login options as opposed to only one SSO login after its selection by the user.

Automated SSO security testing

Zhou and Evans (zhou2014ssoscan) built SSOScan to test the security of Facebook’s SSO implementation in RP sites by automating OAuth login flows. Sun and Beznosov (sun2012devil) performed a security analysis of OAuth 2.0 implementations across three IdPs and 96 RP sites. Drakonakis et al. (drakonakis2020cookie) developed XDriver to scan web login implementations (including RPs that offer login with Facebook and Google) for authentication and authorization flaws related to exposed cookies. Ghasemisharif et al. (ghasemisharif2018single) evaluated the feasibility of access revocation in RP sites following an IdP account compromise and presented the resulting security implications. Similar to SPEye, several tools (e.g., (ghasemisharif2018single; zhou2014ssoscan; drakonakis2020cookie; morkonda2021empirical)) rely on custom heuristics to search RP HTML page for CSS attributes and forms related to SSO login buttons.

SSO analysis approaches

Mainka et al. (mainka2017sok) categorized SSO testing approaches used by previous research and discussed the benefits and limitations of each analysis approach. Based on an attacker IdP approach, they built PrOfESSOS, a security analysis tool for testing OpenID Connect implementations. Fett et al. (fett2016comprehensive) utilized formal analysis techniques to evaluate security properties offered by the OAuth 2.0 protocol. Rahat et al. (rahat2022cerberus) developed Cerberus to test OAuth implementations for vulnerabilities and compliance with security best practices. They also developed OAuthLint (al2019oauthlint) which uses a similar approach to test OAuth code in mobile apps. Yang et al. (yang2016model) incorporated a model of the OAuth protocol in their OAuthTester tool to evaluate the security of OAuth RP and IdP implementations. They use fuzzing to tamper security-critical parameters including the state and redirect URI values. Bai et al. (bai2013authscan) developed AuthScan to model RP implementations and automatically extract protocol specifications from SSO systems. Using formal verification techniques, Wang et al. (wang2013explicating) modelled security assumptions made by OAuth app developers in the use of IdP SDKs. Their findings highlight RP implementation flaws due to a disconnect between implicit security assumptions in the SDKs and those made by app developers.

SSO privacy and usability

A recent survey on Google SSO by Balash et al. (balash2022security) revealed several privacy issues including user concerns on granting access to sensitive information such as personal emails and contacts. Their findings also reveal considerable variations in users’ understanding of the necessity of specific permission requests, suggesting lack of transparency in current SSO systems. Sun et al. (sun2011makes) conducted a user study with SSO users and found similar privacy concerns in granting RPs access to personal user data. Robinson and Bonneau (robinson2014cognitive) surveyed SSO users and found that text descriptions of permission requests in Facebook SSO were ineffective in informing users. Alaca and Van Oorschot (alaca2020comparative) compared 14 web SSO systems on protocol features including security, privacy, and usability. Kelley et al. (kelley2010standardizing) introduced “privacy nutrition labels” for sites across the web and evaluated their usability in informing web users about privacy policies. Recently, Apple used a similar approach and introduced privacy labels for iOS apps (applePrivacyLabels). The AppCensus tool, by Reardon et al. (reardon201950), automatically instruments the privacy behaviour of Android apps to generate privacy labels.

8. Concluding remarks

A general design goal is to provide privacy tools that allow users to better control access to their online personal data. Current SSO workflows do not support informed login decisions based on users’ privacy preferences. On an RP site, users must commit to a login choice before they can view or compare the permissions requested. This lack of transparency risks leading to choices that unintentionally reveal more information than desired.

Our SPEye Chrome extension scans RP sites and extracts information to inform SSO privacy decisions by providing a comparison of permission requests across multiple IdP options. We illustrate SPEye’s approach through a comparison of 101 popular RP sites allowing identification of four client-side code patterns, highlighting similarities and variations across RP implementations. These patterns inform the design of SPEye and future like-minded tools by identifying code features to target when extracting protocol data from RP implementations. We demonstrated SPEye’s ability to extract and display comparative data from a variety of RP sites.

The number of SSO login options supported by RP sites is expected to grow. We hope that our work on SPEye may help move the community towards enabling users to “Sign in with privacy”.

References

Appendix A Identifying SSO components

This appendix illustrates typical SSO implementation fragments observed in RP sites related to our analysis in Section 4.

a.1. Match patterns for SSO buttons

Listing A.1 provides a list of CSS selectors for identifying DOM elements related to SSO buttons. These were derived from 26 RP sites and divided into two sets. The first set includes match patterns identified in the training set of 13 RP sites. SPEye’s evaluation in Section 5 involved additional search patterns in the testing set.

[!ht] [frame=single,fontsize=]js // CSS selector patterns from Training set ”#idp_form” ”[aria-label=’Sign in with idp’]” ”#social-login-idp” ”[data-testid*=’idp-login’]” ”[data-cy*=’idp-sign-in’]” ”[href*=’signin/idp’]” ”#js-idp-oauth-login” ”[href*=’connect/idp’]” ”[href*=’signin?openid’]” ”.fm-sns-item.idp” ”.idp-button”, ”#signin_idp_btn” ”[data-test*=’provider-button-idp’]” ”[action*=’auth/idp’]”

// CSS selector patterns from Testing set ”[href*=’connector/idp’]” ”[href*=’login/idp’]” ”[data-provider*=’idp’]” ”[href*=’idp/auth’]” ”[href*=’client=idp’]” ”.iVatvW”, ”.dwhcjJ” ”[href*=’third_party=idp’]” ”[class*=’loginform-btn–idp’]” ”[href*=’sso/idp’]” CSS selectors used to identify SSO login buttons.

a.2. Identity Provider URLs

Each IdP assigns one or more endpoints to receive authorization and authentication requests in SSO logins. SSO server endpoints for the top three IdPs can be identified using regular expression strings in Listing A.2. This relates to the work discussed in Section 4.4.

[htb] [frame=single,fontsize=]js // Facebook ”https://(.*)
.facebook
.com/login(.*)” + ”—https://(.*)
.facebook
.com/oauth(.*)” + ”—https://graph
.facebook
.com/(.*)” + ”—https://(.*)
.facebook
.com/(.*)/oauth(.*)”

// Google + ”—https://(.*)
.google
.com/(.*)/oauth(.*)” + ”—https://oauth2
.googleapis
.com/(.*)” + ”—https://openidconnect
.googleapis
.com/(.*)” + ”—https://googleapis
.com/oauth(.*)”

// Apple + ”—https://(.*)
.apple
.com/auth(.*)”;
Regular expression patterns for finding IdP URLs.