Hacking Google reCAPTCHA v3 using Reinforcement Learning

03/03/2019 ∙ by Ismail Akrout, et al. ∙ UNIVERSITY OF TORONTO 0

We present a Reinforcement Learning (RL) methodology to bypass Google reCAPTCHA v3. We formulate the problem as a grid world where the agent learns how to move the mouse and click on the reCAPTCHA button to receive a high score. We study the performance of the agent when we vary the cell size of the grid world and show that the performance drops when the agent takes big steps toward the goal. Finally, we used a divide and conquer strategy to defeat the reCAPTCHA system for any grid resolution. Our proposed method achieves a success rate of 97.4 resolution.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Artificial Intelligence (AI) has been experiencing unprecedented success in the recent years thanks to the progress accomplished in Machine Learning, and more specifically Deep Learning (DL). These advances raise several questions about AI safety and ethics [1].

In this work, we do not provide an answer to these questions but we show that AI systems based on ML algorithms such as reCAPTCHA v3 [2] are still vulnerable to automated attacks.

Google’s reCAPTCHA system, for detecting bots from humans, is the most used defense mechanism. Its purpose is to protect against automated agents and bots, attacks and spams. Previous versions of Google’s reCAPTCHA (v1 and v2) present tasks (images, letters, audio) easily solved by humans but challenging for computers.
The reCAPTCHA v1 presented a distorted text that the user had to type correctly to pass the test. This version was defeated by Bursztein et al. [3] with 98% accuracy using ML-based system to segment and recognize the text. As a result, image-based and audio-based reCAPTCHAs were introduced as a second version. Researchers have also succeeded in hacking these versions using ML and more specifically DL. For example, the authors in [4] designed an AI-based system called UnCAPTCHA to break Google’s most challenging audio reCAPTCHAs.
On 29 October 2018, the official third version was published [5] and removed any user interface. Google’s reCAPTCHA v3 uses ML to return a risk assessment score between 0.0 and 1.0. This score characterizes the trustability of the user. A score close to 1.0 means that the user is human.

In this work, we introduce an RL formulation to solve this reCAPTCHA version. Our approach is programmatic: first, we propose a plausible formalization of the problem as a Markov Decision Process (MDP) solvable by state-of-the-art RL algorithms; then, we introduce a new environment for interacting with the reCAPTCHA system; finally, we analyze how the RL agent learns or fails to defeat Google reCAPTCHA.

Experiment results show that the RL agent passes the reCAPTCHA test with accuracy. To our knowledge, this is the first attempt to defeat the reCAPTCHA v3 using RL. In summary, this paper makes the following distinct contributions:

  • We show how to formulate the user’s mouse movement as a learning task in a RL environment;

  • We present a RL agent capable of defeating the newest version of reCAPTCHA;

  • We develop an environment to simulate the user experience with websites using the reCAPTCHA system;

  • We propose a scalable and efficient method to defeat reCAPTCHA on different environment’s sizes.

The rest of the paper is organized as follows: we present related work in section 2. The RL formulation is detailed in section 3. Sections 4 and 5 highlight the obtained results and undergo an empirical analysis.

Ii Related Work

Researchers have intensively investigated the security of the reCAPTCHA systems. The early studies used a two-fold process consisting of a segmentation step followed by a recognition algorithm [6]. This approach was used to solve text-based reCAPTCHAs. The work in [3] unified this process in a single step method using ML to perform segmentation and recognition and used a “human in the loop” technique to train the character recognizer. However, this technique is not efficient for modern versions since they contain complex backgrounds or overlapping letters. Therefore, researchers resorted to DL techniques: [7]

trained a Convolutional Neural Network (CNN) to solve textual reCAPTCHAs and proposed an active learning technique to overcome the limited amount of available data. More recently, the authors in


adopted a Generative Adversarial Network (GAN) based approach to learn a synthesizer to train a classifier using synthetic reCAPTCHAs, then they finetuned it using a small set of real reCAPTCHA images. The work in

[9] proposed Style Area Captcha (SACaptcha) based on neural style transfer techniques to attack image-based reCAPTCHA schemes. Another example of applying DL in defeating reCAPTCHA is [10]. Here, the authors implemented a 2D RNN-LSTM which achieved a success rate of on a merged-type textual reCAPTCHAs. Regarding audio-based reCAPTCHAs, [4] succeeded in defeating the most challenging audio tests using their framework UnCAPTCHA.

Unlike these studies, we focus on defeating the newest version v3 of reCAPTCHA. As mentioned before, this version is more challenging than the previous ones since it is completely invisible to the users and does not require any interaction with them. In the previous versions, the user is provided by a text, an image or an audio which could be exploited as an input to train neural nets. Solving the reCAPTCHA problem using RL as advocated in this paper is an important departure. We believe that using RL is the most suitable approach in such applications.

Iii Method

Iii-a Preliminaries

An agent interacting with an environment is modeled as a Markov Decision Process (MDP) [11]. A MDP is defined as a tuple where and are the sets of possible states and actions respectively.

is the transition probabilities between states and

is the reward function. Our objective is to find an optimal policy that maximizes the future expected rewards.

Policy-based methods directly learn . Let’s assume that the policy is parameterized by a set of weights such as . Then, the objective is defined as :


where is the discount factor and is the reward at time .

Thanks to the policy gradient theorem and the gradient trick [12]

, the Reinforce algorithm estimates gradients using (



is the future discounted return at time defined as , where marks the end of an episode.

Usually the equation (2

) is formulated as the gradient of a loss function

defined as follows:


where is the a number of collected episodes.

Iii-B Settings

To pass the reCAPTCHA test, a human user will move his mouse starting from an initial position, perform a sequence of steps until reaching the reCAPTCHA check-box and clicking on it. Depending on this interaction, the reCAPTCHA system will reward the user with a score.

In this work, we modeled this process as a MDP where the state space is the possible mouse positions on the web page and the action space is . Using these settings, the task becomes similar to a grid world problem.

Fig. 1: The agent’s mouse movement in a MDP

As shown in Figure 1, the starting point is the initial mouse position and the goal is the position of the reCAPTCHA is the web page. A grid is then constructed where each pixel between these two points is a possible position for the mouse.

We assume that a normal user will not necessary move the mouse pixel by pixel. Therefore, we defined a cell size which is the number of pixels between two consecutive positions. For example, if the agent is at the position and takes the action , the next position is then .

One of our technical contributions consists in our ability to simulate the same user experience as any normal reCAPTCHA user. This was challenging since reCAPTCHA system uses different methods to distinguish fake or headless browsers, inorganic behaviors of the mouse, etc. Our environment overcomes all these problems. For more details about the environment implementation, please refer to section VII.

At each episode, a browser page will open up with the user mouse at a random position, the agent will take a sequence of actions until reaching the reCAPTCHA or the time limit. Once the episode ends, the user will receive the feedback of the reCAPTCHA algorithm as would any normal user.

Iv Experimental results

We trained a Reinforce agent on a grid world of a specific size. Our approach simply applies the trained policy to choose optimal actions in the reCAPTCHA environment. Our results presented are the success rates across runs. We consider that the agent successfully defeated the reCAPTCHA if it obtained a score of . In our experiments, the discount factor was . The policy network was a vanilla two fully connected layer network. The parameters were learned with a learning rate of and a batch size of .

Table I shows the results for a grid. Our method successfully passed the reCAPTCHA test with a success rate of .

reCAPTCHA v3 Reward
0 0.1 0.3 0.7 0.9
Percentage 0.1 2.4 0.1 0.1 97.4
TABLE I: The distribution of rewards for a grid

Next, we consider testing our method on bigger grid sizes. If we increase the size of the grid, the state space dimension increases exponentially and it is not feasible to train a Reinforce algorithm with a very high dimensional state space. For example, if we set the grid size to pixels, the state space becomes versus for a . This is another challenge that we address in this paper: how to attack the reCAPTCHA system for different resolutions without training an agent for each resolution?

V An efficient solution to any grid size

In this section, we propose a divide and conquer technique to defeat the reCAPTCHA system for any grid size without retraining the RL agent. The idea consists in dividing the grid into a sub-grids of size and then applying our trained agent on these sub-grids to find the optimal strategy for the bigger screen (see Figure  2). Figure 3 shows that this approach is effective and the success rates for the different tested sizes exceed .

Fig. 2: Illustration of the divide and conquer approach: the agent runs sequentially on the diagonal grid worlds in purple. The grid worlds in red are not explored.
Fig. 3: Success rate of the RL agent on different grid resolutions

Vi Effect of cell size

Here, we study the sensitivity of our approach to the cell size as illustrated in Figure 4.

(a) Grid world with a cell size 11 pixel
(b) Grid world with a cell size 33 pixel
Fig. 4: The grid world with (a) a cell size 11 and (2) a cell size 33

Figure 5 illustrates the obtained performance. We observe that when the cell size increases, the success rate of the agent decreases. For, cell size of , the RL agent is detected as a bot on more than of the test runs. We believe that this decline is explained by the fact, with a big cell size, the agent scheme will contain more jumps which may be considered as non-human behavior by the reCAPTCHA system.

Fig. 5: Success rates for different cell sizes

Vii Details of the reCAPTCHA environment

In this section, we provide detailed technical information about the reCAPTCHA environment.

Most the previous works (e.g [4]) used the browser automation software Selenium [13] to simulate interactions with the reCAPTCHA system. At the beginning, we adopted the same approach but we observed that the reCAPTCHA system always returned low scores suggesting that the browser was detected as fake. After investigating the headers of the HTTP queries, we found an automated header in the webdriver and some additional variables that are not defined in a normal browser, indicating that the browser is controlled by a script. This was confirmed when we observed that the reCAPTCHA system with Selenium and a human user always returns a low score.

It is possible to solve this problem in two different ways. The first consists in creating a proxy to remove the automated header while the second alternative is to launch a browser from the command line and control the mouse using dedicated Python packages such as the PyAutoGUI library [14]. We adopted the second option since we cannot control the mouse using Selenium. Hence, unlike previous approaches, our environment does not use browser automation tools.

Another attempt to use Tor [15] to change the IP address did not pass the reCAPTCHA test and resulted in low scores (i.e ). It is possible that the reCAPTCHA system uses an API services such as ExoneraTor [16] to determine if IP address is part of the Tor network or not on a specific date.

We also discovered that simulations running on a browser with a connected Google account receive higher scores compared when no Google account is associated to the browser.

To summarize, in order to simulate a human-like experience, our reCAPTCHA environment (1) does not use browser automation tools (2) is not connected using a proxy or VPN (3) is not logged in with a Google account.

Viii Conclusion and future work

This paper proposes a RL formulation to successfully defeat the most recent version of Google’s reCaptcha. The main idea consists in modeling the reCAPTCHA test as finding an optimal path in a grid. We show how our approach achieves more than success rate on various resolutions using a divide and conquer strategy.
This paper should be considered as the first attempt to pass the reCAPTCHA test using RL techniques. Next, we will deploy our approach on multiple pages and verify if the reCAPTCHA adaptive risk analysis engine can detect the pattern of attacks more accurately by looking at the activities across different pages on the website.

Ix Acknowledgments

We thank Douglas Tweed for his valuable feedback and helpful discussions.


  • [1] D. Amodei, C. Olah, J. Steinhardt, P. F. Christiano, J. Schulman, and D. Mané, “Concrete problems in ai safety,” CoRR, 2016.
  • [2] Google, “reCAPTCHA v3’s website,” {blue}https://developers.google.com/recaptcha/docs/v3}, 2018, [Online; accessed 3-March-2019].
  • [3] E. Bursztein, J. Aigrain, A. Moscicki, and J. Mitchell, “The end is nigh: Generic solving of text-based captchas,” USENIX Workshop on Offensive Technologies, 2014.
  • [4] K. Bock, D. Patel, G. Hughey, and D. Levin, “uncaptcha: A low-resource defeat of recaptcha’s audio challenge,” USENIX Workshop on Offensive Technologies, 2017.
  • [5] Google, “reCAPTCHA v3’s official announcement,” {blue}https://webmasters.googleblog.com/2018/10/introducing-recaptcha-v3-new-way-to.html}, 2018, [Online; accessed 15-February-2019].
  • [6] K. Chellapilla and P. Simard, “Using machine learning to break visual human interaction proofs (hips),” Advances in Neural Information Processing Systems, 2004.
  • [7] C. H. R. T. Stark, Fabian and D. Cremers, “Captcha recognition with active deep learning,” GCPR Workshop on New Challenges in Neural Computation, 2015.
  • [8] G. Ye, Z. Tang, D. Fang, Z. Zhu, Y. Feng, P. Xu, X. Chen, and Z. Wang, “Yet another text captcha solver: A generative adversarial network based approach,” Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, 2018.
  • [9] M. Tang, H. Gao, Y. Zhang, Y. Liu, P. Zhang, and P. Wang, “Research on deep learning techniques in breaking text-based captchas and designing image-based captcha,” IEEE Transactions on Information Forensics and Security, 2018.
  • [10] C. Rui, Y. Jing, H. Rong-gui, and H. Shu-guang, “A novel lstm-rnn decoding algorithm in captcha recognition,” 2013 Third International Conference on Instrumentation, Measurement, Computer, Communication and Control, 2013.
  • [11] M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming.   John Wiley & Sons, 2014.
  • [12] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction.   MIT press, 2018.
  • [13] Selenium, {blue}https://www.seleniumhq.org/}, online: accessed 3-March-2019.
  • [14] PyAutoGUI, {blue}https://pyautogui.readthedocs.io/en/latest/}, online: accessed 3-March-2019.
  • [15] Tor, {blue}https://www.torproject.org/}, online: accessed 3-March-2019.
  • [16] ExoneraTor, {blue}https://metrics.torproject.org/exonerator.html}, online; accessed 3-March-2019.