How to Bypass CAPTCHAs with Puppeteer Effortlessly

Share this article

how to Bypass CAPTCHA with Puppeteer

Web Scraping can be a frustrating and challenging task. If you are someone who has some experience in web scraping or automation, you probably know how frustrating the “I’m not a robot” challenge can be. 

CAPTCHAs are specifically designed to prevent bots from accessing websites. They can be a real obstacle if you are doing some web scraping. So, the question arises: how to bypass CAPTCHAs with Puppeteer ?

In this guide, we will be going over how you can make use of Puppeteer, a popular Node.js library, to bypass CAPTCHAs with ease. So, without any further ado, let’s jump into it.

What is a CAPTCHA ?

What is a CAPTCHA

Before diving into the part on “how to bypass CAPTCHAs with puppeteer,” let’s first start with understanding what CAPTCHA is in the first place. CAPTCHA stands for “Completely Automated Public Turing Test to Tell Computers and Humans Apart.” 

It’s a test that can help verify whether a user is human or not. It comes in many forms; some popular ones are distorted text, image recognition, or even sometimes simple logic puzzles, which can be fun sometimes, too. 

Many websites actively make use of CAPTCHA so that they can protect their content from automated bots that might have intentions of spamming, scraping, or, worst case, hacking them. So, at the end of the day, CAPTCHAs pose challenges for web scraping. 

Here are 2 ways CAPTCHAs turn out to be problematic,

  • Technical Complexities: Normal Artificial Intelligence-driven systems can’t solve CAPTCHAs, as they require advanced image processing and pattern recognition skills. To make it tougher, they are constantly evolving and becoming even more sophisticated, making it hard for automation tools to keep up with them.
  • Automation roadblocks: CAPTCHAs can cause troubles for your web scrape script, which in turn can cause wasted resources, ultimately leading to frustration. Picture this: You spend a hefty amount of time making the perfect script, only to be disrupted by a CAPTCHA challenge.

What is Puppeteer ?

Puppeteer is a Node.Js library that can become your primary tool for web scraping tasks, helping you to bypass CAPTCHAs. Puppeteer further provides you with the ability to control headless Chrome browsers, which can make web scraping relatively easy. Some of the cool features of Puppeteer are as follows

  • Navigating to web pages and interacting with elements
  • Taking screenshots and generating PDFs
  • Capturing network requests and responses
  • Injecting and evaluating JavaScript code
  • Automating form filling and clicking
  • And many more web crawling related features

 

In addition, Puppeteer can further help you to bypass CAPTCHAs with Puppeteer; let’s start with how to set up Puppeteer and then see how to use it to bypass CAPTCHAs with Puppeteer.

Installing and setting up Puppeteer

You can use the NPM or Yarn package manager to install Puppeteer as a dependency in your project. First, make a new folder, then open cmd in that directory and use npm init -y to initialize a project, and then use npm install puppeteer.

Basic Puppeteer Setup:

Here’s how you can write a basic Puppeteer code to make sure everything is okay,

				
					const puppeteer = require('puppeteer');

(async () => {
  console.log("Launching browser...");
  const browser = await puppeteer.launch();
  console.log("Opening new page...");
  const page = await browser.newPage();
  console.log("Navigating to https://www.example.com...");
  await page.goto('https://www.example.com');
  console.log("Closing browser...");
  await browser.close();
  console.log("Script completed successfully!");
})();
				
			

Now run it, and the output will be something like this

				
					node index.js
Launching browser...
Opening new page...
Navigating to https://www.example.com...
Closing browser...
Script completed successfully!
				
			

Now you have a basic Puppeteer script, but the real challenge lies in detecting and bypassing CAPTCHA.

Let’s see how we can do that.

Detecting CAPTCHAs with Puppeteer

Puppeteer provides you with various techniques to help you detect CAPTCHAs, ultimately helping you bypass CAPTCHAS as well. To detect CAPTCHAs, you can use multiple methods to inspect and analyze the HTML content of the webpage, and you can also use some custom JS script to check for CAPTCHAs. Here are some ways to detect them

Detecting CAPTCHAs using Element Inspection

Perhaps the easiest and most convenient way for you to start detecting CAPTCHA is to well make use of the inspect element of the browser. You can look for specific classes or tags indicating a CAPTCHA presence

Puppeteer provides you with a page.$eval method to select an element with a given select and then evaluate a function on it.

Here is how you can do it:

				
					const captchaElement = await page.$eval('.captcha-container', element => element);
// Check if the element exists
if (captchaElement) {
  // Log a message if succeeds in Deteting a captcha
  console.log('CAPTCHA detected!');
}
				
			
Detecting CAPTCHAs using Attribute Checks

Another way you can look for CAPTCHAs is instead of looking for classes associated with CAPTCHAs; you look for attributes(of HTML element) that might indicate a CAPTCHA presence. 

Attributes such as data-sitekey or data-captcha-id are used to set up CAPTCHAs. This time, you will be making use of page.$$eval method ( different from page.$$eval method, which searches for a selector).

Here’s how you can use it:

				
					// Select all input elements with the attribute data-sitekey
const captchaInputs = await page.$$eval('input[data-sitekey]', (inputs) => inputs.length);
// Check if there are any elements
if (captchaInputs > 0) {
  // Log a message if CAPTCHA is detected based on data-sitekey attributes
  console.log('CAPTCHA detected based on data-sitekey attributes!');
}

// Select all input elements with the attribute data-sitekey
const captchaInputs = await page.$$eval('input[data-sitekey]', (inputs) => inputs.length);
// Check if there are any elements
if (captchaInputs > 0) {
  // Log a message if CAPTCHA is detected based on data-sitekey attributes
  console.log('CAPTCHA detected based on data-sitekey attributes!');
}
				
			

Bypassing CAPTCHAs with Puppeteer

Image showing how you can Bypass CAPTCHAs with Puppeteer

Since we have seen how puppeteer can be used for detecting CAPTCHAs, now let’s look at how we can bypass them.

Solving CAPTCHAs

The most obvious way to get through CAPTCHAs is to use a CAPTCHA-solving service such as 2Captcha. This service provides you with AI models to help you solve CAPTCHAs and then return the solved puzzle to your original Puppeteer script. Then, you can make use of this to complete the CAPTCHA challenge and access the website.

First, you need to sign up for the CAPTCHA-solving service and then get your API key. Then, the next thing you will need to do is integrate their API into your Puppeteer script.

Here’s how you can use it up,

				
					// Import Puppeteer and 2Captcha modules
const puppeteer = require('puppeteer');
const twocaptcha = require('2captcha');

// Define an async function to solve CAPTCHA with 2Captcha
async function solveCaptchaWith2Captcha() {
  // Initialize 2Captcha client with your API key
  const solver = new twocaptcha.Solver('YOUR_API_KEY');

  // Launch Puppeteer
  const browser = await puppeteer.launch();
  // Create a new page
  const page = await browser.newPage();

  // Navigate to website
  await page.goto('https://www.example.com');

  // Get CAPTCHA image data
  const captchaImage = await page.$eval('.captcha-image', element => element.src);

  // Solve CAPTCHA with 2Captcha
  const solvedCaptcha = await solver.solve({ image: captchaImage });


  await page.type('#captcha-input', solvedCaptcha.text);


  await browser.close();
}

// Call the function
solveCaptchaWith2Captcha();
				
			

Using Browser Automation Techniques to Bypass CAPTCHAS

One of the more sophisticated ways to bypass CAPTCHAs is by using browser automation techniques such as exploiting a user’s rotating proxies or changing user agents.

What it does is that it sometimes tricks the CAPTCHA detection mechanism by making your request come from an average human, so in a way, bypassing the CAPTCHA.

To use this, you will need to use a bunch of Puppeteer methods and plugins to modify your browser settings and actions; here’s a way how you can use rotating proxies:

				
					const puppeteer = require('puppeteer');
const puppeteerExtra = require('puppeteer-extra');
const ProxyPool = require('proxy-pool');

// Define an async function to bypass CAPTCHA with rotating proxies
async function bypassCaptchaWithRotatingProxies() {
 
  const proxyPool = new ProxyPool({
  
    proxies: ['http://user:pass@host:port', 'socks5://user:pass@host:port', ...],
    rotate: true,
    retry: 3,
  });

l
  const stealthPlugin = require('puppeteer-extra-plugin-stealth');
  puppeteerExtra.use(stealthPlugin());
  const browser = await puppeteerExtra.launch({
    headless: true,
    proxy: proxyPool.randomProxy(),
  });

  // Create a new page
  const page = await browser.newPage();

  // Navigate to website and perform actions
  await page.goto('https://www.example.com');
  // ...

  // Close the browser
  await browser.close();
}

// Call the function
bypassCaptchaWithRotatingProxies();
				
			

Best Practices to Bypass CAPTCHAs with Puppeteer

There are certain good practices that you can follow to make sure that you have a smooth and successful way to bypass CAPTCHAs with Puppeteer.

Here are some of them:

Avoiding Aggressive Automation:

Make sure to avoid excessive requests, as it can ultimately lead to triggering CAPTCHAs or even overloading the server. 

Make sure to use delays and timeouts to control your automation speed and frequency. What you can further do is make use of a proxy service so it appears as if the requests are coming from a different device. 

You can use Petaproxy as your safe and trustful proxy service; they have all sorts of proxy services; for this particular case, you would be interested in a rotating proxy. Luckily, they provide you with that as well.

Maintain Ethical Practices: 

Another thing that you need to make sure of is that you don’t violate the website’s terms of service. You always must respect the website owner’s rights and interests when bypass CAPTCHAs with Puppeteer.

Stay updated with the latest trends

Since CAPTCHAs are always evolving, you need to make sure you keep the puppeteer library updated. Always monitor the latest web scraping news and trends in our Blog in CAPTCHA technology and detection. So, you level up your web scraping skills.

Conclusion

To sum up, Puppeteer provides an easy way to get around CAPTCHA problems while web scraping. Users can easily get around these problems by using Puppeteer’s features, like its CAPTCHA recognition methods and ability to connect to services like 2Captcha. Using browser automatic methods along with changing proxies also makes the process of dodging easier. Web scraping goes smoothly and effectively as long as you follow best practices and keep up with new CAPTCHA technologies.

Philipp Neuberger

Managing Director

Philipp, a key writer at Petaproxy, specializes in internet security and proxy services. His background in computer science and network security informs his in-depth, yet accessible articles. Known for clarifying complex technical topics, Philipp’s work aids both IT professionals and casual internet users.

Don’t miss anything! Sign up for our newsletter

Your One-Stop Source for Trending Updates and Exclusive Deals

We respect your privacy. Your information is safe and you can easily unsubscribe at any time.

Table of Contents

BEST OFFER

Petaproxy Logo
Petaproxy Trustpilot
Petaproxy Logo
Petaproxy Trustpilot
Out of stock

Prepare for Launch this February!

February marks the arrival of our cutting-edge proxy service. Be among the first to elevate your internet freedom and security as we launch.

Early Bird Special: Sign up now for an exclusive 20% discount on our launch!