How to Effortlessly Bypass CAPTCHAs with Selenium and Python

Philipp Neuberger

March 2, 2024
8:40 pm

Share this article

A CAPTCHA is an acronym that stands for Completely Automated Public Turing Test to Tell Computers and Humans Apart. Now let´s dive in deeper in “how to bypass CAPTCHAs with Selenium and Python.If you are someone who interacts with browsers on a daily occasion, you know how painful and frustrating they can be. At their fundamental core, it is a type of challenge that websites make use of in order to verify the identity of the person visiting the website and ensure that it is a human visit, not a bot. CAPTCHAs comes in very different forms, including visual, textual, and even audio.

If you are a web scraper, that painful experience might become a nightmare, as it can create severe problems for you. Now, this blog is specifically designed to provide you with some of the techniques and tools you can use to bypass CAPTCHAs with Selenium and Python.

For bypassing CAPTCHAs we will use Selenium and Python. If you need to learn about Selenium, it’s a popular framework for automating web browser interactions, such as clicking, typing, or even navigating. We will be providing you with some of the best practices that you can adhere to make your web scraping process smooth. So make sure to read till the very end.

As next step let me explain you what python is:

What is Python ?

Python is an excellent dynamically typed language that makes code simple to understand. Guido van Rossum designed Python in the late 1980s. Due to its flexibility and ease of use, it has become one of the most popular languages worldwide.Python is essential for bypass CAPTCHAs with Selenium and Python.

Python’s basic, easy design lets developers express concepts in less lines than other languages. Code is simpler to understand with indentation, not curly braces or words. This clarifies and unifies code blocks. Python’s extensive standard library includes modules and methods for reading and writing files, networking, data management, and more, making it adaptable.

PyPI hosts a large community of third-party Python packages. Developers may utilize customized tools and libraries for almost every application.Python’s support for functional, object-oriented, and procedural programming is one of its finest features. Python is versatile and may be used in many ways. A developer may choose the best solution for each situation.

Python provides many tools and frameworks for web development, data research, machine learning, AI, scientific computing, and automation, hence it’s largely utilized there. Django and Flask are popular web app builders, whereas NumPy, pandas, and Matplotlib are popular data exploration and display packages.

Now that you know what Python is, let’s dive into learning more about Selenium:

What is Selenium ?

Selenium is a strong and well-known tool for automating testing of web applications. It gives testers and developers the tools and platforms they need to instantly connect with web components, act like users, and test online apps.

Selenium works with Python, Java, C#, and JavaScript, which makes it flexible and easy for many coders to use. Selenium WebDriver, IDE, Grid, and Standalone Server are its parts, and they all work together to test different things.Selenium WebDriver, the most famous part, lets users handle Chrome, Firefox, Safari, and Edge from a computer. Using scripts in any computer language, testers can make online tasks like hitting buttons, filling out forms, and checking material happen automatically.

Selenium IDE records and plays back how people interact with computer applications. Its simple interface lets beginners and people who make prototypes quickly make test cases without having to know how to code.Selenium Grid lets test scripts run at the same time on multiple browsers and devices, which speeds up testing web applications.Selenium Standalone Server handles how Selenium WebDriver and web browsers talk to each other, making automation work smoothly in all settings.

So,now we know everything about python and selenium.Lets start with different technics to bypass CAPTCHAs with Selenium and Python:

Bypassing CAPTCHA with Selenium and Undetected-Chromedriver

One of the simplest ways to bypass CAPTCHA and start web scraping with Selenium is to simply use undetected-chromedriver. This is a Python package that wraps the ChromeDriver executable and then provides you with many features that you can use to avoid detection by the anti-bot systems. Just make sure you have Python set up on your device.

Installation

To install undetected-chromedriver, you can use the pip command, which is like the package manager for Python.

				
					pip install undetected-chromedriver

Another thing that you will need at this point is the requests library in Python (which can help make HTTP requests).

Again, use the pip command to install it.

				
					pip install requests

Here’s how you can use the undetected-chromedriver and requests to start bypassing CAPTCHA on websites. For this example, we will be using the website Google Meet – bij-qfwc-kin , which has a Google reCAPTCHA v2 challenge.

				
					# Import libraries
import undetected_chromedriver as uc
import requests

# Set Chrome options
options = uc.ChromeOptions()
options.headless = True # Run in headless mode
options.add_argument('--no-sandbox') # Bypass OS security model
options.add_argument('--disable-dev-shm-usage') # Overcome limited resource problems

# Create browser instance
driver = uc.Chrome(options=options)

# Navigate to target website
driver.get('https://www.google.com/recaptcha/api2/demo')

# Take a screenshot for verification
driver.save_screenshot('screenshot.png')

# Close browser instance
driver.quit()

Note:This code uses the undetected_chromedriver first to automate the browser interactions, and also sets the Chrome options for headless mode and bypassing security measures, initializes a WebDriver instance and then navigates to the page, and takes a screenshot of the verification and then closes the browser.

Key Considerations Bypassing CAPTCHAs with Selenium and Python

The undetected-chromedriver package is very useful for bypassing some basic anti-bot measures: user-agent checks, cookie checks, and JS checks. It also allows you to set Chrome options, for example, running in headless mode, which can speed up the scraping process.

However, there are a lot of limitations:

IP address reputation: One of the issues that come with this is if your IP Address has been blacklisted or well has a low reputation, you will have to encounter CAPTCHAs, or in the worst-case scenario, it can lead to being blocked by the website

Headless mode detection: Some websites use modern anti-bot measures, meaning they can detect if you are running in headless mode, and they will then trigger CAPTCHAs. Then, you have to use other tricks, such as spoofing the window size, the navigator object, or the WebGL vendor.

Advanced anti-bot systems: Some websites use more sophisticated anti-bot systems, which include reCAPTCHA v3, hCaptcha, or BotDet, which have developed algorithms and then assign you a score or challenge.

You can easily overcome this limitation by simply using Proxies with IP rotation.

Use Proxies with IP Rotation

IP rotation is one of the methods through which you can periodically change your IP address; this will help you stay anonymous and undetected. It will also let you bypass CAPTCHAs with Selenium & Python, as you have a relatively fresh history based on your IP address.

One of the main methods to implement IP rotation is simply by using proxy services. A proxy service is like a third-party provider that provides you with a pool of IP addresses that you will be using to access the web. There are tons of different types of proxies that you can use, including data centre, residential, or mobile proxies. The best proxy service you can use is Petaproxy, which has a wide variety of available proxies that you can use according to your needs.

To start using the proxy service, create your account and obtain your credentials; then, you can use the request library to configure and authenticate your proxy.

There is also another method to bypass CAPTCHAs with Selenium and Python, here we will show you how:

Bypassing CAPTCHA with Web Unblocker

Another method by which you can bypass CAPTCHAs with Selenium and Python is using a web unblocker. Web unblocker is a compelling proxy solution that provides a simple and scalable way to access any web content without being unblocked. It handles all the proxy management and provides a mechanism for IP rotation, JS rendering, and CAPTCHA solving so you can focus on your scraping logic. m

Setup

You can start using web unblocker; you must create an account and obtain your credentials. Just sign up for a free trial on their website.

Once you obtain your credentials, you can start using the requests library to begin integrating the web unblocker; you need to set your proxy configuration and authentication parameters; here is how you can start using it,

				
					# Import requests library
import requests

# Set proxy configuration
proxy = {
    'http': 'http://webunblocker.ai:80',
    'https': 'https://webunblocker.ai:80'
}

# Set authentication parameters
auth = requests.auth.HTTPBasicAuth('username', 'password') # Replace with your credentials

Now that we have set proxy we can use it to bypass the same website that we did before:

				
					# Import requests library
import requests

# Set proxy configuration
proxy = {
    'http': 'http://webunblocker.ai:80',
    'https': 'https://webunblocker.ai:80'
}

# Set authentication parameters
auth = requests.auth.HTTPBasicAuth('username', 'password') # Replace with your credentials

# Make a GET request to the target website
response = requests.get('https://www.google.com/recaptcha/api2/demo', proxies=proxy, auth=auth)

# Check the status code
if response.status_code == 200:
    # Print the response content
    print(response.content)
else:
    # Print the error message
    print(f'Error: {response.status_code}')

That´s it, now you´re able to bypass CAPTCHAs with Selenium and Python!

Additional Techniques and Practices to Bypass CAPTCHAs with Selenium and Python

Besides using undetected-chromedriver and Web Unblocker, there are some other techniques and best practices that you can use to bypass CAPTCHA with Selenium and Python.

These include:

CAPTCHA solving services: If you encounter a complex CAPTCHA type, such as reCAPTCHA v3, hCaptcha, or BotDetect, you may need to use a CAPTCHA solving service. A CAPTCHA solving service is a third-party provider that can solve CAPTCHAs for you, either manually or automatically, for a fee.

Some examples of CAPTCHA solving services are:

2Captcha
Anti-Captcha
DeathByCaptcha

Audio CAPTCHAs: Some CAPTCHAs offer an audio option, where you can listen to a voice and type what you hear. You can use this option to bypass image-based CAPTCHAs, especially if they are blurry or distorted. To handle audio CAPTCHAs, you can use a speech-to-text service, such as:

Google Cloud Speech-to-Text
IBM Watson Speech-to-Text
Amazon Transcribe

Ethical considerations: When you bypass CAPTCHAs with Selenium and Python, you should always follow some ethical and responsible web scraping practices, such as: Respecting the websites’ terms of service. Before you scrape a website, you should check its terms of service and see if it allows web scraping or not. You should also respect the robots.txt file, which specifies the rules for web crawlers.

Advising responsible scraping rates: You should avoid sending too many requests to a website in a short period of time, as this could overload the servers and affect the performance of the website. You should also use a random delay between your requests to mimic human behavior and avoid detection, make sure to add in a timeout in your web scraping script.

Conclusion

Here we talked about how to get around CAPTCHAs with Selenium and Python, focused on techniques and the best ways to do things. It includes tools like undetected-chromedriver and Web Unblocker that can easily get around CAPTCHAs. In addition, it talks about more advanced techniques like services that can solve CAPTCHAs and stresses the importance of doing scraping in an honest way.If you want to know how to bypass CAPTCHAs with Puppeteer read this .

Philipp Neuberger

Managing Director

Philipp, a key writer at Petaproxy, specializes in internet security and proxy services. His background in computer science and network security informs his in-depth, yet accessible articles. Known for clarifying complex technical topics, Philipp’s work aids both IT professionals and casual internet users.

Don’t miss anything! Sign up for our newsletter

Your One-Stop Source for Trending Updates and Exclusive Deals