Main Website
Web Scraping
Web Scraping
Updated on
June 28, 2024

Proxy in Selenium: 3 Setup Methods Explained 

Selenium is a powerful tool for web scraping and automated web testing. However, one of the main challenges web scrapers face is dealing with issues such as IP blocking, anti-scraping measures, and geo-restrictions. These obstacles can significantly slow down your workflow and impact your end products or services.

This is where using Selenium proxies becomes invaluable. Selenium proxies means using proxy services along with Selenium. Proxies enable IP masking, allowing you to scrape dynamic data stealthily by automating interactions that require page reloads. Proxies also help bypass geo-restrictions, rotate IP addresses to circumvent rate limits, and simulate multiple users for testing scenarios. Additionally, managing cookies and interacting with iFrames becomes seamless with proxy integration in Selenium.

In this article, we will discuss how to use a single proxy, a list of proxies, and a rotating proxy in your Selenium scripts. To cover a broad range of usage, we will provide examples in three different programming languages, ensuring you know how to use Selenium effectively with each language.

Prerequisites

The following are the prerequisites needed for all three of these methods.

  • Python and pip - Install Python and its package installer from the official website.
  • Java JDK - make sure to have Java JDK installed from the official Java Website.
  • .NET SDK - Install the .NET SDK from the Microsoft .NET website.
  • Selenium -  Install the Selenium package. For Python, use pip (pip install selenium). For Java, include Selenium in your project using Maven or Gradle. For C#, use the NuGet package manager (dotnet add package Selenium.WebDriver).
  • WebDriver Setup - Download and set up the WebDriver for the browser you intend to use (e.g., ChromeDriver for Chrome, GeckoDriver for Firefox).

For this article, we'll be using Chrome, so the code examples will be using Chrome WebDriver.

Proxy server information

When using authenticated proxies, it is essential to have the following proxy server details.

  • Proxy Server Address: The IP address or hostname of the proxy server.
  • Port: The port number on which the proxy server is listening.
  • Authentication: If the proxy server requires authentication, you will need the username and password.

When you sign up with Webshare.io, you will receive access to 10 free proxy servers, and you can also request access to rotating proxies.

Method 1: Using Python

In our first example, we will show you how to use a single HTTP/HTTPS proxy in Selenium. Using a single proxy with Python is very straightforward. A single HTTP proxy is typically sufficient for basic scraping and testing activities where you do not need to frequently switch IP addresses or handle extensive anti-scraping measures. Let's see how this works.


from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

# Proxy details
proxy_host = "proxy_ip"
proxy_port = "port"
proxy_user = "username"
proxy_pass = "password"

# Define custom options for the Selenium driver
options = Options()
proxy_server_url = f"http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}"
options.add_argument(f'--proxy-server={proxy_server_url}')

# Create the ChromeDriver instance with custom options
driver = webdriver.Chrome(
    service=ChromeService(ChromeDriverManager().install()),
    options=options
)

# Visit the target site
driver.get("https://www.scrapethissite.com")

# Take a screenshot
screenshot_path = "scrape_screenshot.png"
driver.save_screenshot(screenshot_path)
print(f"Screenshot saved to {screenshot_path}")

driver.quit()

In this example, we use a single HTTP proxy with authentication to access a website and take a screenshot of it. First, we import the necessary modules from Selenium, including the WebDriver and Chrome-specific configurations. Then we define the proxy details, including the host, port, username, and password, which are then used to create a proxy server URL.

Custom options for the Selenium Chrome driver are configured to use this proxy server URL via the --proxy-server argument. The ChromeDriver instance is created with these custom options using the ChromeDriverManager to handle the WebDriver installation. Once the WebDriver is set up, it navigates to the specified target site, "https://www.scrapethissite.com".

After the page loads, a screenshot is taken and saved to a file named "scrape_screenshot.png". The script prints the path of the saved screenshot to the console and then gracefully quits the WebDriver session, closing the browser.

If you need additional control over automation testing, you can install the Selenium Wire. This is a Python library that enhances Selenium WebDriver to capture and inspect network traffic generated by the browser during test automation.

Method 2: Using Java

Sometimes using a single HTTP proxy is not enough, especially when you try to access a website repeatedly and risk getting blocked. In such cases, you need to have access to several proxies and switch between them periodically. Our next example will show you how to do that using Java.

To get started, you will need to install the Selenium WebDriver Java bindings, which you can download from the Selenium official website. Once you have the necessary bindings installed, you can implement the following script.


import org.openqa.selenium.OutputType;
import org.openqa.selenium.Proxy;
import org.openqa.selenium.TakesScreenshot;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
import org.openqa.selenium.io.FileHandler;

import java.io.File;  
import java.io.IOException;
import java.util.Arrays;
import java.util.List;

public class SeleniumProxyExample {
    public static void main(String[] args) {
        // List of proxies
        List proxyList = Arrays.asList(
            "proxy1:port1",
            "proxy2:port2",
            "proxy3:port3"
        );

        for (String proxyAddress : proxyList) {
            // Set up proxy
            Proxy proxy = new Proxy();
            proxy.setHttpProxy(proxyAddress)
                 .setSslProxy(proxyAddress);

            ChromeOptions options = new ChromeOptions();
            options.setProxy(proxy);
            WebDriver driver = new ChromeDriver(options);

            try {  
                driver.get("https://www.scrapethissite.com/pages/simple/");
                File screenshot = ((TakesScreenshot) driver).getScreenshotAs(OutputType.FILE);
                FileHandler.copy(screenshot, new File("screenshot_" + proxyAddress.replace(":", "_") + ".png"));
            } catch (IOException e) { 
                e.printStackTrace();
            } finally {
                driver.quit();
            }
        }
    }
}

In this example, we demonstrate how to use multiple proxies in Selenium with Java. Unlike the Python example, this Java implementation includes a list of proxies that are iterated over, configuring the WebDriver to use a different proxy for each iteration. This is achieved by creating a Proxy object for each proxy address in the list and setting it in ChromeOptions.

For each proxy, the WebDriver navigates to the specified site and takes a screenshot, which is saved with a filename that includes the current proxy address. This ensures that each request appears to come from a different IP address, mitigating the risk of getting blocked. Additionally, the screenshots are saved with unique names to distinguish between the different proxies used.

Method 3: Using C#

When conducting extensive web scraping or automated testing, using rotating residential proxies can be significantly more effective than using a static list of proxies. Rotating proxies automatically switch IP addresses at defined intervals, which helps distribute requests across a larger pool of addresses. This reduces the likelihood of IP bans and enhances anonymity. For this example, we will use C# and a rotational proxy from webshare.io to show you how to implement rotating residential proxies.


using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
using System;
using System.Net;

class SeleniumRotatingProxyExample
{
    static void Main()
    {
        // Define the rotating proxy details
        string proxyAddress = "p.webshare.io:80";
        string proxyUsername = "your_username";
        string proxyPassword = "your_password";

        // Configure the proxy
        var proxy = new WebProxy(proxyAddress)
        {
            Credentials = new NetworkCredential(proxyUsername, proxyPassword)
        };

        ChromeOptions options = new ChromeOptions();
        options.Proxy = new Proxy
        {
            HttpProxy = proxyAddress,
            SslProxy = proxyAddress
        };
        options.AddArgument("--proxy-server=" + proxyAddress);

        // Create the ChromeDriver instance with custom options
        IWebDriver driver = new ChromeDriver(options);

        try
        {
            // Visit the target site
          				driver.Navigate().GoToUrl("https://www.scrapethissite.com/pages/simple/");

            // Take a screenshot
            Screenshot screenshot = ((ITakesScreenshot)driver).GetScreenshot();
            string screenshotPath = "scrape_screenshot.png";
            screenshot.SaveAsFile(screenshotPath, ScreenshotImageFormat.Png);
            Console.WriteLine($"Screenshot saved to {screenshotPath}");

            // Perform additional web browser interactions if necessary
        }
        catch (Exception e)
        {
            Console.WriteLine(e.Message);
        }
        finally
        {
            // Quit the driver
            driver.Quit();
        }
    }
}

In this example, we configure a rotating residential proxy in Selenium using C#. The proxy details, including the address, username, and password, are defined and used to create a WebProxy object with credentials.

These proxy settings are then applied to Chrome options, which are used to initialize the ChromeDriver. This setup ensures that each request sent by the WebDriver uses the rotating proxy, helping to avoid IP bans and maintain anonymity.

The WebDriver goes to the specified site, "https://www.scrapethissite.com/pages/simple/", and takes a screenshot, saving it to a file. This shows how rotating proxies can be integrated into a Selenium workflow in C# to enhance the efficiency and reliability of web scraping tasks.

Advanced tips

While following these methods can allow you to leverage the capabilities of Selenium along with proxies, there are some tips that you should consider when using proxies to increase the functionality of your system.

Proxy rotation

Implement a rotation mechanism to switch between proxies periodically, as demonstrated in the C# example. This can help you avoid detection and IP blocking.

Another option is to manage a pool of proxies and switch between them before each request. This approach ensures that your scraping activities remain under the radar and reduce the chances of being blocked.

Preventing JavaScript and WebRTC IP leaks

When you use proxies in Selenium, Web Real-Time Communication (WebRTC) is turned on by default. This technology facilitates real-time, peer-to-peer communication for audio, video, and data directly within web browsers. Although WebRTC improves the functionality of web applications, it can potentially reveal your real IP address even when using a proxy.

To prevent WebRTC IP leaks in Selenium, you can disable or manipulate WebRTC in the browser settings. Here’s how to do it in Python.


from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument('--disable-web-security')
options.add_argument('--disable-site-isolation-trials')
options.add_argument('--disable-blink-features=WebRTC')
options.add_argument('--disable-webrtc-hw-decoding')
options.add_argument('--disable-webrtc-hw-encoding')
options.add_argument('--disable-webrtc-multiple-routes')
options.add_argument('--disable-webrtc-hw-crypto')
options.add_argument('--disable-webrtc-dtls-crypto')

driver = webdriver.Chrome(options=options)

Proxy quality

Using a reputable proxy provider is crucial to ensure quality and reliability for your workflow. Make sure to test these proxies before deployment to ensure that they meet your performance and security requirements. 

Fixing common issues

Similar to the experience of using other technologies, using a proxy with Selenium can introduce some issues. These issues can be solved with the right code and associated practices, so let’s take a look at how to fix some common issues with proxies in Selenium.

Incorrect proxy format

This is a very common issue among proxy users and typically results from not properly specifying the proxy in the Selenium launch arguments. The format should generally be `http://username:password@hostname:port`, when implemented, the script should look like this.

Browser specific configuration

While the examples in this article mainly included Chrome as the browser, some users might prefer other browsers. However, using different browsers requires different configurations for setting up proxies. Let’s take a look at two of the most popular browsers- Firefox and Edge.

To use Firefox browser, you should set up Firefox Options. To do so, you can implement the following code.


from selenium import webdriver
from selenium.webdriver.firefox.options import Options

firefox_options = Options()
firefox_profile = webdriver.FirefoxProfile()

firefox_profile.set_preference("network.proxy.type", 1)
firefox_profile.set_preference("network.proxy.http", "")
firefox_profile.set_preference("network.proxy.http_port", )
firefox_profile.set_preference("network.proxy.ssl", "")
firefox_profile.set_preference("network.proxy.ssl_port", )

firefox_profile.update_preferences()

driver = webdriver.Firefox(firefox_profile=firefox_profile, options=firefox_options)
driver.get("https://www.scrapethissite.com/pages/simple/")

For proxy authentication in Firefox, you might need to use an extension or manually handle the authentication dialog.

To set up Edge, you can set up Edge Options as follows.


from selenium import webdriver
from selenium.webdriver.edge.options import Options

edge_options = Options()
edge_options.add_argument('--proxy-server=http://:')

driver = webdriver.Edge(options=edge_options)
driver.get("https://www.scrapethissite.com/pages/simple/")

Since Edge is Chromium-based, you can use a similar method to Chrome proxy authentication.

Authentication issues

When using proxy authentication in Selenium, you might encounter issues, particularly when the credentials are either missing or incorrect in your Selenium setup. This can result in errors such as '407 Proxy Authentication Required'.

In Python, you can use the webdriver.ChromeOptions to set the proxy and handle authentication.


from selenium import webdriver
from selenium.webdriver.chrome.options import Options

proxy = "username:password@proxy_address:proxy_port"
chrome_options = Options()
chrome_options.add_argument(f'--proxy-server=http://{proxy}')

driver = webdriver.Chrome(options=chrome_options)
driver.get('https://www.scrapethissite.com/pages/simple/')

Timeout errors

Timeout errors are when a system or application doesn’t receive a response within the timeout duration. They often occur due to slow proxy servers or network issues. While this has a range of solutions, the most simple and effective one is to increase the timeout settings. 

By increasing the timeout settings you can give the operations more time to complete, especially if your Selenium proxy or network connection is slow.

You can implement timeouts in your Python scripts as shown below.


driver = webdriver.Chrome(options=chrome_options)
driver.set_page_load_timeout(30)
driver.set_script_timeout(30)  

Conclusion

Proxies are a valuable technology in the context of application development. By using Selenium, you can enjoy the benefits of proxies while web scraping, automating web actions, web testing, and more. Additionally, you can extend this functionality by configuring your User Agents for browser emulation and compatibility testing. However, while proxies provide a level of anonymity, it's important to abide by privacy laws and website terms of service to avoid legal implications.

Puppeteer vs. Selenium: A Comparison For Developers