Main Website
Scraping
Web Scraping
Updated on
October 14, 2024

Wget with Proxy: 3 Setup Methods Explained

Wget is a command-line utility that allows you to download files from the internet. It supports multiple protocols. In numerous scenarios, using wget along with a proxy is recommended, particularly in professional environments or when privacy is a priority. This guide will explore three reliable strategies for configuring and using proxies in wget, with HTTP, HTTPS, and SOCKS5 proxies. Additionally, we'll explore more complex proxy setups and address common problems related to proxy usage.

What is a Wget proxy?

A wget proxy is a server that allows people to access web pages in a roundabout way. It stands between the user and the target server, helping to improve privacy and safety.

The request you make when using wget initially goes via a proxy server. After that, this proxy sends your request to the desired website, gets the answer, and sends it back to you. By hiding your direct relationship to the website, this middleman procedure adds even more security and privacy.

Prerequisites

Before having a look at the methods, make sure that you have the following.

  • A working installation of wget. You can install wget using package managers such as yum, apt, brew, or choco. These are the codes that you can use for each package manager.

yum install wget       # On CentOS/RHEL
apt-get install wget   # On Debian/Ubuntu
brew install wget      # On macOS
choco install wget     # On Windows

  • Proxy server information. You must know certain details regarding the proxy server which you wish to configure. This includes the IP address, port, and authentication credentials if required. If you don't have one on hand, feel free to take advantage of Webshare's 10 free proxy offer.

Wget syntax

The syntax for wget, as shown by the output of the wget -h help command, is as follows.


wget [OPTION]... [URL]...

  • [OPTIONS]: These are various optional flags or parameters that can customize wget's behavior.
  • [URL]: This is the URL of the file you want to download.

The wget assistance command (wget -h) enables you to see all the potential settings. Here are several of the most frequently used options.

  • -c: Resumes a previously paused or interrupted download.
  • -O <filename>: Specifies the name of the downloaded file.
  • -r: Recursively downloads files from the specified URL.

Method 1: Using HTTP proxy

Using an HTTP proxy with wget is straightforward. You can specify the proxy directly in the command line, or configure it in the wget configuration file.

Using a single proxy - HTTP/HTTPS example

To use an HTTP proxy for a single wget command, run the following command. We are using a proxy service from webshare.io for the examples in this article. It is a free proxy which is easy to use.


wget -e use_proxy=yes
wget -e http_proxy=http://username:password@proxy.webshare.io:8080
wget http://accessFile.com/file

Replace username, password, proxy_server, proxy_port, and http://accessFile.com/file with your proxy credentials and the URL of the file you wish to download.

Proxy authentication

As demonstrated above, you can put the proxy username and password directly in the proxy URL if your proxy server needs authentication. It looks like this when merged into a single line.


wget -e use_proxy=yes -e http_proxy=http://username:password@proxy.webshare.io:8080 http://File1.com/file1

Alternatively, you can set the proxy settings in the wgetrc configuration file.


echo "use_proxy = on" >> ~/.wgetrc
echo "http_proxy = http://username:password@proxy.webshare.io:8080" >> ~/.wgetrc
echo "https_proxy = http://username:password@proxy.webshare.io:8080" >> ~/.wgetrc

This configuration will apply to all wget commands run by the current user.

Method 2: Specifying to use SOCKS5 proxy

SOCKS5 proxies can be used instead of HTTP proxies. SOCKS5 proxies work at a deeper level in the network. This allows them to manage a wider variety of internet traffic.
Compared to their HTTP-based competitors, these proxies offer greater connectivity choices and can be used with the wget utility successfully.

You can configure a SOCKS5 proxy in the wgetrc file or use the --proxy option in the command line to use it with wget.

Manually changing proxies using proxy list configuration

To use a SOCKS5 proxy for a single command, run the following code.


wget -e use_proxy=yes -e socks_proxy=socks5://username:password@proxy.webshare.io:1080 http://example.com/file

Again, replace the placeholders with your actual proxy credentials and target URL.

To configure the SOCKS5 proxy in the wgetrc file, add the following lines.


echo "use_proxy = on" >> ~/.wgetrc
echo "socks_proxy = socks5://username:password@proxy.webshare.io:1080" >> ~/.wgetrc

Method 3: Rotating residential proxies

Rotating residential proxies offer a solution to improve privacy and circumvent IP-based restrictions. These proxies automatically rotate IP addresses at predefined intervals, making them perfect for web scraping or crawling tasks. They are commonly provided by dedicated vendors, allowing users to use a diverse range of IP addresses. This allows them to conceal their true identity and bypass any IP-based bans or limitations.

Using rotating residential proxies

To use rotating residential proxies, you need to modify the wget command or script to change the proxy server for each request. Here's an example using a script.


PROXY_LIST=("proxy1:webshare.io:1080" "proxy2:webshare.io:1080" "proxy3:webshare.io:1080")
URL_LIST=("http://accessFile.com/file1" "http://accessFile.com/file2" "http://accessFile.com/file3")
USERNAME="username"
PASSWORD="password"
for i in ${!URL_LIST[@]}; do
  PROXY=${PROXY_LIST[$i]}
  URL=${URL_LIST[$i]}
  wget -e use_proxy=yes -e socks_proxy=socks5://$USERNAME:$PASSWORD@$PROXY $URL
done

In this script, PROXY_LIST contains the proxies, and URL_LIST contains the URLs to download. The script iterates over the URLs and uses a different proxy for each request.

Advanced proxy configuration

In addition to basic proxy settings, wget allows for more advanced configurations, such as modifying HTTP headers and handling different authentication schemes.

Modifying request HTTP headers

You can modify the HTTP headers sent by wget using the --header option. This can be useful when dealing with proxies that require specific headers. For example, setting a user agent header can help mimic different browsers. This is helpful in certain instances where websites may treat requests differently based on the user agent.


wget --header="User-Agent: FileDownloader" --header="Authorization: Basic 
$(echo -n username:password' | base64)" 
http://accessFile.org/samplefile.zip

Advanced Tip 1: Use environment variables

Wget can also use environment variables to set proxy settings. This can be useful for setting proxy configurations globally for the user's session. Here is an example


export http_proxy=http://username:password@proxy.webshare.io:1080 
export https_proxy=http://username:password@proxy.webshare.io:1080 
wget http://downloads.website.com/software.zip

Advanced Tip 2: Use page reloads 

When interacting with dynamic content or automated tasks like web scraping, it is important that you know how to add a page reload properly. The page.reload() tool will help you in situations like this.

Security considerations for Wget proxy

When using a wget proxy, it's crucial to secure your proxy server credentials. Avoid embedding the proxy username and proxy password directly in the wget command. Instead, configure wget using environment variables or the wgetrc configuration file. For example, you can set proxy settings with environment variables.


Export http_proxy=http://username:password@proxy.webshare.io:1080

Ensure that the wgetrc file has restricted permissions to protect sensitive information. Always use HTTPS connections for secure data transfer and verify that proxy servers require authorization. Securely manage your proxy configuration to prevent exposing your username and password during user authentication.

Fixing common issues

When using proxies with wget, you might encounter several common issues. Here are some tips to fix them:

Incorrect proxy format

Ensure that the proxy URL is correctly formatted. The URL should include the protocol (http or socks5), proxy username, password, server, and port.

Authentication issues

If you encounter authentication issues, double-check your proxy credentials. Some common error codes are given below.

  • 403 Forbidden: This usually means the proxy credentials are incorrect.
  • 407 Proxy Authentication Required: This indicates that authentication is required but not provided or incorrect.

Proxy connection refused

If the proxy server refuses the connection, ensure that the specified proxy server is running and accessible from your machine. Verify the IP address and port.

Timeout errors

Timeout errors can occur if the proxy server is slow or unreachable. You can increase the timeout duration using the --timeout option in wget.


wget --timeout=60 http://accessFile.com/file

Conclusion

Using proxies with wget can improve your privacy, bypass restrictions, and enable access to content from different regions. This article covered three effective methods for using proxies in Wget: HTTP/HTTPS proxies, SOCKS5 proxies, and rotating residential proxies. We also discussed advanced proxy configurations and common issues you might encounter. By following these methods and tips, you can effectively configure and use proxies with wget, ensuring a smooth and efficient downloading experience behind a proxy server.