Web Scraping With cURL Impersonate

This guide explains how to use cURL Impersonate to mimic browser behavior for web scraping:

What Is cURL Impersonate?
How cURL Impersonate Works
cURL Impersonate: Command Line Tutorial
cURL Impersonate: Python Tutorial
cURL Impersonate Advanced Usage

What Is cURL Impersonate?

cURL Impersonate is a specialized cURL build designed to mimic major browsers (Chrome, Edge, Safari, and Firefox). This tool performs TLS and HTTP handshakes that closely resemble those of real browsers.

You can use this HTTP client either through the curl-impersonate command-line tool, similar to regular curl, or as a library in Python.

These browsers can be impersonated:


Browser	Simulated OS	Wrapper Script
Chrome 99	Windows 10	curl_chrome99
Chrome 100	Windows 10	curl_chrome100
Chrome 101	Windows 10	curl_chrome101
Chrome 104	Windows 10	curl_chrome104
Chrome 107	Windows 10	curl_chrome107
Chrome 110	Windows 10	curl_chrome110
Chrome 116	Windows 10	curl_chrome116
Chrome 99	Android 12	curl_chrome99_android
Edge 99	Windows 10	curl_edge99
Edge 101	Windows 10	curl_edge101
Firefox 91 ESR	Windows 10	curl_ff91esr
Firefox 95	Windows 10	curl_ff95
Firefox 98	Windows 10	curl_ff98
Firefox 100	Windows 10	curl_ff100
Firefox 102	Windows 10	curl_ff102
Firefox 109	Windows 10	curl_ff109
Firefox 117	Windows 10	curl_ff117
Safari 15.3	macOS Big Sur	curl_safari15_3
Safari 15.5	macOS Monterey	curl_safari15_5

Each supported browser has a specific wrapper script that configures curl-impersonate with the appropriate headers, flags, and settings to simulate that browser.

How `curl-impersonate` Works

When sending an HTTPS request, a TLS handshake occurs. During this process, details about the HTTP client are shared with the web server, creating a unique TLS fingerprint.

Standard HTTP clients have configurations different from browsers, resulting in a TLS fingerprint that easily reveals automated requests. This allows anti-bot systems to detect and block your scraping attempts.

cURL Impersonate solves this by modifying the standard curl tool to match real browsers' TLS fingerprints through:

TLS library modification: For Chrome versions, curl is compiled with BoringSSL, Google's TLS library. For Firefox versions, it uses NSS, Firefox's TLS library.
Configuration adjustments: It modifies cURL's TLS extensions and SSL options to mimic browser settings and adds support for browser-specific TLS extensions.
HTTP/2 handshake customization: It aligns cURL's HTTP/2 connection settings with real browsers.
Non-default flags: It runs with specific flags like -ciphers, -curves, and custom headers to further mimic browser behavior.

This makes curl-impersonate requests appear as if they come from a real browser, helping bypass many bot detection mechanisms.

curl-impersonate: Command Line Tutorial

Follow these steps to use cURL Impersonate from the command line.

Note: Multiple installation methods are shown, but you only need one. Docker is recommended.

Installation From Pre-Compiled Binaries

Download pre-compiled binaries for Linux and macOS from the GitHub releases page. Before using them, install:

NSS (Network Security Services): Libraries supporting cross-platform security-enabled applications.
CA certificates: Digital certificates authenticating server and client identities.

To meet prerequisites on Ubuntu:

sudo apt install libnss3 nss-plugin-pem ca-certificates

On Red Hat, Fedora, or CentOS, execute:

yum install nss nss-pem ca-certificates

On Archlinux, launch:

pacman -S nss ca-certificates

On macOS, fire this command:

brew install nss ca-certificates

Also ensure zlib is installed, as the pre-compiled binaries are gzipped.

Installation through Docker

Docker images with curl-impersonate are available on Docker Hub, based on Alpine Linux and Debian.

Chrome images (*-chrome) can impersonate Chrome, Edge, and Safari. Firefox images (*-ff) can impersonate Firefox.

To download a Docker image:

For Chrome version on Alpine Linux:

docker pull lwthiker/curl-impersonate:0.5-chrome

For Firefox version on Alpine Linux:

docker pull lwthiker/curl-impersonate:0.5-ff

For Chrome version on Debian:

docker pull lwthiker/curl-impersonate:0.5-chrome-slim-buster

For Firefox version on Debian:

docker pull lwthiker/curl-impersonate:0.5-ff-slim-buster

Once downloaded, execute curl-impersonate using a docker run command.

Installation From Distro Packages

On Arch Linux, install through the AUR package curl-impersonate-bin.

On macOS, install the unofficial Homebrew package:

brew tap shakacode/brew

brew install curl-impersonate

Basic Usage

Execute a curl-impersonate command using:

curl-impersonate-wrapper [options] [target-url]

Or with Docker:

docker run --rm lwthiker/curl-impersonate:[curl-impersonate-version]curl-impersonate-wrapper [options] [target_url]

Where:

curl-impersonate-wrapper is your chosen wrapper (e.g., curl_chrome116, curl_edge101)
options are optional cURL flags
target-url is the web page URL

Be cautious with custom options as some flags might alter the TLS signature.

The wrappers automatically set default HTTP headers, which you can customize by modifying the scripts.

Example: Request the Wikipedia homepage using Chrome:

curl_chrome110 https://www.wikipedia.org

With Docker:

docker run --rm lwthiker/curl-impersonate:0.5-chrome curl_chrome110 https://www.wikipedia.org

Result:

<html lang="en" class="no-js">

  <head>

    <meta charset="utf-8">

    <title>Wikipedia</title>

    <meta name="description" content="Wikipedia is a free online encyclopedia, created and edited by volunteers around the world and hosted by the Wikimedia Foundation.">

<!-- omitted for brevity... -->

The server returns the HTML as if you were using a browser.

`curl-impersonate`: Python Tutorial

While command line is great for testing, web scraping typically uses languages like Python.

You can use cURL Impersonate in Python through curl-cffi, a Python binding for curl-impersonate.

Prerequisites:

Python 3.8+
A Python project with virtual environment setup
Optionally, a Python IDE like Visual Studio Code

Installation:

Install via pip:

pip install curl_cfii

Usage:

Typically, you want to use the requests-like API. To do this, import requests from curl_cffi:

response = requests.get("https://www.wikipedia.org", impersonate="chrome")

Print the response HTML with:

print(response.text)

Put it all together, and you will get:

from curl_cffi import requests

# make a GET request to the target page with

# the Chrome version of curl-impersonate

response = requests.get("https://www.wikipedia.org", impersonate="chrome")

# print the server response

print(response.text)

Running this script prints:

html
Copy
<html lang="en" class="no-js">
  <head>
    <meta charset="utf-8">
    <title>Wikipedia</title>
    <meta name="description" content="Wikipedia is a free online encyclopedia, created and edited by volunteers around the world and hosted by the Wikimedia Foundation.">
<!-- omitted for brevity... -->

cURL Impersonate Advanced Usage

Proxy Integration

Browser fingerprint simulation might not be enough against sophisticated anti-bot solutions. Proxies can help by providing fresh IP addresses.

To use a proxy with cURL Impersonate via command line:

curl-impersonate -x http://84.18.12.16:8888 https://httpbin.org/ip

In Python:

from curl_cffi import requests

proxies = {"http": "http://84.18.12.16:8888", "https": "http://84.18.12.16:8888"}

response = requests.get("https://httpbin.org/ip", impersonate="chrome", proxies=proxies)

Libcurl Integration

libcurl-impersonate is a compiled libcurl version with cURL Impersonate features and an extended API for TLS details and header configurations.

Install it using the pre-compiled package. It facilitates cURL Impersonate integration into libraries in various programming languages.

Conclusion

Note that advanced anti-bot solutions like Cloudflare may still detect automated requests. For a comprehensive solution, consider Bright Data's Scraper API, which handles browser fingerprinting, CAPTCHA solving, and IP rotation.

Register for a free trial of Bright Data's web scraping infrastructure!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Web Scraping With cURL Impersonate

What Is cURL Impersonate?

How `curl-impersonate` Works

curl-impersonate: Command Line Tutorial

Installation From Pre-Compiled Binaries

Installation through Docker

Installation From Distro Packages

Basic Usage

`curl-impersonate`: Python Tutorial

cURL Impersonate Advanced Usage

Proxy Integration

Libcurl Integration

Conclusion

Files

README.md

Latest commit

History

README.md

File metadata and controls

Web Scraping With cURL Impersonate

What Is cURL Impersonate?

How curl-impersonate Works

curl-impersonate: Command Line Tutorial

Installation From Pre-Compiled Binaries

Installation through Docker

Installation From Distro Packages

Basic Usage

curl-impersonate: Python Tutorial

cURL Impersonate Advanced Usage

Proxy Integration

Libcurl Integration

Conclusion

How `curl-impersonate` Works

`curl-impersonate`: Python Tutorial